On the Records: Useful Data Tool Visualizes Tweets

A new data visualization tool, the Tweet Topic Explorer, makes analyzing a user’s Twitter feed quick, painless and beautiful. 

“It’s fairly commonly stated nowadays that there’s lots of information, lots of data out there and a lot of people are overwhelmed by it,” said Jeff Clark, a programmer and data visualization enthusiast who runs the site neoformix.com, who created the Tweet Topic Explorer. “I try and use visual ways to summarize that information or illustrate patterns in it that you wouldn’t normally see.”

The Tweet Topic Explorer can be used to figure out at a glance what a user tweets about. Clark said the program is useful for determining whether to follow a user or not, because it visualizes clearly the topics the user tweets about most. Below are screenshots of a few Twitter accounts: The state’s Twitter feed, Gov. Rick Perry’s personal Twitter account, and our account at The Texas Tribune.

The application uses Processing.js, a computer language similar to javascript, to grab the latest set of tweets from a user. It then finds the most used words, ignoring trivial words like prepositions, and groups them using a clustering algorithm, which finds words used in the same tweet or near each other. The words are displayed in bubbles, whose size represent the frequency of the word, and color-coded according to the clustering algorithm.

Clicking on a word will bring up all the tweets in which it is used. On the state’s twitter feed, @texasgov, clicking on the word “water” will bring up all the stories they’ve posted relating to the drought, and water talks between different groups of developers. In this way, the application can be used to quickly discern the topics an individual or organization is tweeting about, and find the tweets related to those topics. (Clicking on any of the images below will take you to the interactive visualization.)

On his personal twitter feed, @governorperry, Gov. Rick Perry tweets most often about “Texas.” But he also tweets frequently about his daily activities. Clustered around the word “run” are the adjectives Perry uses most often to describe his regular exercise routine—including “beautiful,” “dog,” “warrior,” and “hill.” 

The most used word on the Texas Tribune’s twitter feed is “txlege” — the hash tag used for events in the Legislature. As a newsroom twitter feed, it’s not surprising that “says,” is the second biggest. We often use Twitter to report quickly what lawmakers are saying about an issue. 

The yellow cluster in the top right of the Trib’s explored twitter displays words related to education finance reform — a hot topic in the special session, which the Trib has tweeted about frequently. The major players in the House debate — “Eissler,” “Turner,” and “Christian” — each have their own bubble.

Different word cloud programs, such as wordle.net, adjust the height or area of a word to show its relative frequency. “In some cases, they don’t do a good job of showing the relative importance of the words,” said Clark. “For example if you have a very short word with only three letters in it that’s used 100 times and you have a very long word with 10 letters that’s used 100 times, how much area should the word take up in the display?” Although the area should be the same, if displayed as such the three-letter word would look much larger, and thus more important. To get around this issue, Clark used circles where the area is directly proportional to the frequency of the word.

“It’s not a perfect solution either because the words are made as big as you can fit them in the circle.”

 

Texas Tribune donors or members may be quoted or mentioned in our stories, or may be the subject of them. For a complete list of contributors, click here.