This map shows each county’s favorite major league baseball (MLB) team. Or technically, which MLB team was mentioned the most. This data comes from an analysis of 15mil geo-tagged tweets, 45k which were determined to mention a team name.
A matching tweet required the usage of either:
- the official team twitter handle
- the official team hashtag (if one exists). E.g., #brewers
- the team name with city
- team name with “mlb” anywhere
- team city with “mlb” anywhere if it is not ambiguous (e.g., mlb Milwaukee counts, new york mlb does not)
These requirements virtually eliminate false matches, such as might be caused by only matching on team names. For example, by allowing any mention of the word “cubs” to count in favor of the chicago cubs, tweets such as “The new cubs at the zoo are adorable” would count when they should not.
Unfortunately, these requirements also cut down on many tweets that are rightly about MLB teams but do not meet the criteria. For example, “Cubs won a tight one last night” is about the MLB team, but wouldn’t count using the criteria.
Comparing to the Facebook MapIn 2014, a similar map of favorite MLB teams was produced by analyzing likes to teams’ Facebook pages.
The Facebook MLB map produces regions that are much more stable and consistent. Not surprising given that there was a lot more data to work with in the Facebook map. One takeaway from this is that with more data, perhaps the twitter map will begin to look more like the Facebook map in terms of stable and consistent boundaries of fandom. Yet, hopefully provide some new information that the Facebook map does not.
Yet there are similarities present in both maps. Major regions are replicated, with boundaries roughly in the same area. Also, the pockets of Red Sox fans in the West are replicated on both maps.
The coolest thing is twitter can be used to track something like preferences, in this case favorite baseball teams. Pockets of (unexpected) fandom are represented and tracked nicely, and are easily visually represented. To me, this is proof of concept of what I’m able to do in terms of twitter analysis going forward.
Also exciting are possibilities going forward with more data. I expect these results become more stable, and less messy with more data over time. The strict requirements on what counts limits the data to only 45k tweets, making for a noisy map. More data means a better map in the future.
Reasons for caution
There are also reasons for caution in reading this map. That is, in many places on the map just one tweet represents the winning faction for that county due to the sparseness of the data (especially in certain regions of the country). This also means the next tweet could flip an entire county. In fact, it could flip a whole region on the map. That’s because the data are spatially smoothed. Meaning that data are aggregated across neighboring counties to take some of the noise out of the data. In most cases, this reduces errors and removes outliers. However, it also means that counties with no data are heavily influenced by their neighbors on the map. When those neighbors only have a few tweets (or just one), the next tweet may have implications for several counties on the map.