Analyzing Twitter sentiment and topics with Python and Streamlit
Data science app built using Python and Streamlit to run a text and sentiment analysis on Twitter data in real-time.
Data science app using Python and Streamlit to visualize sentiment from NHL fans and insiders on Twitter.
Streamlit app that visualizes sentiment from NHL fans and insiders on Twitter.
The glorious purpose of this project is to estimate sentiment from NHL fans and insiders on Twitter during the Seattle Kraken’s 2021 NHL Expansion Draft.
On July 21, 2021, the NHL will welcome the Seattle Kraken as the 32nd franchise in the league’s history. The much-anticipated expansion to the league requires the Kraken to select players from incumbent NHL teams around the league in what’s appropriately called an Expansion Draft.
The player selection rules and guidelines are the same as the ones used back in 2017 when the Vegas Golden Knights entered the league. To learn more about the rules, read this short article from the NHL.
This project is an attempt to understand how people around the NHL feel about the Expansion Draft, in real-time. Specifically, treating NHL fans and NHL insiders differently and applying text analysis to both in separate pages in the app.
I’ve iframed the app below. You can check out the live app deployed on Streamlit here.
Data is retrieved from the Twitter API v2.0 using the tweepy Python library and top secret personal Twitter developer credentials. To differentiate between NHL fans and insiders, we chose to leverage two different end-points: Twitter Search (NHL fans), and Twitter Timeline (NHL insiders). We will go into more detail in the app overview section
When we refer to text analytics, you can think of it as a set of different processes to create high-quality insight from tweet data.
Below is a list of the different processes that were used:
Topic modeling and Latent Dirichlet Allocation (LDA) to attempt to understand the main themes apparent in the cluster of tweets. This was done primarily through the gensim python library
Rule-based mixed-membership classification of text to label each of the NHL teams mentioned in a tweet. That is, each tweet can belong to 0, 1, or multiple NHL teams. The rules used are imperfect keywords (if keywords exist, then label as related NHL team). We used a combination of data manipulation and visualization libraries such as pandas
, numpy
and altair
.
Word clouds and custom top-tweet logic to get a quick overview of the top words seen on Twitter. We used the worldcloud python library in addition to various data manipulation libraries to accomplish this.
We used Streamlit to surface insights from the various analyses.
What is Streamlit?
Streamlit is the fasted way to build and share data apps. It turns Python scripts into shareable web apps.
In the following section, we will go into the nitty-gritty of our application.
The app uses Streamlit v0.84. At the time of deployment, it did not natively support multi-page apps. As a result, we turned to the wonderful world of stack-overflow and Streamlit community threads and found a methodology to treat individual python scripts as a page in a Streamlit app. For credit and resources, see the resources section.
At a high level, the app structure behaved as described below:
app_multipage.py
is set up to enable multiple pages to be run
Twitter developer credentials are stored in a local folder named .streamlit
, ignored by our .gitignore
file
Required inputs to the app such as nhl_app_accounts.csv
, nhl_app_teams.csv
, and the nhl_app_logo.png
are stored in the folder assets
fans.py
, insiders.py
, and readme.py
are created within a folder named pages
. Each file is its own Streamlit application
app.py
combines user input and the functions from app_multipage.py
to display the app page that was selected by the user (e.g. default page is readme.py
)
As mentioned in the future releases section, we are working on a process diagram to visualize the flow of data used in our app.
The goal of this page is to estimate how NHL fans around the league feel towards the NHL Expansion Draft.
To do this, we used the Twitter Search API endpoint as it allows for flexible query search parameters. Every time this page runs, the twitter_get_nhl
function is used to search for the most recent x many tweets having the keywords expansion draft
or expansiondraft
. Following this, various transformations are called to deliver text analytics insight and data visualizations.
Similar to the Fan-analyzer page, the goal of this page is to estimate sentiment on Twitter. However, this page is all about analyzing the sentiment of professionals in the NHL, which we refer to as insiders. Really, it is an arbitrary just a custom list of hockey reporters and analytics folks who offer popular takes on Twitter.
This required using the Twitter Timeline API endpoint, which provides all the tweets posted by a specified user. Our logic simply loops through the aforementioned list of accounts and returns all of the tweets posted within 48 hours. Following this, similar transformations are made to deliver text analytics insight and data visualizations.
To run this app on your own, either fork
this repository, or git clone
into your local.
See the requirements.txt file for the libraries and version required to run this app.
You will need to get your approval from Twitter to get data from their API. You can apply for a Twitter developer account at this link.
There are various resources to help you with setting up your credentials - we found this article to be a good resource.
Important reminder on protecting your credentials: If you plan on using git for version control, don’t forget to add a .gitignore
file with the proper configurations to ensure your credentials stay private!
Coming soon! We are working on putting a visual demo here.