To Impossible and Beyond: Social Media Sentiment Analysis of Plant-Based Patties
Plant-based alternatives to meat have become a staple on supermarket shelves and restaurant menus. This trend has been bolstered by the increasing popularity of veganism and sustainable diets, as well as significant product innovations. Gone are days of veggie burgers bearing no resemblance to burgers made with beef. The plant-based patties of today, primarily produced by Impossible Foods and Beyond Meat, claim to be nearly identical to the real deal in appearance, texture, and taste. As these alternatives steadily gain popularity and eat up the meat industry’s market share, I am curious about public opinion towards these products.
Social media is an arbiter of public opinion, and I chose Twitter as a channel. With a focus on tweets containing the terms “impossible burger” or “beyond burger”, I obtained 40,000 recent tweets for each term.
Goals:
- Get tweets: twitter_scraper.pdf
- Create training sets: make_training_set.pdf
- Impossible Burger: impossible_analyze_tweets.pdf
- Beyond Burger: beyond_analyze_tweets.pdf
After analyzing the tweets for positive or negative sentiment using the rule-based method VADER, I visualized the positive and negative sets of tweets separately in word clouds to gain familiarity with each set. These positive and negative sets of tweets will be used to train a machine learning model to perform sentiment prediction.
Impossible Burger
Positive Sentiment
Negative Sentiment
Beyond Burger
Positive Sentiment
Negative Sentiment
I extracted the hashtags from the positive and negative sets of tweets and calculated the frequency of usage for each.
Impossible Burger
Positive Sentiment
Negative Sentiment
Beyond Burger
Positive Sentiment
Negative Sentiment
Feature extraction is the process of gathering words (as vectors of numbers) from text to use as input to train machine learning models. Bag-of-Words and TF-IDF (term frequency-inverse document frequency) are two methods of feature extraction. Bag-of-Words represents the occurrence of each word in a document (tweet). TF-IDF represents the importance of each word to a document within a group of documents (set of tweets).
Machine learning models: Logistic Regression, XGBoost, Decision Trees
F1 score is used to measure the effectiveness of each combination of machine learning model and feature extraction method. For Impossible Burger, the combination with the highest F1 score (.665513) is logistic regression using TF-IDF features. For Beyond Burger, the combination with the highest F1 score (.623679) is logistic regression using Bag-of-Words features.
Impossible Burger
Bag-of-Words Features
TF-IDF Features
Bag-of-Words vs TF-IDF
Beyond Burger
Bag-of-Words Features
TF-IDF Features
Bag-of-Words vs TF-IDF
Sentiment predictions (0 for positive, 1 for negative) of tweets by logistic regression machine learning model. The Impossible Burger set used TF-IDF features, while the Beyond Burger set used Bag-of-Words features.