To Impossible and Beyond: Social Media Sentiment Analysis of Plant-Based Patties

Cover Image

Plant-based alternatives to meat have become a staple on supermarket shelves and restaurant menus. This trend has been bolstered by the increasing popularity of veganism and sustainable diets, as well as significant product innovations. Gone are days of veggie burgers bearing no resemblance to burgers made with beef. The plant-based patties of today, primarily produced by Impossible Foods and Beyond Meat, claim to be nearly identical to the real deal in appearance, texture, and taste. As these alternatives steadily gain popularity and eat up the meat industry’s market share, I am curious about public opinion towards these products.

Social media is an arbiter of public opinion, and I chose Twitter as a channel. With a focus on tweets containing the terms “impossible burger” or “beyond burger”, I obtained 40,000 recent tweets for each term.

Goals:

  1. Get 40,000 tweets (30,000 to train machine learning model, 10,000 to test machine learning model) for both "impossible burger" and "beyond burger" search terms
  2. Create training sets by analyzing tweets for positive (0) or negative (1) sentiment using the rule-based method VADER
  3. Visualize positive and negative training sets of tweets in word clouds to gain familiarity with each set
  4. Compare combinations of feature extraction methods (Bag-of-Words and TF-IDF) and machine learning models (Logistic Regression, XGBoost, Decision Trees) to find the combination with the highest F1 score
  5. Train machine learning model using features extracted from training sets of tweets
  6. Apply trained machine learning model to predict sentiments of tweets in test sets

  1. Get tweets: twitter_scraper.pdf
  2. Create training sets: make_training_set.pdf
  3. Impossible Burger: impossible_analyze_tweets.pdf
  4. Beyond Burger: beyond_analyze_tweets.pdf

After analyzing the tweets for positive or negative sentiment using the rule-based method VADER, I visualized the positive and negative sets of tweets separately in word clouds to gain familiarity with each set. These positive and negative sets of tweets will be used to train a machine learning model to perform sentiment prediction.

Impossible Burger

Positive Sentiment
Wordcloud for Impossible tweets with positive sentiment
Negative Sentiment
Wordcloud for Impossible tweets with negative sentiment

Beyond Burger

Positive Sentiment
Wordcloud for Beyond tweets with positive sentiment
Negative Sentiment
Wordcloud for Beyond tweets with negative sentiment

I extracted the hashtags from the positive and negative sets of tweets and calculated the frequency of usage for each.

Impossible Burger

Positive Sentiment
Dataframe for Impossible hashtags with positive sentiment
Negative Sentiment
Dataframe for Impossible hashtags with negative sentiment

Beyond Burger

Positive Sentiment
Dataframe for Beyond hashtags with positive sentiment
Negative Sentiment
Dataframe for Beyond hashtags with negative sentiment

Feature extraction is the process of gathering words (as vectors of numbers) from text to use as input to train machine learning models. Bag-of-Words and TF-IDF (term frequency-inverse document frequency) are two methods of feature extraction. Bag-of-Words represents the occurrence of each word in a document (tweet). TF-IDF represents the importance of each word to a document within a group of documents (set of tweets).

Machine learning models: Logistic Regression, XGBoost, Decision Trees

F1 score is used to measure the effectiveness of each combination of machine learning model and feature extraction method. For Impossible Burger, the combination with the highest F1 score (.665513) is logistic regression using TF-IDF features. For Beyond Burger, the combination with the highest F1 score (.623679) is logistic regression using Bag-of-Words features.

Impossible Burger

Bag-of-Words Features
Dataframe for Impossible Burger Bag-of-Words comparison among machine learning models
TF-IDF Features
Dataframe for Impossible Burger TF-IDF comparison among machine learning models
Bag-of-Words vs TF-IDF
Dataframe for Impossible Burger comparison of Bag-of-Words and TF-IDF for logistic regression

Beyond Burger

Bag-of-Words Features
Dataframe for Beyond Burger Bag-of-Words comparison among machine learning models
TF-IDF Features
Dataframe for Beyond Burger TF-IDF comparison among machine learning models
Bag-of-Words vs TF-IDF
Dataframe for Beyond Burger comparison of Bag-of-Words and TF-IDF for logistic regression

Sentiment predictions (0 for positive, 1 for negative) of tweets by logistic regression machine learning model. The Impossible Burger set used TF-IDF features, while the Beyond Burger set used Bag-of-Words features.

Impossible Burger

Dataframe for Impossible Burger Sentiment Predictions

Beyond Burger

Dataframe for Beyond Burger Sentiment Predictions