Analysis of the Attitude towards the 2021 United Nations Climate Change Conference (COP26)
As reported by NASA, the planet’s average surface temperature has risen by 1.18 degrees Celsius since the late 19th century, of which most of the warming has occurred in the past 40 years. Oceans are warming, ice sheets are shrinking, and glaciers are retreating. In 2018, the Intergovernmental Panel on Climate Change (IPCC) released a special report documenting the impacts of global warming reaching 1.5 degrees Celsius above pre-industrial levels and the catastrophic consequences it would lead to.
The United Nations Climate Change Conference (COP26) is a conference of the parties under the 1992 United Nations Framework Convention on Climate Change (UNFCCC). The 26th iteration of this conference was held in Glasgow, the United Kingdom, from 31st October to 13th November 2021. This conference aimed to set goals to keep the 1.5-degree increase in temperature (since pre-industrial times) within reach and to strategize on reaching the target of net-zero emissions by 2050. Thus, this study used computational and conceptual text analysis to examine the response to the COP26 conference as expressed on Twitter.
This study used volume analysis, sentiment analysis, and topic modeling for a set of Tweets to compare results over state and time. The steps involved data collection, data preprocessing for each model, model selection, and model evaluation. Volume analysis is done to establish temporal and geospatial facts about the dataset. Sentiment analysis can be used to gauge a user’s attitude toward the conference. Finally, topic modeling will determine the topics users have discussed. Each analysis will be combined with the results of the previous analysis to glean a more nuanced insight; thus, the order of the analyses was the same as mentioned above.
The Twitter API was used to scrape all public Tweets containing the terms “cop26” or “climate change conference” to extract Tweets from 31st October to 13th November. Restrictions were applied to filter retweets and to collect Tweets in English only. Regarding the Tweet text, standard preprocessing techniques were followed. First, all Tweets were changed to lowercase. Then all URLs were removed, after which special characters were removed as well. According to Saif et al. (2014), removing stopwords, especially in the context of short Tweets, will hamper the sentiment analysis and topic modeling which will be carried out. Thus, stop words were not removed.
After the data was cleaned, an exploratory analysis was conducted. 10,268 unique Tweets in English were extracted from the API, of which 67.15% had locations that could be used. Each Tweet had an average of 31.15 words within the 280-character limit. The dataset contained 6932 unique users. The 100 most frequently used words (after filtering out ‘cop26’, ‘climate,’ ‘change,’ and ‘conference’) were visualized to get a brief understanding of the Tweet's contents.
Volume Analysis
The volume of Tweets on each day or from a particular country gives insight into the Tweeting patterns of the dataset.
As seen in the graph, the most Tweets have come from developed countries, namely the United Kingdom, United States, and Canada (with the UK having the highest at 1978 Tweets). This was followed by India, Australia, the Philippines, Kenya, etc., all of whom had substantially fewer Tweets. As with Dahal et al. (2019), the high number of English-speaking countries is due to the keyword search, and language filter applied while extracting Tweets. Overall, with 144 countries included (with at least one Tweet from that location), the worldwide impact of the conference is seen.
To reduce bias, the Tweets per country were normalized by the country's population. To reduce the bias for very small countries or islands with small populations, only the top 50 countries in terms of volume of Tweets were considered for this. Many countries with a high volume of raw Tweets (such as Australia) had much lower normalized counts, but others, like the USA, UK, and Canada, had similar values for both. It must be noted that this normalization is biased against countries with few Tweets in English and those with extremely large populations. The results also show that with regard to their population, countries classified as high-income or upper-middle-income countries by the World Bank are more likely to be involved in climate change discussions. This could also be due to the relatively larger population in lower-middle- income and lower-income countries and the usage of Twitter in regional languages.
Sentiment Analysis
Sentiment analysis is useful to analyze the emotional state or opinion in the dataset. To carry out this supervised technique, 10% of the Tweets (1046 Tweets) of the dataset were randomly selected and labeled as 1 (positive), 0 (neutral), or -1 (negative). This was split into training and testing sets to train and evaluate different models. After tuning, the model with the highest accuracy was chosen as the final model and was applied to the entire dataset.
Valence Aware Dictionary and sEntiment Reasoner (VADER)
The Valence Aware Dictionary and sEntiment Reasoner (VADER) was created specifically to analyze social media sentiment. It examines the lexical features of the Tweets to determine a preliminary sentiment score. Based on this, in addition to syntactic and grammatical conventions, each Tweet is given a score between -1 to 1 (Dahal et al., 2019). If a Tweet had a score between -0.25 to 0.25, it was classified as neutral. While 0 seems like the intuitive threshold value, preliminary experimentation showed that 0.25 had the best precision and recall values. Beyond this, Tweets with a negative score were classified as negative, and Tweets with a positive score were classified as positive. A drawback of this model is that since it is untrained, the accuracy of VADER is questionable for shorter Tweets and is prone to false positives. Thus, to test this, its performance was tested against the data, which was manually labeled. The accuracy was found to be 0.3263.
Support Vector Machine
The next algorithm tested was the Support Vector Machine algorithm. This algorithm is based on a computational learning theory principle — Structural Risk Minimization. These models are universal learners and can learn irrespective of the feature space’s dimensionality, making them a great fit for text categorization. Furthermore, since document vectors are sparse and the text is linearly separable, it is hypothesized that the SVM model will perform well (Joachims, 2005). This paper used GridSearchCV to select the vectorizers and tune the parameters. The accuracy was found to be 0.5933.
K-Nearest Neighbors
KNN is an instance-based learning (lazy) algorithm. The algorithm's output is a class membership based on the majority votes provided by the object’s neighbors. The algorithm relies on the assumption that it is possible to classify documents in the Euclidean space as points (Trstenjak, 2014). Different values of K, ranging from 5 to 25, were tested, and the best results were found for K = 15. The accuracy was found to be 0.4450.
Naive Bayes Classifier
The Naïve Bayes Classifier is an extremely quick algorithm for large chunks of data that uses the Bayes probability theorem to predict probabilities for an unknown class, and then the most likely class is defined based on the class with the highest probability. In this paper, Multinomial Naïve Bayes was used on the vectorized data. A GridSearchCV is used to tune the parameters and pick the vectorizer. The accuracy was found to be 0.6651.
On comparing the results of the four models tested, it was found that Multinomial Naïve Bayes had the best performance for this dataset. It was selected as the final model and applied to the remaining 90% of the dataset.
Of all the Tweets, the largest category was the neutral/ informative category, with 4789 tweets. This was followed by positive Tweets (3963), thus implying that individuals are showing more satisfaction with the conference than satisfaction. The dataset contained 1516 negative Tweets. The trend for the top 6 countries was similar to that of the overall dataset, with Australia showing more negativity than the others.
On considering how sentiment has varied through the course of the conference, it can be seen that the average sentiment improved as the conference progressed. It is likely that people who started off with their concerns were satisfied with the conference’s proceedings. However, another possibility is that cynical users stopped Tweeting as the conference progressed.
People tended to use “government” or “leader” with more negative connotations. This was likely due to the several Tweets referring to the conference being for the sake of talk only or the idea of the event being driven by political leaders to please industry leaders without addressing the true cause of climate change. Fitting in with this hypothesis, people had more positive opinions regarding Greta Thunberg and the idea of youth, implying that they likely resonated with the idea of youth for climate change.
Topic Modeling
Topic Modelling is an unsupervised Machine Learning technique relying on Bayesian probability to determine the thematic structure of a corpus of text (Lafferty and Blei, 2007). Since it does not require labeled data, it is beneficial to analyze text which is too much for humans to do realistically and reliably. While there is no way to validate this model externally, it is extremely useful to get an overview of the thematic structure of the dataset. This paper uses a hierarchical clustering based method called correlation explanation (CorEx) in a semi-supervised manner. Here, the model tries to fit the topics using the given anchors, but these anchors will be overridden if the model cannot fit the topics using the anchors. This will prevent artificial manipulation of the topics. Furthermore, it was not necessary for each Tweet to be allocated to a topic, so this further prevented artificial manipulation.
CorEx topic modeling was used to create 8 clusters which were based on given anchors. For each topic, the 20 most commonly used words were found, based on which a broad category name was given to the topics.
Topic #1 (Political Entities): biden, johnson, boris, trump, joe, asleep, falls, sleepy, sleeping, president, mocks, nodding, spotted, squirms, nicola, fire, prime, planning, worst, mistake
Topic #2 (Conference Accessibility): access, accessible, wheelchair, owe, israeli, disabled, partnerships, unable, generations, build, energy, chapters, recommits, forward, unlock, handicap, back, future, journal, elsevier
Topic #3 (Transportation): private, prince, charles, transport, fly, jet, car, vip, jets, bikes, jailed, regime, environmentalactivists, pressure, release, detaining, duke, joyofleaving, william, harassing
Topic #4 (Food at the Conference): food, vegan, intelligent, system, meat, launches, management, livekinder, fourpawsuk, practices, unhumanrights, sponsors, serving, sold, menu, cancer, cigarettes, diet, treat, humanrights
Topic #5 (Global Warming): united, nations, methane, australia, rising, temperatures, parties, contest, countries, convention, framework, scotland, un, stands, attended, xi, jinping, klimaatconferentie, snubs, held
Topic #6 (Non-political Entities): speech, greta, thunberg, david, blackpink, attenborough, billie, eilish, sir, knee, seawater, tuvalu, deep, standing, blackpinkclimateadvocates, climateactioninyourarea, ambassador, lying, activist, legendary
Topic #7 (Conference Failure): failure, short, talk, fall, giant, blah, mirage, breakthrough, women, sorry, caught, eclipsed, empire, fear, water, fawning, incompetents, rapidly, west
Topic #8 (Conference Success): agreement, success, paris, goals, draft, phaseout, guidelines, accelerate, towards, finalising, implementation, approved, revised, fulfill, together, pray, raising, finalise, cryosphere, agreements
To understand people’s opinions towards the conference, the sentiments for each topic were compared. In general, people were highly pessimistic regarding political entities and transportation but were much more positive towards food at the conference, non-political entities, and the overall success of the conference. Interestingly, the general sentiment toward the conference’s failures was positive. On further scrutiny of these specific Tweets, it was found that the sentiment analysis model could not pick up on their sarcastic or mocking tones. Before the conference began, Greta Thunberg famously lambasted world leaders and their empty promises. Her use of “blah, blah, blah” in her speech was quoted by several users expressing their opinions in her words. Others expressed themselves by stating the conference had met all their expectations — of world leaders not achieving anything real.
As seen in the overall sentiment of the Tweets on a particular day, the Tweets regarding the success (topic 8) and failure (topic 7) show that while people were skeptical, to begin with, it is possible that they had a more positive attitude towards the conference by the time it ended.
This study studied people’s attitudes toward the COP26 conference from a temporal and geospatial perspective. This was done by analyzing people’s sentiments and their topics of discussion.
Overall, it was found that barring the informational (neutral) Tweets, people had a positive opinion regarding the conference. The trend of sentiment through the conference period also showed that many people likely began with a negative outlook but were more positive as the conference proceeded. The peaks in negative sentiment were particularly related to topics concerning politics and political entities, where individuals tended to align more with activists such as Greta Thunberg.
Topic modeling results showed a clear division in topics people aligned with positively and negatively. People were particularly unhappy regarding the political diplomacy and the means of transportation (private jets and fleets of cars) used to arrive in Glasgow for the conference. Conversely, they were more positive regarding the vegan options available at the conference, the ideologies of non-political entities such as Greta Thunberg and David Attenborough, and the overall stance the conference had on Global Warming. The positivity reflected in the analysis must be considered knowing that the sentiment analysis is prone to false positives due to its inability to pick up on sarcastic tones.
Finally, it is interesting to see that in addition to the great number of Tweets regarding news and information about the conference and the need for action against global warming now, most of the discussions were regarding the logistics of the conference as opposed to the content discussed during the conference. From references to ‘sleepy Joe’ to pointing out the hypocrisy of political leaders in reaching the conference in a non-sustainable manner, few people engaged in discussions regarding the conference’s highlights. Many Tweets provided links to news sources that dove deeper into these highlights, but mapping a network model of these conversations would be interesting to understand people dynamics across countries for this truly global topic.