CHINA, CHINA, CHINA, CHINA, CHINA CHINA, CHINA, CHINA, CHINA, CHINA CHINA, CHINA, CHINA, CHINA, CHINA CHINA, CHINA, CHINA, CHINA, CHINA CHINA, CHINA, CHINA, CHINA, CHINA CHINA, CHINA...
By Donald Trump
What we are trying to do?
Quotations are often used to represent someone’s point of view. We believe that these Quotations used in the news related to China represent the attitudes of Western politicians and, to some extent, the Western public, towards China. By analyzing those Quotations, we hope to answer the question: what are the attitudes of the Western world towards China?
We use keyword filtering to get the related quotations, then apply KeyBERT, a minimal keyword extraction technique, to the quotations and article title to create keywords and key phrases that are most similar to the string itself. We also aggregate keywords into more general topics. Then, we use
Twitter-roBERTa-base for our sentiment analysis on each quotation and get a sentiment score from -1 ~ 1 to represent the positive/negative scale of the attitude of that quotation.
What does our data look like?
The Quotebank data-set we use contains English quotations gathered from online news articles in the period between 2008 and 2020. We filter the data to obtain quotations relating to China which were uttered by a person with a western background as defined by their nationality.
The data filtering was done by checking whether specific keywords relating to china can be found in either the title, quotation, or local left and right context of the quotation. We checked for the following keywords:
['jinping xi', 'xi jing ping', 'xi jinping', 'president xi', 'china', 'chinese', 'beijing', 'peking', 'sino-']
In order to determine whether a certain speaker is from a ‘Western’ background, we check whether they come from one of the countries in the political definition of Western countries. As mentioned before, after filtering we are left with approximately 2 million article-quotation pairs.
Overview of the Filtered Data
Below we see a map of the number of quotations per country in the western world, note that the
QuotationScale is the logarithm of the number of quotes. As we can see the vast majority of quotes are from the US, with a smaller amount of quotes coming from all other countries. Therefore, when observing the results please take into account that most of the data the investigation is based on are from the US.