China

A data driven Western perspective

A presentation of Western attitudes towards China
through the exploitation of 1,954,147 quotaions from Quotebank

The story

​ Over the years we have observed deteriorating relations between the US and China. In previous years we have seen escalating tensions as a result of the increasing trade deficit of the US with respect to China. Even more recently, the news has been dominated by the worldwide outbreak of Covid-19, for which some in the US blame China. Moreover, recently the European union has expressed their dissatisfaction with some of China's policies such as the "road and belt initiative".
Becasue it seems that often much of the opinions people hold, find their origins with some utterance by a public figure, we analyze quotations of public figures regarding China, as an analog to the general view of the west. This data story therefore explores the western perspective on China through a comprehensive analysis of approximately 2 million quotations from Quotebank, regarding China.

CHINA, CHINA, CHINA, CHINA, CHINA CHINA, CHINA, CHINA, CHINA, CHINA CHINA, CHINA, CHINA, CHINA, CHINA CHINA, CHINA, CHINA, CHINA, CHINA CHINA, CHINA, CHINA, CHINA, CHINA CHINA, CHINA...
By Donald Trump

What we are trying to do?

Quotations are often used to represent someone’s point of view. We believe that these Quotations used in the news related to China represent the attitudes of Western politicians and, to some extent, the Western public, towards China. By analyzing those Quotations, we hope to answer the question: what are the attitudes of the Western world towards China?

We use keyword filtering to get the related quotations, then apply KeyBERT, a minimal keyword extraction technique, to the quotations and article title to create keywords and key phrases that are most similar to the string itself. We also aggregate keywords into more general topics. Then, we use Twitter-roBERTa-base for our sentiment analysis on each quotation and get a sentiment score from -1 ~ 1 to represent the positive/negative scale of the attitude of that quotation.

What does our data look like?

The Quotebank data-set we use contains English quotations gathered from online news articles in the period between 2008 and 2020. We filter the data to obtain quotations relating to China which were uttered by a person with a western background as defined by their nationality.

The data filtering was done by checking whether specific keywords relating to china can be found in either the title, quotation, or local left and right context of the quotation. We checked for the following keywords:

['jinping xi', 'xi jing ping', 'xi jinping', 'president xi', 'china', 'chinese', 'beijing', 'peking', 'sino-']

In order to determine whether a certain speaker is from a ‘Western’ background, we check whether they come from one of the countries in the political definition of Western countries. As mentioned before, after filtering we are left with approximately 2 million article-quotation pairs.

Overview of the Filtered Data

Below we see a map of the number of quotations per country in the western world, note that the QuotationScale is the logarithm of the number of quotes. As we can see the vast majority of quotes are from the US, with a smaller amount of quotes coming from all other countries. Therefore, when observing the results please take into account that most of the data the investigation is based on are from the US.