The Dangers of Data Illiteracy: Coronavirus and Data Distortions

John-and-Len.jpg

Click to learn more about co-author John Ladley.

Click to learn more about co-author Len Silverston.

People are unconsciously incompetent at communicating with
data. This applies to our personal and professional lives. This ignorance
creates an ethical issue that is profoundly important to all business and
societal communities.

This inability to understand and communicate with data has caused aggravation and divisiveness. We can argue data issues around COVID-19 have caused death. This is a problem for society to address, not just governments and companies.

A key issue is that users of data manipulate data to support
and foster their own position. People regurgitate charts and other data without
an understanding of the source or context. Many “official sources of data” are
biased or taken out of context. Often the same data seems to result in
differing opinions or is dismissed because it does not support a certain context
or point of view. This behavior has certainly been a factor in the current
pandemic. 

We have reached a tipping point, but none of this is
surprising. Humanity fails at data, astonishing given how much we produce.

The authors of this article are “data professionals.” We
help organizations get “data literate,” improving management and use data. We
feel a broader view is required, and we need to address social Data Literacy,
i.e., society adopting standards of behavior for communicating and using data.
Think of other standards we use in communicating, such as in legal matters —
terms are defined at the beginning of a contract. We are accustomed to
standardization in communication. With data, anything goes. There are no
guardrails or commonly accepted standards for communicating and using data in
our society. We are data illiterate, and if we don’t become more “data literate,”
harmful trends will continue.  

People have always spread information well before it is
verified. Historically, data is filtered or adjusted to support opinions,
motivations, perceptions, stories, and judgments. Rumors around the village
campfire gave way to court gossip, which gave way to yellow journalism. Now we have
false internet posts. None of this is new.

However, we are now in a period of human existence where
data growth is exposing us to greater numbers of opinions and judgments. Our
ability to effectively perceive is compromised since we are overloaded. The
viral aspects of the internet encourage nasty and imprudent actions. Data is no
longer just information you pull in and consider. It is intertwined in all
human activity. Ethically, as a species, we can no longer tolerate the
attitudes that led to yellow journalism and now lead to the manipulation of
elections, environmental actions, health trends, and other endeavors.    

Data should approximate truth or at least come close. But in
today’s world, the vast proliferation of data means we will be a perpetrator or
victim of the misuse of data.

This is a new problem. We have seen a world where data is now
ubiquitous. We did not see this coming. There are many examples that
demonstrate our challenges, and we are living such an example now.

Our current challenge, specifically, is a novel coronavirus that
causes a disease called Corona Virus 2019, shortened to COVID -19. [1] We have
all seen the published data and charts — with one side saying “We should do A”
based on the data, and the other side saying “We should do B” based on the same
data. The following are examples that illustrate this Data Literacy problem.

A news story compared the 2002-2004 SARS pandemic to the
COVID-19 pandemic. Mention was made of the total deaths from SARS but in the
context of
the interpreted “higher percentage of fatalities from SARS vs.
COVID-19.” The story then concluded SARS was far more dangerous, yet the global
economy was not shut down. Based on that interpretation, the news story
editorialized that society was overreacting. Some government officials accepted
information such as this and limited their actions based on these perceptions. You
could argue that people died because of this data interpretation.

From a data standpoint, there were multiple issues with the above.

  • The percentage of fatalities was measured
    differently. The population affected for SARS was much smaller than COVID-19. The
    data sampling was different, and, thus, the comparison is not uniform or fair
    without taking the various differentiating factors into consideration.   
  • The aggressive reaction taken in response to SARS,
    such as Asia regions shutting down travel, was not mentioned in the news story.
  • Worst of all, SARS is over. It mutated. It faded
    away. COVID-19, as of this writing, is expanding. We do not have all of the
    COVID-19 data, yet. So, the time frames cannot be accurately compared.

In layman’s terms, we aren’t talking apples to apples. The
article could be right or wrong. There is no way to draw a reasoned conclusion
if you do not understand the proper use of data. But that did not stop people
from acting upon skewed judgments.

Here are other so-called “facts” about COVID-19. The
following quotes were published and spread widely on social media:

  • “Every election year has a disease. SARS-2004 Avian-2008 Swine-2010 MERS-2012 Ebola-2014 Zika-2016 Ebola-2018 Corona-2020.”
  • “COVID-19 has a 99.7 percent cure rate for people under 50, and its spread is leveling off.”
  • “Coronavirus has a contagion factor of 2, SARS was 4, and the measles was 18.”

One of the authors, Len, saw variations of these points in a healthcare professional’s office. However, all of the above statements were challenged by many sources. We all know, intellectually, that posting something on the internet does not make it a “fact.” Yet there is evidence that continuous sharing and forwarding these so-called “facts” creates an aura of truth. Many people accepted these “facts” and acted accordingly. This could have led to increasing the spread of the disease.

Besides the efficacy of “facts,” consider muddled terminology. The above-mentioned social media posts called these “coronavirus facts,” comparing coronavirus with SARS. Coronavirus is a category of a virus, one of which is the common cold; another is SARS-COV (the 2002-2004 outbreak), and another is COVID-19. When “facts” are stated with inconsistent definitions, the doors to distortion are opened. When disease categories are confused with a specific disease, the data is not distinguished or classified properly, and accurate measurement is difficult.

Understanding context is key to Data Literacy. People often look at isolated pieces of information out of context, and this skews their conclusions. For example, South Korea was declared one of the more dangerous countries because of the large number of people testing positive for COVID-19. The same data was used to compliment South Korea because the number of tests was almost one for every 150 people. Even though South Korea tested many more people than most countries as a percentage of their population, they appeared to have more sick people. Various presenters ignored the context that the truthful picture was not just the numbers of positive tests. Again, opinion was applied to the interpretation instead of viewing appropriate contextual factors.

A lack of understanding of basic statistics is an obstacle to Data Literacy. The difference between correlation and causality is important. Just because there is a correlation of data doesn’t mean that there was causation, or in other words, a cause and effect relationship between the variables. For instance, a study suggested that a certain vaccination offers some level of protection against infection by the novel coronavirus and even reduces mortality. This sounds promising. However, experts communicated that we must be careful not to suggest that the vaccination definitively affects the mortality rate. Thus, correlation is useful to suggest a plausible hypothesis, but clearly, “We need more data from trials to be able to say anything with confidence.”

The issue is now ethics as it relates to data. Spreading
misinformation causes harm. Interestingly, we use data every day for ourselves with
few problems. We compare prices. We review the statistics of our favorite team.
But passing on inaccurate, inappropriate data or misinformation are ethical
issues. When you do something that creates harm, you have done something
unethical.

So, what is to be done?  

First, before automatically sharing data, create some space,
and wait. “Before
fear makes you press ‘share,’ take a deep breath and check. A dose of caution
can stop you from making a bad situation worse.”

Second, do not assume that data is factual.
Know
the difference between a story with data and the data itself. Consider
asking the following questions:

  • How true is this?
  • What are possible biases?
  • What about this is a fact or a story?

When accessing data, understand the data supply chain. What is the source of your data? What are the underlying motivations of the data suppliers? Sites like Media Bias/Fact Check (MBFC) and Snopes provide some insight regarding biases in data source and fact-checking. 

Third, consider the context used in creating the data. Often
an isolated fact is exploited without a referral to the bigger picture. Your
use of the data may be in opposition to that context. Understand the variables in
use. Don’t look at one piece of data in
isolation. What are the important factors about the data you receive?

Fourth, and perhaps most painfully, we all need to grasp
some basic statistics to be a responsible data citizen. Besides the
aforementioned difference between correlation and causation, statistics behind
data can be very misleading. Show your data with statistical margins of error
and probabilities. Be careful of rounding the numbers to prove a point.

Fifth, consider the amount of data that was used to provide
evidence of truth. In general, a larger data set with uniform data collection
processes can provide a more reliable analysis. This includes limiting the
types of data to what is most important for the analysis. Collecting a great
variety of different variables can confuse and complicate the findings, making
it difficult to assess useful data.

Sixth, do a gut check on the data. Check in to see if the
data seems truthful. In addition to intellectual understanding, use intuitive
capabilities to assess how true things are or not. [2]

Challenge yourself when you receive or send data. Is it
possible to pause, question the veracity of the data, consider context,
correctly use applicable statistics, consider the amount of supporting
evidence, and/or intuitively check-in regarding how true the data is?

References

[1] A new strain of virus, not previously
identified in humans — from the World Health Organization regional office for
Europe.

[2] For example, studies using muscle
response testing (MRT) have shown that peoples’ muscles often test stronger
when presented with true statements and weaker when presented with false
information, thus helping to distinguish lies from truths.

Credit: Source link