Nov 28

14 min read

Information vs. Data - Are you conscious?

Introduction

In this article we will discuss the difference between data and information. There are many ways to approach this topic. Here, we will explain the main concepts behind them and connect them to our daily lives. We will also cover Information Theory, but don’t worry if there is some math involved. This article is not about math. The goal of this article is to make you conscious of whats surrounds you.

Please read this article to the end to understand how everything connects. You will not regret it, I can promise that.

The calculations are done with Python, but you can easily follow them using a calculator and the provided equations (Only simple math).

Informal definition

Let’s start with a simple explanation of what data and information are and how they differ. Depending on the author you will find different definitions, more or less detailed. Here I will try to keep it as simple as possible:

Data: Data are individual facts, they are not organized, and they are not self-explanatory. For instance, Revenue: 3.800 USD, Month: December, Customers: 14. Each one of these data points doesn’t tell us much, until we try to connect them. That brings us to the next concept.
Information: An organized and connected set of facts that can be interpreted to provide insights. For instance: In December we made 3.800 USD of revenue which is expected since in that month we used to have the lowest amount of customers.

How to organize, connect, and interpret data to get information is a big topic and depending on the situation different techniques can be applied. For now, keep in mind the map below.

You have to “mine” your data to get the gold out of it.

Now that we have an intuition about what data and information are, let’s dig deeper into the differences. What does data not have that information has? Or in other words, what does data need to become information?

The three components of information

Right now, if I handed you a spreadsheet, you probably would ask me what you are looking at. You would ask me which area that data belongs to, or you would try to connect that data on your own. In all cases, you are trying to figure out the Context. If you want to turn data into information, you must understand what the data is referring to. What is happening in that context? Coming back to the example above, if you want to understand if a revenue of 3.800 USD in December is high or not you need to know how much it was in previous months, the previous year, etc.

Now, not all the data is the same. If I sent you information regarding a technology company it could be the marketing results of a campaign, the financial analysis for the previous year, or any other kind of data. Let’s suppose that you are a financial analyst. You will feel comfortable reading the financial report, you are used to seeing these metrics, you know how they are calculated and how to interpret them. What if I accidentally give you the marketing report? You will feel lost at first. That’s because you may not have the Domain Knowledge to read that kind of report. Different areas require different domain knowledge. Even if all areas rely on data at some point to get information, without the right domain knowledge you will have trouble getting there.

Let’s keep the previous example. You’re a financial analyst. Suppose you need to make a decision regarding the next step for the company. One first option could be to look at data to try to get the information you need to take that decision. But, surprise! The company you are working for has a one-month delay in delivering the data. Depending on the kind of decision you make, one month of data could be crucial. That’s why Timing is so critical, data is only useful if it allows us to solve a problem, understand what is happening, if it arrives to late, it’s worthless. As a site comment, if you never worked with data, let me tell you that there are many reasons why data could delay.

To get information you have to take into account CONTEXT, DOMAIN KNOWLEDGE, AND TIMING

By now I hope you have a pretty clear idea of what data and information are, and what you need to do in order to get information from data. Now, come with me to the dark side, where you will find math and the information theory.

The Information theory

Introduction

What is the Information theory? This theory describes how information is passed over different channels, and how we can measure it no matter which channel is used. A picture contains information, just as a written message or a voicemail does, and we can measure the amount of information contained within them and compare them. The question is, how will we compare that information? With a unique unit to measure it, say hello to the Bit. Before explaining the interpretation of a bit, let’s start with an example.

Let’s say you want to know if a child likes chocolate. What would you do? You can just ask him. That is an example of a binary question. The possible answers are True (Yes) or False (No). That message has a value of 1 bit. But, why 1 bit? Because the question split the possible answers into two equally sized groups.

Let’s continue with another example with more possible answers to make that concept clear. Let’s say that we have 32 blocks. We have to figure out which one is the diamond block.

In the previous example we had only 2 possible answers, here we have 32. We need to define a binary question that splits the answers into two equal groups to start finding the right block. One example would be: Is the diamond block on the right side? The answer is yes, so we can discard all the other possible answers. As you can see, based on that question we discarded half of all the possible answers. The other half are the answers that are still possible. This kind of question, we will call them “Standard binary questions” in this article.

By using standard binary questions we can make our second question. Is the diamond block on the upper side? Yes, let’s discard the rest. Notice, again we discarded half of all the possible answers.

As you can see, we are getting closer to the diamond block. Let’s speed this up, after doing all the needed standard binary questions that’s how the graph will look like.

Now, how many standard binary question did we ask to get to that diamond block? Given that each red line is one question we did 5, you can check it on your end too. So, the value of the information which tells us where the diamond block is, is 5 bits.

Finally, what is a Bit? A Bit is one standard binary question, and information is measured in Bits. In other words, we measure information as the number of standard binary questions we would need to ask in order to get the same result. In this case, we would need to do 5 standard binary questions. Given that each standard binary question adds 1 Bit, by each question I’m 1 bit closer to reach the right answer.

The example above with 32 possible answers took a bit of time, so suppose we want to find out what is the PIN code of a credit card. Since the code is 4 digits long, we have 10000 possible answers. We can’t follow the approach above if the amount of answers is too large. Luckily, there is a mathematical way to do that. To calculate the amount of bits we have in certain information we can use the equation below, being x the amount of possible answers we have.

How to calculate it?

Let’s apply it to the example we already analyzed with the blocks.

The result is 5, it matches with our previous result.

Now, let’s apply that equation to find out how many binary questions (bits) we would need to do to get the PIN number of a credit card.

The result is 13.28 bits. We could apply the same equation to any case.

How to calculate the amount of bits of an image?

If you remember, I told you that with this theory we can calculate the amount of bits for any message, no matter the channel. Let’s give an example with photos. But first, to understand the difference between data and information from a mathematical point of view you don’t need to understand this example. I’m adding it here to show that indeed, we can measure the information in bits for any message no matter the channel.

This is a photo of a cat. From a technical point of view there are many ways to visualize a photo, one of the most popular is RGB. It means that each photo consists of three layers (each layer representing the colors Red, Green and Blue), so every pixel is the result of the combination of these three layers at different intensities. The intensity of each layer, for each pixel, is represented by a number between 0 and 255 (256 possible values). For example, one possible combination for a single pixel could be:

Pixel 1 = 26 * Red + 255 * Green + 0 * Blue

The result of that equation is the color we would see at the image at that pixel.

Now that we know how an image is built let’s calculate the amount of bits. Suppose a picture has a size of 120 Pixels x 120 Pixels, plus, we have three layers. So, the amount of data points we would have in that image is:

The result is 43200 data points.

Now, how many standard binary questions do I need to define the amount of information in only 1 data point given that we have 256 different possible values?

If you use the information equation above you will see that we get 8 bits.

But, we have 43200 data points, not only 1, so we have to multiply these 8 bits by the amount of data points. The complete equation would be:

By doing the math we get 345600. This means that image had 345600 bits of information, or in other words, you would need 345600 standard binary questions to define that image.

Again, if you are not familiar with the technical representation of an image using the RGB system this example could be challenging. You don’t need to understand this one to understand the rest of the article. But, also take a moment to think about it, isn’t it amazing that there is a way to compare the amount of information in a photo with the amount of informatin in a sentence, in a song, or any other channel?

Can we do better?

Now the funny part begins. Do you remember the example of the blocks? Well, what happens if I just start asking if the diamond block is in the second row? Perhaps I have already gathered some contextual information which suggests that it’s more likely the block is in the second row, so that’s why I ask that question.

Using the previous equation, I can’t do anything. However, based on the image above, I can see that by asking this question I would do better than using standard binary questions. Why? Because I’m discarding more wrong options than the half I would discard with a standard binary question.

However, if we change our equation a little bit, we can come up with another equation that will allow us to calculate how much information we gained from that NON-standard binary question we asked.

Now you see that our equation depends on two variables. “b” are the possible answers before you ask the question, and a are the remaining answers after asking it. Before selecting the second row we had 32 options, now we have 8, so:

If you do the math you will see that the result is 2, and remember that the total amount of bits we calculated for this case was 5. So, that question is twice as good as a standard binary question. In other words, by doing that question we gained twice as much information than doing a standard binary question.

Now, let’s do it the other way around. What if I have some wrong context indicating that the diamond block is in the third row? Before the question I would still have 32 options. After it, I would have 24, given that neither of the blocks in the third row are diamond blocks.

The result is 0.41, which means that the wrong context led us to ask a faulty question which is worth less than 1 standard binary question.

As we can see here, a right context can help us get closer to the answer faster, but a wrong context can also slow down our journey a lot.

How much information does a question add?

Until now we worked only with standard binary question. With the examples above we showed that given the context, we could come up with other questions that perform better or worse than standard binary questions. But, can we generalize how much information we can gain from a question? Yes, and for this we will do a little bit of math. Please pay attention to this.

If you look closer at the second equation you will see that the argument for the logarithm is an inverted probability.

That P to the power of negative 1 means inverted probability, don’t pay too much attention to it. What is important to notice here is that if we turn it around we get the probability of the remaining answers given all the answers. That means, that by doing a little bit of math, we can rewrite the equation as:

And as we said that a/b is a probability we can simplify that equation into

By doing all this work we can plot a graph which returns the value in bits for any possible question based on that probability.

From this graph we can see how much information a single question can add, based on the probability. If I ask a question that leaves only a few possible answers behind the probability will be low and the gain will be huge. That’s because we eliminated more than half of the wrong answers, which is what a standard binary question would do.

Let’s read that graph together.

1st Example:

What does a probability of 100% mean in this context? Let’s use the blocks case again as an example. We have 32 possible answers. What happens if I ask:
Is the diamond one of these 32 blocks?
The answer would be yes. However, given that we are not discarting any option by asking that question, the amount of information that question is gaining is 0, as you can see on the graph.

2nd example:

By the time we reach 50% of probability, we are saying that half of the answers were left after our question. The bit gain associated with 50% is 1 Bit. This makes a lot of sense, think about it! In the initial examples we said that 1 Bit is one standard binary question which splits the answers into two equally sized groups. We are saying exactly the same thing here.

3rd example:

Finally, if the probability is less than 50%, in other words, if after our question less than half of all the possible answers remain, we are doing better than the standard binary questions. As a result, our question gains more than 1 bit. Less answers we leave, more bits we gain.

Data vs. Information

Now, I ask you, what is data and what is information? Do you remember the definitions with which we started?

Reread them and come back. Now, let’s relate them to the mathematical definition we made.

The first thing I have to do is correct one of the equations. I know that I started saying that this was the equation to get the information of a message. Indeed, with that equation you can get the amount of data in a message.

Using standard binary questions we can get the raw value of data from a message. But, with the right questions, given the Context, the Domain Knowledge, and the opportune Timing we can make that data interpretable in a much easier way. Using the right questions, we can gain much more information from it. But, as much as we can improve, the wrong context will make it much harder for us.

Do you remember how many bits of information we said that cat photo had? 345600. Do we really need 345600 questions to figure out that this is a cat photo, or can you do it with less?

Conclusion

In a way, data is a raw resource, like so many others we have on earth, but at the same time it is so different. Every area needs it in one way or the other. These days we are constantly overwhelmed by so much data at the same time, and our workspaces are no exception. The question is, how can we make information out of that data? Do we need to make out information out of all the data we receive? This would be impossible, and we didn’t even discuss how deeply we should process each data source to gain more information.

We might not be able to find an answer to all these questions in this article, but that doesn’t mean we shouldn’t think about them. Take them into consideration and be conscious of what surrounds us. Being conscious of that is the goal of this article.

There may be a lag of context, domain knowledge, timing, or all together to be able to accurately extract information from your data. Our problem is that we are not aware of that most of the time, the first step to improving is being aware, identifying the problem, and finding a solution. In that way you will be able to get the information you need out of your data.

Each of us, as members of a society, communicates with others. It can be spoken or written, but also when you present a dashboard or a report to another person you are communicating. In doing so, do you provide context in addition to the report? Also, do you make sure the person has domain knowledge to understand it, and delivers everything in a timely manner? Only in that way will that person be able to make the most of what you’re presenting.

I hope that by now, reaching the end of this article, you have become a little bit more conscious of what surrounds you…

Information vs. Data - Are you conscious?

Introduction

Informal definition

The three components of information

The Information theory

Introduction

How to calculate it?

How to calculate the amount of bits of an image?

Can we do better?

How much information does a question add?

Data vs. Information

Conclusion

More from Alejando Attento

Recommended from Medium

13 Google Sheets Formulas Every SEO Should Know

RFM Analysis: Customer Segmentation for Target Marketing Strategies

Exploratory Data Analysis of Covid-19 Dataset

An Important Thing I forgot when Starting my Data Science Journey

How to Achieve Over 90% Accuracy With Dual Shot Face Detector (DSFD) in WSL2

Cross-Validation for Imbalanced Datasets

How Predictive Analytics Can Improve Your Mobile App

Polynomial Regression

Get the Medium app

Alejando Attento