What the Heck is "Data" | Breaking Down Big Data for Business | Data 101 in 15 minutes
What is data then we’ll get into some data 101, and just so you know i’m developing a course on this. Once that’s available i’ll add the link in the description below, but in the meantime, let’s just start out with some quick and dirty high level. Uh discussion about data okay. So what is data right um? So everything literally everything around us has the potential to become data, but when we, when the way that i define data, is something that is explicitly defined by a person, an individual or a business that is then observed or measured and recorded right. So let’s think about that let’s think about it in the context of what’s, going on now, um you’re, watching this video and uh. We could start collecting data about this about our interaction right here, um so, first off we need to define what we want since there’s. An infinite amount of options to start collecting data, so first we want to know what is going to provide value to the business. So let’s divide define a few of these things location. We could say you know what let’s take a look at at the location you’re at and everybody who’s watching this video. Where are they located? That could be valuable to internet service providers right? They want to know. Where are population densities things like that if they want to expand service or internet new places, things like that right? What are you wearing something like this? Looking at your clothing can be valuable to somebody who wants to design the next snuggie while people are lounging watching youtube videos, your device that you’re using obviously device manufacturers want to know the trends of the the devices that people are using.
So they could decide what they want to make next right, where they want to improve where they want to invest, because, ultimately, if they make something that’s of value to you, it’s a value to the business right, you increase sales things like that um. Are you an adult or not, marketers love this type of demographic information, because they’re able to make something that you’re very used to by now i’m sure targeted ads right, so marketers consume data at an amazing level. Nowadays and all of these data points like something as simple as are you an adult is important to them. Obviously, if you look at youtube, kids, very different ads that come to something like this that’s that’s tailored for for for adults or businesses right and then certain things that might be potential of use to me as a creator or advertisers who might want to use my Videos are like: how long did he watch this video you know. Did you give up two minutes in that wouldn’t be useful uh and then i might want to change things up right. How many videos have you watched um so that i could decide see what’s getting the most views, sort of make contact that’s similar and then when even something like? When are you watching this so that i hope helps demonstrate what it is, so data can literally be anything, but it does not become data until you define it and you have defined it because it brings some value and therefore you’re going to observe it you’re going To record it now let’s talk about specifically business data because data could be used for you know multitude of things, but we want to talk about business data.
So when we talk about this, i want to mention that when we say record it, we want to be recording it in systems right. So these things that i mentioned before can very easily be taken in with a pen and paper right and we could record down uh the observations we made or the measurements we took and then just set it down, but that’s not really of use to us. As a business right, so when we think of business data, all this data gets put into a data repository so i’m sure you’ve heard of databases or data, warehouses or data lakes right. Ultimately, we want to be able to put it in these systems because uh, then that is what ultimately makes it accessible right, so um. I wanted to make the point and something that i’m going to keep on mentioning, because i want to underpin all of our discussions about data and big data is that that should be the ultimate goal as a business right when we’re talking about data, if we’re collecting Data then we’re putting you know paying for these systems or paying for data that we want to collect that. Somebody else already collected um, so secondhand data things like that, but the point is uh. Ultimately, our goal is to use that data, so we want to make it accessible and we want to make it actionable and the way that we do, that is with you know, usually with data systems so that’s a little bit of a tangent.
So i digress so let’s continue talking about data 101 right, so data comes in different flavors. We have data categories and then we have data types right, so data categories, basically there’s, two categories: there’s quantitative data and there’s qualitative data. Quantitative data uh is think of things like numbers, things that you could do math to right: quantitative, i’m measuring it and therefore, i can add subtract things like that and then there’s qualitative data, qualitative is observations and these things could be categorized and they could be counted Right so, like i’m wearing a black shirt, how many other people out there are wearing black shirts, so that’s? What i want you to think about when, when you think about these two categories, quantitative versus qualitative, now there’s different names for them. I usually like to call them attributes versus measures. Measures is pretty simple. Just think of that as quantitative, because you measure it remember in attributes would be your qualitative data, because that is observed. The attribute of this shirt i’m wearing is that it is black. That is an attribute of the shirt, so let’s bring our table back up there. We go and now i’ve added in another column, to say what is the category of of these data things that we’ve defined right and, as you can see, so the location is an attribute. What we’re wearing is attribute devices attribute. Are you an adult that is an attribute um, and then you get into things that are measured? How long did you watch that we could record? You know measure the time that you watch the video, how many videos the count of how many videos you have watched.
That’S a measure, and then when did you watch it? So this is a little tricky right here, but when you watched it would be considered a measure that’s like you know, you might be thinking well, that’s a calendar date. You know. Is that a well think about it like this right time, is on a continuous spectrum right and then we could measure where, on that spectrum we are, you know, did this occur 700 bc or 2021 a d, so when we think of it like that, as a Continuous line right number line well, that’s how you could get a better understanding to know that okay time is actually a measure, we can subtract dates from each other and things like that right now. I want to talk about something that i said before that that quantitative things are, usually you know. Well, i should say, are: are numbers right, think of dates as numbers? If, if you ever look at how dates are represented and and something like excel it’s, actually actually a number right, it depends on where you set the start and end of your date but you’re able to measure it like. I said along this this time scale and it’s represented it could be represented as a number, so quantitative, quantitative data. I want you to consider as numbers, but that doesn’t mean that every number is quantitative. Okay, so let’s look at our location here. What, if that location? We decided to define that as the zip code right.
So what? If we said, what is the five digit zip code of the where you’re watching this within the united states um? That looks like a number right and – and we will call that as a number – you know time in a time out every time we see it, but it is not a quantitative piece of data right because we don’t do math to zip codes. You wouldn’t take my zip code, minus your zip code to come up with any sort of answer, so actually that would be qualitative as we’ve annotated here. That is an attribute attribute of where you are living so uh again. Let me make that the distinction again quantitative is a number, but not all numbers are quantitative. Think about whenever you’re deciding what category should be think about. Am i going to be doing math to this? Can i subtract these two values? If not, it is, you know an attribute it’s, a category so think about another common one is like employee, ids, that’s, usually a number or any sort of id right, that’s, usually a number, but that is actually qualitative it’s an attribute of a person. Okay. So with that being said, that’s a good segue into talking about data types. Okay, these can be broken down into a lot more detail but, like i said, uh uh we’re just doing quick and dirty for all intents and purposes. When we talk about the basic data types, there are four of them: okay, so there’s text, there’s, number there’s date and time like we mentioned and there’s binary right.
Okay, so let’s look at our table and i added in the types of data so as we could see, as we mentioned before already with location that that you know, we’d want to define that as a text data, even if it’s zip code right it. We want to consider it like text, because it’s an attribute right, usually attributes, are going to be text, usually not always because let’s go down to the adult question. Are you an adult that would be in a good example, binary binary, meaning there’s only two options, one or zero right on or off true or false things like that in this case, if you’re asking the question, are you an adult watching this it’s either a yes Or no right, so that would be a binary data type. Okay. The next two would be numbers. How long have you watched? How many have you watched right and then, when you watch this would be our date time, data type? Okay. Now, why is this important? The type of data that you define so like i said we are defining this. We, you are defining what data you want to collect. Why you want to collect it, then you, you know when we’re talking about business you’re deciding where is that going and then, as these system repositories are created, and these analytics and the reporting and things like that, the data types matter for for starters data, even though It’S virtual, just like everything else on your computer takes space and it requires memory to be processed right so, depending on the data type you have and how many characters are in that data, that piece of data value that’s going to be more memory intensive or less Memory intensive right so so, when we think about in general, let’s, look at these four general or these four basic data types right binary would be lit.
The least memory intensive and text would be the most memory intensive. Why is that let’s look at if you had? What are you wearing versus? Are you an adult? Are you an adult always only going to have one character and it’s going to be a yes or no, so for a computer to run through that let’s say you had a billion data points, a billion observations the computer would have to for each each one. So each billion the computer only has to evaluate that one character. Is it a one or a zero? Is it a yes or no right, whereas if you have something like what are you wearing, that could be multiple characters right, let’s say: it’s a snuggie, i don’t know why i’m talking about snuggies, but whatever you’re wearing is snuggie. Now we have. What is that? Six? Six characters right multiply that by a billion entries or a billion observations and then think of it from the computer’s aspect. A computer side of thing, let’s say you. You want your computer to compare right. Your differences between these answers. Music, computers have to compare these things. One character at a time: okay, so if you wanted to compare how many people are adults versus how many aren’t the computers just has to get say, is this a one or a zero? Next one is, i don’t want a zero all the way down to a billion. Is this one or a zero, whereas in the snuggie it’s got to say, is this first character an s? Is the second co character? U is the third character g, so on and so forth, and then not only does it say, is it one or zero? Is it two options you get the entire alphabet, so 26 you get the entire number uh pad, which is 10 digits right.
You get all the special characters that can be used in text, etc and so forth. So the things that it evaluates, especially when comparing these values to each other become exponentially, larger and so that’s. Why binary is always going to be your fastest and text is always going to be the most intensive, there’s, more digits, more things for it to evaluate for each character now. Obviously, when we’re looking at text, if we’re able to limit that character size well, then now it becomes more optimized and faster for the computer process right, but that’s going into a little more detail. I just want you to understand those basics, um so that’s, first that’s, the first thing that we should consider with data types. The second thing and the more important thing is: how do they interact with each other? Okay, for all intents and purposes, you cannot be mixing data types together right. So when we think about again quantitative versus qualitative let’s say you want to start comparing things across the board with these different data things that we’ve identified, we you know, if you’re trying to evaluate some locations and then also when it’s watched. Now you start intermixing. These data things um and you can’t be creating formulas with the different data types. We could take steps to convert the data into numbers right that then we could use in formulas, but at the outset of it we cannot be intermixing the data types generally.
If you want to be mixing things together, they should be the same data type so that’s. Why it’s important number one uh whenever we’re, creating a data system or a reporting structure, or anything like that to define them ahead of time? If we have that ability to say all right, let’s optimize the system here’s the data types that i need, here’s, you know what or here’s the data that i need and here’s the data types they should be and then, on the back end of things after the System’S generated saying, okay we’re going to create reporting, i need to know the types of data i’m using. I need to know the data type so that either i know if i could blend things together or if i need to take those steps to make it such that i could do analytics and aggregations and things like that. Okay, i want to close by going over some key things here, number one. When we think about data. What is data again? It can literally be anything, but it does not become data until you have defined it and you defined it because it is value to you or your business. Ultimately that’s the answer to this video of what is data. Now again, data gets started systems for business use because we want it to be accessible and actionable right and ultimately, these sys that takes up space on these systems and these systems cost money to either implement or maintain or even just use.
If we’re thinking about you know a platform as a service and things like that, all of that costs money think about your data plan. So like with att, i have a a plan that comes with 30 gigabytes of hotspot data per month. The point is uh. The data has even those virtual can think about it that that it has weight; it has mass. It has volume right because you got to pay for that. You got to pay for that processing power to to to do this, and also, if you’re, a business and you’ve got these large systems you’re paying to maintain that right and therefore, when we think about data from the business side, if it’s not accessible, if it’s not Actionable it’s, worse than being useless because you are paying for it. So there you have it. We discussed what data is and even went over some basics of data 101. If you learned something you liked, it please be sure to like and subscribe leave a comment.