Sign in or start a free trial to avail of this feature.
2. Structured and Unstructured Data
This lesson introduces the concepts of structured and unstructured data. We’ll also discuss the growth of unstructured data in the modern world.
What is Structured and Unstructured Data?
Structured data is data that has a clearly defined format. Usually, this is a tabular format, like the tables found in a database. For example, a table recording sales figures would be structured data.
Unstructured data is data which cannot be readily categorized. For example, text and images would be unstructured data. Finding insights from unstructured data is more difficult than finding insights from structured data.
The Rise of Unstructured Data
Traditionally, most business data was stored in structured formats, however, this is changing. More and more companies now collect and store unstructured data. For example, social media companies store images people upload, and the text of people’s posts. There is a need to generate insights from unstructured data.
Dealing With Structured and Unstructured Data
When working with applications like Alteryx, Tableau or Power BI, all your data must be stored in structured tables. In Excel, you can layout your data in a structured manner, but it’s not a requirement, and you can create an unstructured layout instead.
When you collect data, you’ll often collect structured and unstructured data. For example, in a survey, questions asking people to rate something from 1 to 10 produce structured data. Questions asking people to respond with free text produce unstructured data.
In the previous lesson, we went through the outline for this course and briefly discussed the concept of structured data.
In this lesson, we'll explain the concepts of structured and unstructured data in greater detail and learn how Excel differs from other applications in its approach to structured data.
Companies often have information of interests spread across multiple tables.
In this case, the tables will combine to form a data model or database.
The data model will define what information is stored in each table as well as how all of the tables relate to each other. We'll learn more about these concepts in our later lessons on databases.
Structured data is data that has a clearly defined format allowing it to be organized and analyzed with relative ease.
For example, a table of data recording sales figures for the last 12 months would be structured data.
Unstructured data can't be easily categorized. The most common example of unstructured data is text data such as email contents or Word documents.
Compiling and searching through text data like this is much more difficult than the structured tables we saw previously.
Historically, businesses aim to store their data in structured formats. However, unstructured data is becoming increasingly common.
With modern technology, it's easier than ever for companies to collect data and cheaper than ever to store it.
As a result, the amount of unstructured data collected by companies is constantly growing.
In fact, it's often companies that exploit modern technology that collect the most unstructured data. Consider a social network like Facebook.
Some of the data they collect is structured such as a list of someone's friends or the pages they like. However, much of their data is unstructured such as posts people write and pictures people upload.
For Facebook, generating analytical insights from this unstructured data is key to delivering value to users and advertisers.
When you use advanced data analysis applications like Tableau, Power BI or Alteryx, data must be stored in a structured tabular format.
If you're familiar with Excel, you might notice that it works slightly differently.
While you can format Excel data in a neat tabular format, there's nothing to stop you from laying out your data in a less structured manner. So in that sense, Excel allows structured data but does not demand it.
Most data you'll encounter will be a mix of structured and unstructured data.
Let's say you're surveying customers to find out what they think of a specific product.
Most of the questions will return structured data. For example, you might ask them to rate various features from one to 10.
However, the final question might be a comment box allowing users to write whatever text they want. These responses will be unstructured data.
In this case, the last question might be the most valuable of all as customers may point out issues that you hadn't previously thought of. However, to analyze their responses, you'll need to find some structure in them.
This will make it easier to analyze and interpret answers to help better understand your business.
We'll consider how to go about this in a later lesson.
For now, we'll end the lesson here.
Now that we know what structured and unstructured data are, we'll consider how to obtain data for use in business projects in the next lesson.