Think for a moment about all of the things you used this morning. Your tooth brush, your coffee cup, the handle on your front door; they were all designed by someone, down to the number of ridges on the cap on your toothpaste tube. Even though we rarely give these seemingly small design choices a second thought, they can impact us in very noticeable ways. What separates the frustrating from the intuitive seems to be the level of consideration we give the context around these choices, and this is as true for analyzing data as it is for designing a door handle. Since every dataset is the product of decisions, and as analysts, we need to account for those decisions when mining for insights.
To help you think through this context, take a moment to ask three questions:
- Who or what is this data about?
What story is this data telling? It can be helpful just to brainstorm nouns, verbs, and adjectives that relate to the data in order to paint a clearer picture of the subject. Consider what might be missing from the data as well – is everything being captured, or do some activities not generate data? For instance, if no violations are found on an inspection, would that visit still be logged in this dataset? Relating your data to a broader perspective and other datasets can help give you a sense of what is reliable and what may be incomplete.
- Who collected this data, and how?
It’s not uncommon for data to be collected automatically in the background or directly from individuals, but in many cases, there’s a human element to how the data ended up in your files. Think through what the process of collecting this data, and especially what decisions were made while logging information. Sometimes, an unexpected distribution of data can be explained by some quick-and-dirty solution implemented in the field, or as a workaround to a tedious categorization process. It’s worth exploring what sort of training data collectors receive, how they translate raw information into structured data, and their experiences collecting data.
- Who decided what data to collect, and how?
Before any data is collected, there must be some decision made to collect it, what fields should be collected, and how it should be collected. Often, this is in response to a specific need for reporting and tracking, but this is not necessarily the way data would be collected if rigorous analysis is required. Asking about the purpose behind the dataset can give you deeper insight into trends over time, gauge what might not be reflected in your data, and even point to additional data to incorporate.
Finding the answers to these questions can be uncomfortable for analysts on a deadline – it’s time consuming, and because the answers are rarely written down, it usually means talking to lots of people – but it’s a crucial step in working with any dataset. Analysis made without context, like a door handle or toothpaste cap, might look pretty, but could lead to some painful and messy results; so next time you start in on exploring a dataset, take a moment to reach out to the owners of that data and set up a time to talk. Who knows what you might find out!
What are some things that you do to add context to your datasets? Post in the comments to let us know!