Data is the singular stroke in a painting that when put together, creates a picture that tells a story that we previously didn’t see. A single data point by itself is relatively useless without context but collectively, it can help us see patterns we would have never seen otherwise.
So how can you better understand the data analytic process to more clearly see the “business” picture that is sitting right in front of you? As we get started here are the areas that we will be covering in this white paper.
• Cleaning up the Data
• Segmenting your Data
• Understanding Insights
• Visualizing Data
• Visualization Biases
Cleaning up the Data
Download the data and get orientated
Looking at hundreds, thousands or hundreds of thousands of named rows and columns can be daunting at first. Sometimes people think that the data science ‘wizards’ can just look at a spreadsheet and see a story emerge. Nothing could be further from the truth. Getting oriented in your data set involves breaking down the basics systematically so that you can ask good questions.
You can use a program like Excel or Google Sheets to do basic exploration to answer questions like:
• How many observations (rows of data) do you have?
• Is it clear what each row is counting?
• What is the time period of the data?
• What is the geographic extent of the data?
• Does there appear to be a lot of missing data?
This process of getting oriented with a new data set is no small task.
Before you conduct your analysis or try to figure out the story that the data is trying to tell you, make sure that you have specific questions you want to ask. An example might be “Why is our bounce rate and click thru rate high but our total sales low?” “How do they define conversion/engagement?” “Did women purchase at a higher rate than men?”
It’s important to write down all of these questions, because as you go through the next couple steps, you can try to answer them.
Explore all available metadata
Metadata is data about data and can be your golden ticket to establishing context for a data set. In an ideal world, any data set that you download would have a detailed and up-to-date data dictionary. This is a document that provides a column-by-column description of the data set, along with guidelines for using it.
Seeking metadata is not always easy. Make sure to have some form of data dictionary available. You will need this because you will not be alone in shifting through data. Others who are not familiar with your project or someone who may look at the data at some future point will not have the luxury to be able to know what you were thinking as you compile together you data set.
Once you have your foundation set, start looking at your analytics. The key is to make sure to start segmenting your analytics right from the get go. As you start to put like-minded customers/visitors together, it will help you with the next section that we are going to go into which is insights.
The purpose of segmentation is to better understand your customers and individuals who interact with your company. Segmentation allows you to reduce the number of variables when it comes to your data so that you have a better context to understand the analytics. It will help you to improve your digital focus, increase your ability to expand into new verticals, increase your customer retention and helping you with pricing strategies.
Typically when you segment your audience, you will segment them by the following:
• Demographical segmentation
• Geographical segmentation
• Psychographic segmentation
• Behavioral segmentation
A few examples of possible segments that you can do for your business are:
• Visits originating only from direct traffic and utilizing Chrome as their browser (Behavioral)
• Visitors who remained on your site for longer than “x” minutes. (Behavioral)
• Customers who have the highest CLV (Demographical)
• % of college students who are customers (Demographical)
• Lifestyle choices (Psychographic)
• Individuals who have a tendency to purchase on line vs in store (Psychographic)
• Customers’ location by city (Geographical)
• Do your customers live close to other competitors? (Geographical)
Don’t worry so much about collecting every possible data set that you can (unless you have an unlimited budget or process to go through it all).
Focus on the segments that you think matter most and then do the following:
Estimate the size of each segment: Use surveys, Google Analytics, Facebook Analytics or similar tools to help you estimate the size of your segment. Also you will need to use confidence intervals to help you understand how much fluctuation you may expect to see in a data set
Estimate the value of each segment: Compute the average revenue and profitability by each segment. Also use variables like the Net Promoter Score to differentiate between “good and bad” profits. Customers that generate a high proportion of revenue but who have a bad experience are more likely to say negative things and lead to negative profits down the road (by word of mouth) and in the end, may not be worth having as a customer.
Cross-Tabbing: After estimating the size and value of single segment, you’ll want to “cross” more than one segment because very few segments are completely isolated. For example, cross gender by age or company size by industry to look at the number of men in urban areas or large manufacturing businesses. Keep experimenting with different dimensions to find patterns or valuable segments.
Cluster-Analysis: A more advanced technique to identify segments is based on clustering algorithms such as cluster analysis, factor analysis and multiple regression analysis. These techniques identify statistical patterns that are hard to detect intuitively
The most difficult part of dealing with data and analytics is simply just trying to understand what it is that you are observing. How are your customers actually behaving? What do they really want to know more about? How do they actually interact with your business? Analytics could be telling you a million different stories but insights is the process of understanding the true story of what is going on with your business and your customers.
Another way of framing this, every business can be viewed as a complicated mesh of different systems. While we love to think that we all understand exactly how everything works, no one actually knows how everything works 100% of the time. Not the founder, not the CEO…no one understands it completely.
Because of this, there is a gap in an employee’s understanding of the business and how it actually works. With this framework in mind, insights helps individuals to bridge this gap between their understanding of how the system works and how it actually works.
Insights is the “ah-ha” moment when data and analytics come together into a cohesive story that allows you to better see the reality of what’s going on with your business.
Remember, reporting does not equal insights. Reporting is the process of organizing data into summaries. Insights are the results of exploring data and reports in order to extract meaningful information to improve business performance. Reporting translates raw data into information. Analysis transforms data and information into insights.
Investigating the “that’s funny…”
Isaac Asimov captured the spirit of discovering insights perfectly when he said: “The most exciting phrase to hear in science, the one that heralds new discoveries, is not ‘Eureka!’ but ‘That’s funny…’”
As I mentioned earlier, the biggest issue businesses face when it comes to data and analytics is the gap between how they think the business runs and how it actually runs. These ‘that’s funny’ moments allows us to see areas where we are blinded by our own assumptions or previous experiences that we have had with the business. They allow us to step back and say to ourselves “we really need to look at this process because clearly something is going on here.”
How to turn Data into Insights
So now that we have all of the definitions out of the way, how do I actually pull insights from the data that we are collecting? Here are some useful tips to help you accomplish this goal.
• Ask yourself “what questions do we need to answer in order to succeed?”
• Create a specific hypothesis prior to running an analysis.
• Start with small data, filter and segment those data to build larger segments.
• Work on a single problem at a time.
• Break complex systems into smaller pieces.
• Ask specific questions. Generic questions will produce generic answers.
• Measure loss/gain caused by your findings.
Gif Credit: FlowingData.com
We as humans beings hate uncertainty. Many people hate it because uncertainty makes them feel uneasy and anxious, and the assumption is that data is supposed to provide the opposite of that. The problem is that data does very little to help you see insights.
It’s hard to focus in areas and segments on an excel sheet. When you have lines and lines of data, it’s hard to see trends and patterns within that data set. However, when you visualize the data, it becomes much easier to pull the insights from the data sets. This is one of my favorite things about visualization. Once you unearth and see those hidden messages in the data, it’s hard to unsee them. Sometimes visualization forces us to see things we don’t want to see, but that’s okay. In our case, seeing is definitely believing.
The following are different charts/graphics that you can use to help you visualize your data:
• Line Chart
• Bar Graph
• Column Chart
• Pie Chart
• Area Chart
• Pivot Table
• Scatter Chart
• Scatter Map
It’s also not a bad idea to take the same data set and visualize them via several of the previous charts to help you see the “story” that the data is trying to tell you. Pulling insights from data is an art and simply having a different way of looking at the same data will help you pull insights from your data sets.
Bar charts use length as their visual cue, so when someone leaves out the starting point on the x axis and thus shortens it, the differences look more dramatic. A perfect example is the one below. Both charts utilize the same data, but visually, the truncated graph makes it appear as if there is a steep rise in the chart.
By using dual axes, the magnitude can shrink or expand for each metric. This is typically done to imply correlation and causation. “Because of this, this other thing happened” but more often than not it is just confusing at best and deceptive at worst.
Everything is relative. You can’t say a digital campaign is more effective because you have more gym signups as a company because this data is not factoring other variables such as is the company growing and adding new gyms? Most of the time when I present data, I try to focus the charts/graphs on percentages and ratios as these graphics tend to help show what is actually going on better than absolute totals.
One of the biggest areas that you have to be careful of not falling into when it comes to understanding the data that is in front of you, is not having the proper scope when it comes to your data. Is your business seasonal? Did you run large promotions or sales? Did you get a big feature in a large publication? All of these factors can greatly exaggerate or change the data that you are looking. Make sure to account for these anomalies when creating your data sets.
This is one of the most common things I run into when people send me reports or analysis. Instead of showing the full range of variation in a data set, people try to oversimplify the complexity of it by putting all of the data into unrelated but simpler data buckets. The first graph is misleading because it makes it seem that the metric that they are measuring is closer to 1 than it is towards 10. When you try to oversimplify complex patterns or data sets, you by necessity have to make large assumptions and leave out valuable in that process.
If area is the visual encoding, then one has to size by area. When someone linearly sizes an area-based encoding, like a square or a circle, they might be sniffing for dramatics.
You should be constantly scrutinizing the data and charts as you see them. Does it make sense? Is it overly dramatic or shocking? If it is, why? What error might cause such a drastic result? As the old adage goes, data doesn’t make something true. The value you add is by making sense of what is true based on the data in front of you.
If you are not used to thinking about data other than looking at a couple of random KPIs, some of the information above may be a bit much. The biggest thing you can do to help your company start down this data driven focus, is to simply start. Just focus on collecting the right data or just creating one customer segment that you can use later. You’re not going to be able to transform your company into an IBM or Amazon overnight. Be patient and take it one step at a time.