tl;dr Large amounts of data isn’t always useful, so make sure the data you’re collecting and analyzing has a measurable business impact along the way and grow to “big data” initiatives over time as you prove the benefit of your data initiatives along the way vs starting with a company-wide “collect first, analyze later” effort or jumping into a “Big Data” initiative and end up collecting too much and producing too little value to the business. Don’t focus on what’s interesting to analyze, focus on what’s going to drive profitability and positively impact stakeholders across the company. Less data sometimes can be better data, when it drives a more meaningful business impact with less cost and complexity along the way.

———–

I often hear the quote “Data is the New Oil” given the growth of data initiatives across companies to mine, and refine data to be leveraged across companies around the world. However, like oil, data has to be refined and put to use in the right way in order to really be effective, just as oil left collecting in a tank isn’t useful and will go bad over time. If it’s hard to put all your data to use though, then “Big Data” initiatives can make it even harder. If you look up the definition of “Big Data” on Wikipedia, the first sentence reads “Big Data refers to data sets that are too large or complex to be dealt with by traditional data-processing application software”.

As companies evolve to be more data-driven, the volumes of data that get collected across various types of data grow as well. From the early days of collecting transaction data in structured tables to now grabbing everything from log files to “unstructured data” in the form of images, videos, and the like, the state of the art around data analysis is pushing toward using advanced techniques like machine learning to analyze larger and larger data sets that are less structured and more unwieldy as the need to grab more data from a variety of different sources grows.

An article on statista.com regarding “Worldwide Data Created” states that “The total amount of data created, captured, copied, and consumed globally is forecast to increase rapidly, reaching 64.2 zettabytes in 2020. Over the next five years up to 2025, global data creation is projected to grow to more than 180 zettabytes.”

As someone that’s worked from setting up databases at startups, to managing the analytics team for Alexa worldwide at Amazon, I can say in my experience that more data doesn’t always translate to more insight and impact. There’s a lot of buzz out there about how data is the new oil, becoming an asset class that aids in growing the valuation of a company, but like oil not all of it is created equally and not all of it is certainly useful.

What does it take to make data useful then? In my experience, it needs to start with the problem you’re trying to solve and the questions you’re looking to answer. Data should be treated as the means to an end around business effectiveness, and not an end in itself. A lot of tech companies stand to benefit from more data being analyzed by more people inside of a company (the global big data market set to generate $103bn by 2027), but a lot of companies still find themselves lost regardless of how much data they have in front of them.

In a joint effort between Informatica and Capgemini, a white paper was produced in 2016 around “the Big Data Payoff”. In it, Capgemini surveyed 210 executives and found that only 27% of those execs say its company’s Big data initiatives are profitable. 50% of US executives and 39% of European executives said budget constraints were the primary hurdle in turning Big Data into a profitable business asset.

It’s no doubt expensive to collect, store, and analyze large sums of data when you consider hurdles around security, integration challenges, technical talent, data silos, legacy infrastructure, divided company sponsorship, etc.

If instead though, you looked at your data problem the same way an entrepreneur looks at starting a tech company and building a “Minimal Viable Product” or MVP, perhaps it wouldn’t be so difficult to demonstrate the value of data as it grew over time. Taking the concept of an MVP and using it as a “Minimal Viable Dashboard” or perhaps a “Minimal Viable KPI” and developed your data comprehension with the goal of making every data initiative you launch grow from the previous effort and ensuring that the end result is demonstrating impact back to the business. Even for companies with mature BI practices, centralized data teams, and data scientists developing complex forecasting tools, the real usefulness can often get lost in the complexity. Imagine how many dashboards get built with the intention of being used, but to be passed over by the end-users after the first use never to be opened again.

I think if companies started by making descriptive analytics as useful as possible, ensuring that the most basic use cases around data are being fully utilized and that everyone in the org is benefiting from the most basic use of data collection, modeling, and analysis then it wouldn’t be so tricky to justify the budget to go after more advanced initiatives involving increasingly more complex data sets.

The temptation to keep up with competitors, keep your quants interested and leverage advanced data capabilities to build a data-driven moat isn’t too unlike all the companies that rushed to build mobile apps in 2007 to try and gain the same type of competitive advantages being first to deliver something compelling on smartphones. Yet, how many apps really moved the needles for companies?

Big Data no doubt has its place, as tech companies demonstrate the power of analyzing increasingly large data sets and the insights those data sets can produce. However, if your company isn’t taking a “crawl, walk, run” approach towards data competency, and really making sure you’re exploiting all the benefits with basic data analysis before working your way up the analytics maturity chain (see https://computd.nl/demystification/4-levels-of-data-maturity/) you’ll not only leave valuable insights on the table but also find yourself spending increasingly more capital for less value while engaging fewer and fewer people across your organization.

At the end of the day, the amount of data you have doesn’t matter so much as the level of impact you have from the data you’re using regardless of the size. As small data sets become increasingly impactful, grow from that initial impact to increase the level of insight, and always measure against the business value you’re creating as you add more data sources to the mix. If the value isn’t there, drop the data sets and/or analysis in question, and find a path to measurable business impact. At Loftus Labs, we like to say that “companies have business problems, not data problems” – that data is the means to solving those problems instead. Perhaps it’s a matter of understanding the problems you need to solve, before diving into the dashboard development.

Here’s some tips to maximizing the usefulness of your data, regardless of the size

1) Analyze what’s being analyzed today. Consider analyzing how often your stakeholders are using what’s developed and develop surveys or personal engagements to make sure people are using what’s being provided effectively (most BI servers will make it easy to track who’s using what and how often). All because it exists doesn’t mean it’s being fully utilized by end-users, so treat your centralized data team as its own internal startup and the “revenue” or measure of success for that team comes in the number of users taking advantage of what you’ve built to date on a regular basis.

2) Make sure someone is asking for what you’re analyzing. If someone isn’t asking for it, then it’s not going to be utilized. I’ve made the mistake of building dashboards I thought would be useful to a department, only to find out that they were busy solving other problems and never actually spent time with what I built. If no one is worried about customer trends at the moment, don’t spend time collecting and analyzing customer trend data. It’s not to say customer trends aren’t important, but the data needs an end-user that cares to spend time with what you’re building in order to generate the business impact. Wait for the need to be raised, before spending the time developing analysis related to that data related to that need if at all possible. Better yet, consider if you even need to start collecting data related to that area if it makes sense. Yes, often times for things like trends you need to have already collected data over time to circle back and analyze it, but that’s also the excuse companies will use to collect and store EVERYTHING on the off chance someone wants to analyze the full corpus over time.

3) Get rid of what’s not tied to a measurable business impact. Don’t be afraid to drop what’s not being used or stop collecting data that isn’t useful. A lean and effective data warehouse driving quantifiable business impact is far more useful at the end of the day than massive data lakes with very little tangible business value. Too much data infrastructure built up around data that’s not entirely useful is going to grow in cost and complexity to maintain. It’s better to ramp down parts of your data organization that isn’t beneficial than try and keep it all going on the off chance less useful parts of your data ecosystem are utilized at some point down the line.

4) Use what’s easy to collect and analyze first, then work your way to bigger and more complex data ecosystems. Doesn’t have to be state of the art to be useful, and it doesn’t have to be big data to drive a business impact. Consider the right tool for the job, and only grow your ecosystem into more data sets, handling more complexity, as what you’ve built today ties to a measurable ROI with the full buy-in of your stakeholders. If everyone can point to the use what you’ve build and managed today can provide, it’s much easier to justify going to the next level in both time and cost.