10.000 hours is not enough
However, being data-driven is not a recent phenomenon.
In fact, it is fundamental to both human and societal existence and as such has been part of civilisation for several thousand years. The collection of taxes has always been driven by data collected through censuses for example.
Even if we narrow our focus to organisations, we’ve had a few hundred years of experience in being data-driven since the organisational structure we recognise today started to emerge during the Industrial Revolution. The submission of financial statements, for instance, is very much driven by data collected by the finance department.
However, the unfortunate fact is that only a select few have succeeded in leveraging data for significantly better outcomes. Instead, many organisations are realising that even though they have been spending massive amounts on various data initiatives, they have gotten only meagre tangible returns on their investments.
In his 2008 book “Outliers”, author Malcolm Gladwell popularised the “10.000-hour rule” which describes the notion that expertise in a given domain requires around 10.000 hours of practice.
Apparently, 10.000 hours is not enough when we’re considering the data domain. Even with several hundred years of practice and the clear benefits of becoming better at data, many organisations are still grappling with effectively becoming more data-driven.
The evolution of the data estate
The reasons for this are manifold. But most – if not all – can be tied back to the evolution of data systems within organisations. These have progressed from simple to complex, underscored by the emergence of specialised operational and analytical systems. This bifurcation has led to the establishment of four primary data utilisation patterns, each with its own unique characteristics and complexities.
Initially, operational systems dominated the data landscape, focusing on transaction processing and day-to-day business functions. These systems were designed for speed, efficiency, and reliability, handling tasks like order processing, inventory management, and customer relationship management. Data flowed primarily within these operational systems (intra-operational movement), ensuring smooth business operations.
As data volumes and business complexity grew, the need for deeper insights into this operational data became apparent. Analytical systems emerged to meet this need, designed to process, and analyse historical data, revealing trends and patterns that could inform decision-making. This led to the development of the second data movement pattern: from operational to analytical systems (operational to analytical movement), where data was transferred from the realm of daily operations to the sphere of reporting and analytics.
Analytical systems evolved to become more sophisticated, encompassing tools for data warehousing, business intelligence, and advanced analytics. Within these systems, data moved and interacted in complex ways (intra-analytical movement), supporting activities like business intelligence, advanced analytics, and predictive modelling. This intra-analytical data movement allowed organisations to delve deeper into their data, extracting valuable insights that could support business initiatives.
The fourth data movement pattern emerged as organisations recognised the value of feeding insights and predictions from analytical systems back into operational processes (analytical to operational movement). This flow of data allowed organisations to be more adaptive, responsive, and proactive, enhancing decision-making and operational efficiency.
These 4 types of data movement serve as the background for highlighting why becoming more data-driven remains a challenge.
Why must it be so hard?
First off, data ownership is unclear. Organisations have great experience in assigning ownership of systems and processes and there is often an easy answer to questions such as “who does what” and “who is responsible”. Either implicitly through the organisational structure or explicitly in process documentation and standard operating procedures. For instance, you can often easily identify the person responsible for the CRM system. Or the person responsible for accounts receivable.
This assignment of ownership is harder to identify for data. It has not kept up with the evolution of the data estate. For instance, think about the relatively simple process of claiming expenses. The process might be that you submit the expense claim, followed by a validation step and then an approval or rejection. Up until this point the data movement has been in the operational layer – most likely involving an expense (mobile) app and the organisation’s finance system. At some point, these transactional data will be moved from the operational layer to the analytical layer and fed into a data warehouse where they are transformed and aggregated before being surfaced in the financial reporting.
Who in this chain of events is responsible for the quality of the data in the reports? The owner of the expense application? The owner of the finance system? There could be issues with the movement of the transaction from the expense application to the finance system. Or there might be issues with the master data feeding the chart of accounts from the finance system into the expense application.
And what happens to the ownership when data is moved out of the operational layer? Who is the owner of the data integration from the operational layer to the data warehouse? The data warehouse owner? The owner of the report?
The key issue is that while system and process ownership is assigned, the data being used transverses systems and processes leaving it difficult to establish and enforce ownership often resulting in lengthy time to resolution whenever problems arise.
A second type of problem arising from lack of ownership is that data is often used in a “grab-n-go” fashion. Just grab the data you need directly from the source containing the data. This might work at the time of implementation, but the result is a multitude of point-to-point integrations which quickly become unwieldy to update and maintain.
Finally, the maturity of development and maintenance practices for analytical solutions is tracking that of traditional systems and software development. Operational systems support the execution of processes which are often highly business critical. Consequently, stringent standards for reliability and quality are imperative to prevent expensive operational interruptions. To meet these standards, the development and maintenance of solutions in the operational layer follow well-established methodologies and processes.
Arguably, analytical systems have not historically been providing the same level of mission-critical support. The impact of not being able to receive customer payments, trade securities, or operate factory machinery is higher than the sales manager not being able to see a report of sales last month. A consequence of this is that the development of solutions in the analytical layer has not followed stringent methodologies and practices to the same extent.
As organisations are moving towards increased integration between operational and analytical systems and analytical systems are becoming more critical to the successful execution of business processes, this lax attitude towards development and maintenance can have severe consequences ranging from the inability to automate processes to difficulty in explaining exactly how analytical results are generated.
Running to stand still
The issues mentioned above are, however, often invisible to the untrained eye. This is because organisations are exchanging data between systems, they are creating reports, getting insights, and building and deploying machine learning models.
The key thing to realise is that the investments made to be able to achieve this have resulted in a data estate that is a patchwork of poorly designed and implemented solutions which on top are often undocumented.
The result is that many organisations today are stuck in a situation where they need to keep investing heavily – just to maintain the status quo. It is very costly – if not impossible – to introduce new solutions or develop existing solutions further.
If your current peak performance level only serves to maintain the status quo, any aspiration of becoming more data-driven is naive.
Breaking the status quo
Thankfully, several organisations have already embarked on changing how they work with data. Even though all organisations are different, and hence are pursuing paths that at the detail level are different, there are certain recurring characteristics of organisations that have succeeded in becoming more data-driven.
Going back to Malcolm Gladwell’s “Outliers”, a key point is that the 10.000 hours of practice to master a domain need to be “deliberate”. You need to ensure that the practice you go through is designed to provide the optimal conditions for learning. Transferring this insight to the data domain means that to become more data-driven you need to be deliberate in the way you work with data. One way of ensuring this is to create a data strategy and execute the initiatives identified herein.
Even without a data strategy, organisations can break free of the status quo by clearly defining data ownership. And making the required organisational changes to support this ownership structure. The insight here is that there is not one single blueprint that fits all organisations at all points in time. You cannot just copy what your competitors are doing. You need to do the work. You need to monitor the progress. And you need to adjust as needed.
Adding to the insights above, more and more organisations are also realising that for data-driven solutions to be treated as business-critical assets, they need to be developed with the same rigour and structure that other IT software solutions apply. Treating data as products complete with product owners and lifecycle management is one approach often adopted. Adopting DevOps ways of working in the data domain and working with DataOps is another.
Finally, a word of caution. Some organisations are finding the gap between their current capabilities and their data-driven ambitions seemingly too large to bridge. Feeling the need to do something, anything, they turn to different democratisation initiatives – self-service BI, and citizen data science to name a few. However, democracy is not a gift shop. Instead, democracies are characterised by citizens having both rights and obligations. Focusing on only providing rights and failing to satisfy the corresponding obligations will not make you more data-driven. Instead, it will only end up exacerbating the problems you want to solve.
Democratisation is great. If your starting point is the right one. Going directly from the Stone Age to a post-industrial democracy is not recommended.