Roni Kobrosly Ph.D.'s Website

More thoughts on "data maturity" at organizations

written by Roni Kobrosly on 2024-12-19 | tags: personal updates career data engineering


I recently posted on LinkedIn about the state of "data maturity" at organizations. I'll spare you the trouble of linking to it:


It’s almost 2025, but I can’t help but feel like the VAST majority of orgs lack “data maturity”. By this I mean:

* A centralized data store with good metadata, dictionaries, and processes for ensuring the data remains clean
* Leadership that understands how to effectively deploy BI, analytics, ML and quantify the real impact it has on KPIs
* Teams of DS/MLEs that have the tools they need and the discipline to produce rigorously tested and well-written software

Instead I still hear a lot of stories of places that: 

* have a data science team to sort of check a box or say that they do data science. They rarely have the tools, clean data, or cultural buy-in to make an impact. The leadership can’t wrap their heads around why “ML magic” isn’t happening, and they ultimately don’t see the value.
* have shadow data and tech orgs all doing their own things
* use decision-driven data analysis instead of data-driven decisioning (“hey, can you re-run that analysis to show something that looks better for us?”)
* store data in 10s of different data bases/lakes/whatever and ETL to get things done is a nightmare
* partly rely on third-party data vendors that don’t play well with internal data tooling, causing frustration. 

Sure, not all organizations care to be data-driven (maybe a 60-person e-commerce org just wants to know their sale numbers and inventory count, that’s fine!). But I’m staggered how by few small, medium, and enterprise-scale orgs that want that data-driven edge are sooo data immature. I don’t have the data, but I might guess 1 out of 30 orgs do this reasonably well. I imagine the numbers are worse in the more regulated industries (I.e. healthcare and finance) that have a heavy data governance requirement on top of everything. 

Am I crazy?! I sort of figured these were 2015 issues that would be resolved in a decade. 

Increasingly I’m coming around to Benn Stancil ‘s views on data (he seems to see the ability of data to drive value with some skepticism). I’m paraphrasing but in 2022 he said something to the effect of “businesses are all gambling, with data we have a very slightly better odds, but it’s still a gamble”. As in, data teams don’t make or break a businesses success. I’m a bit skeptical GenAI will change this dynamic. I can already see lots of companies marketing that the employ GenAI but is it done effectively or just to show they’re in the GenAI game. 

I think the proliferation of blog posts entitled “why I left data science” is telling (https://towardsdatascience.com/why-youll-quit-your-data-science-job-6079d407bbeb, https://ryxcommar.com/2022/11/27/goodbye-data-science/, https://nirantk.com/writing/why-i-quit-data-science/)

And to be clear I’m not having a career crisis 🙂 I’m just reflecting on my decade in the industry and all of the expectations those of us that started in the 2010s had. I still believe there is value in data to inform what decisions to make, I’m just a bit discouraged by the lack of orgs to figure this out still.

At the risk of sounding age-ist, I do have some hope that as a generation of data-savvy folks enter the C-suite, things may get better. I feel like the current generation has a mentality of “hey we’ve got a lot of data right, just sprinkle some ML on top of what we have and we’re off to the races, right?”

After re-reading it, I didn't mean for it to come off as that spicy. It was meant to be 30% venting and 70% honest reflection on the state of the industry. It got a suprising amount of private replies and actually led to a number of Zoom call conversations.

Here are some of the replies:

  • "Definitely not crazy! I'm constantly surprised and confused how many teams are struggling with the same challenges that we heard team leaders talking about in 2014, 2015 ... ten years ago?! I think data science / ML / AI are all in a weird place where the amount of hype around them creates so many new tools, job titles, opportunities -- and a lot of that is great! -- but many (most?) companies haven't really found structures / tools / processes that work for them."

  • "FWIW - I sorta relate, in that I largely transitioned away from DE and working as a backend engineer. Generally found it more rewarding to maintain core systems that make the product function. The challenges of ensuring large high traffic complex core services are reliable are generally more subtle and more varied than a typical “productionized sql” data pipeline. For data engineering, I sorta found the career paths were either transition to MLE, go into management, start a company, or maybe work in tooling. None of which particularly interest me at the moment, but to each their own!"

  • "You’re not crazy but any team beyond maintenance is a gamble"

  • "Maybe I've been lucky to work at organizations where I was part of a team building, maintaining, scaling and growing a core data engineering infrastructure that serves the need of a multitude of user groups, such as data scientists, business intelligence analysts, application developers, machine learning engineers. The work has been a mix of building dbt models, integrating apis / ingesting new data sources, building software pipelines to train and predict ML models and a multitude of other things. I don't think the journey has been unique but maybe different from others on this thread."

Here's my attempt at bulleting out the key takeaways I learned from the phone call conversations:

  • Progress has been made, but it has occurred spottily.
  • Like a decade ago, there is still a percent of companies that excel at the data game, and a percent that doesn't. It's a minority that does. Maybe in 2015 it was (5%), perhaps now it's (10%).
  • There is a spectrum of companies where data and the data work is closer to the core of the business (to name some big names: LinkedIn, Spotify, Facebook, and Glassdoor have that data work at its very core), and there are other companies where the core of the business is something more traditional: Selling clothes, selling bank accounts, selling software for writing documents (yes, even companies selling software) and the data team/component is essentially an appendage / arm of the company. Data maturity is less import to companies of the latter category.
  • Classic ML has become less sexy than generative AI, but data engineering is as unsexy as it was a decade ago. And without a solid data engineering foundation, you cannot do BI, analytics, ML, or AI well.
  • Some interesting question to ask during a data job interview process if you want to suss out their degree of data maturity:
    • If your data team vanished tomorrow, how much would that impact the business (OBVIOUSLY you need to frame this in a gentle way). Maybe like "How is your world different without the presense of your team?"
    • How much data do you have and can you quantify it? Look for specificity in the answer (we've got petabytes of data in X, terabytes of Y, etc.)
    • What does your data team look like? How many other types of role X are there?
    • What impact have your existing models or dashboards had, specifically?
    • It is important to me that I make an impact in my work? If I were to step in tomorrow, would I be able to deploy something within a month that would be used?
    • Is your data team considered first-class citizens in the company?