• May 10

Causal Inference: A Data Science Skill AI Can’t Fully Replace

  • Jamilla Cooiman, Founder Causal Academy

We are all seeing how quickly AI is developing and changing the world around us. It is not difficult to imagine that many tasks AI is able to do today could soon completely take over certain jobs. AI can work without breaks, without sleep, and at a lower cost than hiring a human. It is still unclear how these developments will be regulated, but for many of us, the question has probably crossed our minds: will AI replace my job?

Why Data Scientists Are at Risk of Being Replaced

I am not someone who can predict the future, and I don’t claim to know exactly how things will go. But my own view is that AI will replace some jobs, or at least parts of them. Roles that already rely heavily on repetitive computational work may be more vulnerable, and one of those roles is data science.

When I say this, I don’t mean that everything a data scientist does is replaceable, but many common tasks are. In particular, the most commonly applied data science practices today are focused on predictive modelling, which often tends to follow a typical, to some extent repetitive workflow. And this way of working is relatively easy for machines to copy.

Think about some common steps we often take in a typical project. We split our data into training and testing sets, we try different models, tune the hyperparameters, and look for the best accuracy (e.g. the minimum or maximum of some optimization metric).

Steps like these follow a routine, which can be fully automated by AI. It can try out different models, tune them, and evaluate their performance without human input. It can also handle tasks like data cleaning and exploratory analysis to a large extent. Monitoring models after they are deployed is often already managed by software.

Of course, human input is still important. We are needed for defining the business problem, deciding if a model is ethical or makes sense for a specific situation, and doing more thoughtful feature engineering and more. And ofcourse some highly specific domains might require more human involvement. But that’s a smaller slice of the overall data science pipeline and indusrty, and businesses may start to question whether they really need large data science teams when so many parts of the job can now be automated. This could lead to a significant reduction in data science roles.

Is Predictive Modelling All There is to us Data Scientists?

Whether or not data scientists are ‘easily’ replaceable by AI depends largely on the range and depth of their skillsets. The broader and more versatile your skills, the more value you’re likely to offer beyond what AI can currently replicate.

However, the reason automation poses a real risk to the job security of many data scientists is that a large number have been primarily trained in predictive modeling. They’re skilled at building and deploying models that optimize for predictive accuracy, and that’s precisely the type of work AI is increasingly capable of doing independently, and at scale.

But data science isn’t just about prediction. At its core, data science is meant to support better decision-making. And making effective decisions often requires more than just knowing what is likely to happen — it requires understanding how our actions will change what will happen.

For example, it’s one thing to predict sales of a product next quarter. It’s another thing to figure out how to increase those sales. This shift from passively predicting outcomes to actively changing them is a fundamentally different problem. It’s a causal problem. In these cases, we want to know how different decisions or interventions (like pricing changes, marketing campaigns, or product tweaks) will affect the outcome, so we can choose the most effective one.

These kinds of causal questions are often among the most valuable for businesses. They’re central to strategy, policy-making, marketing, pricing, product development, and more. And answering them well requires a different set of skills: causal inference, or what some call causal data science. And compared to predictive modelling, causal inference is a lot harder to fully automate with AI.

In my perspective, this creates a clear opportunity: data scientists who invest in learning causal inference will develop a skillset that is both highly relevant and less easily replaced by automation. In a field where parts of the job are becoming increasingly automatized, that kind of edge can make all the difference.

Why Causal Inference Skills Are Harder for AI to Replace

Here are two things that AI automation likes: repetitiveness and large amounts of data. Predictive modeling fits perfectly into that world. It’s based on finding associations in observational data (which we often have large amounts of) and it follows a fairly standard and repetitive process (like selecting candidate models, tuning hyperparameters, and optimizing performance through cross-validation). Steps like these can be automated quite easily.

But causal inference is different. It doesn’t follow a highly standardized or repeatable process and relies heavily on human judgment. That makes it much harder for AI to fully replace.

To understand this, let’s start with the difference between association and causation. Associations can be found directly from observational data. Causation, on the other hand, can generally only be discovered in two ways: either through experimental data, or by combining observational data with external knowledge about how the world works.

The first option, experimental data, is relatively rare. This kind of data is often created through controlled experiments, which are expensive and time-consuming, and usually designed by humans for specific business problems. The second option, observational data, is everywhere. It’s collected continuously by apps, platforms, websites, and business systems. It’s what most of our predictive models rely on.

But observational data on its own never reveals cause and effect. It only shows associations. This means that AI models can’t only rely on observational data to perform causal inference.

To estimate causal effects using observational data, we have to bring in assumptions about the underlying causal processes that generated the data. These assumptions don’t come from the data; they come from domain knowledge and human reasoning. As Judea Pearl puts it in The Book of Why:

“You are smarter than your data. Data do not understand causes and effects; humans do.”

This external knowledge is not only human-dependent but also highly specific to the problem at hand, which makes it harder to automate.

There’s another layer to this as well. In predictive modeling, we have a clear ground truth to optimize for. For example, if we build a model to predict sales, we can compare our predictions to the actual sales figures and refine the model accordingly. The ‘best’ model is often clearly defined as one that minimizes or maximizes some metric – something AI can easily check without human involvement.

In causal inference, we are estimating causal effects, like the effect of price changes on sales. But we don’t have ground truth for these effects. There’s no column in our dataset that shows the true causal effects we can use to compare our model’s predictions against.

Because of this, building a strong causal model is much more problem-specific and requires continuous human judgment. It calls for a deep understanding of the data, the domain, and the assumptions being made. Because we lack strong validation procedures like with predictive modelling, tools like sensitivity analysis become a lot more important. With such approaches, there are no clear rules for what’s acceptable or not. Interpreting the results and deciding what counts as “too sensitive” is again a human call.

Don’t get me wrong, AI can automate a lot of parts of the causal inference process, like the causal effect estimation procedure, or suggesting potential causal assumptions. But unlike predictive modeling, where automation can handle large parts of the process fully alone, causal inference just depends a lot more on human reasoning and involvement. That’s what makes these skills harder to fully automate.

Conclusion

In my view, learning causal inference is no longer just a bonus. It’s a skill that is, and will remain, highly relevant, and an opportunity to make yourself stand out in a job market where AI automation is starting to shake things up.

0 comments

Joinor login to leave a comment