Growing up – all the way to engineering school and beyond – I was obsessed with mathematical modelling and statistics. The ability to model (correctly) the predictive behaviour of a system (oftentimes a complex system that involves a fairly large number of variables) can seem like pure magic to an outsider (and is a whole lot of fun science to an insider).
My most cherished memories as a game designer all involve working with humongous spreadsheets – tuning complex game economies, knowing that one mistake could lead to hundreds of thousands of dollars worth of loss, tying up dozens of loose ends and running countless simulations to correctly predict the behaviour of millions of players. Virtual economies of large games over time start looking like the economies of small countries – and a game designer/ product manager starts resembling the Chairman of the Federal Reserve/ Governor of the Reserve Bank of India. Exhilarating!
So when I see mathematics and especially statistics being used incorrectly – it rankles me to the core.
“A glass of wine a day keeps heart disease away.”
“People with black hair more likely to turn to a life of crime.”
“Martians are attacking vineyards across the country.”
“People with black hair who drink wine are prime targets for alien abduction and a heart-disease-free life of crime on Mars.”
You get the gist: a research institute (funded by some kind of a lobby) will run a study with a sample set of people (usually of the order of a few hundred that they would inflate to a few thousand). These people will be asked to do one action repeatedly (drink wine/ beer) over time and another variable (their blood pressure/ height/ length of toe-nails) will be recorded over the same period by a researcher.
At the end of the study, the researcher will compute the correlation between the repeated action and the variable and publish his/ her findings in a journal. A journalist will come across this paper – will refuse to go over the details – and directly jump to the conclusion section. And we will see a sensationalist headline the next day in every newspaper, website, blog across the world.
People will take this headline as a cardinal truth and vow to change their lives accordingly (start consuming beer, stop wearing undergarments, stop showering, etc).
Randall Munroe sums it up beautifully in XKCD:
I am not trying to discount the work of serious researchers and scientists here. Statistics is a very tough business – especially in the real world:
- Experiments cannot be conducted in a controlled environment (and a true A/B test requires a highly controlled environment)
- Too many variables exist that cannot be kept constant so that one can observe the interplay just between two selected variables
- Sample sizes remain small because large sampling is costly and not scalable
- Law of large numbers cannot be applied if the sample size is small
However, I see a lot of these realities being ignored – not just in research, but also in the world of technology product management (that actually has much better instrumentation and data gathering tools).
Example: Start-up XYZ releases a new feature in their app that has about 10,000 daily active users (DAU). A week after the feature release the DAU jumps to 15,000 suddenly. The team is elated. They run a couple of quick queries on their analytics system. They see a couple of different metrics on the rise and are convinced that it was the feature that caused the jump in DAU. What else could it be? The VP of product immediately calls a meeting between all functional leads: “We need to put everything else on hold. Forget the product roadmap that we took a whole month to arrive at. This new feature is our future. Let’s commit all our resources to this.” Everyone nods in agreement.
Three months later the start-up is dead.
“Correlation does not equal Causation.”
People will point to scrappy start-ups that did exactly this and became huge. Words like “pivot” and “experimentation” will be thrown around a lot.
People only remember the exceptions. People are usually wrong. (I completely get the irony of that statement given what I am discussing in this post.)
A vast majority of product teams forget the fundamentals. Be wary of strong positive correlations between a feature and the rise in a particular product metric (active users, retention, engagement, etc). Without correctly setting up an experiment and validating a product hypothesis one cannot jump to conclusions.
The correct experimental method involves the following steps:
– Come up with a hypothesis. “What will this feature achieve?” “What metrics will it move and by how much?” Having a clear purpose for an experiment and estimating expected results are both critical.
– The second step is to set up the experiment correctly. Do we have the ability to measure the results? What are we gauging/ matching the results against? Do we have a sturdy A/B testing platform (and how many people in the team have a clear understanding of what an A/B test is)?
– The third step is to draw inferences from the results – the actuals versus the expected. Inferences are drawn using a mix of instincts and numbers. Though between instincts and numbers most product teams almost always seem to err on the side of instincts. This is really bad. A tech start-up needs to find a balance between instincts and numbers.
I think this strip best sums up this post:
1 thought on “The Correlation Causation Conundrum (Alliteration Ahoy!)”