Before I was CEO of Fivetran, I was a scientist, and when I wasn't helping give monkeys telepathic powers I was doing my best to learn and apply the principles of scientific thinking. These principles continue to influence how I lead Fivetran. Recently I've been thinking a lot about evidence-based medicine (EBM), and I've found that it contains a lot of lessons that apply in a business context. We can think of evidence as a hierarchy, with higher-quality evidence at the top:
When drawing conclusions, we should give priority to higher-quality evidence over lower-quality evidence. If randomized controlled trials disagree with observational studies, the controlled trials are probably right and the observational studies are probably wrong. There are exceptions to this principle, which we will return to later. But first, let's consider some examples of each category of evidence.
Mechanistic reasoning
Mechanistic reasoning is not really a form of evidence as it is a way of generating hypotheses. Paradoxically, mechanistic reasoning is incredibly persuasive, and in the medical field this frequently leads to reversal, where a common practice is discontinued when it is discovered after many years that it was ineffective all along.
Despite its flaws, mechanistic reasoning is frequently all we have to go on when making decisions in business. If we are starting a new company trying to find product-market fit, many of the decisions we make are going to be based on mechanistic reasoning. At Fivetran, arguably the core decision we made that differentiated us from existing ETL tools was that the data warehouse schema was not user-configurable. This allowed us to automate a large class of data pipeline maintenance activities and initiated a virtuous cycle of centralized bug-fixing that over time produced a superior product. This core decision was based on mechanistic reasoning.
Observational evidence
Observational evidence is inherently weak but extremely common, because it is so much easier to create. All we need is an existing dataset and a computer. Most medical research is observational in nature, and observational evidence is valuable! Many randomized controlled trials test hypotheses that were initially generated by observational studies.
In business, observational evidence is usually all that we have, and we have to make decisions based on the evidence at hand. A question we recently asked at Fivetran was: does committing to use Fivetran for a year, and paying up-front, change customer behavior? We looked at customers who switched from month-to-month to commit, and found that they behaved similarly to customers who did not switch. Based on this evidence and some mechanistic reasoning, we decided to de-emphasize commit in 2023. This change will generate a natural experiment that will provide additional evidence, and we may ultimately run experiments to understand the true value of commit.
Controlling for X
Researchers trying to make the best of observational data will frequently "control for" confounders using multiple regression or re-weighting the data. These techniques are usually necessary to learn anything from observational data, but they are fraught. After-the-fact "controls" only mitigate confounding; there is always residual confounding in observational data. Furthermore, if we control for too many variables, and try too many models, we will fall into a different trap: data mining. When we give ourselves many choices and try many different analyses, the choices we make shape the result.
Natural experiments
A natural experiment occurs when something happens in the world that naturally creates two groups that we can compare. For example, the introduction of public smoking bans in many countries in the 2000s created a natural experiment that researchers have used to study the effects of second-hand smoke. Natural experiments are not as good as true randomized controlled trials: they are still vulnerable to confounding if something else changes alongside the natural experiment.
Natural experiments are common in business. For example, we recently wanted to better understand the effectiveness of our sales team at Fivetran. Leads are assigned to sales representatives based on territories, and there is considerable random variation in the number of leads assigned to each representative in a given month. This is a natural experiment: leads that come in during a "heavy month" will receive less attention from sales. Our results indicated that our sales activities probably have less effect than we would like to think. But this conclusion must be regarded as tentative, because there are many possible sources of confounding in the data. We have planned a true experiment for next year to better understand the real effect of early engagement with sales.
Randomized controlled trials
The strongest form of evidence are randomized controlled trials, for example the drug trials that the FDA requires pharmaceutical companies to run prior to approval. These trials can cost hundreds of millions of dollars to run, so pharma companies only run them when there is extensive evidence that the drug is likely to work. Nonetheless, drugs frequently fail in FDA trials! There is an important lesson here: many things seem to work until they are subjected to a true experiment.
A-B tests are the most common example of a randomized controlled trial in a business context. We recently conducted an A-B test at Fivetran where we gave new users the option of signing up with login-with-Google, instead of creating a username and password, and found a meaningful increase in the conversion rate to an active free trial. A-B tests are most common in marketing and product, but can and should be applied to sales.
When randomized trials are wrong
True experiments are the strongest form of evidence, but they are not always right. There are a number of ways that randomized trials can go wrong.
False negatives
An underpowered experiment can produce negative results for a treatment that really does work. If we're designing an A-B test, we should always do a power analysis beforehand to ensure we have a large enough sample to produce meaningful negative results. Even a well-powered trial can produce false negatives when the treatment is applied to the wrong group. For example, a marketing campaign might fail when tested against small companies but succeed for large enterprises.
False positives
A poorly chosen surrogate endpoint can produce positive results for a treatment that does not really work. Surrogate endpoints are easy-to-measure proxies for some other endpoint that we truly care about. In the A-B test of the Fivetran signup flow that we described earlier, initiating a free trial is a surrogate endpoint for becoming a paying customer. It is possible that the additional trials we gain by offering signup-with-Google never convert to paying customers, and so the modification to our signup flow might not actually improve the outcome that we care about.
Accepting negative results
When a practice backed by strong mechanistic reasoning and observational evidence fails in an experiment, it can be difficult to accept. This is especially true when people have taken action for years based on the theory that has just been disproven. Accepting a negative result means acknowledging that not only were we wrong, but that our past efforts were wasted. Overcoming this bias is the hardest part of putting evidence into practice.
How to make your business evidence-based
We hear a lot about data-driven decision making. The difference between a data-driven decision and an evidence-based decision is that the decision-maker considers whether the data represents strong evidence for the conclusion. The point of evidence is to guide decisions, so transforming a business into being evidence-based has to start with leaders.
Talk about evidence quality
When team members present data in meetings, get into the habit of talking about evidence quality.
- Is this data observational, a natural experiment, or a randomized experiment?
- What are the known confounds and what has been done to mitigate them?
- Are there signs of other confounding variables that might make our results misleading?
Anyone presenting data should be prepared to answer these questions. Teammates can use these questions as a framework to critique the presenter's interpretation of the data. It's OK to present data with weaknesses! Most business decisions must be made based on low and medium-quality evidence.
Bigger decisions need better evidence
High-quality evidence takes time and effort to collect. We should only expend that effort when it is justified by the magnitude of the decision. Low-risk decisions can be made based on observational evidence or just mechanistic reasoning. High-risk decisions should ideally be made based on high-quality evidence.
Sometimes, a high-risk decision needs to be made quickly, so there is not enough time to collect high-quality evidence. In these cases, try to implement the decision in a way that generates a natural experiment. For example, in 2022 Fivetran reduced our pricing at the low end. This change took effect on a single day. We were able to use this as a natural experiment to study customer behavior before and after the change.
When considering a big change that is not urgent, start with low-quality evidence before putting in the effort to generate high-quality evidence. For example, start with a simple retrospective analysis of observational data that a single analyst can do at their desk. If the results are promising, it's time to do a real A-B test.
Evidence needs support from leaders
Evidence-based decisions can encounter fierce resistance from the organization. Everyone, starting with the CEO, needs to make a commitment to change their minds based on evidence. This is most challenging when we have prior beliefs that are contradicted by new evidence. We can use a simple Bayesian framework for incorporating new evidence into our prior beliefs, depending on the strengths of those beliefs and the quality of the evidence:
- When weak evidence challenges a strong belief, ignore the evidence.
- When weak evidence challenges a weak belief, be even more uncertain.
- When strong evidence challenges a weak belief, change my mind.
- When strong evidence challenges a strong belief, change my mind unless I can articulate a convincing theory for why the evidence is misleading.
The most important experiments encounter the greatest organizational resistance, because they call for big changes. It's part of the job of every leader to help overcome this resistance.
Reversals are OK
In business, it's OK to go forward with a strategy based on weak evidence, discover later that it doesn't work, and reverse the decision. Business decisions are usually made under time constraints. If we put too much pressure on ourselves to avoid mistakes, this will paradoxically incentivize everyone to resist high-quality evidence that emerges later. Instead, we need to celebrate changing our minds in the face of new evidence, and encourage everyone to revisit important past decisions using high-quality evidence.
What success looks like
As our organizations implement evidence-based business, we should see these telltale signs of success:
- Team members presenting data talk about evidence quality and potential confounds..
- Team members ask questions and offer critiques based on the "hierarchy of evidence" framework.
- Decision-makers acknowledge the uncertainty of their decisions, and offer examples of the type of evidence that could prove them wrong.
- Decision-makers change their minds regularly based on high-quality evidence, and past decisions are reversed based on new evidence.
The use of evidence in business is in its infancy in many ways. We have a huge opportunity to improve by adopting the same ideas that have revolutionized medicine. Using data to make decisions is only the first step. If we are going to use data to make the right decisions, we must make our businesses evidence-based.
Read more articles by our co-founder and CEO George Fraser on the Fivetran blog.