How To Survive a Ground-Up Rewrite Without Losing Your Sanity

By Dharmesh Shah on April 8, 2013

aka: Screw you Joel Spolsky, We're Rewriting It From Scratch!

This is a guest post by Dan Milstein (@danmil), co-founder of Hut 8 Labs.

Disclosure: Joel Spolsky is a friend and I'm an investor in his company, Stack Exchange (which powers the awesome Stack Overflow) -Dharmesh

So, you know Joel Spolsky's essay Things You Should Never Do, Part I? In which he urgently recommends that, no matter what, please god listen to me, don't rewrite your product from scratch? And lists a bunch of dramatic failures when companies have tried to do so?

First off, he's totally right. Developers tend to spectacularly underestimate the effort involved in such a rewrite (more on that below), and spectacularly overestimate the value generated (more on that below, as well).

But sometimes, on certain rare occasions, you're going to be justified in rewriting a major part of your product (you'll notice I've shifted to saying you're merely rewriting a part, instead of the whole product. Please do that. If you really are committed to just rewriting the entire thing from scratch, I don't know what to tell you).

If you're considering launching a major rewrite, or find yourself as the tech lead on such a project in flight, or are merely toiling in the trenches of such a project, hoping against hope that it will someday end... this post is for you.jump rewrite

Hello, My Name is Dan, and I've Done Some Rewrites

A few years back, I joined a rapidly growing startup named HubSpot, where I ended up working for a good solid while (which was a marvelous experience, btw -- you should all have Yoav Shapira as a boss at some point). In my first year there, I was one of the tech leads on a small team that rewrote the Marketing Analytics system (one of the key features of the HubSpot product), totally from scratch. We rewrote the back end (moving from storing raw hit data in SQLServer to processing hits with Hadoop and storing aggregate reports in MySQL); we rewrote the front end (moving from C#/ASP.Net to Java/Tomcat); we got into the guts of a dozen applications which had come to rely on that store of every-hit-ever, and found a way to make them work with the data that was now available. (Note: HubSpot is now primarily powered by MySQL/Hadoop/HBase. Check out the HubSpot dev blog).

It took a loooong time. Much, much longer than we expected.

But it generated a ton of value for HubSpot. Very Important People were, ultimately, very happy about that project. After it wrapped up, 'Analytics 2.0', as it was known, somehow went from 'that project that was dragging on forever', to 'that major rewrite that worked out really well'.

Then, after the Analytics Rewrite wrapped up, in my role as 5 Whys Facilitator, I led the post-mortem on another ambitious rewrite which hadn't fared quite so well. I'll call it The Unhappy Rewrite.

From all that, some fairly clear lessons emerged.

First, I'm going to talk about why these projects are so tricky. Then I'll pass on some of those hard-won lessons on how to survive.

Prepare Yourself For This Project To Never Fucking End

The first, absolutely critical thing to understand about launching a major rewrite is that it's going to take insanely longer than you expect. Even when you try to discount for the usual developer optimism. Here's why:

  • Migrating the data sucks beyond all belief

I'm assuming your existing system has a bunch of valuable data locked up in it (if it doesn't, congrats, but I just never, ever run into this situation). You think, we're going to set up a new db structure (or move it all to some NoSQL store, or whatever), and we'll, I dunno, write some scripts to copy the data over, no problem.

Problem 1: there's this endless series of weird crap encoded in the data in surprising ways. E.g. "The use_conf field is 1 if we should use the auto-generated configs... but only if the spec_version field is greater than 3. Oh, and for a few months, there was this bug, and use_conf was left blank. It's almost always safe to assume it should be 1 when it's blank. Except for customers who bought the Express product, then we should treat it as 2". You have to migrate all your data over, checksum the living hell out of it, display it back to your users, and then figure out why it's not what they expect. You end up poring over commit histories, email exchanges with developers who have long since left the company, and line after line of cryptic legacy code. (In prep for writing this, when I mentioned this problem to developers, every single time they cut me off to eagerly explain some specific, awful experience they've had on this front -- it's really that bad)

Problem 2: But, wait, it gets worse: because you have a lot of data, it often takes days to migrate it all. So, as you struggle to figure out each of the above weird, persnickety issues with converting the data over, you end up waiting for days to see if your fixes work. And then to find the next issue and start over again. I have vivid, painful memories of watching my friend Stephen (a prototypical Smart Young Engineer), who was a tech lead on the Unhappy Rewrite, working, like, hour 70 of an 80 hour week, babysitting a slow-moving data export/import as it failed over and over and over again. I really can't communicate how long this takes.

  • It's brutally hard to reduce scope

With a greenfield (non-rewrite) project, there is always (always) a severe reduction in scope as you get closer to launch. You start off, expecting to do A, B, C & D, but when you launch, you do part of A. But, often, people are thrilled. (And, crucially, they forget that they had once considered all the other imagined features as absolutely necessary)

With a rewrite, that fails. People are really unhappy if you tell them: hey, we rewrote your favorite part of the product, the code is a lot cleaner now, but we took away half the functionality.

You'll end up spending this awful series of months implementing all these odd edge cases that you didn't realize even existed. And backfilling support for features that you've been told no one uses any more, but you find out at the last minute some Important Person or Customer does. And, and, and...

  • There turn out to be these other system that use "your" data

You always think: oh, yeah, there are these four screens, I see how to serve those from the new system. But then it turns out that a half-dozen cron jobs read data directly from "your" db. And there's an initialization step for new customers where something is stored in that db and read back later. And some other screen makes a side call to obtain a count of your data. Etc, etc. Basically, you try turning off the old system briefly, and a flurry of bug reports show up on your desk, for features written a long time ago, by people who have left the company, but which customers still depend on. This takes forever all over again to fix.

Okay, I'm Sufficiently Scared Now, What Should I Do?

You you have to totally own the business value.

First off, before you start, you must define the business value of this rewrite. I mean, you should always understand the big picture value of what you do (see: Rands Test). But with rewrites, it's often the tech lead, or the developers in general, who are pushing for the rewrite -- and then it's absolutely critical that you understand the value. Because you're going to discover unexpected problems, and have to make compromises, and the whole thing is going to drag on forever. And if, at the end of all that, the Important People who sign your checks don't see much value, it's not going to be a happy day for you.

One thing: be very, very careful if the primary business value is some (possibly disguised) version of "The new system will be much easier for developers to work on." I'm not saying that's not a nice bit of value, but if that's your only or main value... you're going to be trying to explain to your CEO in six months why nothing seems to have gotten done in development in the last half year.

The key to fixing the "developers will cry less" thing is to identify, specifically, what the current, crappy system is holding you back from doing. E.g. are you not able to pass a security audit? Does the website routinely fall over in a way that customers notice? Is there some sexy new feature you just can't add because the system is too hard to work with? Identifying that kind of specific problem both means you're talking about something observable by the rest of the business, and also that you're in a position to make smart tradeoffs when things blow up (as they will).

As an example, for our big Analytics rewrite, the developers involved sat down with Dan Dunn, the (truly excellent) product guy on our team, and worked out a list of business-visible wins we hoped to achieve. In rough priority order, those were:

  • Cut cost of storing each hit by an order of magnitude

  • Create new reports that weren't possible in the old system

  • Serve all reports faster

  • Serve near-real-time (instead of cached daily) reports

And you should know: that first one loomed really, really large. HubSpot was growing very quickly, and storing all that hit data as individual rows in SQLServer had all sorts of extra costs. The experts on Windows ops were constantly trying to get new SQLServer clusters set up ahead of demand (which was risky and complex and ended up touching a lot of the rest of the codebase). Sales people were told to not sell to prospects with really high traffic, because if they installed our tracking code, it might knock over those key databases (and that restriction injected friction into the sales process). Etc, etc.

Solving the "no more hits in SQLServer" problem is the Hard kind for a rewrite -- you only get the value when every single trace of the old system is gone. The other ones, lower down the list, those you'd see some value as individual reports were moved over. That's a crucial distinction to understand. If at all possible, you want to make sure that you're not only solving that kind of Hard Problem -- find some wins on the way.

For the Unhappy Rewrite, the biz value wasn't perfectly clear. And, thus, as often happens in that case, everyone assumed that, in the bright, shiny world of the New System, all their own personal pet peeves would be addressed. The new system would be faster! It would scale better! The front end would be beautiful and clever and new! It would bring our customers coffee in bed and read them the paper.

As the developers involved slogged through all the unexpected issues which arose, and had to keep pushing out their release date, they gradually realized how disappointed everyone was going to be when they saw the actual results (because all the awesome, dreamed-of stuff had gotten thrown overboard to try to get the damn thing out the door). This a crappy, crappy place to be -- stressed because people are hounding you to get something long-overdue finished, and equally stressed because you know that thing is a mess.

Okay, so how do you avoid getting trapped in this particular hell?

Worship at the Altar of Incrementalism

Over my career, I've come to place a really strong value on figuring out how to break big changes into small, safe, value-generating pieces. It's a sort of meta-design -- designing the process of gradual, safe change.

Kent Beck calls this Succession, and describes it as:

"Design changes are usually most efficiently implemented as a series of safe steps. Succession is the art of taking a single conceptual change, breaking it into safe steps, and then finding an order for those steps that optimizes safety, feedback, and efficiency."

I love that he calls it an "art" -- that feels exactly right to me. It doesn't happen by accident. You have to consciously work at it, talk out alternatives with your team, get some sort of product owner or manager involved to make sure the early value you're surfacing matters to customers. It's a creative act.

And now, let me say, in an angry Old Testament prophet voice: Beware the false incrementalism!

False incrementalism is breaking a large change up into a set of small steps, but where none of those steps generate any value on their own. E.g. you first write an entire new back end (but don't hook it up to anything), and then write an entire new front end (but don't launch it, because the back end doesn't have the legacy data yet), and then migrate all the legacy data. It's only after all of those steps are finished that you have anything of any value at all.

Fortunately, there's a very simple test to determine if you're falling prey to the False Incrementalism: if after each increment, an Important Person were to ask your team to drop the project right at that moment, would the business have seen some value? That is the gold standard.

Going back to my running example: our existing analytics system supported a few thousand customers, and served something like a half dozen key reports. We made an early decision to: a) rewrite all the existing reports before writing new ones, and b) rewrite each report completely, push it through to production, migrate any existing data for that report, and switch all our customers over. And only then move on to the next report.

Here's how that completely saved us: 3 months into a rewrite which we had estimated would take 3-5 months, we had completely converted a single report. Because we had focused on getting all the way through to production, and on migrating all the old data, we had been forced to face up to how complex the overall process was going to be. We sat down, and produced a new estimate: it would take more like 8 months to finish everything up, and get fully off SQLServer.

At this point, Dan Dunn, who is a Truly Excellent Product Guy because he is unafraid to face a hard tradeoff, said, "I'd like to shift our priorities -- I want to build the Sexy New Reports now, and not wait until we're fully off SQLServer." We said, "Even if it makes the overall rewrite take longer, and we won't get off SQLServer this year, and we'll have to build that one new cluster we were hoping to avoid having to set up?" And he said "Yes." And we said, "Okay, then."

That's the kind of choice you want to offer the rest of your larger team. An economic tradeoff where they can chose between options of what they see when. You really, really don't want to say: we don't have anything yet, we're not sure when we will, your only choices are to keep waiting, or to cancel this project and kiss your sunk costs goodbye.

Side note: Dan made 100% the right call (see: Excellent). The Sexy New Reports were a huge, runaway hit. Getting them out sooner than later made a big economic impact on the business. Which was good, because the project dragged on past the one year mark before we could finally kill off SQLServer and fully retire the old system.

For you product dev flow geeks out there, one interesting piece of value we generated early was simply a better understanding of how long the project was going to take. I believe that is what Beck means by "feedback". It's real value to the business. If we hadn't pushed a single report all the way through, we would likely have had, 3-4 months in, a whole bunch of data (for all reports) in some partially built new system, and no better understanding of the full challenge of cutting even one report over. You can see the value the feedback gave us--it let Dan make a much better economic choice. I will make my once-per-blog-post pitch that you should go read Donald Reinertsen's Principles of Product Development Flow to learn more about how reducing uncertainty generates value for a business.

For the Unhappy Rewrite, they didn't work out a careful plan for this kind of incremental delivery. Some Totally Awesome Things would happen/be possible when they finished. But they kept on not finishing, and not finishing, and then discovering more ways that the various pieces they were building didn't quite fit together. In the Post-Mortem, someone summarized it as: "We somehow turned this into a Waterfall project, without ever meaning to."

But, I Have to Cut Over All at Once, Because the Data is Always Changing

One of the reasons people bail on incrementalism is that they realize that, to make it work, there's going to be an extended period where every update to a piece of data has to go to both systems (old and new). And that's going to be a major pain in the ass to engineer. People will think (and even say out loud), "We can't do that, it'll add a month to the project to insert a dual-write layer. It wil slow us down too much."

Here's what I'm going to say: always insert that dual-write layer. Always. It's a minor, generally somewhat fixed cost that buys you an incredible amount of insurance. It allows you, as we did above, to gradually switch over from one system to another. It allows you to back out at any time if you discover major problems with the way the data was migrated (which you will, over and over again). It means your migration of data can take a week, and that's not a problem, because you don't have to freeze writes to both systems during that time. And, as a bonus, it surfaces a bunch of those weird situations where "other" systems are writing directly to your old database.

Again, I'll quote Kent Beck, writing about how they do this at Facebook:

"We frequently migrate large amounts of data from one data store to another, to improve performance or reliability. These migrations are an example of succession, because there is no safe way to wave a wand and migrate the data in an instant. The succession we use is:

Convert data fetching and mutating to a DataType, an abstraction that hides where the data is stored.

Modify the DataType to begin writing the data to the new store as well as the old store.

Bulk migrate existing data.

Modify the DataType to read from both stores, checking that the same data is fetched and logging any differences.

When the results match closely enough, return data from the new store and eliminate the old store.

You could theoretically do this faster as a single step, but it would never work. There is just too much hidden coupling in our system. Something would go wrong with one of the steps, leading to a potentially disastrous situation of lost or corrupted data."

Abandoning the Project Should Always Be on the Table

If a 3-month rewrite is economically rational, but a 13-month one is a giant loss, you'll generate a lot value by realizing which of those two you're actually facing. Unfortunately, the longer you solider on, the harder it is for people to avoid the Fallacy of Sunk Costs. The solution: if you have any uncertainty about how long it's going to take, sequence your work to reduce that uncertainty right away, and give people some "finished" thing that will let them walk away. One month in, you can still say: we've decided to only rewrite the front end. Or: we're just going to insert an API layer for now. Or, even: this turned out to be a bad idea, we're walking away. Six months in, with no end in sight, that's incredibly hard to do (even if it's still the right choice, economically).

Some Specific Tactics

Shrink Ray FTW

This is an excellent idea, courtesy of Kellan Elliot-McCrea, CTO of Etsy. He describes it as follows:

"We have a pattern we call shrink ray. It's a graph of how much the old system is still in place. Most of these run as cron jobs that grep the codebase for a key signature. Sometimes usage is from wire monitoring of a component. Sometimes there are leaderboards. There is always a party when it goes to zero. A big party.

Gives a good sense of progress and scope, especially as the project is rolling, and a good historical record of how long this shit takes. '''

I've just started using Shrink Ray on a rewrite I'm tackling right now, and I will say: it's fairly awesome. Not only does it give you the wins above, but, it also forces you to have an early discussion about what you are shrinking, and who in the business cares. If you make the right graph, Important People will be excited to see it moving down. This is crazy valuable.

Engineer The Living Hell Out Of Your Migration Scripts

It's very easy to think of the code that moves data from the old system to the new as a collection of one-off scripts. You write them quickly, don't comment them too carefully, don't write unit tests, etc. All of which are generally valid tradeoffs for code which you're only going to run once.

But, see above, you're going to run your migrations over and over to get them right. Plus, you're converting and summing up and copying over data, so you really, really want some unit tests to find any errors you can early on (because "data" is, to a first approximation, "a bunch of opaque numbers which don't mean anything to you, but which people will be super pissed off about if they're wrong"). And this thing is going to happen, where someone will accidentally hit ctrl-c, and kill your 36 hour migration at hour 34. Thus, taking the extra time to make the entire process strongly idempotent will pay off over and over (by strongly idempotent, I mean, e.g. you can restart after a failed partial run and it will pick up most of the existing work).

Basically, treat your migration code as a first class citizen. It will save you a lot of time in the long run.

If Your Data Doesn't Look Weird, You're Not Looking Hard Enough

What's best is if you can get yourself to think about the problem of building confidence in your data as a real, exciting engineering challenge. Put one of your very best devs to work attacking both the old and the new data, writing tools to analyze it all, discover interesting invariants and checksums.

A good rule of thumb for migrating and checksumming data: until you've found a half-dozen bizarre inconsistencies in the old data, you're not done. For the Analytics Rewrite, we created a page on our internal wiki called "Data Infelicities". It got to be really, really long.

With Great Incrementalism Comes Great Power

I want to wrap up by flipping this all around -- if you learn to approach your rewrites with this kind of ferocious, incremental discipline, you can tackle incredibly hard problems without fear. Which is a tremendous capability to offer your business. You can gradually rewrite that unbelievably horky system that the whole company depends on. You can move huge chunks of data to new data stores. You can pick up messy, half-functional open source projects and gradually build new products around them.

It's a great feeling.

---

What's your take?  Care to share any lessons learned from an epic rewrite?

Topics: guest technology
Continue Reading

Measuring What Matters: How To Pick A Good Metric

By Dharmesh Shah on March 29, 2013

Ben Yoskovitz is the co-author of Lean Analytics, a new book on how to use analytics successfully in your business. Ben is currently VP Product at GoInstant, which was acquired by Salesforce in 2012. He blogs regularly at Instigator Blog and can be followed @byosko.

We all know metrics are important. They help report progress and guide our decision making. Used properly, metrics can provide key insights into our businesses that make the difference between success and failure. But as our capacity to track everything increases, and the tools to do so become easier and more prevalent, the question remains: what is a worthwhile metric to track?

Before you can really figure that out it's important to understand the basics of metrics. There are in fact good numbers and bad numbers. There are numbers that don't help and numbers that might save the day.

First, here's how we define analytics: Analytics is the measurement of movement towards your business goals.

The two key concepts are "movement" and "business goals". Analytics isn't about reporting for the sake of reporting, it's about tracking progress. And not just aimless progress, but progress towards something you're trying to accomplish. If you don't know where you're going, metrics aren't going to be particularly helpful.

With that definition in mind, here's how we define a "good metric".

A good metric is:

  • Comparative

  • Understandable

  • A ratio or rate

  • Behavior changing

A good metric is comparative. Being able to compare a metric across time periods, groups of users, or competitors helps you understand which way things are moving. For example, "Increased conversion by 10% from last week" is more meaningful than "We're at 2% conversion." Using comparative metrics speaks clearly to our definition of "movement towards business goals".

A good metric is understandable. Take the numbers you're tracking now--the ones you think are the most important--and show those to outsiders. If they don't instantly understand your business and what you're trying to do, then the numbers you're tracking are probably too complex. And internally, if people can't remember the numbers you're focused on and discuss them effectively, it becomes much harder to turn a change in the data into a change in the culture. Try fitting your key metrics on a single TV screen (and don’t cheat with a super small font either!)

A good metric is a ratio or a rate. Ratios and rates are inherently comparative. For example, if you compare a daily metric to the same metric over a month, you'll see whether you're looking at a sudden spike or a long-term trend. Ratios and rates (unlike absolute numbers) give you a more realistic "health check" for your business and as a result they're easier to act on. This speaks to our definition above about "business goals"--ratios and rates help you understand if you're heading towards those goals or away from them.

A good metric changes the way you behave. This is by far the most important criterion for a metric: what will you do differently based on changes in the number? If you don't know, it's a bad metric. This doesn't mean you don't track it--we generally suggest that you track everything but only focus on one thing at a time because you never know when a metric you're tracking becomes useful. But when looking at the key numbers you're focused on today, ask yourself if you really know what you'd do if those numbers go up, down or stay the same. If you don't, put those metrics aside and look for better ones to track right now.

http://www.flickr.com/photos/circasassy/7858155676/


Now that we've defined a "good" metric let's look at five things you should keep in mind when choosing the right metrics to track:



  • Qualitative versus quantitative metrics

  • Vanity versus actionable metrics

  • Exploratory versus reporting metrics

  • Leading versus lagging metrics

  • Correlated versus causal metrics


1. Qualitative versus Quantitative metrics

Quantitative data is easy to understand. It's the numbers we track and measure--for example, sports scores and movie ratings. As soon as something is ranked, counted, or put on a scale, it's quantified. Quantitative data is nice and scientific, and (assuming you do the math right) you can aggregate it, extrapolate it, and put it into a spreadsheet. Quantitative data doesn't lie, although it can certainly be misinterpreted. It's also not enough for starting a business. To start something, to genuinely find a problem worth solving, you need qualitative input.

Qualitative data is messy, subjective, and imprecise. It's the stuff of interviews and debates. It's hard to quantify. You can't measure qualitative data easily. If quantitative data answers "what" and "how much," qualitative data answers "why." Quantitative data abhors emotion; qualitative data marinates in it.

When you first get started with an idea, assuming you're following the core principles around Lean Startup, you'll be looking for qualitative data through problem interviews. You're speaking to people--specifically, to people you think are potential customers in the right target market. You're exploring. You're getting out of the building.

Collecting good qualitative data takes preparation. You need to ask specific questions without leading potential customers or skewing their answers. You have to avoid letting your enthusiasm and reality distortion rub off on your interview subjects. Unprepared interviews yield misleading or meaningless results. We cover how to interview people in Lean Analytics, but there have been many others that have done so as well. Ash Maurya’s book Running Lean provides a great, prescriptive approach to interviewing. I also recommend Laura Klein’s writing on the subject.

Sidebar: In writing Lean Analytics, we proposed the idea of scoring problem interviews. The basic concept is to take the qualitative data you collect during interviews and codify it enough to give you (hopefully!) new insight into the results. The goal of scoring problem interviews is to reduce your own bias and ensure a healthy dose of intellectual honesty in your efforts. Not everyone agrees with the approach, but I hope you'll take a look and try it out for yourself.

2. Vanity versus Actionable metrics

I won't spent a lot of time on vanity metrics, because I think most people reading OnStartups understand these. As mentioned above, if you have a piece of data that can't be acted upon (you don't know how movement in the metric will change your behavior) then it's a vanity metric and you should ignore it.

It is important to note that actionable metrics don't automatically hold the answers. They're not magic. They give you an indication that something fundamental and important is going on, and identify areas where you should focus, but they don't provide the answers. For example, if "percent of active users" drops, what do you do? Well, it's a good indication that something is wrong, but you'll have to dig further into your business to figure it out. Actionable metrics are often the starting point for this type of exploration and problem solving.

3. Exploratory versus Reporting metrics

Reporting metrics are straightforward--they report on what's going on in your startup. We think of these as "accounting metrics", for example, "How many widgets did we sell today?" Or, "Did the green or the red widget sell more?" Reporting metrics can be the results of experiments (and therefore actionable), but they don't necessarily lead to those "eureka!" moments that can change your business forever.

Exploratory metrics are those you go looking for. You're sifting through data looking for threads of information that are worth pursuing. You're exploring in order to generate ideas to experiment on. This fits what Steve Blank says a startup should spend its time doing: searching for a scalable, repeatable business model.

A great example of using exploratory metrics is from Mike Greenfield, co-founder of Circle of Moms. Originally, Circle of Moms was Circle of Friends (think: Google Circles inside Facebook). Circle of Friends grew very quickly in 2007-2008 to 10 million users, thanks in part to Facebook's open platform. But there was a problem--user engagement was terrible. Circle of Friends had great virality and tons of users, but not enough people were really using the product.

So Mike went digging.

And what Mike found was incredible. It turns out that moms, by every imaginable metric, were insanely engaged compared to everyone else. Their messages were longer, they invited more people, they attached more photos, and so on. So Mike and his team pivoted from Circle of Friends to Circle of Moms. They essentially abandoned millions of users to focus on a group of users that were actually engaged and getting value from their product. From the outside looking in this might have been surprising or confusing. You might find yourself at a decision point like Mike and worry about what investors will think, or other external influencers. But if you find a key insight in your data that’s incredibly compelling, you owe it to yourself to act on it, even if it looks crazy from the outside. For Mike and Circle of Moms, it was the right decision. The company grew their user base back up to 4 million users and eventually sold to Sugar Inc.

4. Leading versus Lagging metrics

Leading and lagging metrics are both useful, but they serve different purposes. Most startups start by measuring lagging metrics (or "lagging indicators") because they don't have enough data to do anything else. And that's OK. But it's important to recognize that a lagging metric is reporting the past; by the time you know what the number is, whatever you’re tracking has already happened. A great example of this is churn. Churn tells you what percentage of customers (or users) abandon your service over time. But once a customer has churned out they're not likely to come back. Measuring churn is important, and if it's too high, you'll absolutely want to address the issue and try to fix your leaky bucket, but it lags behind reality.

A leading metric on the other hand tries to predict the future. It gives you an indication of what is likely to happen, and as a result you can address a leading metric more quickly to try and change outcomes going forward. For example, customer complaints is often a leading indicator of churn. If customer complaints are going up, you can expect that customers will abandon and churn will also go up. But instead of responding to something that's already happened, you can dive into customer complaints immediately, figure out what's going on, resolve the issues and hopefully minimize the future impact in churn.



Ultimately, you need to decide whether the thing you're tracking helps you make better decisions sooner. Remember: a real metric has to be actionable. Lagging and leading metrics can both be actionable, but leading indicators show you what will happen, reducing your cycle time and making you leaner.

5. Correlated versus Causal metrics

A correlation is a seeming relationship between two metrics that change together, but are often changing as a result of something else. Take ice cream consumption and drowning. If you plotted these over a year, you'd see that they're correlated--they both go up and down at the same time. The more ice cream that's consumed, the more people drown. But no one would suggest that we reduce ice cream consumption as a way of preventing drowning deaths. That's because the numbers are correlated, and not causal. One isn't affecting the other. The factor that affects them both is actually the time of year--when it's summer, people eat more ice cream and they also drown more.



Finding a correlation between two metrics is a good thing. Correlations can help you predict what will happen. But finding the cause of something means you can change it. Usually, causations aren't simple one-to-one relationships--there’s lots of factors at play, but even a degree of causality is valuable.

You prove causality by finding a correlation, then running experiments where you control the other variables and measure the difference. It's hard to do, but causality is really an analytics superpower--it gives you the power to hack the future.

So what metrics are you tracking?

We’ve covered some fundamentals about analytics and picking good metrics. It's not the whole story (to learn more see our presentations and workshops on Lean Analytics), but I'd encourage you to take a look at what you're tracking and see if the numbers you care the most about meet the criteria defined in this post. Are the metrics ratios/rates? Are they actionable? Are you looking at leading or lagging metrics? Have you identified any correlations? Could you experiment your way to discovering causality?

And remember: analytics is about measuring progress towards goals. It's not about endless reports. It's not about numbers that go constantly "up and to the right" to impress the press, investors or anyone else. Good analytics is about speeding up and making better decisions, and developing key insights that become cornerstones of your startup.

Topics: guest execution
Continue Reading

Recommended For You

Community

Let's Connect

And, you can find me on Google+, Twitter, and Linkedin.

Recent Posts

Chat with GrowthBot

It's a bot to help you with your marketing and grow. You can research your competitors, improve your SEO and a lot more. http:/GrowthBot.org