OnStartups

We Like Humans, But We LOVE Code (Machine Learning Case Study)

Posted by Dharmesh Shah on April 16, 2015 in guest development 6 Comments

The following is a guest post by Jonah Lopin (@jonahlopin), founder at Crayon.  Jonah is a HubSpot alumnus, and I'm an angel investor in the company.  

Here at Crayon, we love humans. 

In fact, all our moms and many of our best friends are humans! 

But when it comes to solving problems, we prefer code.

Humans are awesome at certain things, like closing enterprise sales deals, delivering rousing speeches, and comforting small children. For these tasks and lots of others, humans are far superior to code.

But in a business context, humans have some serious drawbacks:

1. They’re hard to recruit (Crayon is hiring, by the way)

2. They’re expensive (compared to servers)

3. They sometimes get sick and can’t come to work (☺)

For a product-driven software company, there’s something else to consider. It’s subtle, but important.

Software is better than humans at providing an elegant solution to complex problems at scale.

Tomasz from Redpoint put together some fascinating data back in 2012 that showed billion dollar public SaaS companies have revenue per employee around 200k/year. Jeremiah Owyang puts revenue per employee at Google & Facebook around $1m per employee, about 5x higher! Clearly, Facebook & Google solve more problems with code than a typical B2B software company.

The fact that Google & Facebook solve their primary problems with code rather than people doesn’t just impact their metrics and margins, it shapes the elegance and efficiency of the solutions they provide. The subtle advantage of solving problems with code rather than humans is that it tends to make your core product better over time.

Crayon is small, but we dream big. We solve problems with code, not humans, because we want to serve millions of users as elegantly as possible. It’s just down-right hard to do that with too many humans in the mix, especially if you want to scale quickly. (And especially if you’re not Uber.)

The rest of this post is about a big problem we had and how we solved it with code.

The Problem: Some Web Pages Don’t Look So Hot

Crayon is a visual inspiration platform. We help marketers get great ideas so they can do better marketing.

We’re programmatically adding about half a million pages to our system every day. As I write this article, we’ve got about 13 million designs in the system, and we’ll have close to 100 million by the end of the year.

We let users vote designs up and down, so most categories on Crayon like Startup Home Pages, B2B Pricing Pages and Landing Pages have always looked pretty good.

But we still had a problem in deeper categories: folks would be happily browsing, only to have the smooth inspirational vibe unceremoniously broken by a crappy looking page. How could we get the “best” designs to the top of the result set and the “least inspiring” designs to the bottom?

We Challenged Ourselves to Solve it With Code

Can code separate “inspiring” marketing designs from “crappy” ones?

Many said it couldn’t be done… but if they were right, would I be writing this article?

Here’s How We Solved It

Step 1: Pick Training Sets

We picked a set of 200 “inspiring” marketing designs and a set of 200 “uninteresting” marketing designs.

Yes, this is a bit of a fuzzy thing to do because it’s based on human judgment. But after running the services team at HubSpot for 5.5 years, and working closely with more than 8,000 professional marketers, I felt qualified to judge which marketing designs were likely to be interesting and instructive to other marketers. So sue me!

Step 2: Make Some Guesses About Why The Good Ones Are Good

This part is like picking a basketball team without being able to watch the candidates play basketball. What characteristics of the players would you look at? For instance, you might pick taller players or players wearing high-tops.

In the style of Jeff Foxworthy’s comedy routine You Might Be a Redneck:

If your Html is all based on tables… you might not have a great marketing design (tweet)

If you’ve got lots of inline styles… you might not have a great marketing design (tweet)

If you don’t use media queries in your CSS… you might not have a great marketing design (tweet)

You get the idea.

Step 3. Test Your Guesses

We wrote some code to test each “guess” from step 2 against each design in the training sets from step 1. We were hoping to find things that were “true” for the “inspiring” pages, and “untrue” for the “uninteresting pages”.

We looked at 45 discrete “guesses”, and found 25 of them were predictive of “inspiring” marketing designs. Success!

Some of the best factors were things like:

  • Setting the viewport meta tag
  • Including Facebook Open Graph tags
  • Using the Bootstrap framework
  • Specifying Apple touch icons

Note that these factors don’t directly predict which pages are “inspiring”. Rather, these factors indicate that someone clueful created the page, which is directly predictive of a page being “inspiring”.

Some of the things we thought might work, but weren’t predictive at all include:

  • Whether a page uses JQuery doesn’t matter… it’s just too ubiquitous
  • The total text on the page doesn't matter.  It turns out there are equally as many crappy short pages as there are long pages (tweet)

The final step was some mathematical mojo based on how strong the signals were in step 2 to come up with an overall “inspiringness score” for each page.

The Results

Pages like the chatterbox.co homepage did very well:

interesting-example

While our friends at biomarkerstrategies.com didn’t fare as well:

uninteresting-example

 

Overall, here’s what we got:

humans-vs-code-chart-chart

How awesome is that?

The top 10% of Crayon search results just went from 50% “inspiring” to a whopping 93%!

We have 13 million designs in the system today, and reviewing those manually would take a 10 person team about 5 years. Ouch! But we’ll have 100 million designs by the end of the year, and that’s just the beginning. If we didn’t solve this problem with code, we’d be in serious trouble. And no humans were involved… except for the one human writing this article.

Your thoughts?

Does your business have a tough problem you plan to solve with code rather than humans?

Have you used machine learning to deliver elegant solutions to customers at scale?

Please continue the converstaion in the comments.

Go solve something with code! (tweet)

Read More

An Insider's Look At HubSpot Sidekick's Growth Approach

Posted by Dan Wolchonok on March 18, 2015 12 Comments

The Sidekick growth team is a small, data driven and aggressive group within HubSpot that works on new, emerging products with massive audiences and a freemium business model (similar to Dropbox and Evernote).  We are constantly pushing ourselves to learn new growth strategies, tactics, and techniques. I have personally become more data driven and model driven after joining the team, and wanted to walk through an example of one decision that became much easier with the use of our generic problem solving framework.

I am a big believer in the idea that complicated problems look simple when you are able to break them down.  Don’t take my word for it - this is what was attributed to Einstein:

If he had one hour to save the world he would spend fifty-five minutes defining the problem and only five minutes finding the solution

The Sidekick growth team follows a very straightforward process that strives to take complicated choices, and analyze them to produce areas of opportunity:

Step 1: Choose a goal

Step 2: Build a model

Step 3: Analyze the inputs

Step 4: Identify opportunities

These steps are generic enough that they can be applied to many kinds of problems.  Whether you’re on a sales, marketing, product, support, services, or any other type of team this framework is incredibly valuable.

Step 1: Choose a goal

Choosing the right metric / goal is very challenging and critical to our success.  If our team optimizes for the wrong metric, it doesn’t matter how well we execute because our efforts won’t translate into success.

Our identified goal:

We ultimately chose to define our goal as increasing the number of people active on a weekly basis.  Rather than pick any metric, a person has to take one of six key actions to demonstrate that they are getting value from our product.  Brian Balfour talks about the cycle of meaningless growth here, but the key takeaways for our product are that more people using Sidekick helps us to grow faster.

Some of the attributes we used when picking our goal:

  • It is a holistic representation of our product. We thought about all of the ways that someone uses Sidekick, and thought about the best way to represent them.

  • It’s authentic (hard to fake).  If you optimize for a hollow metric that is easy to attain, but doesn’t translate into success later on, you might fool yourself into thinking you’re making progress. If we were to pick signups, we might crush that goal and get a lot of users to sign up, but they might not stick around.

  • It represents real value.  If you solve for your own needs instead of the customer’s needs, you may be successful in achieving your goal but it won’t translate into true success down the road.  We tried to pick a goal that represented users getting value out of the product, which results in people upgrading to the paid version of Sidekick.

Step 2: Build a model

With the goal established, we then set out to build a model to understand what will have the biggest impact on weekly active users (WAUs).  If we simply tried to increase our top level goal on its own, we wouldn’t have an understanding of where to start.  In order to understand where to focus our efforts, the model breaks down the goal into manageable pieces.  The whole point of building the model is to understand what the inputs are, and what the biggest contributor to our goal is.

Our Excel model breaks down our goal (WAUs) into the individual components that drive it on a weekly basis.  It’s a simple equation:

WAU = (New people) + (People from previous weeks who continue to use Sidekick)

We broke down each of these two buckets into their individual components.

WAU =

  • New People:
    • Channels:
      • People we acquire through paid acquisition
      • People we acquire through content marketing
      • People we acquire through SEO
      • People we acquire virallly from existing Sidekick users (invites, etc)
      • Activation Rate
        • Not everyone who signs up ends up using the product.  Therefore, we measure the people who install our software and get it up and running correctly through an activation rate. Rather than look at the activation rate across all channels, it’s important to understand how each one is different, and if there are isolated pockets ripe for improvement. For example: users who read our content are more likely to set it up than someone who clicked through on an advertisement before signing up.
  • People from previous weeks who continue to use Sidekick
    • We look at the number of people who sign up each week, and then look to see how many of them are active each week since they signed up.
    • We look at retention, which is critical in freemium businesses.  In order to accomplish our goal of having millions of users, we have to retain the users we acquire.

This is a screenshot of our model:

The numbers in this screenshot have been changed so they are not reflective of our true numbers.

Step 3: Analyze the inputs

The model above is extremely valuable because it allows us to use our week-over-week growth to forecast the long term impact of any change. It’s incredibly hard to understand how multiple factors could interact over a long period of time. It might be possible for someone to reasonably predict the implications of any change, but without the model it is easy to be short sighted.

With our model built, it was easy for us to test the sensitivity of the inputs.  For example, if we were to increase the number of users we acquire from our paid acquisition budget, how would that impact our WAUs in a year? Instead, if we focused on retention and user acquisition rates stayed the same, would we have more users a year later?  What about if we improved the conversion rate for a different area of the funnel?  

Rather than sporadically tackling new campaigns or projects, the goal is to understand what is the most impactful focus area for the business.

In looking at the Sidekick funnel, we found that two of our biggest drivers were retention and viral growth.  We modeled how changes to each of them would impact our goal, and decided to focus on retention first before looking at increasing the number of new people through viral channels.  At the end of the day, some of the factors that we always consider:

  • What is the current state of the metric?
  • How much do we think we can improve this metric?  What’s the ceiling on any improvement?
  • What are the resources required to have a meaningful impact?  How long would it take?

Factoring in answers to those questions, and including the estimates in our model, we decided to focus on improving our retention in Q4 2014.  There was a lot of analysis that went into picking retention; it was the result of repeating Steps 1 through 3 multiple times.  By going through the process of evaluating different levers in the model, it becomes much easier to weigh different options against one another and impartially judge alternatives.

For the Sidekick team, it wasn’t as simple as saying that we wanted to improve retention.  Just like WAU’s, retention in itself has many inputs that we had to evaluate.

  • Of the people that stop using Sidekick, we lose the majority of them in their first couple of weeks

  • In the hypothetical example below, we have sample numbers of how we retain users over time:

    • 45% of a cohort stops using Sidekick one week after signing up

    • 5% of a cohort stops using Sidekick two weeks after signing up

Cohort Size

Signup Week

Active 1 Week Later

Active 2 Weeks later

Active 3 Weeks Later

100

11/3

55

50

45

110

11/10

61

55

50

120

11/17

66

60

54

The numbers in this table have been changed from their real values for this post

  • Given the size of our user base, we determined that week 1 retention was our biggest issue and opportunity.  If our existing user base was larger, our long term retention might have been a more important issue.  The lesson is that your biggest areas of opportunity depend on your current context.

Once we isolated the fact that people stopped using it after their first week, we set out to understand why someone who installed Sidekick would stop using it after they signed up.

Step 4: Identify Opportunities

At this point of the process, we know what’s most important to our goal and the implications of an improvement.  The next step is to start identifying how we can make an improvement.   Depending on the lever, there’s a mix of elements that are helpful in breaking down the opportunity.  We used quantitative analysis to identify a problem segment, qualitative analysis to flush out its symptoms, and used our understanding of our product to come up with ideas to address the issue.

To identify a problem area, we did a quantitative analysis of the people that only used Sidekick for a single week.  We looked to segment these users to look for patterns, such as:

  • Where were these users coming from?  Was there an issue for a single channel of users?

  • What technology were these users using?  Was it an issue with Gmail, Microsoft Outlook, or Apple Mail?

  • What part of the application were they using the most?

  • How much do they use Sidekick?  How many days did they use it?  How much did they use it their first day?

In asking these questions, we found that Gmail users were more likely to stop using the product when compared with other email clients. This was a complete shock to us.  We had figured that Gmail would retain fairly well, and that an issue would be likely to exist in one of our other email clients.  We found that a large number of these people were only active the day that they signed up.  To understand their usage on their first day, we created a histogram that showed how many tracked emails this population of users sent their first day.  

The numbers in this chart have been changed from their real values for this post

For the Sidekick team, it wasn’t surprising that people who only tracked a single email their first day didn’t come back.  The surprising element was that such a large % of these people were only tracking one email.  We wondered why someone would go through the Sidekick onboarding process only to never use it again.  Wouldn’t you at least test it out with a couple of friends or coworkers?

To understand why these people stopped using Sidekick, we sent out a simple email to a thousand users.  I emailed them individually by BCCing them from my HubSpot account, asking for feedback on a specific question designed to bring insight to the pattern we discovered.   We bucketed the replies to our email, and found that there were big opportunities to improve our week 1 retention.

The numbers in this chart have been modified from their real values for this post.

I was personally ecstatic when I saw these distributions.  It wasn’t that a competitor was better, or that there was a mismatch between the features people were looking for and what our product offered.  The issue was a psychological one:

We weren’t doing a very good job explaining what our product did, and how people could get value from using it.

Rather than having to build a lof of new features, we needed to experiment with explaining the value of the product.  It’s much easier to test out different ways of describing the product than addressing weird edge cases or building entirely new features.

With our quantitative analysis done and having received qualitative feedback from the segment of users we were most interested in, we spent time brainstorming ideas to address the opportunity.  We looked at how competitors accomplish the same task, how companies in other industries educate their new users, and researched why our most passionate users like Sidekick.  I’ve included a list of sample experiments we’ve tested:

  • Only show the Sidekick web application once we have value to demonstrate

  • Show a video of someone using Sidekick and how they get value out of it

  • Ask users whether they intended to use Sidekick for personal or business use cases, understand whether we should try to change their mind or give them examples that align with their mind set

  • Show a narrative of how someone uses Sidekick over a period of time

  • Incorporate our onboarding into the Gmail interface rather than in our web app

Conclusion

Looking at the opportunity we have focused on for Q4 2014, it seems kind of simple and obvious.  By setting an appropriate goal, understanding the inputs to that goal and finding the biggest contributor, it led us down a path to clearly define our next steps.  While finding a solution isn’t guaranteed, the team is confident that if successful it’ll have a big impact on our trajectory for 2015.  

This framework isn’t perfect and isn’t for everyone, for instance, if you are creating a new product or process and have a small sample size.  However, for the Sidekick team, this process has been an enormous help in prioritizing where to focus energy and resources and get the team aligned behind a common goal.

This framework is incredibly valuable to the Sidekick team for multiple reasons:

  • It breaks down large, complicated problems into actionable and manageable tasks.

  • We have confidence that the opportunities we are working on will have a big impact.

  • We understand the relative importance of different initiatives and are able to make conscious decisions about areas to pursue and the resulting trade offs.  It’s also easier to decide what we shouldn’t be working on, even if it may feel important.

  • Our team can see the direct impact on individual metrics, and understand how any improvements translate into the success of our team.  Teams like being able to track their progress and see how their efforts translate into success.

  • It’s a repeatable and scalable process.

  • The insights aren’t isolated to technology solutions - they can be as simple as messaging and the steps instructions are displayed.

Great!  How do I apply this?

Your time is extremely valuable. Whenever you can, make data informed choices.  You don’t have to be a slave to your model, but you should be making conscious decisions about what you’re focusing on and why it’s most important. Hold your colleagues accountable - ask them why an initiative is important, and what the impact will be.

The mantra of our team is that we want to be the best at getting better.  If you’re interested in learning about growth and seeing your contributions all the way to a business’ bottom line, our team is hiring.  You can see a list of the open positions here.

Thanks to these wonderful readers for their feedback: Brian Balfour, Anum Hussain, Maggie Georgieva, Jeremy Crane, and Andy Cook.

Read More