PPC: Methods for Evaluating Tests

My monthly column at Search Engine Land in case you missed it. Measuring test results can be surprisingly difficult. One reason for this is order latency. The fact that the orders generated by today's clicks don't all come in today, but instead trail in over time means that analyzing new launches and tests can be tricky. Two ways to address this complication are described below. PROBLEM: Successful tests can look bad initially because of order latency For example, let's say the order latency for a particular advertiser with a 14 day cookie window looks like this: 56% of the orders come in within 24 hours of the time of the click, 10% come during the next 24 hour period, etc. So, on that first day you see 100% of the clicks, but not nearly all of the orders those clicks will drive -- actually, quite a bit less than 56%, as clicks late in the day have less time to "mature". For sake of simplicity, let's ignore that bit. Doing so allows us to map out what a tremendously successful test might look like. Let's say an advertiser launches a new product category and new keyword ads are developed. Let's say the advertiser's efficiency target is a 25% cost to sales ratio, and let's say their brilliant PPC firm nails the bids right out of the gate. The clicks generated on day 1 cost $1,000 and will eventually drive $4,000 in sales, but here's what the results look like as they unfold spending $1,000 every day at perfect efficiency: Yielding a Day to Day Apparent cost to sales ratio that looks like this: The first few days of the test it appears that the efficiency is way above the advertiser's comfort threshold. It takes the full duration of the cookie window, for the Observed Efficiency to match the Actual Efficiency of the advertising. Advertisers who don't recognize this effect may cancel tests, or pull back on the rudder too quickly. Indeed, what this suggests is that every launch and every extra bid push will appear to be less helpful to the top line and more harmful to the bottom line than reality. On the flip side, every pull back on bids will appear to be more helpful to the bottom line and less harmful to the top line than it really is because of the lagging orders from the higher click volumes that preceded the test. The greater the order latency, the bigger the impact. We typically find that more considered purchases, and higher AOV advertisers have greater latency than average which impacts the proper length for the cookie window. However, no one wants to wait 14 or 30 or 90 days to read the results of a test. In the example above, the PPC agency hit the bids right on the head from day 1. When that doesn't happen, it's good to find out sooner rather than later that you're undershooting or overshooting. Two Methods for Evaluating Tests:
  1. Shorten the Sales Window. Instead of evaluating the test based on the full cookie window, study the data based on a same session or one-hour sales interval. In the example above, if 35% of the eventual orders normally come within the first hour of the click, extrapolate the results from the first few days based on that number. If the ratio of cost to (observed 1-hr sales/0.35) is on target, the test is probably on target.If an advertiser is attempting to learn the top-line vs bottom-line trade off of bidding to a 30% A/S target rather than a 25% A/S target, compare the % increase in 1-hour sales to the % increase in cost. That should be a pretty good proxy for the A/S ratio on the incremental sales.
  2. Tie Orders to the Time of the Click. Most reports show the PPC costs for the day, and the PPC sales taken that day. It's entirely likely that half of the sales taken that day came from earlier clicks. By running reports tying the sales to the time of the click, rather than the time of the order, you get a much clearer picture of what your actions on that day did for you over the long haul. This is particularly useful for studying past tests and anticipatory bidding at the holidays to see whether you anticipated the improvement in traffic quality appropriately.
The problem with the first method is that it assumes the latency for the new product category, or incremental traffic, will be the same as it's been in the past. Not a bad guess, but potentially misleading. The problem with the second method is that you can't use it fully until the cookie windows have elapsed. By using method 1 during the early phases of the test and method 2 after the test is "complete", a good analyst can avoid missing opportunities and overspending during the test, and get a dead-eye accurate read on the results after the fact.
Join the Discussion