When A/B Testing Your Website Is A Bad Idea?
A/B testing has been a valuable tool for marketers and researchers seeking to improve digital properties for well over a decade. Today, publishers are leveraging new testing technologies to conduct A/B tests on everything from site layouts to brand imagery. I’m willing to bet that more A/B tests are being conducted on websites across the globe now than ever before.
Unfortunately, A/B testing is often used as a gold-standard for decision-making in instances that it actually is sort of a blunt tool. How can data-driven decisions be anything but good, Tyler? That’s a great question. The answer is that you can never go wrong by following the data; however, it’s easy to be misled by what we think is intuitively the right decision.
Below, I’ll highlight some of the common ways that A/B testing is often used improperly, and highlight a model that can be used more effectively.
When testing works
When performed correctly, any test with a sufficient amount of data should allow you to make objective decisions that ultimately move your website towards your end goal; whether that be improved UX, increased revenue, more form fills, or something else. The problem that many publishers are discovering about A/B testing — in particular —is that it is impossible to scale.
In the past 5 years, user behavior has begun an unprecedented shift. Due to increased use of mobile devices and improvements in connectivity, user behavior is no longer as predictable as it once was. While there are many variables to measure, data scientists have learned that the following visitor conditions have the largest impact on how a user responds to different website elements…
- internet connection type/speed
- geo-location
- device type
- time of day
- referral source
One of the things our research team has learned in looking at these data points for thousands of publishers is that these variables often influence and change a users preferences on the same site. In one scenario a visitor may prefer combination D, and prefer combination A in another.
What this means is that on Monday at 7 am, a user coming from Facebook on a mobile phone connected to wifi, may prefer a hamburger-style menu with no ads above the fold. That exact same user may return Friday at 8 pm on a tablet device from a Google search and prefer a different style of menu and a totally different ad treatment.
This is the role of context in testing. User intent and user behavior have much more to do with individual preferences that we may have realized. Gathering sufficient data is vitally important, but contextualizing is an actionable way might be more important.
The flaw in A/B website testing
This exposes the fundamental flaw in A/B website testing. If an A/B test reveals that 64% of visitors prefer A and only 36% of visitors prefer B, then A is the clear winner, right?
What happens to all of the visitors that prefer B? What if the test was looking at a hamburger menu vs. a horizontal pop-out menu? We already saw from our example above that context could see the exact same user casting “votes” for both A and B.
This is where the traditional application of A/B testing starts getting slow and a bit murky. You could run an experiment, pick a majority variable winner, and apply it in hope that your goal metrics will improve. The downside of this approach is that you’ll never know how the B-side visitors will respond to the change; nor will you be able to account for changes in user behavior or context.
The application of contextual testing
As you can tell from above, what the data seems to advocate for is contextual testing and delivery. If visitors should be measured by profiles. Those same criteria from above should be used to bucket users into groups to see how each respond to different variable changes.
While there should always be a control group in these types of segmentation, the process of testing should always be split to see how all visitors types respond to variable changes.
The implementation should apply the same type of methodology. Visitors that prefer A should get A, and visitors that prefer B should get B. There isn’t enough data on individual users to do this on a per visitors basis; however, you can easily use these categories to implement them across segments/buckets.
- geo-location
- UTM source
- time of day
- device type
- avg. user pageload speed
To do this, a webmaster would need to…
- proxy the control page
- pick page variants that you would like to partition traffic to
- measure the results on a per segment basis
- deliver the results via the proxied page to only the segments in which variables were clear winners
There you have it. That’s a better way to do A/B testing. It’s still not as good as true-to-form multivariate testing, but it is a much more effective and modern way of approaching how a website should be testing.
Website elements worth testing
The next question that comes after this revelation is what should actually be tested? This was something I dug into a little bit when discussing ways that website could reduce their bounce rates. There are several basic elements our data science team has learned about as it relates to what most clearly affects digital revenue and user experience.
The website elements worth testing are the things that have the largest impact on user experience metrics (bounce rate, time on site, and pageviews per visit). These things have statistically and scientifically proven impacts on both revenue and other popular conversion metrics. These basic web elements include…
- menu location
- background color
- text size and color
- other navigational elements
- ad placements
- layouts
- image locations
These are the elements that we have seen have the largest impacts on those core user experience metrics over time. Keep in mind that they should be tested across all devices.
How testing affects revenue
This is something we’ve dug into in great depth before, but testing can clearly improve things like experience metrics. And arguably, this is the best use of your time when testing; as UX metrics have a direct correlation to just about any end testing goal. This includes most forms of digital revenue.
By optimizing your website on a per visitors basis you can effectively impact the length of a user session indefinitely; driving up revenue and providing the user with a better experience on your site.
This is where I ultimately believe that artificial intelligence will be able to aid and help digital publishers now and in the future. The ability to collect, measure, and distribute all of this data on a per visitor basis is very time-consuming; and arguably impossible to a comprehensive degree. Offloading this process to a machine would allow us to make much more sophisticated decisions with all of this existing information.
Above, you can see a short clip of my good friend, Dr. Greg Starek, explaining where this may ultimately take us.
Always follow the data
Ultimately, data and proper testing will always lead us in the right direction. The hard part is conducting sufficient tests and contextualizing the data in a way that keeps us from being misled.
Questions, thoughts, rebuttals? Keep the conversation going below.