Thank you to all of you who attended what was our most successful webinar yet!
In this webinar, it’s all about the numbers.
As marketers, we live in an age of increasingly complex data. Many of the tools we deal with day to day have complex maths at their core, and whether you’re doing CRO, SEO or just looking at analytics, it’s important to have a grasp of what you’re seeing.
Do you really understand the data? Is a significant result always a valid one?
To help you along, we’ll be looking at the weirdness of statistics and probability, and taking a look under the hood at the theory behind the numbers. You’ll learn what our tools do to manipulate data, how to create your own tooling for when you need something that doesn’t exist, and how to turn raw analytics into actionable business knowledge.
Let’s get started!
Sign up for SEOgadget Webinar Updates:
Host: Hello everybody. Welcome back to another SEOgadget webinar. This time we have Pete Wailes, our Operations Director, talking us through the statistics and science behind the tools we trust. So it should be a really interesting one. We’ve got about a 30-minute presentation that Pete’s going to run us through, and then we’ve got some Q&A afterwards.
It would be really good to hear from you, so please do send your questions in either via email to firstname.lastname@example.org. Tweet us on the hashtag, #SEOgadget, and we’ll do our best to answer as many questions as we possibly can.
So without any further ado, I’m going to pass over to Pete, and we’ll crack on.
Pete: Very little to do indeed. So this is “Predictably Vague,” looking at statistics and probability, how they effect us in marketing and the differences between them.
Statistics and probability, people kind of use them interchangeably, especially in the media. Fairly often I will be looking through a piece of reporting on a new scientific discovery or whatever it might be, and people are using probability when they mean statistics or statistics when they mean probability, or they are using it right but they’re not really understanding what they’re talking about.
So I thought we’d start off with just a quick definition as to what the two of them actually are, which will set us up a little bit for the rest of the presentation.
So statistics. Statistics is when you have a data set that exists already, be it analytics or it might be social data or data from an email campaign or whatever it might be, medical trials, and you infer a cause. So it’s a correlative thing. As an example, you could look at this statement.
You’d see you’ve got a data set on where people tend to die, and you’ve got the following conclusion, which is that people tend to die more often in hospitals than when they are out shopping. You can infer one of two things from that, that either people who are in a hospital are probably ill and dying, or people who are ill should go shopping to increase their chances of survival. One of those two statements is obviously true.
Probability, on the other hand, is kind of the other way around. With statistics, we’ve got data and we infer a cause. Probability takes the cause and predicts data. So say we’ve got a coin that we toss. For any particular toss of the coin, that can land heads or tails. So if we know that, we can conclude one of two things about if we toss a coin a bunch of times. We can say that either we should see heads, tails, heads, tails, heads, tails, ad infinitum as a series, or if we toss the coin ten times, we should see probably around five instances of heads and five instances of tails.
The latter is obviously the one that’s correct here, and it raises an important point between the differences of the probability of a sequence and the probability of a discrete event. But that’s something that we’ll come on to later.
So statistics can be really weird. We’re just going to go over a quick example as to how something can be a little bit counterintuitive, and we’re going to use a real world example.
So the thing with statistics is you’re never entirely certain, because what you’re doing is a correlative study and not a study where you can definitely infer a cause. The accuracy of your conclusion is going to be based on two different things, the first of which is the cohort or population density. So this is looking at the actual size of the data set that you’re dealing with and how robust that is. And the second is looking at how valid your data is.
There are all sorts of ways that you can mess up data validity. For example, if you’re dealing with testing blood pressure, if you take the blood pressure of 10 people 10 times and say that’s 100 samples, that’s very different than if you take 1 sample from 100 people. That’s an example of how you can have invalid data that looks like it’s more robust than it actually is.
So one of the ways that we can deal with statistics is using a thing called Bayesian statistics or Bayesian analysis. Now, there’s a long formula that goes into this, which is a bit hard to understand. So rather than spend time on that, we’re going to run through an example.
I’ve got a friend who’s doing cancer research, and we were talking through a little while ago the test that they use for this particular cancer, because it seems on the surface very good, except that they don’t use it in the real world, and I was curious as to why. So I had a chat with my friend, and we looked into the numbers behind it.
So as it turns out, the probability that any particular women, it’s breast cancer, in this particular instance, and the likelihood that any particular woman has this cancer is about 1 in 200, or half a percent. So if you’ve got 100,000 people, you can pretty much be guaranteed that about 500 of them will have this particular disease.
The probability that you get a true positive result, so if somebody has this disease, the likelihood that it will say that they did have it is 99%. But it also has a false negative rate. So if somebody is tested and they don’t have this particular cancer, it will say that they do have it 5%
of the time.
So bear these things in mind. It’s a 1 in 200 probability that you have it. If you have it, 99% of the time the test will say that you do have it. 5% of the time, if you’re healthy, it will say that you have it.
So let’s say that you’ve got 200,000 people. So we test 200,000 people, and 1,000 of them it will come back and say that, we will know about 1,000 of them have this disease. Another 1,000 that have this cancer, we run the test on them, 990 it says, “You have cancer.” Ten of them it will say that they don’t have cancer, so it got them wrong. Of the 199,000 who are healthy, 189,000 of them it will say, “Yes, you are healthy and that’s fine.” But it gets it wrong quite a lot of the time as well, so it says about 10,000 people have it who don’t.
So we’ve got the numbers for healthy and unhealthy and the likelihood that the test works. We multiply that out by the likelihood that you actually have it, and if we take the positive values, we end up realizing that this test is 99% accurate, but if it comes back with a positive result, there’s less than a 10% chance that you actually have that particular cancer. So this is where all these numbers start to get a bit weird.
So despite the fact that this is a test that looks brilliant, it’s actually completely useless in the real world. So you might say, “Why do statistics matter?” Well, there’s a bunch of things that we do.
So when we look at marketing and we look at the testing and optimization of systems, I think to a large extent we tend to do our tests wrong. We tend to test against everybody, and that’s awkward, because if you’re running a test that’s dealing with conversion, it may be that only 10% of the people who come to our website are ever going to convert at any given time.
So if we’ve got a conversion rate of 3% and we’re trying to move that up by testing against everybody, 90% of the people that we’re putting that test to wouldn’t convert anyway. So there’s a lot of slightly egregious data in there that’s going to make our results go wonky, and secondly, we tend to optimize based around interfaces rather than experiences. We tend to optimize the design of a site or the way a checkout is laid out, rather than the experience of actually interacting with the business, the experience of using the site as a whole.
So I’m going to give you some examples of some tests that wouldn’t be particularly worth doing, and then we’re going to go into looking at why they don’t work well and what we do to make them work better.
So examples of bad tests to run. Thus button is red and red is bad. Stop signs are red. Red isn’t a color that we tend to associate with things that you should do. Fire extinguishers are red and stoplights are red and all that kind of stuff. Therefore, if we change it to green, which is to do with go, then that would be a good thing. So let’s try that.
There are obviously two massive problems with that. Firstly, there’s absolutely no data behind it to say why that would be a good test to run, and a colorblind person is going to see exactly the same thing anyway whether it’s red or green. So there’s a percentage of the population for whom it’s never going to work.
Another type of test you could run, this checkout is a multistage checkout. There are many different components to it. A multistage checkout might introduce more friction, and that could be bad. So let’s test a short checkout. It seems like a valid idea. But again, there’s no data as to why this is what they should have done, and there’s a fully fleshed example of taking this to an extreme. It sounds like a really good test, but falls down.
This page has a title which might not be as good as it could be if it’s a header on a piece of copy. Let’s create different treatments and see which test works best.
But the problem is that in none of these examples was there any data actually used to formulate those tests, to understand why this is a thing that should be tested, or why altering this or influencing it might have a useful outcome. In none of them were any metrics stated or picked, or series of metrics and then combinatorial metric used to validate whether or not this would actually work.
So I’m going to outline what I think we should be doing when we’re doing testing. There are basically six parts to running any test scientifically, and whenever we’re doing a test in marketing, that’s basically what we’re doing. We’re doing science. So the six parts are you for, a hypothesis, which should be based on some data. You design an experiment. You collect some data. You then investigate and interrogate that data. You find a conclusion. Then you make alterations to the system that you’re looking at.
So a hypothesis, there are three things to a hypothesis to make it decent. It should be specific. You should be saying, “We’ve been looking at some data on our conversions, and we’ve noticed there’s a drop off in the funnel for conversion for checkout. It happens specifically at step two where we ask people for delivery information. We have noticed that at that point people tend to go off and look at our delivery details, and they tend not to return.”
You should also be realistic when you’re forming a hypothesis. So if you change this, is it actually likely that it’s going to do anything? So we might decide that we’re going to try testing and putting some delivery information from the delivery page into the checkout process itself, so that hopefully people don’t abandon from that point, and the people that go on further through the funnel are more qualified.
It should be probabilistic. You should be looking at the data that you’ve got and then being able to make some sort of prediction as to what sort of improvement you can expect to see. This should also be looked at from the point of view of how long do you need to run the test to have a statistically valid sample of results to know that your conclusion is solid.
That’s a different thing, and if you need to get into that kind of stuff, I would suggest Googling statistically significant calculators. There is a whole bunch of pre-available ones online. They will work that out for you.
So now we get on to experiment design. We have decided that we’re going to test our checkout. So what do we do?
We have decided that we need to alter a single KPI. Now that might be abandonments from the checkouts. It might be time spent on page. It might be completions through that particular stage to the next step in the funnel.
The reason that you need that is so that you have a reference point around which you can analyze and report and iterate on your experiment, because that’s going to then give you a single number that’s going to change as you perform your experiments to then know whether or not the experiment has been successful or not.
We then collect our data, and as we said earlier you need to collect it until statistical significance is achieved, so that has to be done.
We then investigate the data. There’s an important thing to bring up here, which is that it’s very easy to bias the results of data towards the things that we want to see. If we’re testing for a certain thing, especially if we’re getting paid for it, we presumably want to see that our test worked.
There’s a whole bunch of psychological biases where you can read into data and read into things more than is actually there. So I put in a link here to psychology.wikia.com and a list of cognitive biases. Alternatively, you can just search for a list of cognitive biases in Google, and you’ll get a link to it.
It will take you a while to go through this. There’s quite a lot of them. I would suggest you look at it as a project to look into over the course of a few months rather than necessarily something that’s going to be gone through in five minutes.
When you’re looking at data, this is a useful thing, not just for testing, but for analytics or any piece of marketing material. It will mean that when you’re looking at things, you’re better able to understand and account for the ways that you will influence what it is that you’re looking at, but not necessarily realize that you’re doing it.
So we then draw a conclusion from our analysis. So we’ve performed our experiment. We’ve got our data. We have validated whether our hypothesis was correct or not. We’ve got an idea as to how valid our conclusion is. Bear in mind, of course, this is statistics, so we’re never going to be 100% certain. But we can say about how certain we are, and we make alterations to the system.
So if we’re seeing, again, that we’re looking at conversion rates through a checkout, we can then go back and we can amend that checkout based on the conclusions of our test. We can refine that system, and we can build something that hopefully works better and converts better.
So I’m going to bring up something else here, which is that, when it comes to statistics and probability, specifically with regards to analytics and testing, there’s something that’s more important than even this, which is to actually understand the business that we’re dealing with.
There are KPIs that might be more obvious or might be hidden behind the data at first that we can sometimes ignore. So as an example, let’s imagine that we have our checkout and we have run some tests on it and we’ve got Test A, which converted at 2.30%, Test B, 3.97% and Test C at 3.73%.
It’s fairly easy to look at this and go, “Well, of course, Test B wins,” because it’s the one that has the highest conversion rate.
The difficulty is, if we break that down by channel, so we’ve got Test A, Test B and Test C, and we look at the conversion rates on a per medium basis, we get something quite different. So we can see this, for example, PPC, Test B converted best. But for organic, Test C worked best. For email, Test B worked best. For social, Test C. For none of them did Test A work particularly well, but we can see that it worked better for PPC relatively to the other two [inaudible 16:49] for organic. So we can see that.
And then what happens if we feed in the average order value for each of those different mediums? So let’s say that PPC converts at ?100 an order, whereas organic, there’s people coming through from brands and then
[inaudible 17:03] and have a higher order value. Email is people being given offers and discount cards, so they’re kind of lower, more like PPC. Social is, again, it’s multiple. It’s at the higher end.
So with those, rather than looking at just the conversion rate now, we’ve actually got the amount of money that results in for each of the different tests, and because we’ve got different average order values for different mediums, despite the fact that the conversion rate was better for B, implementing B across everything would actually generate less revenue for the company than C.
Off those figures it would generate around ?70,000 versus about
?77,500. So we have lost a fair chunk of the revenue that we could have, despite the fact that we’ve implemented the result that has the highest conversion rate.
Well, we can get a little bit more scientific than this because we can look at the channel that’s driving the traffic. We can potentially implement hybrid model so that if somebody comes in from PPC or email, they get a different version to follow than if they come in from organic or social. And by doing this, we can get a combined revenue that’s even higher. In fact, it adds more than ?7,000 extra. So if we just implemented B, which had the highest conversion rates, we’d be seeing around ?70,000 for this.
Conversely, if we had managed to have a nice hybrid solution, we’re up to almost ?85,000. So we have managed to get out another ?15,000 worth of revenue from exactly the same test, assuming our original treatment converted at 100%, B converted to 200%, the hybrid version 242%. So we have managed to create an extra 42% of revenue just by implementing a slightly different result than a different conversion rate would have indicated.
So there are some issues that we can obviously see with this, with tests and with the way we perform our tests. So three [inaudible 19:10],
the first of which is not measuring enough, which comes in two ways. You either don’t measure the thing that you actually need to to confirm whether your conclusion’s valid, so like in the last instance where we looked at conversion rate, whereas, actually, if you had looked at revenue, we would get a very different story than that. Or just not testing to a point where there are statistically significant results, and that can have all sorts of interesting problems.
If you want to see an example of that in real life, if you look up the statistics behind the right on red policy of turning at traffic junctions in the U.S. and how that was affected in the ’70s and what has happened since, that’s a fantastic example of how something that seemed to be a good idea at the time because the test wasn’t run for long enough and the conclusions weren’t particularly valid. Those results have had some fairly serious real world consequences.
The second fairly obvious failure in analysis is just simply not analyzing enough, so not digging deep enough into the data, not trying to figure out what’s the real story behind what we’re looking at. And the last one is simply not understanding statistics. There’s a lot of people who assume that because something looks right or looks intuitively correct, that they are going to go with that, and obviously that must be the case.
But as we saw with the [inaudible 20:33] example earlier, I’ll give you another wonderful example, which is called the ants on a rubber rope problem. I suggest you Google that as a wonderful example of counterintuitive maths. It’s very easy to draw a picture produced from numbers that look obvious and realistic and sensible, but are in fact woefully wrong.
So what does this mean? Well, tools are great. Google Analytics, optimize [inaudible 21:02], all these things that we deal with on a daily basis are wonderful. But if you don’t plan the tests in the marketing campaigns that you’re creating, you could end up with all sorts of noisy data.
Finally, there are non-obvious pitfalls. There are things that, it’s quite easy not to understand that they’re going to be an issue. But if you don’t really dig into the data and understand from a planning point what it is you’re trying to see and how you’re going to run that test, how you’re going to run that campaign, what’s going to validate whether it’s successful or not, then you can potentially end up skewing the results.
So let’s look at another example.
Bingo Card Creator created by Patrick McKenzie. Now, as a quick one, I’m not going to rag on Patrick, I have huge admiration for what he has done in the convergence space. A/B tests that he produces for Ruby is fantastic. The guy is far more renowned for this than I am, but he blogged publicly about this. It’s a fantastic example of where testing can go wrong, and I admire the guy for finding this out. So let’s have a look at the example.
This is the Bing Card Creator, which is one of Patrick’s products, and it’s one that he’s most known for. This is what the site used to look like and he redesigned it recently to look like this. So if we just kind of flick back between them for a minute, obviously this looks a little bit tired, a little bit dated. He redesigned it. It looks much more clean, it’s got pretty iconography, it has a much more clear call to action, the navigation has moved.
Interestingly, all the navigation and text has stayed the same. There’s no copy change. This is only a design change. It’s truly made for all of the user interface, but it’s a pretty radical overhaul. There are actually 60 different changes that have gone on here.
So what was the result of this? Well, users were more satisfied. Conversation rates went up, and sales stayed constant, despite the site looking completely different and an awful lot better. There was no increase in revenue from this.
So let’s go through Patrick’s own deconstruction of this. Either a),
the new site is converting more parents than the old one used to, since parents rarely have 15 children — it has to do with the free trial size –
and they’re simply having a happy Bingo experience and not paying, or b),
for indefatigable reasons, users simply get what they need out of the free trial and don’t convert. It’s entirely possible that any of these 60 small tweaks I had to make to the site nudged people away from hitting those limitations.
So it’s entirely possible. The reason people are having a better user experience is because the site is now less clunky, but because the site is less clunky they’re not moving over to the paid version, and because they’re not moving over to the paid version there’s less revenue; the drop off in revenue being made up for by the increase in the conversion rate, so everything stays the same.
So what went wrong with this as a test? Well, there are two things, which is a slightly wooly hypothesis. A better design would make people convert better, and from the point of converting better, it worked. People did convert better. They had a better user experience. All things that you would expect to have from the redesign and you would want, bar the increase in revenue occurred.
Undefined data. It’s a particularly interesting product in that it appears to appeal to distinct groups with very different needs, which are parents and teachers. They are both buying the same product, but they are buying for different reasons. And the problem with this is, and I don’t know how you would analyze separately for these two things, but because there’s two different user groups going on to purchase this, but they have very different motivations, it leads to very difficult data to analyze.
You can’t see what the conversion rate for the teachers cohort is against the parents cohort, so it’s entirely possible that the teachers conversion rate went up enormously but the parents conversion rate plummeted, in which case it might be worth devising a test to look at, could you have a separate product or design a user experience to appeal to parents differently to teachers. That might be something worth testing, but very difficult.
So the last thing that we’re going to look at here is the idea of optimizing everything. Sometimes the tool that you need doesn’t exist. We actually have this here all the time. There are some things that Google Analytics doesn’t do , so we built the tools to do those things. There are some tools that we would like to have at the social base that don’t exist, so we have just built them. And we had a big thought about content sharing, so we built the tool to do it.
So their hypothesis, which is that people seem to have patterns in terms of how they tweet and how they share URLs, and that content and hashtags seem to be motivated more by self-interest or by emotion. Based on that, could you collect some data that would be useful, analyze it, find out whether this is valid or not and then find out a useful way of fitting this into marketing campaigns?
So understanding the hypothesis, if we can store enough tweets that contain “I”, which denotes that it’s about the person themselves, or
“feel”, which denotes that it might be something to do with emotion, and most people tweet about things where they are in one of the two states, we can find out how people use URLs and hashtags in what they share. We can then use that to predict the marketing impact of marketing campaigns, and we can feed that into strategic decisions at the start of campaigns.
So we designed our experiments. We monitored tweets looking at self references and emotional references to give us a data set, and we collected this data. We stored just over six and a third million tweets containing the “I” and “feel”. We broke out these tweets to monitor hashtags, nearly 600,000, and URLs, about 160,000, and we had a sample of about 3.55 million users. We held the whole thing in relational database. We actually used
[inaudible 27:25] for this so that we could do SQL-based querying and do a lot of relational querying.
We analyzed it using SQL queries and Excel and we found some interesting results. So on the left we have an access for hashtags on the right and we have an access to the URL. So we found that URLs get shared less than hashtags do, but interestingly, the velocity of sharing the URLs is much greater than that for hashtags.
We looked at the times, this is over around about a six-day period. So starting in the morning there’s not much activity. It goes up to a peak in the early evening, drops off very sharply at night, picks up again in the morning and repeats fairly predictably.
Most hashtags, interestingly, when we started to look at the data, seem to bias around teens and early twenties as a demographic, so that one is interesting. So we’re going to have a look into what happened with that.
The next thing that we looked at is the number of tweets per user. Most users, in fact, 3.2 out of the 3.24, I think it was, out of the 3.25 million users. Think of an [inaudible 28:36] less than ten times. Now bear in mind we’re only looking for tweets matching specific things and we’re using sample data, so there’s not enough to use this and this.
It looks like most users don’t tweet very much. And when they’re tweeting they are tweeting about a fairly common set of topics. On the other hand, there are some users, and there’s actually one that was so far off to the right hand side of this that I left it off from the data sets, one account ended up 980 times over the course of the six days that we collected data for.
That was fairly obviously a bot, and pretty much everything that turns up more than around about 50 or 60 was either a bot, or it was an official account of a band or a celebrity fairly obviously being run by some form of PR agency.
So what can we conclude from this? Well, the vast majority of users don’t tweet very much. Most people seem to be using it to talk to themselves, they’re not using hashtags most of the time; they’re not sharing content most of the time. And interestingly, high volume doesn’t necessarily mean low engagement or vice versa. Some of the highest volume accounts were some of the most engaged with. One Direction’s account, they were actually responsible for the most tweeted URL, which is a link to their music video that came out at the same time, and that created more engagement than anything else [inaudible 29:57].
So you can get accounts that put out a huge amount of content that still get interacted with heavily. Equally you can get accounts that have very low levels of interaction online, but everything that they put out gets shared by users as well. So there seems to be a lot of brand of that happening.
Then I thought we’d have a look at shares per domain, and I have not concatenated these two together. So YouTube turns up a few times under different guises, so YouTube.[inaudible 30:25] one of those short ones, the actual YouTube domain. It’s the URL shorteners that turn up as well.
But interestingly, if you look at the results for the un-shortened and unfiltered domains, Vine turned out to be the most shared slides, which is interesting, because this is something that has only just started to come around, and if that growth continues on the trajectory that it is at the moment, it’s not hard to imagine that even if you combine the things like the YoutTube shares or the Facebook shares together, you have
[inaudible 31:00] different sites. Then presume Vine will be, by far, the most shared domain on Twitter.
What can we conclude from this? Well, most shares seem to be self promotion. Most of the things that turned up seem to be being tweeted or retweeted versions of music videos or Instagram photos from celebrity accounts and the like, the kind of stuff that you would find in Daily Mail and similar sorts of places, and fandoms completely dominated the most shared lists. Less than 10% of the domains that turned up produced pretty much 70% of the content shared.
It’s interesting that most sharing seems to be cohort-incestual. There’s very little content that got shared that actually managed to break out from any particular group of users. So if something was started off by, for instance, a One Direction member or one of their accounts, it would be very unlikely to end up being shared by anyone outside of people who self identify with the One Direction fan group. People who share the same content seem to do so over and over again, and it never really leached out.
We can also look at the hashtags that turned up themselves and what they were. It was interesting. You get obvious ones in there like “LOL” and
“FFL” and “WTF”, but you also get things like “love” and “bless” and “sad”
tweets. One that really got me was, “is that weird”, and frankly I think if you’re hashtagging a tweet as “is that weird” the answer is almost certainly “yes”.
We looked at a little more data behind it and most hash tags seem to have a half-life of about an hour, pretty much all of them were dead inside of the day unless they were connected to specific ongoing news events. For instance, “Syria” turned up over and over again, but most of them came and died.
Interestingly, many of the hashtags that turned up were recurring, so you get things like “Apollo Friday” and their equivalents that you could absolutely fund a marketing campaign around. And you get non-occurring hashtags that tend to be driven by high powered accounts. There are ones that just take off and die away, are almost invariably so, bar for one instance to the extent that we collected, driven by users that have all of these hundreds of thousands if not millions of followers, and as a result of having lots of people following them they are able to drive that level of engagement.
So what can we do to alter how we engage with social? Well, the morning is the most important time in terms of latching on to volatile content and hashtags. If you can get on early on in the day, it seems that the stuff that’s produced between around 9:00 and 11:00 seems to be stuff that then gets shared over and over through the course of the day as it propagates through.
Certain hashtags can be fed into marketing efforts ahead of time because you know they are going to come up on a certain day. So let’s say that you wanted to tag onto the Apollo Friday thing because you know that that’s going to come up every Friday, if you miss one particular week, then you’re going to have to wait until the next week to be able to run that campaign again, and it doesn’t matter that it didn’t work the first time because nobody’s seen it the first time around, so running it again doesn’t really annoy anyone. So you can just keep on going until it finally does go viral.
The result, so say we have an agency of about 10 to 15 people, if you’re trying to take this and apply it to social activity, I reckon that, loosely speaking, you could probably save something like ?90 to ?200 worth of agency time per year by limiting the time the you’re monitoring, limiting the time you’re engaging to only the times when it’s most likely to produce results.
We have obviously gained a whole lot of new insight about how people are using Twitter and what they’re sharing that we can fit into our own marketing efforts.
So that’s about it for this. It’s time for some Q&A.
Host: Yeah, thanks very much Pete. Really interesting. I guess there are two sides to this. One is looking at the stats behind the conversion rates and how you can use this information to generate more revenue, and the second aspect is how you can use it to implement your viral marketing activities, right?
So having an understanding of this, even just a basic understanding of probability or statistics, understanding what you’re seeing has far reaching effects. It’s not just, “Shall I write at all,” is it?
Pete: Yeah. This is the big thing. We work in such a data driven industry. Everybody that I know who works at this is looking at analytics, if not every day, every couple of days. And it’s interesting because it’s only getting more technical in a bunch of different ways.
We’ve got things like schema.org. Traditionally micro formats which are just limited to the realms and frankly esoteric front-end development, but that’s now a part of what SEOs have to deal with as part of their daily life, and it’s a technical subject.
So I think it’s interesting that the whole discipline of marketing is becoming more technical, and absolutely as we get more data in, as that data has to be analyzed in more and more complex ways. I don’t think you need to be a statistician or to be an expert on probability, but you absolutely need to have an understanding of this stuff because you are going to be making business decisions and advising businesses based on the data that you’re seeing, and if you don’t understand, at least, what’s being talked about, then potentially you’re going to be advising businesses to make pretty catastrophic decisions.
So in the same way that I don’t think everyone needs to become an expert in front-end web dev, I don’t think every SEO should become an expert developer, but I do think you should have an understanding of development so that when you’re dealing with development teams, you’re able to at least talk in their language and in their terms so they are able to understand the changes that you want them to make to a website.
Similarly, I don’t think you need to be necessarily an expert in statistics and probability, but when you’re dealing with data science teams, and we’re now starting to see even smaller companies have data science teams, or people who are just dedicated to analytics, to be able to advise them and interact with them in a meaningful way and produce useful marketing based on that data, you absolutely need to have a solid understanding of the numbers that are going on.
Host: Yeah, absolutely. It’s pretty much having an ability to question the numbers that you’re seeing, so you’re not just going, “Hey, wow, my conversion rate has gone up. Isn’t that amazing?” You’re actually questioning whether that’s a good thing, and having enough understanding to actually think of questioning it in the first place.
Pete: Yeah, and this is the thing. It’s very easy to solve the problem that you’re obviously presented with, despite the fact that it’s not the problem that you really need to solve, so in that example, if you’re trying to optimize your conversion rates you can optimize your conversion rate and then end up producing a result that had better return on revenue than the other test result, but it wasn’t the best that it could be simply because you’re looking at the wrong metric. You’re looking at it as, the point of doing a conversion optimization is to increase conversion, rather than to increase revenue.
Lately I have seen people produce marketing campaigns where the end goal was to increase awareness among a particular cohort. That was what the business wanted, but they didn’t communicate that quite clearly enough to the marketing department in question, and so then they are producing a lot of roundabouts, but it produces negative sentiments among the people that they are trying to get to, and just general awareness but no real positive sense among the wider community at large. Yeah, it’s a tricky one.
Host: Excellent. So if anyone else has any other questions please do feel free to tweet them out. We will get to them. We will keep running through some here until such point, but, yeah, do please send them in. It would be good to hear what you think.
So one really good question I have seen here is, “How would you go about either training someone or convincing someone that just going about the surface numbers isn’t correct?”
A lot of people tend to see things on the surface [inaudible 39:11].
What is the first step if you’re trying to train someone or educate someone in this?
Pete: This is a question that has two answers, which is if you’re looking to train someone specifically to become a data analyst or to work in analytics department, I kind of say don’t bother if they don’t have a background in stats. If you’re hiring someone for that role, really, you want to be hiring someone who has a background in mathematics, because all of this stuff gets pretty hard pretty quickly when you start getting into the formula behind it, and that’s before we go into more of these terabits.
So if they have not got some sort of background in mathematics, I would say you have probably hired the wrong person. That said, if you’re not trying to train someone for that, but to be able to understand numbers rather than to be doing the stats raw, if it’s just somebody in marketing and they need to know enough about it to be able to be producing the right answers and interpreting data correctly, I think probably the best thing to do is to start off looking at some real world examples as to how understanding numbers can produce some really good, useful benefits.
Host: Well, the case studies you have in here of where, on the surface, things look great, but when you dig behind it, actually, the revenue didn’t go up.
Pete: Yeah. It’s that kind of stuff. And equally showing them examples from how, if you misunderstand this, you can cause a lot of damage. There’s no use for you being able to use both the carrots and the stick in this, right. You can show the example I talked briefly about earlier about the statistics for right on red in America. That decision has been responsible for tens of thousands of deaths and serious injuries since the ’70s when it was introduced, all because of the time when they were looking at whether or not it would be a problem.
They looked at 20 intersections, didn’t get enough data, said that the numbers look like there’s no statistically significant increase, and then when they finally did the study much later, discovered that actually in some places the increase went up to 100% on the number of fatal and serious accidents. So there has been a huge amount of real harm caused just because someone couldn’t be bothered to find out a little bit about statistics and how to run a test properly.
Host: So perfect example, slightly horrible, but . . .
Pete: It’s a horrible example! This is the problem. I mean, when people get statistics wrong, the sad fact is that the response of that tends to be either that people die or businesses go bust, because pretty much the only domains that we tend to use it in are places where it affects lives in some sort of fairly serious way, either in business or in decisions for policy of government.
Host: This leads to a really good other question, and that’s, really, I think you can look at any one piece of data or any one number, and if you look at it from the right angle, it adds up to the answer you’re looking for.
Pete: Lie [inaudible 42:25] in statistics.
Host: Yeah, exactly. So when you’re looking at CRO, surely the trick is in combining some data or stats. If you were to pick three what would you combine?
Pete: Three pieces of data?
Host: Yeah, to get a bigger, more accurate picture.
Pete: It’s really tricky. I would rather it be a really easy answer where I can say, “This is the thing that you look at. Go and do that and then you’re golden.” But to be honest, it kind of depends on what the test is for.
To go back to the example earlier where you’re testing the funnel of checkouts. It’s pretty clear that what you’re trying to optimize there is the amount of revenue that’s generated. It’s an obvious money point in your process. But there’s also much more soft touch points, right? So let’s say that you’re optimizing the experience of people who have either never engaged with your brand before or have only just recently come to be aware of the brand when they first come to the website.
Most people who come to that page fall into that category. Optimizing the experience for them, you’re not optimizing around something that’s revenue based. You’re probably optimizing around information delivery, retention, so the KPIs for those sorts of things are wireless communications, sign ups to emails, sign ups to social, further engagement with the website, repeat visit, that kind of stuff.
It’s not the kind of thing you’re going to be able to measure using any sort of revenue KPI, especially if you have a sales cycle or a lead time that’s six or nine months long. If you’re trying to optimize around revenue for users who aren’t going to buy, if they first engage with your company at the start, they’re not going to be buying until September, October maybe. Optimizing around a metric that’s revenue based is really not going to work, so it’s really understanding the micro conversions along that path and being able to measure those goals on a granular level and optimizing the interactions that drive those things.
So looking at that as an example, you might want to try testing the placement of social links, or the placement of email sign up form, or testing a pop up for people who have never been on the site before, or you might want to test alterations to copy to try and alter people’s navigation behavior, or alter the navigation to make a certain section of the site more prominent than another if you happen to find that that section tends to entice people to come back.
For instance, something like a blog, where you get regular data content. So I would say try and understand what it is that you can measure that’s going to lead to the outcome that you want, and understand that you’re not just dealing with, not every user is the same. You’re dealing with different users in different states with different needs, so optimize around one specific cohort of users, fixing one specific thing for them, and then test what that does.
Host: Of course. So really, actually, understand how the business works or how the website actually works and go from there.
Pete: Yeah. I mean, you can turn this back into a question, which is, you know, if you were to take three metrics in these conversion states that you’re trying to alter for people who are just about to purchase, you might want to look at revenue and time taken to get to the checkout and time spent in the checkout process. Those would be three good metrics. If you could increase the basket size, decrease the time taken to get to the conversion points, and then decrease the time spent in that basket, in that checkout process, you’re probably going to result in increased revenue.
But if you’re dealing with something like a desire to get people to connect with you socially, then you’re probably looking at things like exits to your Facebook page or exits to your Twitter page, or increased propensity to tweet at your account, which are very different metrics and measured using completely different tools, and tools that might [inaudible 46:33]. So you go and look at different metrics for different things.
You probably can get it down to two or three for any particular action, but what those two or three are is going to change depending on what that action is.
Host: Cool. We probably have time for one more point. We do have one final point to make. So one thing, what would it be?
Pete: What’s the key takeaway? I think the key point with this is, being able to actually understand what it is when people talk about these things, right, whether it’s Bayesian statics or [inaudible 47:18] tests or whatever it might be, they’re not that hard to understand the theory of what’s going on. The mathematics behind it, if you want to get into all of this, gets really hard, and you can get it even at the basic level, right? Things like not passing corrections for small data sets, or how you analyze data sets where you have gappy data. That gets pretty technical pretty quickly, and if you’re not from a mathematical background, they can get pretty intimidating.
But it’s not that hard to learn the language and to start to understand if somebody says, “You’ve done a Bayesian analysis,” to understand what that actually means, what things they have looked at and to be able to understand whether they actually knew what they were talking about.
Again, I came across a blog post pretty recently on a major agency website where someone was talking about using Bayesian analysis, and they were sadly using it in a completely wrong context. I spent 15 minutes reading this thing over and over again to try and understand. I was convinced that I just missed the point; that they were talking about something and I just missed the key element of it, and if I could just wrap my head around it then it would be fine.
It wasn’t. They had obviously read a Wikipedia article or done something on Coursera or whatever, not really understood what it was that they had looked at and then started applying it to numbers like there was no tomorrow and producing all sorts of woolly data as a result.
So don’t try and get the maths. Or if you’re going to, spend enough time that you really do get it.
There’s this other example. What does being wrong feel like?
Host: Well, sometimes when you’re wrong you don’t know it.
Pete: Yeah. Well, this is the problem, right? Being wrong feels like being right, and that’s the problem. It’s very easy with maths, because if you feed numbers in, you’re going to get something out. It’s very easy to think that you understand stuff and not and still be producing data and not get that it’s not right.
So if you’re going to get into this, do it properly, take your time, accept that it’s not going to be quick and learn it properly.
Host: But start with the principles.
Pete: But start with the principles. Just learn the language. Learn what these things are actually doing. Learn that if you’re dealing with Bayesian, there should be a data set that you know about and a data set where there is something missing, and you’re making an inferred statement about the bit that’s missing.
Host: At least have an understanding that this is out there.
Pete: Yeah. Know that this stuff exists.
Host: Yeah. Cool. Excellent.
Thank you so much, Pete. That was really, really cool. Thank you to everyone who joined in. We will be sticking the slides and the recording up on the blog within the next couple of days, so do check back. If you do have any other questions just drop a comment on the blog and we will hopefully see you next time. Thank you.
Image credit: Bedtime Champ
Predictably Vague: Statistics for Marketing, SEO and Conversion Rate Optimisation,