Question Science: Failed experiments

Everyone likes to talk about successes of their research techniques and methodologies but we often tend to brush the failures under the carpet and don't like to talk about them. This is understandable, nobody likes to be associated with failure. But this silence on things can result in other people wasting a lot of time and effort going down the same road and making the same mistake. I am afraid I witnessed a case of this at a conference I recently attended where someone was talking about conducting video based online interviews which we experimented with a few years ago with not much success and I could see they were heading in the same direction.

I feel a bit guilty now that we did not publish the findings from this failed experimentation at the time and so thought I’d use the opportunity of having this blog to make up for this a bit by laying a few of our failed experiments on the table as a gesture towards encouraging more open collaboration on failures as well as successes in the market research industry...

The failed video experiments

3 years ago we invested a lot of thought and effort experimenting with the idea of video surveys where we got actors to deliver the question as we thought it might make the surveys more engaging. It was failure on several fronts, firstly the research company we worked with kept changing the questions, not an uncommon issue, in fact it’s the norm and so we had to re-record the actors speaking the script 3 times at a tremendous cost. Then, integrating these videos into the survey made the survey really slow and cumbersome to load restricting it to people with decent broadband connections, perhaps less of a problem now in many markets. Then the third factor we faced which we did not even think about when starting out was the realisation that up to a third of people were doing our surveys in public situations like offices where it would be annoying for other people to hear someone speaking the survey questions.

As a result when experimenting with this, all things combined we experienced more than a 30% drop out rate from these surveys which really rather undermined the whole point of doing this which was an effort to improve the volume of feedback.

Now it’s not to say that the video did not work as an engagement technique, it did for those that were prepared to watch it, but we found that by using a silent animation technique instead we could stimulate a similar quality of response and this approach was far more cost effective.

Difficulties of implementing virtual shopping

Another area of online research which we have had real problems with is virtual shopping. Virtual shopping is a very popular idea, and a virtual shopping module looks great in anyone’s portfolio of technical online survey solutions you always see them being shown off and demonstrated at conference events, but to be candid we have found it almost impossible to properly emulate a shopping experience online.

Here are the problems. Firstly looking at a group of products on a web page is nothing like the experience of being in a shop looking at products on the shelves. In a shop the shelf at eye level gets the most attention and the products on top and bottom shelves are not looked at so often, on a web page our eye scans from the top left to bottom right naturally (in western markets) and so there is no way you can effectively model the same experience.

The second issue is one of pure statistics. Most real shopping experiences have around 20 or so competitive products to choose between, but if you do the maths with 20 products there is only a 1 in 20 chance a test product will be selected and to get a statistically significant measure on whether one product will sell more than another you need at least 50 selections, ideally 100 which means sample cells of 1,000 to 2,000 per design variant. So if you were say, testing out 5 designs you might need to interview 10,000 which is simply not economic.

Naive to this fact when we first started creating virtual shops we were designing them with 50 to 100 products sometimes, with samples of a few hundred resulting in only 2 or 3 purchase instances of any one product which made it almost impossible to make any real sense of the data.

The third factor is one of costs, when we started out we would spend days designing these wonderfully accurate renditions of a supermarket shelf, with 3d depth effects and shadowing and even went to the length of experimenting with 3d environments which were taking 2 or 3 weeks to create with a price tag of £10k to £20k per project just to create the shelves. The simple fact though is that the average budget to design test a product is less than £10k including sample and there are often significant time constraints and so these more elaborate techniques were simply not economic.

The 4th factor is one of screen resolution. If you cram 20 or so items on a page it becomes almost impossible to get a real sense of what the product looks like, read the labels or details and this factor alone is enough to make many comparisons almost meaningless. When you are in a shop looking at products it’s amazing how much detail you can see and pick up on without even picking up the items; that is just missing when you are looking at fuzzy pixel on a screen.

The solution we have reverted to for design testing is a really quite a simple shelf with between 3 to 8 competitive products on display and have developed a virtual shopping module that enables us to create these dynamically without having to do any elaborate design work cutting down dramatically on cost and creation time.

Yet still to this day we have clients come up to us unaware of these issues, with a request to do a virtual shopping project saying they want to test out 10 designs on a shelf with 100 products, and we have to explain to them the issues.

The failed ad evaluation time experiments

We have done a lot of work looking at consideration time when it comes to making online survey decisions and it is clear that there is an underlying relationship between consideration time and uncertainty, though complicated buy a wide number of factors. One of the other things we were aware of from reading psychology books was that people spend longer looking at more appealing images. So we thought, what about using this as a test of the effectiveness of advertising. Surely if respondents spend longer looking at one ad vs. another it is likely to be more effective?

Well, we conducted a whole series of experiments to see if we could measure this with abject failure. It’s not to say that the basic premise is not true, the problem was that there are so many other more significant factors at play like how long people spend looking at an ad, from research terms viewing time seemed to be almost a random number! Confusion was an issue, the amount of visual clutter, the clarity of the message, layout factors, colours, the style of visual content and so on, all had measurable effects. Often some of the most effective ad respondents were merely glancing and some of the worst and respondent were spending ages looking, perhaps in the same way as you stare at a train wreck, so we gave up on these experiments with our tails between our legs.

Failed question formats

Now I am rather sensitive about this and somewhat defensive as we are in the game of developing more creative questioning techniques and we have developed quite a number over the last few years but it has to be said that some of them have crashed and burnt along the way.

Multiple drag and drop questions

Dragging and dropping is a particularly problematic format, especially if you want people to drag multiple items we have found that people get bored of doing it very rapidly which restricts some of the most creative applications of this technique. Dragging and dropping is a brilliant solution for single choice selection process because it can produce measurable reductions in staightlining and improved data granularity. But if say you had 3 brands and a range of attributes that you ask respondents to drag and drop onto the brands you would be much better off doing this with conventional click selections, as we have found to our loss, there is simply a limit to how many things people can be bothered to drag and drop, 2 or 3 at most before they think they have done their job.

The flying words question

We had this idea that we could make tick selection more fun if we made the options fly across the screen and ask people to click on them as they went past. It took one experiment to realise it was not going to be a very usable question format. A combination of bemusement amongst respondents seeing the options flying past and the realisation that not everyone is 100% focused, 100% of the time resulted in about half the number of clicks being registered compared to a traditional question.

Opinion Snowboarding

This is a question format where respondents snowboard down a hill and pass through gates along the way to indicate their choices. It looked fantastic, I was very excited when we first developed it. Most respondents thought it was a fun way of answering the questions. The problem was that in world of survey design most is not enough, around 15% of people found it totally annoying and what we got out of the back of it was really quite chaotic data from people who could not make up their minds in time. We tried to slow it down to give people more time to think but that just started to annoy another group of people who became frustrated with the time it was taking.

We have not given up on this question format yet though, we have be working on a simplified version using a single poll that respondents ski to the left or right of which seems to perform much better and we feel it may have an interesting use for implicit association style research were you force people to make quick decisions but as a swap out variant for a conventional grid question forget it!

Content

Friday, 16 September 2011

Failed experiments

1 comment: