## Saturday, 26 April 2014

### How many people need to answer that question? When we set up a survey we tend to think about how many people should answer the survey survey as a whole.  I would challenge  you to think a bit beyond this and think about how many people should answer each question in your survey, as there are potentially significant savings in survey lengths that can be achieved by optimizing the sample for each question.

A typical  survey might be sent out to around 400 respondent, in many respect this is a bit of an arbitrary figure,  the number of people who need to answer each question in any one survey to get statistically significant answer might range from anywhere between 5 and several thousand.

The number of people who need to answer each question is based on the level of variance in the answers respondents give.  Say I am testing 2 ads to find out which is better and the first 5 people I interview all prefer ad A over ad B, there is a 95% certainty that ad A is going to be preferred everyone, so job done.  If on the other hand I ask respondents to evaluate a brand on 15 different brand metrics using a 5 point likert rating scale and the score range from 3.0 to 3.5 (which is not uncommon level of differentiation) to pull apart the differences between these 15 metrics completely you would need to interview around 5,000 people.

In an average survey there are a range of questions between these two extremes and so it would make sense to stop thinking about your sample requirements for a whole survey but your sample requirements at a question level.

Now the problem is that it is difficult in advance to know exactly how much sample you will need for each question because to work it out accurately requires some data.

The solution to this is a more iterative survey design approach, where you don't set your sample quota's until you have sample enough people to estimate the sample size requirements.   This can be easily done by instead of sending out your survey in one go, you send it out in 2 batches. You send out the first batch to what I would normally recommend 100 respondents, pause the survey, this will give you enough data to roughly assess the sample requirements for each question, you can then set quota's on each question for the second batch of sample.

Now there are obviously a few things you need to consider, for example how you are going to sub divide up the data for example if you are going to want to analyse some sub demographic trends in the data for any one question, e.g. compare men v women or look a age split for each of these groups you will need a minimum sample so you may need to double or even quadruple your basic sample requirements for some questions to account for this.

When you do this across a survey you get a chart like this example below:

In this example you can clearly see that there are some questions that require a lot more sample than others.

If you were say interviewing 400 respondents in total then some of these question you will already have enough data on from the first batch of responses and some of the others need only be answered by 1 in n of the respondents.  What this mean is that if you randomize at a respondent level who answers each question the survey overall gets shorter for each respondents.

So how do you actually work out sample sizes for a question?

There is a relatively basic formula that you can use to calculate the minimum sample size for a question:

Minimum sample size = [(Standard Deviation x Z)/Acceptable Error]2

Z is the factor that determines the level of statistical confidence you wish to apply. For 90% I would recommend Z = 1.64, and for 95% Z= 1.96.

You can see from this formula its all related to standard deviation and the level of variance in the answers which is how you set the acceptable error  (in the brand example I quoted above, if the overall variance in answers in 0.5 and there are 15 metrics to differentiate them all the "acceptable error" would be around 0.03 (0.5/15) .

.

1. 2. 3. 