*Note: This is the first in a series of blog posts about the computational analysis of survey questions and responses. You’re encouraged to read the other posts for greater context.
Part II: Preparing Text for Analysis with Natural Language Toolkit (NLTK)
Part III: How to Find Near Duplicate Text and Recognize Name Entities in Survey Responses
Best practices for constructing open-ended questions
Open-ended survey questions are those that allow a free-form response in the participant’s own voice. While they provide important nuance and additional information that may not be revealed via close-ended questions (i.e., those with fixed response options), they are more difficult to assess. Typically, analysis is labor intensive involving qualitative categorization or the application of decision rules by a human to create a quantitative measure (i.e., content analysis). At its worst, “analysis” may entail the unsystematic selection of examples, a practice that is scientifically indefensible.
Recently Evolytics was asked to analyze survey data with approximately 68,000 open-ended responses to nine survey questions. Clearly the scale of this data meant that it was not amenable to manual analysis. In this post, we’ll consider best practices for constructing surveys with open-ended questions that can be computationally analyzed. We’ll consider specific computational techniques for analyzing text in later posts.
Many of the guidelines for good survey construction also result in open-ended questions that are easier to computationally analyze. Ideally, for analysis we want as many responses as possible because unlike manual methods we can use all of the data. Additionally, we want responses with rich detail, a diverse vocabulary, and are somewhat long in length. Therefore, we must consider how to elicit these types of responses.
First, limit your questions to what you really want to know. Would you be more likely to respond to two or nine open-ended questions? Since each open-ended question requires a high level of effort, most people would be more likely to respond to two rather than nine. Similarly, take care to avoid redundant questions. Consider the following questions drawn from a real survey:
- “Please explain why you are likely to recommend this product to others.”
- “What do you like about the product?”
Although superficially different, the reasons you would recommend a product to others probably significantly overlap with the things you like about the product. In short, by only asking essential questions you can maintain the good will of the participant and get more and better responses.
Second, carefully consider how you phrase your open-ended responses. A good open-ended question encourages the participant to elaborate with more details. Think of it as an invitation to tell a story! Which do you think would elicit a better response:
- “What do you use our product for?”
- “Describe to us how you use our product?”
A participant might respond to the first question with a list, while the second encourages them to provide contextual details. If it’s possible for the participant to answer the question with a discrete answer or list, it’s probably not a good open-response question.
Finally, if applicable, structure your questions in a way that makes it easy to attribute the responses to a given brand or product. I once analyzed a survey which used open-ended questions to ask participants to list up to five competitor brands, rate them, and then explain their ratings. The problem was that only one response field was available for participants to explain their ratings for all of their listed brands! This is a cousin of the double-barrelled question as it makes it exceedingly difficult to attribute the response to a particular brand. In other words, pay attention to how your survey software is structuring the data on the backend to allow for response attribution.
The next posts in this blog series discuss preparing text for analysis and finding duplicate text and named entities.