Survey text analysis beyond word clouds

Quick disclaimer/TLDR.

This article is written for fairly technical people who love detail. If you don’t, here are a few key things you can take away:

  • Traditional survey text analysis is basically limited to word clouds (which were cool in 2001) and yield little to no actual insight from your hard earned opinion data.
  • If you think machine learning will fix it, you’re close, but survey text alone isn’t rich enough to yield a useful output.
  • Networked surveys are a new kind of survey that create way richer insights from your open-ended responses.
  • Networked surveys multiply text data per-respondent 27x on average and provide higher ROI on your panel investment.
  • Networked surveys automatically discover respondent personas based on their behavior and reactions (not just text).
  • Networked surveys mean change, and if you don’t like change, stick to what you know.
  • If you want to get started, or learn more about networked surveys in a more market-y way, click on this.

Alright, game on.

Survey text analysis is limited.

Are you a technical/analytical thinker? Do you run surveys often? If you are, and you do, then you probably already know where they are strong and where they have limits.

One of those limits is text analysis, and as markets move to contextual insights to comply with GDPR, this barrier can be paralyzing. At the center of the challenge is the very nature of survey text data collection.

You invest cash to get a representative sample of your market, collect answers to open-ended questions, and then invest again in time spent tagging each of the ~3,000 opinions you collected for keywords and themes. If you spend one minute tagging each opinion, that’s 50 billable hours ($7,500 @ $150/hr) spent on tagging alone. You spent all of that time tagging text and what you got back was… the number of times you tagged your text… the depth ends there.

Survey text machine learning is limited.

Now, you might say, ‘but what about natural language processing (NLP) and machine learning (ML) approaches to analyzing text for sentiment?’ The problem with survey opinion data is that it’s flat text, with no metadata. Because you’re feeding the machine learning algorithm short-length, flat text with no additional descriptors, there’s just not enough training data for the ML algorithm to yield useful or relevant insights.

If you’ve ever made use of these machine learning text analysis tools for surveys, you’ve probably already been disappointed by the obviousness of the conclusions they draw. You’ve likely also had to manually adjust outputs, because your human perspective is more relevant to the problem you’re solving. That human perspective is what makes networked surveys so powerful. But, before we get into what they are and why they’re useful, let’s travel back in time to simpler days.

Remember when search engines sucked?

If you’re ancient enough to remember Lycos, Ask Jeeves, and Yahoo! search, you remember that they were pretty basic. Put a keyword in the box, and if the keyword appeared most often in the text of a result, that result displayed highest for your search. That was it – flat text-level thinking. This led SEOs of yester-year to stuff keywords into content in hopes that it would rank higher, and it worked… This made for a frustrating search experience and was the de facto standard until Google entered the market with a simple and obvious discovery, links.

They realized that text alone wasn’t enough to determine that a result was relevant to your search. There was a trove of additional non-text human behavioral signals in the form of links between web content that formed a massive network. This network dramatically improved the quality of search results and set a new bar for what search engines needed to be.

Networked surveys go beyond text-level thinking.

Ok, back to 2018. Traditional survey text analysis is a lot like Lycos right now. You tag your text, and those tags that show up most often get ranked higher in your report (or worse, your word cloud, but don’t get me started there). That’s where your insight really ends – frequency – making your text analysis output about as high-quality as a circa-‘95 search result.

Now let’s talk about networked surveys. Networked surveys work exactly the same way as a traditional survey in terms of sampling and survey distribution. You log into the software, create a survey, get a link, and share it with your panel. They integrate with panel recruiters like Research Now/SSI, and panel market places like They support Google analytics URL tracking parameters for digital survey recruitment attribution. They support demographic exclusion rules to filter out bad sample fits and save you budget. All of the basics are covered, including basic survey question types (like Likert scales, multiple choice single-answer, multiple choice multiple-answer, open-ended, etc.). They also support a new question type: the networked question.

Networked questions spin-up miniature disposable social networks inside of your survey. Sounds badass, but what does it mean? More specifically, the simplest version of a networked question takes an open-ended text response, and lets other respondents react to it along a rating scale (from positive to negative, agree to disagree, not interesting to interesting, etc.).

Example of a networked survey in action:

These non-text human rating signals between opinions and respondents create a large network of opinions (the opinion network) and have a huge multiplying effect on text data. Now, in addition to each respondent’s open-ended text, you get on average an extra 27 open-ended data points with that respondent’s reaction to each on a scale.

Here’s an example of an opinion network (nodes are opinions, edges are shared respondents).

That many more qualitative data points per respondent means a significantly increased depth of knowledge when multiplied across your sample and a much bigger return on your survey investment. But the increased quantity of qualitative data isn’t the only advantage networked surveys give you. The extra data enables much richer discovery of patterns that only occur in the network of opinions that forms (e.g. Craig likes pizza, but not spicy pizza, but spicy tacos are yum).

Here’s a comparison between traditional and networked (you’ll want to scroll over it)


Networked surveys let you cluster your tags together to form respondent segments.

Now here’s something cool you can do with all of your tags. Because networked surveys track respondent scores for each opinion, and those scores can be aggregated by tags, we can ask questions like “which tags tend to be scored similarly by the same people?” If tags have high score similarity and respondent overlap, we group them together into segments and find useful patterns that are not text-dependent. For example, what do you think pool goers care about in parks? Take a minute.

Ok, times up, were canopies on the list? It might seem pretty obvious once you hear it and it makes sense, but, I know it at least wasn’t top of mind for me. Did you know people who care about fitness stations also care about the same things? Not to mention other seemingly unrelated things like “invasive control.”

Here’s an example of a single respondent segment (“the pool goer”). The bars represent levels of agreement with each tag/factor.

Networked surveys also measure complex behavior like persuasion.

More advanced versions of networked surveys can even measure persuasion. Take the Net Promoter Score for example. If you ask an NPS scale question, followed by an open-ended, networked surveys make it possible to segment customer loyalty feedback along an extra dimension that traditional surveys can’t access. You get to find out which opinions were exclusively those of your promoters (your promoter tribe), which were written by your promoters and got your detractors to agree with them (positive persuasive), which opinions were exclusively those of your detractors (your staunch opposition), and which were written by your detractors that got your promoters to agree with them (negative persuasive/leaks in your dam). This means you can tell which opinions persuade your market to (and not to) buy your product, recommend your brand, vote for your candidate, use your service, engage in your workplace, donate to your cause… etc.

Here’s an example of how networked surveys can be used to increase donations.

Networked surveys are actually pretty straight forward to learn and run with but they’re not for people who don’t like change. If you’re cool with word clouds and basic validation of your assumptions, there’s nothing wrong with a traditional survey. But, if you’re an innovator, and angling for a way to get past word clouds, you have little to lose and a lot (27x) to gain.

This has been fun! If you’re planning research for yourself or a client and want a trial, contact us below and we’ll help you figure out the right setup.

~ Alan


Survey Research Bias: Finding THE Answer vs. YOUR Answer

This is a look into an example of survey research bias, survey tactics and methodology, and how they can actually affect your brand perception. I tried to do this without political judgment or taking sides. 

Have you ever seen this type of poll (below)? I received this one a few months ago, and then again recently in an email from the Republican National Committee. Just a quick look shows key problems in the survey’s execution:

Survey Research Bias

Simply based on what I’m seeing in my inbox and on social media, this kind of survey is becoming more popular: one question, multiple choice, I only need 15 seconds of your time…

I don’t know the ultimate objective of this particular survey (I’ll get to that point later), but I find three big problems with a survey constructed this way:

  1. Researcher bias
  2. Respondent bias
  3. Inconsistent messaging

In this post I examine all three of these problems, with examples and suggested quick fixes for each. For more information on how our Networked Survey™ technology helps limit researcher and respondent bias, take a look at Agreeable Research solutions.

Addressing researcher bias

It comes in many different forms, but at a high level, researcher bias occurs when survey creators knowingly or unknowingly influence the answers respondents provide (through survey design or methodology), or direct the subsequent analysis of the survey data toward desired results.

In short, it’s how question-askers influence question-takers, or how survey data analysts guide the creation of insights.

Looking at this example, if respondents are only allowed four options to rate the president’s performance—Great, Good, Okay or Other—the results will be heavily skewed toward the apparent desired result of the questioner (i.e. that the president’s job performance so far has been “Okay” or better). Even if enough respondents answer “Other” and provide reasons why, the default positive answers shed light on the results that this survey seeks to gather.

This issue is known as “wording bias,” where the words or options provided have the power to shape respondent answers and ultimately the survey results. Therefore, it’s difficult to see the results of a survey like this as wholly accurate, or accurately representative of the audience being surveyed.

QUICK FIX: The survey can be better constructed using a variable range of answers, like a Likert-type scale, where 1 = Poor and 5 = Great. This limits the influence of the survey language on respondent answers, and the survey creator can still allow participants to submit verbatim answers.

Addressing respondent bias

When thinking ahead to the ultimate value of this survey’s data, there are two major considerations on the side of the respondent (question taker). The first is the respondent pool itself. The email I received begins with this message: “The President has asked us to reach out to some of our top supporters for a one-question poll, and as one of our best, you’ve been chosen to participate.”

Again, no political judgments, but the only reason I got this email is that I’m on the RNC email list. That fact in itself does not determine my level of support for the organization (remember this when I discuss the objectives of the survey).

However, their email shows that the RNC wants to measure the president’s job performance rating only among its top supporters. This is not necessarily an issue, so long as the results are accurately positioned as such.

Survey Research BiasBut when you click on any answer in the email, you’re taken to a page titled “Official Presidential Job Performance Poll” (pictured). This title insinuates a level of authority, definitiveness and objectivity—it is “official,” after all—that’s inconsistent with the limited respondent pool.

While this issue can be addressed with better messaging and positioning, the issue of “sponsor bias” cannot. Sponsor bias occurs when respondents know (or think they know) who’s asking the survey questions. This often results in respondents providing answers that are skewed by their feelings toward the survey sponsor.

Since I received this email from the RNC, and the email ends with the sentence, “We’ll be sure to pass it along to President Trump,” my responses may be influenced by my sentiment toward both parties.

QUICK FIX: If continuing with this methodology, position the survey and its results as a study of RNC supporters. If looking to create an “official” performance rating, work with a third party to administer the survey anonymously so respondents don’t know who’s gathering the information.

Addressing inconsistent survey messaging

As I mentioned, the email I received begins with, “The President has asked us to reach out to some of our top supporters…” and ends with, “We’ll be sure to pass it along to President Trump.” All of this leads me to believe the RNC is working to better inform and provide research to the president.

However, the footer of the email (pictured) contradicts this thought entirely, stating this email is “Not authorized by any candidate or candidate’s committee.”

Survey Research Bias

For context it should be noted that on Inauguration Day, President Trump filed the paperwork to be an official candidate for re-election. So it’s further confusing that when I reply to the email, my message is directed to “”

If the RNC’s email is not authorized by any candidate or candidate’s committee, then why is the president requesting this survey outreach, and how is the RNC authorized to direct funds to “Donald J. Trump for President”? This could be a semantic argument along political, legal or regulatory lines. Regardless, these points of inconsistency would make me suspicious as to the motives of the survey issuer, the objectives of the survey, and any findings coming out of the research.

QUICK FIX: Draft clearer language as to the motivations and objectives of the survey research, what it will be used for, and how its results will be shared. In addition, ensure coordination between the sponsor of the survey, the email communications and the response mechanisms.

The objectives of the survey

Many of the issues I brought up in this post can be addressed or cleared up by determining the ultimate objectives of the survey itself.

  • The “wording bias” could be a deliberate tactic to generate a study with more favorable results to share within the Republican Party.
  • The limited respondent pool could be specifically used for a survey designed to provide insights into RNC supporters.
  • The “sponsor bias” could be a concerted effort by the RNC to be more transparent in its data gathering initiatives.

All of these are completely valid.

One consideration I didn’t address was the valuable information this kind of survey would provide the RNC about its email list. Remember I said earlier that the only reason I got this email is that I’m on the RNC email list. They likely didn’t have very much information on my level of support for their organization or the president.

By having respondents select among their options—Great, Good, Okay, Other—members of their email list are self-selecting into categories based on their approval of the president’s job performance. With this information, the RNC can more effectively target email communications to each category, and even use these email addresses to create custom audiences on Facebook and Twitter for more effective ad targeting.

If this is the ultimate objective of their survey outreach, then they will most likely gain the kind of data and insights they’re looking for. Other than this, the RNC should reevaluate its research tactics and methodology to produce more accurate and defendable findings.

Interested? Let Us Share More Materials With You!

AI and the Opinion Network

Our Vision for Artificial Intelligence

Artificial Intelligence is automated enlightenment. It has the power to solve hard problems that are normally handled by people, because it’s trained by people. It is a mirror onto ourselves and as a result an incredible catalyst for human innovation. For that reason, we have decided to double-down on our vision:

To train social-facing Artificial Intelligence systems with a deeper qualitative understanding of people, using our Networked Survey™.

Individuals have opinions on a near infinite number of topics. Groups of people do too, and there is significant overlap. We can imagine connecting people based on the opinions they have in common. If we transposed our thinking, we would find relationships between opinions based on the people that align with them. This network of connected opinions is what we call the “Opinion Network.” The Opinion Network is ancient; but, fairly new terrain. It needed a new research methodology, so we developed a solution, the Networked Survey. Networked Surveys allow us to sample and map the opinion network from topic to topic.

Because the Opinion Network is pervasive across verticals, segments, and cultures, it fits nicely as a general source of training data for machine learning applications. With this in mind, our AI strategy is unique in that it is not centered on the end goal of building systems that autonomously make decisions.

Predictions made by machine learning algorithms are only as good as the training data informing them. For this reason, our AI strategy is to support its development across an unlimited field of qualitative applications using our Networked Survey technology.

We are already making headway proving this strategy out with two major brands, each developing feature selection and data collection strategies using our Networked Survey technology. Active applications range from predictive modeling of consumer behavior to qualitative team alignment inputs for a global brand.

We are also making strides in developing a normative set of training data on topics relevant to our user base made up of strategists, researchers, and marketers at global brands, political organizations, and membership organizations. Agreeable will iterate on these topical areas in a longitudinal study that maps small mutations to the Opinion Network over time. We plan on launching this initiative Q2 2018 and making results available quarterly to our customers.

We are truly excited to be a part of the growing field of Artificial Intelligence.


~ Alan

Alan Garcia
Founder & CEO, Agreeable Research

Interested? Let Us Share More Materials With You!