Saturday, May 20, 2017

Sensitive survey questions

Do you steal from your employer? Do you lie on your taxes? Have you cheated on your wife?  If you want to gather statistical information about these questions, you can't ask directly.  Most respondents will lie.  I'm aware of three methods for addressing the problem, two of them are quite clever.

Bogus Pipeline

The first one is not particularly clever.  Hook the subject to a machine.  Tell them it's a lie detector even though it's not. Ask them to respond honestly and pose a few baseline questions to which you know the answer (What's your name? What day is it? etc). After each answer, have the machine indicate that it detected truth.  Now ask the subject to respond deceptively and ask more baseline questions. After each response, have the machine indicate that it detected a lie.  Now hide the machine's truth/lie indicator and ask your questions.  Most subjects will tell the truth.

This is called a bogus pipeline. It's complicated to implement, requires physical access to the subject and not as accurate as other techniques.

Randomized Response

Ask the subject to flip a coin but don't tell you what it is.  If it's heads, they should answer truthfully. If it's tails, they should answer yes (or whatever the socially unfavorable answer is).  Now ask your question.  Applying some simple math to the aggregate responses, you can accurately calculate the percentages you want to know.

This one's pretty helpful, but it requires the subject to have a coin (who uses coins anymore?).  The subject must also be smart enough to recognize that the coin gives him deniability. It seems obvious, but it's not obvious to everyone.

Unmatched Count

Construct an innocuous survey along these lines: "How many of the following statements are true about you? I own a dog. I drink coffee. I've been married. I have brown hair."  Construct a second survey, identical to the first but add your sensitive statement, "I cheat on my taxes".  For each subject, randomly give them one survey or the other.  Calculate the average answer for each type of survey.  The difference between the two averages tells you the percentages you want to know.

This one's my favorite. Since the subject only tells you their final count, it's obvious to them that they've divulged no sensitive information.  The math for analyzing the results is similarly easy.

Do you know of any other techniques?

