What the Questionnaire Score Doesn't Tell You

4/22/20263 min read

The GAD-7 and PHQ-9 have been used in clinics for decades. They are brief, validated, and widely understood. Every clinician knows them. Every client has likely seen them. They produce a number in under five minutes, and that number is defensible.

That familiarity is part of the problem.

Both instruments were designed as screening tools. Their original purpose was initial detection: identify likely cases in general care settings and flag who needs further assessment. The research is explicit on this. The GAD-7 "cannot be used as a replacement for clinical assessment." The PHQ-9 requires additional comprehensive evaluation before a clinical diagnosis can be confirmed.

Used as intended, they are useful. Used as a destination rather than a starting point, they produce a number that answers a narrower question than most clinicians realize they are asking.

What a Checkbox Can and Cannot Capture

The format of these tools shapes what they can find.

When a client answers a PHQ-9 item about feeling down or hopeless, they select from four options: not at all, several days, more than half the days, nearly every day. The frequency is captured. The context is not.

A score of 14 on the PHQ-9 indicates moderate depression. It does not indicate whether those symptoms are a response to a recent bereavement, a longstanding pattern, a medication side effect, or a trauma presentation that depression screening was never designed to detect. Research confirms this: PHQ-9 and GAD-7 scores are more sensitive to situational distress than to formal clinical pathology, meaning they can over-identify cases during acute stress without distinguishing clinical from contextual origin.

Forced-choice formats produce this limitation by design. The client selects from options the questionnaire author anticipated. Anything outside those categories goes unrecorded. The research literature on open-ended versus structured questions is consistent on this point: closed formats miss comorbidities, context, and patterns that narrative responses surface. One body of research found that structured clinical interviews identified at least two psychiatric diagnoses in one-third of patients, information that single-condition screening tools had not detected.

The checkbox tells you what the client endorsed. It does not tell you what the client meant.

Why Clinicians Don't Branch Out

The simplicity of familiar tools is also what keeps many clinicians inside them.

More comprehensive questionnaires exist. They ask about more domains, in more depth, with normative data that allows a result to be interpreted against a relevant population rather than a generic cutoff. But they produce more data. More data has historically meant more reading time, more scoring time, and more cognitive load before the interview has even started.

The practical conclusion many clinicians reach is that the efficient option and the thorough option are in conflict. The GAD-7 is quick. The alternative is another hour of work. So the GAD-7 stays.

That calculation made sense when reading and synthesizing open-ended responses was entirely a time consuming task. It no longer needs to be.

Where AI Changes the Equation

History gathering is data collection. Data collection is a task AI handles with accuracy and without fatigue.

When a client completes open-ended history forms before an assessment, they produce responses that no checkbox could have contained. They describe what their childhood felt like, not just whether it was difficult. They explain why sleep has been a problem for ten years rather than selecting a frequency band. They surface details they would not have thought to mention in a structured interview because the question was open enough to reach them.

Reading through those responses used to take time a busy clinician did not have. AI can now process that narrative, identify relevant clinical themes, and surface what is worth the clinician's attention before they walk into the room. The reading is done. The synthesis is ready. The clinician arrives at the interview with a picture, not a score.

What AI cannot do is what no tool should try to replace.

Knowing that a client's sleep disruption, irritability, and avoidance are not three separate problems but a single trauma response requires years of clinical training to recognize. That is pattern recognition across domains. It is the integration of developmental history, current presentation, and diagnostic criteria in a way that produces a clinical formulation, not an output. A client describes their childhood sleep difficulties, their hypervigilance at work, and their difficulty with emotional regulation in three separate form fields. The information is there. The clinical judgment that connects them into a coherent picture belongs to the clinician.

That distinction matters because it is frequently obscured. AI in clinical contexts is often described in terms that imply interpretive capacity it does not have and should not have. A diagnostic conclusion is not a pattern match. It is a clinical judgment that carries professional and legal weight.

The tool that informs it is not the same as the clinician who makes it.