LBL Depression Test methodology — instrument, scoring, limits

1. Instrument selection & naming

The screening questionnaire used in this tool was developed by Kroenke, Spitzer, and Williams in 2001, published in Journal of General Internal Medicine (16:606-613). It is a 9-item self-report instrument deriving directly from the 9 DSM-IV major depressive episode criteria, with each item rated 0-3 on a frequency scale over the past 2 weeks. It is the most widely-used brief depression screener in the world, with over 100,000 citations across psychology, sleep medicine, primary care, occupational health, and psychiatry.

Why this instrument over alternatives

Brevity. 9 items completed in 2-3 minutes. Beck Depression Inventory has 21 items and is impractical for primary-care screening; Hamilton Depression Rating Scale is clinician-administered.
Validation depth. Cronbach's α = 0.89 in original (Kroenke 2001). Sensitivity 88%, specificity 88% at cutoff ≥10 for major depression. Test-retest reliability r = 0.84.
Standard cutpoints. 0-4 / 5-9 / 10-14 / 15-19 / 20-27 are used by NHS IAPT (Improving Access to Psychological Therapies), VA/DoD clinical practice guidelines, and APA major depressive disorder guidelines.
Permissive licensing. Pfizer (which funded the original PRIME-MD project) explicitly grants "No permission required to reproduce, translate, display or distribute."
Direct DSM mapping. Each of the 9 items corresponds to one of the 9 DSM major depressive episode criteria — symptom-by-symptom transparency.
Built-in suicidality item. Item 9 captures self-harm ideation directly — this enables crisis-aware design that other depression screeners (HADS does not include this) cannot match.

Naming convention used in this tool

Public name: "LBL Depression Test"
Internal code: "LBL-DEP"
Glossary entry slug: /glossary/depression-screener/
Academic attribution: Always credit Kroenke, Spitzer & Williams (2001) in references and methodology. First in-prose mention says "the 9-item depression screener developed by Kroenke, Spitzer & Williams (2001)"; subsequent mentions say "the screener" or "the LBL-DEP instrument."
What we do not do: Lean on "PHQ-9" as user-facing branding. The acronym appears only inside reference list citations where it is part of the original work's title.

2. The 9 items + functional impairment item

The standard implementation includes the 9 symptom items (scored, sum 0-27) plus a 10th functional-difficulty question that is not added to the score but informs interpretation.

#	Item	Cluster (author choice)	DSM-V criterion
1	Little interest or pleasure in doing things	Anhedonic	A2 (anhedonia)
2	Feeling down, depressed, or hopeless	Cognitive-emotional	A1 (depressed mood)
3	Trouble falling or staying asleep, or sleeping too much	Somatic	A4 (sleep)
4	Feeling tired or having little energy	Somatic	A6 (fatigue)
5	Poor appetite or overeating	Somatic	A3 (appetite)
6	Feeling bad about yourself, or that you are a failure	Cognitive-emotional	A7 (worthlessness)
7	Trouble concentrating	Cognitive-emotional	A8 (concentration)
8	Moving/speaking slowly, or fidgety/restless (psychomotor)	Somatic	A5 (psychomotor)
9	Thoughts that you would be better off dead, or of hurting yourself	Self-harm (singleton)	A9 (suicidality)

Functional impairment item (not added to score):

If you checked off any problems, how difficult have these problems made it for you to do your work, take care of things at home, or get along with other people?

Response options: Not difficult at all (0) / Somewhat difficult (1) / Very difficult (2) / Extremely difficult (3). This item is captured but not summed. It informs the results copy: a moderate-band score with "extremely difficult" functional impairment is interpreted as warranting more urgent follow-up than a moderate score with "not difficult at all."

Reference period: Past 2 weeks. Response options for items 1-9: 0 = Not at all, 1 = Several days, 2 = More than half the days, 3 = Nearly every day. Total score range: 0-27.

3. Scoring algorithm

The scoring algorithm follows Kroenke 2001 directly. Pseudocode for the implementation:

// Inputs: responses[1..9] each in {0,1,2,3}; functional in {0,1,2,3,null}

// Total score
total = sum(responses[1] + responses[2] + ... + responses[9])

// Sub-dimension scores (author choice — see §7)
cognitive  = responses[2] + responses[6] + responses[7]      // max 9
somatic    = responses[3] + responses[4] + responses[5] + responses[8]  // max 12
anhedonic  = responses[1]                                    // max 3
suicidality = responses[9]                                   // singleton, not in profile

// Band assignment
if total <= 4:        band = 'minimal'
elif total <= 9:      band = 'mild'
elif total <= 14:     band = 'moderate'
elif total <= 19:     band = 'mod-severe'
else:                  band = 'severe'

// Item-9 crisis modal — triggered immediately on any non-zero response
on item_9_change(value):
    if value > 0:
        open_crisis_modal(tier = (2 if value >= 2 else 1))

// Functional impairment override
if functional == 3 and total >= 5:
    show_functional_override_note()

// Severe-band clinician prompt
if total >= 20:
    show_severe_clinician_prompt()

The implementation is straightforward: integer addition with no weights, no missing-data imputation (the user must answer all 9 items to enable the submit button), and clear band cutoffs.

4. Five severity bands

The 5-band structure comes directly from Kroenke 2001 and is used by NHS IAPT, VA/DoD, and APA. We do not modify the cutpoints.

Band	Score	Label	Clinical interpretation
1	0-4	Minimal depression	Symptoms unlikely clinically significant. Most adults score here.
2	5-9	Mild depression	Some symptoms, subthreshold for major depressive episode (MDE). Watchful waiting + self-care typical.
3	10-14	Moderate depression	Probable MDE per the standard ≥10 cutoff (sensitivity 88%, specificity 88%). Active treatment warranted.
4	15-19	Moderately severe depression	Active treatment with therapy and/or medication usually appropriate.
5	20-27	Severe depression	High symptom burden. Combined treatment (therapy + medication) often warranted. Professional consultation strongly recommended.

5. The cutoff debate: ≥10 vs ≥8

The standard cutoff of ≥10 has been validated repeatedly. Manea, Gilbody & McMillan's 2012 meta-analysis of 18 studies confirmed the ≥10 cutoff has the best balance of sensitivity (0.78-0.88) and specificity (0.85-0.94) across primary care populations.

Cutoff	Sensitivity	Specificity	Trade-off
≥8	0.93-0.96	0.71-0.78	Maximizes detection at cost of false positives
≥10 (standard)	0.78-0.88	0.85-0.94	Best balance per Manea 2012
≥12	0.65-0.75	0.92-0.97	Conservative — minimizes false positives at cost of missed cases

Some research contexts use ≥8 to maximize sensitivity in screening protocols where missing a case is more costly than over-flagging (e.g., perinatal depression screening). For most clinical and self-screen contexts, ≥10 remains the most-cited and clinically established cutpoint, and is what this tool uses for the moderate-band threshold.

7. Sub-dimension symptom profile

Author choice The sub-dimension symptom profile is an interpretive framework added by LifeByLogic. The original instrument is largely unidimensional in factor analysis. This is documented transparently rather than presented as a validated subscale.

Three dimensions (3/4/1 split)

Dimension	Items	Max score	What it captures
Cognitive-emotional	2, 6, 7	9	Depressed mood, worthlessness/guilt, concentration
Physical / somatic	3, 4, 5, 8	12	Sleep, fatigue, appetite, psychomotor
Pleasure-motivation (anhedonic)	1	3	Anhedonia (loss of interest/pleasure)

Item 9 (self-harm ideation) is not included in any sub-dimension grouping. It triggers crisis escalation and is reported as a singleton in the results.

Why these three dimensions specifically

Clinical depression treatment differentiates strongly between intervention classes:

Cognitive-emotional dominant → CBT for depression, cognitive restructuring, IPT (interpersonal therapy) — strongest evidence per Cuijpers et al. 2013 meta-analysis (115 studies, large effect sizes).
Somatic dominant → behavioral activation, exercise, sleep stabilization, SSRIs/SNRIs — strongest evidence for somatic-loaded depression per Cooney et al. 2013 (exercise) and Hollon 2005 (medication vs CBT relapse prevention).
Anhedonic-motivational dominant → behavioral activation specifically, novel reward exposure, Brief Behavioral Activation Treatment for Depression (BATD; Lejuez 2001) is a structured 8-15 session protocol with strong RCT support.

Why this is presented as an author choice rather than a validated subscale

Factor analysis on the original instrument finds primarily a single-factor structure (Kroenke 2001, Cameron 2008). Some studies find a 2-factor solution (cognitive-affective + somatic), but no replicated 3-factor structure exists in the literature. The 3-dimension grouping in this tool is interpretive — useful for matching pattern to evidence-based intervention class, but not a validated psychometric subscale.

8. Asymmetric scaling for archetype matching

Author choice The 1.0 / 0.75 / 3.0 scaling factors used in archetype matching are author-derived to bring all three sub-dimensions to comparable theoretical maximum.

The three sub-dimensions have asymmetric maximum scores (cognitive-emotional max=9, somatic max=12, anhedonic max=3) because they have asymmetric item counts (3 / 4 / 1). For archetype matching to work — i.e., for any of the three dimensions to be capable of dominating when it is the leading edge — they need to be brought to comparable scale.

The scaling factors

Cognitive scaled by 1.0 (baseline, max 9)
Somatic scaled by 0.75 (= 9/12, brings max to 9)
Anhedonic scaled by 3.0 (= 9/3, brings max to 9)

Why asymmetric scaling is necessary

Without scaling, somatic dominance would be over-easy to achieve (4 items at max gets 12 vs cognitive's max of 9), and anhedonic dominance would be near-impossible to achieve (max 3 vs cognitive's 9). The result would be archetype assignment biased toward "Depleted" (somatic-dominant) and against "Disconnected" (anhedonic-dominant), even when the underlying symptom pattern doesn't warrant it.

Mathematical example

User scores: cognitive=6, somatic=8, anhedonic=2 (total=16).

Without scaling: somatic 8 > cognitive 6 > anhedonic 2 → "The Depleted"
With scaling: somatic 8×0.75=6.0; cognitive 6×1.0=6.0; anhedonic 2×3.0=6.0 → tied; first-match-wins logic returns "The Inner Critic" (cognitive)

The scaling reveals when the underlying symptom pattern is genuinely balanced vs genuinely dominated by one dimension. This is documented as an author choice because no published precedent for these specific scaling factors exists; they are derived from the asymmetric maximums alone.

9. Five-archetype framework

Author choice The 5 archetypes are LBL-derived interpretive frameworks, not a published clinical typology.

Five archetypes match symptom profile to evidence-based intervention pathways. Order matters — first match wins, more specific archetypes tested first.

Archetype	Trigger condition	Primary intervention class	Key citations
The Steady	total < 5	Maintenance practices, watchful waiting	—
The Inner Critic	cognitive_scaled ≥ somatic_scaled AND ≥ anhedonic_scaled	CBT for depression, cognitive restructuring, MBCT, IPT	Cuijpers 2013, Kuyken 2016
The Depleted	somatic_scaled > cognitive_scaled AND ≥ anhedonic_scaled	Behavioral activation, exercise, sleep stabilization, SSRIs/SNRIs	Cooney 2013, Hollon 2005
The Disconnected	anhedonic_scaled ≥ cognitive_scaled AND ≥ somatic_scaled	Behavioral activation, BATD, novel reward exposure	Dimidjian 2006, Ekers 2014, Lejuez 2001
The Pervasive	(default) — multidimensional, no single dimension dominates	Combined therapy + medication, MBCT, integrated CBT-D	Hollon 2005, Kuyken 2016

Pathway count per archetype

Each archetype carries 5-7 evidence-based pathway recommendations:

The Steady: 5 pathways (maintenance practices, re-screen, strengthen what's working, read about risk factors, build support map)
The Inner Critic: 7 pathways (CBT-D, cognitive restructuring, MBCT, IPT, compassion-focused therapy, internet-delivered CBT, defer to clinician at moderate+)
The Depleted: 7 pathways (behavioral activation, cardiovascular exercise, sleep stabilization, SSRI/SNRI consultation, light therapy, nutritional review, caffeine/alcohol audit)
The Disconnected: 6 pathways (BATD specifically, activity scheduling with reward monitoring, novel reward exposure, social re-engagement, mindfulness for noticing pleasure, prescriber consultation re: emerging treatments)
The Pervasive: 6 pathways (combined therapy + medication, MBCT, integrated CBT-D, therapist consultation, intensive outpatient programs, Unified Protocol)

Total: 31 evidence-cited pathways across the 5 archetypes.

10. Diagnostic probability per band

Author choice The approximate confirmed-MDE rates per band (1/10/50/75/90 in 100) are derived from the Kroenke 2001 ROC data and rounded for intuitive interpretation.

Band	Score range	Approximate confirmed-MDE rate
Minimal	0-4	≈ 1 in 100
Mild	5-9	≈ 10 in 100
Moderate	10-14	≈ 50 in 100
Moderately severe	15-19	≈ 75 in 100
Severe	20-27	≈ 90 in 100

Derivation

Values derived from the Kroenke 2001 ROC analysis (sensitivity 88%, specificity 88% at ≥10 cutoff in n=580 with structured diagnostic interview). The approximate per-band rates assume a base rate of MDE consistent with the original primary-care validation sample (12-month prevalence ~5-9% per Kroenke 2007). They are explicitly labeled "approximate, derived from Kroenke 2001" on the tool page.

Population-dependence caveat

These probability values depend on the base rate of MDE in the user's population. In primary care (~5-9% 12-month prevalence per Kroenke 2007), they are roughly accurate. In psychiatric outpatient samples (much higher base rate, typically 50-70% per Beard 2016), the per-band rates would be higher. In low-base-rate populations (e.g., asymptomatic community samples), they would be lower. The values serve as an intuitive interpretation aid, not a precise probabilistic statement applicable to every user.

11. Population norms

The tool compares user score against two reference populations:

Population	Sample	Mean	SD	Source
General population (Germany)	n = 5,018	2.91	3.52	Kocalevent et al. 2013
Female (Germany)	n ≈ 2,608	3.13	3.61	Kocalevent et al. 2013
Male (Germany)	n ≈ 2,410	2.66	3.41	Kocalevent et al. 2013
Primary care (US)	n = 6,000	3.3	3.8	Kroenke et al. 2001
Psychiatric outpatient	n = 502	13.8	6.5	Beard 2016

The general-population norm (Kocalevent 2013) anchors comparison against people not seeking help; the primary-care norm (Kroenke 2001) anchors against people who consulted a clinician about symptoms. Both are reported on the tool page with z-scores and percentiles computed via standard normal CDF.

12. Documented author choices summary

Three components of this tool are not directly derivable from published literature and are documented here as transparent author choices:

Component	Author choice	Rationale
Sub-dimension item assignments	3 cognitive-emotional / 4 somatic / 1 anhedonic	Maps items to evidence-based intervention classes; original instrument is unidimensional in factor analysis
Asymmetric scaling factors	1.0 / 0.75 / 3.0	Brings sub-dimensions to comparable theoretical maximum (9) so any can dominate archetype matching
Approximate dx-prob values per band	1 / 10 / 50 / 75 / 90 in 100	Derived from Kroenke 2001 ROC data, rounded for intuitive interpretation; population-dependent
5-archetype framework	Steady / Inner Critic / Depleted / Disconnected / Pervasive	LBL-derived interpretive labels matched to evidence-based intervention classes; not a published clinical typology
Item 9 modal threshold	Triggered at item 9 ≥ 1 (any non-zero)	Clinical convention strongly favors over-flagging suicidality; cost of over-flagging is low (resources displayed), cost of under-flagging is potentially severe
Tier 2 escalation threshold	Item 9 ≥ 2 (more than half the days)	Distinguishes occasional ideation from sustained ideation; aligns with research distinguishing passive vs active ideation

13. Validation evidence

Internal consistency

Cronbach's α = 0.89 in the original Kroenke 2001 validation (n=580). Hinz et al. 2017 confirmed α = 0.88 in a German general-population sample (n=5,018). This is solid internal consistency for a 9-item instrument.

Test-retest reliability

Test-retest correlation r = 0.84 in Kroenke 2001 (n=300, ~48 hours between administrations).

Diagnostic accuracy at ≥10 cutoff

Study	Population	Sensitivity	Specificity	NPV
Kroenke 2001	Primary care (n=580)	0.88	0.88	≈ 0.99
Manea 2012 (meta-analysis)	18 primary care studies	0.78-0.88	0.85-0.94	≈ 0.96-0.99
Beard 2016	Psychiatric outpatient (n=502)	0.85	0.78	≈ 0.92

Convergent validity

The instrument correlates strongly with the Beck Depression Inventory (r = 0.73 in Kroenke 2001), the Hamilton Depression Rating Scale (r = 0.79 in Cameron 2008), and clinician-rated severity (r = 0.59 in Kroenke 2001). It correlates appropriately with anxiety (GAD-7 r ≈ 0.6 — high, reflecting genuine comorbidity, but distinct construct).

Cross-cultural validity

The instrument has been validated in dozens of cross-cultural studies (English, German, Spanish, Mandarin Chinese, Japanese, Arabic, Brazilian Portuguese, French, others). Cultural expression of depression varies, and somatic-vs-cognitive emphasis differs across cultures, but the instrument's basic psychometric properties hold across translations.

14. Functional impairment override

The 10th item (functional difficulty) is captured but not added to the score. It influences the results in two ways:

Functional override note

If functional difficulty = "extremely difficult" (value 3) AND total score ≥ 5 (any band above minimal), the results panel surfaces a contextual note:

"You indicated these symptoms have been extremely difficult to live with. That impairment level — regardless of the precise band score — warrants a conversation with a clinician. Difficulty functioning is itself a clinical signal, separate from the symptom-frequency score."

Rationale

Functional impairment is a separate clinical signal from symptom frequency. DSM-5 MDE criteria require "clinically significant distress or functional impairment" — not just symptom presence. A user with mild symptom-frequency but extreme functional difficulty is in a clinically more concerning state than a user with the same symptom score and minimal functional difficulty. The override note surfaces this without modifying the underlying severity-band assignment.

15. Care-aware infrastructure

Depression carries higher absolute mortality risk than anxiety, primarily through suicide. The care-aware infrastructure is consequently more prominent than in the LBL Anxiety Test:

Always-on (independent of any score)

Persistent crisis bar at the top of every page (tool, methodology, glossary). Visible from the moment the page loads. Provides 988 / 741741 / "More resources" links. Cannot be dismissed.

Score-triggered

Care-aware top block at score ≥ 10 — surfaces above the results when moderate or higher
Care-aware bottom block at score ≥ 10 — surfaces below the pathways
Severe-band clinician prompt at score ≥ 20 (different threshold from the Anxiety Test's ≥ 15 prompt because PHQ has 5 bands not 4)

Item-9 specific (the key escalation path)

Crisis modal triggered immediately when item 9 ≥ 1. Modal cannot be dismissed without explicit acknowledgment.
Modal copy escalation at item 9 ≥ 2 (tier-2 emergency callout moves to top)
Item-9 readout in results panel with red-bordered styling when non-zero

Functional impairment override

Surfaces at functional = "extremely difficult" AND total ≥ 5, regardless of band

16. Privacy & data handling

Completely browser-local. Specifically:

The 9 item responses, the functional impairment response, computed scores, archetype, and any optional inputs (such as sex for sex-stratified norms) never leave the device
Nothing is transmitted to any server
Nothing is stored in localStorage, sessionStorage, cookies, or any other persistence mechanism
Closing the tab clears the session entirely
The tool uses Google Analytics 4 in a privacy-respecting manner for aggregate page-level metrics only — never individual responses. IP anonymization is enabled.
Two GA events are fired with no PII: tool_complete (with band label and total score for population-level analytics), and crisis_modal_shown / crisis_modal_self_initiated (for understanding crisis-pathway usage rates)
If the user copies their share text using the "copy summary" button, item 9 details are deliberately omitted from the shareable text for privacy

17. Limitations

Snapshot, not trajectory. The instrument measures last-2-weeks symptoms. Depression naturally fluctuates with life events. A high score during a difficult period doesn't necessarily mean MDE; a low score during a calm period doesn't mean you've never had a depression problem.
Self-report dependent. Honest self-report is the foundation. Strong self-criticism, alexithymia, denial, or recall bias can all distort scores in either direction.
Not diagnostic. MDE diagnosis requires DSM-5 or ICD-11 clinical interview by a qualified professional. The screen is sensitive (88% per Kroenke 2001) but not diagnostic.
Cultural variation. The instrument was developed in US/European primary care. Validation in non-Western populations exists but cultural expression of depression varies.
Differential diagnosis ignored. A high score may reflect bipolar depression rather than unipolar, persistent depressive disorder rather than MDE, adjustment disorder, or grief. The screen flags "depression symptoms present" not "the source or kind of depression."
Sub-dimension scoring is interpretive, not validated. The 3-dimension profile is an author-derived interpretive aid. Use as a conversation starter, not a clinical typology.
Archetypes are interpretive frameworks. The 5 archetypes are LBL author-derived. They are not a published clinical typology.
Functional impairment item is captured but not summed. The 10th question informs interpretation but does not change the 0-27 score.
Validation in adolescents and children is separate. The instrument has separate adolescent validation (PHQ-A, n=403, Richardson 2010) but the cutoffs differ. This tool is not validated for users under 18.
Documented author choices. Three components are author-derived (sub-dimension assignments, asymmetric scaling factors, approximate dx-prob values per band). All are documented in §12.

18. Frequently asked questions

Why use the 9-item Kroenke 2001 instrument and not something longer like the Beck Depression Inventory?

The 9-item depression screener developed by Kroenke, Spitzer & Williams in 2001 is purpose-built for primary-care screening contexts where time is limited. Its brevity (2-3 minutes), strong validation (Cronbach α=0.89; sensitivity 88%, specificity 88% at cutoff ≥10 per the original validation), and direct mapping to all 9 DSM major depressive episode criteria make it the most widely-used brief depression screener in the world, with over 100,000 citations. The Beck Depression Inventory has 21 items; the Hamilton Depression Rating Scale is clinician-administered. Both have their place, but neither is appropriate for a 2-3 minute self-screen.

How were the severity bands chosen?

Directly from Kroenke et al. 2001: 0-4 minimal, 5-9 mild, 10-14 moderate, 15-19 moderately severe, 20-27 severe. These cutpoints are used by NHS IAPT (Improving Access to Psychological Therapies), VA/DoD clinical practice guidelines, and the APA major depressive disorder treatment guidelines. The standard ≥10 cutoff was confirmed by Manea, Gilbody & McMillan's 2012 meta-analysis of 18 studies as having the best sensitivity/specificity balance across primary care populations (sensitivity 0.78-0.88, specificity 0.85-0.94).

Why is the sub-dimension symptom profile considered an author choice?

The original 9-item screener is largely unidimensional in factor analysis (Kroenke 2001 confirmed single-factor structure; Cameron 2008 replicated). Splitting items into cognitive-emotional, physical/somatic, and pleasure-motivation dimensions is an interpretive aid added by LifeByLogic to surface action-relevant patterns. It is presented transparently as an author choice rather than a validated subscale, and the methodology section documents the item assignments and asymmetric scaling factors used.

Why does item 9 trigger an immediate crisis modal that cannot be dismissed casually?

Item 9 captures self-harm ideation. Mann et al. 2005's systematic review of suicide prevention strategies concluded that means restriction and direct connection to crisis resources are among the most evidence-supported interventions. Surfacing crisis resources immediately upon any non-zero response on item 9 — and requiring explicit acknowledgment to dismiss — is a deliberate friction designed to ensure the user encounters the resources, not to obstruct their use of the tool. The modal is non-blocking; the user can continue to results after acknowledging.

How were the 5 archetypes derived?

The 5 archetypes (The Steady, The Inner Critic, The Depleted, The Disconnected, The Pervasive) are LBL author-derived interpretive frameworks that match symptom profile to evidence-based intervention classes. Each archetype maps to interventions with strong meta-analytic or RCT support — CBT for cognitive-emotional dominance (Cuijpers 2013), behavioral activation for anhedonic dominance (Dimidjian 2006, Ekers 2014), exercise/sleep stabilization for somatic dominance (Cooney 2013). Archetypes are explicitly named as interpretive frameworks, not a published clinical typology.

Why are the diagnostic probability values labeled 'approximate'?

The values (1/10/50/75/90 in 100 across the 5 bands) are derived from the Kroenke 2001 ROC data showing 88% sensitivity and 88% specificity at ≥10 cutoff in a primary care sample of n=580. They are population-dependent — the actual confirmed-MDE rate in any individual user's case depends on the base rate of major depression in their context. They serve as an intuitive interpretation aid, not a precise probabilistic statement, and the methodology page documents the derivation transparently.

Why include a functional impairment item if it's not added to the score?

The 10th item asking 'how difficult have these problems made it' is part of the original Kroenke 2001 instrument and is captured for the same reason: functional impairment is a separate clinical signal from symptom frequency. A user with moderate symptom-frequency but extreme functional difficulty may need more urgent follow-up than a user with the same symptom score and minimal functional difficulty. This tool surfaces a 'functional override' note in the results when functional difficulty is reported as 'extremely difficult' alongside any symptom score above minimal.

How does this tool handle privacy?

Completely browser-local. The 9 responses, the functional impairment response, computed scores, archetype, and any optional inputs (such as sex for sex-stratified norms) never leave the device. No transmission, no storage, no logging. Closing the tab clears the session. The tool uses Google Analytics 4 in a privacy-respecting manner for aggregate page-level metrics only — never individual responses.

What is this tool not for?

Not a diagnosis. Not a treatment plan. Not a substitute for clinical judgment. Not appropriate for users currently in psychiatric crisis (the crisis modal will surface the relevant resources immediately, but anyone in active crisis should call 988 or local emergency services). Not validated for use under age 18 (the original instrument was validated in adult primary care). Not a longitudinal tracking tool — this is a self-screen at a single moment in time.

How does this differ from the LBL Anxiety Test?

The Anxiety Test uses the 7-item GAD-7 (Spitzer 2006) and produces a 0-21 score with 4 severity bands. The Depression Test uses the 9-item depression screener (Kroenke 2001) and produces a 0-27 score with 5 severity bands. The Depression Test adds an item-9 hard-escalation crisis modal because of the self-harm ideation item, which has no parallel in the GAD-7. Anxiety and depression are highly comorbid (40-60% overlap in clinical samples per Kroenke 2007); both screens together give a fuller picture than either alone.

19. References & citations

Kroenke, K., Spitzer, R. L., & Williams, J. B. W. (2001). The PHQ-9: Validity of a brief depression severity measure. Journal of General Internal Medicine, 16(9), 606–613. doi.org/10.1046/j.1525-1497.2001.016009606.x
Manea, L., Gilbody, S., & McMillan, D. (2012). Optimal cut-off score for diagnosing depression with the Patient Health Questionnaire (PHQ-9): a meta-analysis. CMAJ, 184(3), E191–E196. doi.org/10.1503/cmaj.110829
Kocalevent, R. D., Hinz, A., & Brähler, E. (2013). Standardization of the depression screener Patient Health Questionnaire (PHQ-9) in the general population. General Hospital Psychiatry, 35(5), 551–555. doi.org/10.1016/j.genhosppsych.2013.04.006
Beard, C., Hsu, K. J., Rifkin, L. S., Busch, A. B., & Björgvinsson, T. (2016). Validation of the PHQ-9 in a psychiatric sample. Journal of Affective Disorders, 193, 267–273. doi.org/10.1016/j.jad.2015.12.075
Hasin, D. S., Sarvet, A. L., Meyers, J. L., et al. (2018). Epidemiology of adult DSM-5 major depressive disorder and its specifiers in the United States. JAMA Psychiatry, 75(4), 336–346. doi.org/10.1001/jamapsychiatry.2017.4602
Cuijpers, P., Berking, M., Andersson, G., Quigley, L., Kleiboer, A., & Dobson, K. S. (2013). A meta-analysis of cognitive-behavioural therapy for adult depression. Canadian Journal of Psychiatry, 58(7), 376–385. doi.org/10.1177/070674371305800702
Cooney, G. M., Dwan, K., Greig, C. A., et al. (2013). Exercise for depression. Cochrane Database of Systematic Reviews, (9), CD004366. doi.org/10.1002/14651858.CD004366.pub6
Dimidjian, S., Hollon, S. D., Dobson, K. S., et al. (2006). Randomized trial of behavioral activation, cognitive therapy, and antidepressant medication in the acute treatment of adults with major depression. Journal of Consulting and Clinical Psychology, 74(4), 658–670. doi.org/10.1037/0022-006X.74.4.658
Ekers, D., Webster, L., Van Straten, A., Cuijpers, P., Richards, D., & Gilbody, S. (2014). Behavioural activation for depression: an update of meta-analysis. PLOS ONE, 9(6), e100100. doi.org/10.1371/journal.pone.0100100
Hollon, S. D., DeRubeis, R. J., Shelton, R. C., et al. (2005). Prevention of relapse following cognitive therapy vs medications. Archives of General Psychiatry, 62(4), 417–422. doi.org/10.1001/archpsyc.62.4.417
Kuyken, W., Warren, F. C., Taylor, R. S., et al. (2016). Efficacy of mindfulness-based cognitive therapy in prevention of depressive relapse. JAMA Psychiatry, 73(6), 565–574. doi.org/10.1001/jamapsychiatry.2016.0076
Mann, J. J., Apter, A., Bertolote, J., et al. (2005). Suicide prevention strategies: a systematic review. JAMA, 294(16), 2064–2074. doi.org/10.1001/jama.294.16.2064
American Psychiatric Association. (2013). Diagnostic and Statistical Manual of Mental Disorders (5th ed.). DSM-5. APA Publishing.
Lejuez, C. W., Hopko, D. R., & Hopko, S. D. (2001). A brief behavioral activation treatment for depression: Treatment manual. Behavior Modification, 25(2), 255–286. doi.org/10.1177/0145445501252005
Kroenke, K., Spitzer, R. L., Williams, J. B. W., Monahan, P. O., & Löwe, B. (2007). Anxiety disorders in primary care: prevalence, impairment, comorbidity, and detection. Annals of Internal Medicine, 146(5), 317–325.

Read the LBL Depression Test tool itself, or browse the depression screener glossary entry.

1. Instrument selection & naming

Why this instrument over alternatives

Naming convention used in this tool

2. The 9 items + functional impairment item

3. Scoring algorithm

4. Five severity bands

5. The cutoff debate: ≥10 vs ≥8

6. Item 9 hard-escalation modal architecture

The item

Behavior specification

Rationale

7. Sub-dimension symptom profile

Three dimensions (3/4/1 split)

Why these three dimensions specifically

Why this is presented as an author choice rather than a validated subscale

8. Asymmetric scaling for archetype matching

The scaling factors

Why asymmetric scaling is necessary

Mathematical example

9. Five-archetype framework

Pathway count per archetype

10. Diagnostic probability per band

Derivation

Population-dependence caveat

11. Population norms

12. Documented author choices summary

13. Validation evidence

Internal consistency

Test-retest reliability

Diagnostic accuracy at ≥10 cutoff

Convergent validity

Cross-cultural validity

14. Functional impairment override

Functional override note

Rationale

15. Care-aware infrastructure

Always-on (independent of any score)

Score-triggered

Item-9 specific (the key escalation path)

Functional impairment override

16. Privacy & data handling

17. Limitations

18. Frequently asked questions

Why use the 9-item Kroenke 2001 instrument and not something longer like the Beck Depression Inventory?

How were the severity bands chosen?

Why is the sub-dimension symptom profile considered an author choice?

Why does item 9 trigger an immediate crisis modal that cannot be dismissed casually?

How were the 5 archetypes derived?

Why are the diagnostic probability values labeled 'approximate'?

Why include a functional impairment item if it's not added to the score?

How does this tool handle privacy?

What is this tool not for?

How does this differ from the LBL Anxiety Test?

19. References & citations