The universal GP Training website for everyone, not just Bradford.ย  ย Created in 2002 by Dr Ramesh Mehay

Evidence-Based Practice & Medical Statistics โ€” Bradford VTS
Bradford VTS ยท AKT Mastery Series

Evidence-Based Practice & Medical Statistics

Because "I read something online" doesn't quite cut it in the AKT.

๐Ÿ”ฅ High-yield tips for AKT โšก High-impact learning in minutes ๐Ÿ’Ž Hidden gems they forget to teach

Last updated: April 2026  ยท  Also known as Evidence-Based Medicine (EBM)

โšก

One-Minute Recall

Scanning this before clinic โ€” or the night before an AKT paper? These are the things that score you marks.

๐Ÿงฎ Risk Formulas

  • ARR = CER โˆ’ EER
  • NNT = 1 รท ARR
  • RRR = ARR รท CER
  • RR = EER รท CER
  • NNH = 1 รท ARI

๐Ÿ”ฌ Diagnostic Testing

  • Sens = TP รท (TP+FN) โ†’ SnNout
  • Spec = TN รท (TN+FP) โ†’ SpPin
  • PPV = TP รท (TP+FP) โ† falls with low prevalence
  • NPV = TN รท (TN+FN) โ† falls with high prevalence

๐Ÿ“Š Study Designs

  • SR/Meta-analysis โ†’ highest evidence
  • RCT โ†’ gold standard for treatment
  • Cohort โ†’ RR & incidence
  • Case-control โ†’ OR, rare diseases
  • Cross-sectional โ†’ prevalence

๐Ÿ“ˆ Graphs

  • Forest plot diamond crosses line = not significant
  • Funnel asymmetry = publication bias
  • Cates plot: NNT = 100 รท yellow faces
  • Box plot middle line = median
  • Iยฒ >50% = substantial heterogeneity

๐Ÿ“‰ Significance

  • p < 0.05 = statistically significant
  • CI crosses 1.0 (ratio) = not significant
  • CI crosses 0 (difference) = not significant
  • Mean = average; Median = middle value
  • 68-95-99.7 rule for normal distribution

โš–๏ธ Bias Types

  • Selection โ†’ unrepresentative sample
  • Recall โ†’ cases remember more
  • Publication โ†’ positive studies only
  • Lead time โ†’ screening illusion
  • Attrition โ†’ dropout distorts results
๐Ÿ’ก

Why This Matters in GP & the AKT

EBM and statistics aren't just theoretical. In your consulting room, every conversation about treatment options involves NNTs whether you name them or not. Every blood test has a sensitivity and specificity. Every new guideline is based on a study design that affects how much trust you should place in it.

In the AKT, this topic accounts for a significant proportion of marks โ€” roughly 10โ€“15% of the paper according to RCGP guidance. It is one of the few areas where a small amount of targeted revision pays dividends immediately. Many candidates lose easy marks here not because the concepts are difficult, but because they've never sat down and learned them systematically.

The statistics questions in the AKT often present a table of trial data and ask you to calculate a value, interpret a graph, or identify the best study design. They reward methodical thinking, not medical knowledge. This makes them the most "learnable" marks in the paper.

Once you know how to calculate NNT, interpret a forest plot, and understand the effect of prevalence on PPV, a whole category of AKT questions becomes straightforward rather than scary.
๐Ÿ“–

Evidence-Based Medicine (EBM)

"When I was in training in the mid-1980s, I gave an intravenous infusion of lidocaine to every patient who came through the door after a heart attack. That was the standard. Everyone did it. It seemed to make perfect sense."

โ€” Professor Gordon Guyatt, the physician who coined the term "Evidence-Based Medicine", describing his own training before EBM existed

He later discovered that the practice he'd been trained in โ€” and that hundreds of thousands of doctors worldwide were performing โ€” was not only useless, but potentially killing people. Not through negligence. Not through incompetence. But because no one had ever properly tested whether it actually worked.

That story is why Evidence-Based Medicine exists โ€” and why it matters deeply to every patient you will ever see.

๐Ÿ“‹ What Is Evidence-Based Medicine?

"The conscientious, explicit and judicious use of current best evidence in making decisions about the care of individual patients."

โ€” David Sackett, BMJ 1996 โ€” the most widely cited definition in medicine

In plain English: rather than treating patients based on habit, opinion, tradition, or what your professor told you, EBM requires you to base clinical decisions on the best available research โ€” rigorously conducted, critically appraised, and honestly interpreted.

It does not replace clinical judgement โ€” it informs it. EBM rests on three inseparable pillars that must work together:

๐Ÿ”ฌ
Best Research Evidence
Clinically relevant, high-quality research โ€” especially from RCTs and systematic reviews
๐Ÿฉบ
Clinical Expertise
The skill and experience of the individual clinician, built through years of practice
๐Ÿค
Patient Values & Preferences
The individual patient's circumstances, values, and what matters most to them
๐Ÿ•ฐ๏ธ How Did EBM Come About? โ€” A Brief History

EBM didn't appear from nowhere. It was the culmination of decades of quietly revolutionary thinking in Canada and the UK.

YearEvent
1938John Paul (Yale) coins the term "clinical epidemiology" โ€” the idea that medicine should be studied scientifically in populations, not just observed in individual patients
1967McMaster University (Hamilton, Canada) opens its new medical school with a Department of Clinical Epidemiology and Biostatistics โ€” radical at the time, dedicated to applying research methods to clinical decisions
1972Archie Cochrane, a Scottish epidemiologist, publishes Effectiveness and Efficiency: Random Reflections on Health Services โ€” a landmark text arguing that medicine must test its own treatments rigorously. His work eventually gives birth to the Cochrane Collaboration, named in his honour.
1981David Sackett and colleagues at McMaster publish a nine-article series in the Canadian Medical Association Journal teaching clinicians how to critically appraise medical literature. This is the formal beginning of the EBM movement.
1990Gordon Guyatt, a young resident director at McMaster, designs a new teaching programme and initially calls it "Scientific Medicine." Colleagues recoil โ€” the implication that current practice isn't scientific is too direct.
1991Guyatt renames the approach "Evidence-Based Medicine" and publishes the term in an editorial in the ACP Journal Club. The phrase sticks immediately.
1992The landmark JAMA paper โ€” "Evidence-Based Medicine: A New Approach to Teaching the Practice of Medicine" โ€” introduces EBM to the world. The response, Guyatt recalls, was initially "rage." Colleagues felt they were being told they weren't good doctors.
1993The Cochrane Collaboration is formally founded โ€” an international network to produce and disseminate systematic reviews of healthcare evidence.
1996David Sackett publishes the definitive three-pillar definition in the BMJ. EBM becomes mainstream.
2000sโ€“presentEBM becomes embedded in UK training: NICE guidelines, the GMC's Good Medical Practice, and the RCGP curriculum all require it. It is the foundation of how every UK doctor is now trained and assessed.
โš ๏ธ What Was Medicine Like Before EBM? The Problem It Solved

Before EBM, medicine ran on what Gordon Guyatt memorably called "GOBSAT" โ€” Good Old Boys Sitting Around a Table. Clinical guidelines were written by senior experts who pooled their personal opinions, and what happened to your patient depended entirely on which doctor happened to see them.

The Pre-EBM World
  • Eminence-based medicine: You treated patients the way your professor did. Authority came from seniority, not evidence. A consultant who had "always done it this way" for 30 years was deferred to โ€” even if "this way" had never been tested.
  • Intuition-based medicine: If a treatment seemed to make physiological sense, it was used. If suppressing abnormal heart rhythms seemed logical, you suppressed them. Whether it actually helped patients was rarely tested.
  • Anecdote-based medicine: "In my experience, I've found that..." was the standard of evidence. Individual cases drove practice โ€” even when those cases were statistical outliers.
  • Enormous variation: The same patient presenting to two different hospitals โ€” or even two different doctors in the same hospital โ€” might receive completely different treatment for exactly the same condition.
Would you want a bridge built based on "I've been designing bridges this way for 30 years and none have fallen down yet"? Or based on tested structural engineering science? The answer is obvious. But for most of medicine's history, the bridge approach was exactly what happened.

๐Ÿ”ด The Example That Changed Medicine โ€” The CAST Trial

This is not a hypothetical. It is one of the most important true stories in modern medicine โ€” and one of the strongest arguments that EBM has ever needed.

The Setup โ€” The Logic That Seemed Unassailable

Heart attacks cause dangerous heart rhythm abnormalities (ventricular arrhythmias). Ventricular arrhythmias cause sudden death. Therefore: suppress the arrhythmias โ†’ prevent sudden death. This seemed so obviously right that from the 1970s onwards, antiarrhythmic drugs โ€” particularly lidocaine, flecainide, and encainide โ€” were routinely given to post-MI patients in hospitals across the world. Not occasionally. Routinely. As standard care.

The Trial โ€” Someone Actually Tested It

In 1987, the Cardiac Arrhythmia Suppression Trial (CAST) enrolled over 1,700 post-MI patients and randomised them to antiarrhythmic drugs (flecainide or encainide) or placebo. The drugs did exactly what they were supposed to โ€” they successfully suppressed the arrhythmias. But something unexpected happened.

The Result โ€” What Nobody Expected

Patients on the drugs were 2.5 times more likely to die than those on placebo. The trial had to be stopped early because the harm was so clear. The drugs had been killing the very patients they were meant to protect.

NNH = 21. Every 21 patients treated with flecainide or encainide, one additional person died who would otherwise have survived.

The Lesson

Gordon Guyatt โ€” then a young cardiologist โ€” had personally given lidocaine infusions to every post-MI patient who came through his ward. He was following best practice. He had been taught correctly. He had good intentions. And yet, without the rigorous test of an RCT, neither he nor his colleagues had any way of knowing the treatment was harmful. This experience became central to why he dedicated his career to EBM. The history of medicine, he later said, "is full of treatments that were based mostly on guess-work and intuition rather than solid evidence."

๐ŸŒ How EBM Changed Everything โ€” Standardisation and Unity

Before EBM, what you got depended on where you happened to live, which hospital you attended, and which doctor saw you. The same patient with the same condition might receive completely different treatments in Leeds and London. Different hospitals. Different countries. Wildly different outcomes.

EBM changed this. By anchoring clinical decisions to the same body of evidence โ€” the same trials, the same systematic reviews, the same guidelines โ€” it gave medicine a common language and a common standard. Today, a patient presenting with an MI in Bradford and a patient presenting with an MI in Bristol should receive essentially the same evidence-based care. Not because doctors are identical, but because the treatment is driven by the evidence, not by individual preference.

In the UK, this is operationalised through NICE guidelines, the RCGP curriculum, QOF indicators, clinical audits, and MRCGP examinations โ€” all of which require and assess evidence-based practice. When you sit the AKT, you are being tested on your ability to apply this framework.

๐ŸŒ A World Without EBM โ€” The International Picture

The UK's commitment to EBM โ€” through NICE, the NHS, and postgraduate training โ€” is not universal. In many parts of the world, what you receive as a patient still depends heavily on who sees you, where you present, and how much you can pay. Understanding this helps you appreciate what EBM protects your patients from.

๐Ÿ”ด Important Note Before Reading

The variation described below reflects healthcare systems and structures, not the competence or dedication of individual doctors. Many brilliant, hard-working physicians practise in every country listed. The issue is the absence of the standardising infrastructure โ€” guidelines, oversight, training frameworks โ€” that EBM provides. Individual doctors cannot overcome systemic problems alone.

Country / RegionHow Practice Varies Without Strong EBM Frameworks
๐Ÿ‡ฎ๐Ÿ‡ณ India (private sector)A 2018 Lancet study found C-section rates of 40โ€“58% in private hospitals compared to 10โ€“14% in public facilities โ€” often driven by financial incentives rather than clinical need. Over-investigation and polypharmacy are widely documented in the private sector. The same cancer patient may receive dramatically different treatment based on where they present and what they can afford.
๐Ÿ‡ต๐Ÿ‡ฐ PakistanSignificant variation in adherence to antibiotic guidelines โ€” one of the highest antibiotic prescription rates in South Asia. Drug-resistant TB and antimicrobial resistance are direct consequences. Access to specialist care and standardised management pathways is highly dependent on geography and income.
๐Ÿ‡ณ๐Ÿ‡ฌ Nigeria / ๐Ÿ‡ฌ๐Ÿ‡ญ GhanaMagnesium sulphate is the WHO-recommended, evidence-based, inexpensive treatment for eclampsia. Studies show it reduces maternal mortality significantly. Yet availability and actual use in Nigerian and Ghanaian facilities varies enormously depending on hospital resources and clinician training โ€” meaning whether a woman with eclampsia lives or dies may depend on which facility she reaches.
๐Ÿ‡ธ๐Ÿ‡ฉ Sudan / ๐Ÿ‡ฎ๐Ÿ‡ถ IraqProlonged conflict and instability have devastated healthcare infrastructure. In Iraq after 2003, public health services collapsed, guideline implementation stalled, and access to basic drugs became geography-dependent. In Sudan, conflict has disrupted vaccination programmes, maternal health services, and chronic disease management. Practice variation in such environments is not a matter of preference โ€” it is a matter of what is available.
๐Ÿ‡ช๐Ÿ‡ฌ Egypt / ๐Ÿ‡ฎ๐Ÿ‡ท IranBoth countries have medical schools producing skilled physicians and have published EBM guidelines โ€” but implementation is inconsistent between public and private sectors, and between urban and rural areas. In Iran, international sanctions have affected drug availability, forcing adaptations that diverge from evidence-based protocols.
๐Ÿ‡ท๐Ÿ‡ด RomaniaRomania has been documenting the practice of plicul (the "envelope") โ€” informal cash payments to doctors and nurses to ensure care. Officially illegal, widely practised. Parliamentary enquiries and investigative journalism have confirmed that the quality of surgical care can depend on what a patient can pay privately, regardless of their official NHS-equivalent entitlement. Brain drain has removed an estimated 14,000+ doctors since EU accession.
๐Ÿ‡บ๐Ÿ‡ธ United StatesThe most expensive healthcare in the world โ€” over $12,000 per person per year โ€” yet outcomes often no better than the UK. The Dartmouth Atlas project has documented enormous geographic variation in clinical practice: the same patient in Miami may receive twice as many investigations and procedures as the same patient in Minneapolis, with no difference in outcomes. A 2019 JAMA study estimated $935 billion โ€” roughly a quarter of all US healthcare spending โ€” is wasted on unnecessary care. The opioid crisis was partly fuelled by pharmaceutical companies influencing prescribing practices outside of EBM frameworks.
๐Ÿ‡ซ๐Ÿ‡ท FranceFrance has excellent healthcare โ€” but antibiotic prescribing rates have historically been among the highest in Europe, driven partly by cultural expectations that a consultation should always end with a prescription. Campaigns to reduce this ("Antibiotics are not automatic") have helped, but the pattern illustrates how cultural and commercial pressures can override evidence-based guidance even in sophisticated systems.
๐Ÿ‡ฎ๐Ÿ‡น ItalyHealthcare quality in Northern Italy (Milan, Bologna) is among the best in Europe. In parts of Southern Italy, the picture is very different โ€” longer waiting times for cancer surgery, less consistent application of screening programmes, lower adherence to guideline-based care. Geographic origin within the same country can significantly affect outcomes.
๐Ÿ‡ฌ๐Ÿ‡ท Greece / ๐Ÿ‡ช๐Ÿ‡ธ SpainGreece's austerity crisis (2010โ€“2015) led to healthcare spending cuts of over 25%, causing documented shortages of medicines, staff reductions, and quality deterioration. Over 35,000 healthcare workers emigrated. In Spain, significant regional variation in cancer survival rates has been documented โ€” the tumour you develop may behave differently depending on which region you happen to live in, not because the biology differs, but because the system's application of evidence-based treatment does.
๐Ÿ’Š When What You Get Depends on What You Pay

In healthcare systems without robust EBM frameworks or universal entitlements, the relationship between payment and treatment quality is often direct and documented:

  • In some Indian private hospitals, a patient presenting with chest pain who can pay for private care may receive immediate catheterisation and stenting. The same patient in the public system may wait hours for an ECG.
  • In parts of Sub-Saharan Africa, whether a child with severe malaria receives artemisinin-based combination therapy (the evidence-based standard) or an older, less effective drug depends on which facility they reach and what their family can pay.
  • In countries without universal drug access, cancer chemotherapy agents may be available only to those who can pay out-of-pocket โ€” meaning identical cancers have dramatically different outcomes based solely on income.
  • In Romania and some other Eastern European countries, the quality of a surgical procedure โ€” the surgeon's diligence, the quality of anaesthesia monitoring, even the availability of post-operative nursing โ€” has been documented to depend on informal payment, not clinical need.
โœˆ๏ธ The Analogy That Makes It Clear

Every time you board a commercial flight, you benefit from one of the most effective safety systems humans have ever built. Not because individual pilots are exceptionally talented โ€” though they are. But because aviation is built around standardised, evidence-tested protocols. Every pilot, every airline, every country follows the same pre-flight checklists, the same landing procedures, the same emergency protocols. The system protects you regardless of which individual pilot you get.

Medicine without EBM is aviation without checklists. Your safety depends entirely on whether you happen to get a good pilot, whether that pilot trained recently enough, whether they are having a good day, and whether they've heard the latest thinking from someone they trust.

EBM replaces luck with systems. It replaces "in my experience" with "in 15,000 trials involving 2 million patients." It replaces the opinion of whoever happens to be most senior in the room with the accumulated evidence of humanity's collective clinical experience. Wouldn't you prefer that for your patients? Wouldn't your patients prefer it for themselves?

๐Ÿ’ก Why This Matters for You, as a GP Trainee in the UK
  • Every NICE guideline you follow is the product of systematic evidence review โ€” someone has done the work of ensuring that what you do is supported by the best available science
  • Every AKT question on statistics and research methods is testing your ability to critically appraise evidence โ€” to be an active consumer of EBM, not a passive follower of instructions
  • When you explain an NNT to a patient, or discuss the limitations of a screening test, or refuse to prescribe an antibiotic that isn't indicated, you are practising EBM โ€” consciously, explicitly, and judiciously
  • And when a drug rep sits across from you and tells you their new medication reduces cardiovascular events by 35%, your first question โ€” "35% relative or absolute?" โ€” is the question that EBM taught medicine to ask
1

Study Designs & Hierarchy of Evidence

Before you interpret any result, you need to know where it came from. Different study designs answer different questions, generate different statistics, and carry different levels of reliability.

The Evidence Pyramid

Strongest evidence at the top; weakest at the bottom

Systematic Reviews & Meta-Analyses
Randomised Controlled Trials (RCTs)
Cohort Studies
Case-Control Studies
Cross-Sectional Studies
Case Reports & Expert Opinion

โฌ† Strongest evidence   |   โฌ‡ Weakest evidence

Study DesignDirectionBest ForGeneratesKey Weakness
Systematic Review / Meta-Analysisโ€”Best overall evidence on a questionPooled effect sizeOnly as good as underlying studies; heterogeneity
RCT (Randomised Controlled Trial)ForwardDoes treatment X work?RR, ARR, NNTExpensive, artificial setting, ethical issues
Cohort StudyForward (prospective) or backward (retrospective)Does exposure cause outcome? Incidence?Relative Risk (RR), IncidenceAttrition; expensive over time; confounding
Case-Control StudyBackwardRare diseases; risk factorsOdds Ratio (OR)Recall bias; cannot calculate incidence directly
Cross-Sectional StudySingle snapshotHow common is X right now? (prevalence)PrevalenceCannot establish causation; temporal ambiguity
Case Report / Expert Opinionโ€”Hypothesis generation; rare eventsDescription onlyHighly susceptible to bias; not generalisable
๐Ÿ“– Qualitative vs Quantitative Research
Quantitative Research

Answers "how many" or "how much." Uses numbers, statistics, and structured data. Examples: RCTs, cohort studies, surveys with numerical outcomes. Generates p-values, CIs, NNTs.

Qualitative Research

Answers "why" or "how." Uses words, themes, and interviews. Examples: focus groups, ethnographic studies, grounded theory. Explores patient experiences and beliefs.

In the AKT: a question about patient attitudes, experiences, or understanding of illness โ†’ qualitative. A question about rates, outcomes, or effectiveness โ†’ quantitative.
๐Ÿ”ฌ Systematic Review vs Meta-Analysis โ€” Not the Same Thing

Systematic Review: A rigorous, structured literature search that identifies, selects, and critically appraises all relevant studies on a question. The result is a qualitative summary of the evidence.

Meta-Analysis: A systematic review that goes one step further โ€” it mathematically pools the quantitative results from multiple studies into a single combined estimate. Not all systematic reviews include a meta-analysis (e.g., if studies are too heterogeneous to combine).

Think of a systematic review as reading every book about a topic and writing a report. A meta-analysis is that report with a formula at the end that averages all the authors' conclusions into one number.
๐Ÿ”„ The PICO Framework

PICO is the standard framework for structuring a clinical research question โ€” used to search the literature and design studies.

LetterStands ForExample
PPopulation / PatientAdults with type 2 diabetes
IInterventionSGLT2 inhibitors
CComparisonMetformin alone
OOutcomeCardiovascular events at 5 years
The AKT may present a scenario and ask which study design best answers the PICO question. Match the outcome type to the correct design using the table above.
๐ŸŽฒ RCT Design Features (commonly tested)
FeatureWhat It MeansWhy It Matters
RandomisationParticipants allocated to groups by chanceEliminates selection bias; balances confounders
Single blindParticipants don't know their allocationReduces placebo effect and participant bias
Double blindNeither participants nor investigators know allocationEliminates observer bias AND participant bias
Triple blindParticipants, investigators, AND data analysts blindedMaximum bias reduction
Intention to Treat (ITT)Analysed in their original group regardless of adherencePreserves randomisation; reflects real-world use
Per ProtocolAnalysed only if they completed the protocolShows biological efficacy but overestimates real-world benefit
Crossover designParticipants receive both treatments in sequenceEach person acts as their own control; needs washout period
Allocation concealmentThe person recruiting participants cannot see which group the next participant will be assigned to until after they have been enrolledPrevents the recruiter from subconsciously (or deliberately) allocating healthier patients to the treatment group โ€” a form of selection bias that randomisation alone does not prevent
Cluster RCTWhole groups (e.g. GP practices, wards, schools) are randomised rather than individualsUsed when individual randomisation is impractical (e.g. testing a new consultation style). Requires larger sample sizes and statistical adjustment for clustering effect
โš ๏ธ AKT Trap โ€” ITT vs Per Protocol

Intention to treat gives a more conservative (lower) estimate of effectiveness โ€” because it includes non-adherent participants. This is the preferred analysis for clinical decisions. Per protocol overestimates effectiveness but is useful for understanding biological mechanism.

โš–๏ธ Superiority, Non-Inferiority & Equivalence Trials

Not all trials ask the same question. The AKT occasionally tests whether you understand what a trial was actually designed to show โ€” and why that matters when interpreting its results.

Trial TypeThe Question Being AskedCommon Context
Superiority trial"Is the new treatment better than the comparator?"Most standard RCTs โ€” testing a genuinely new drug or approach
Non-inferiority trial"Is the new treatment no worse than the comparator by more than a pre-specified small margin?"New drug with fewer side effects, lower cost, or easier to administer โ€” aim is to show it's "good enough"
Equivalence trial"Are the two treatments essentially the same?"Biosimilar drugs; generic medicines; different routes of administration
๐Ÿ’ก Why Non-Inferiority Trials Matter in GP

A new anticoagulant might be shown to be "non-inferior" to warfarin for stroke prevention โ€” not better, but not meaningfully worse โ€” while being easier to use (no INR monitoring). That's a clinically valuable finding even if the drug didn't "beat" warfarin. The AKT may ask you to interpret a non-inferiority trial result correctly.

โš ๏ธ AKT Trap

A non-inferiority trial that shows "no significant difference" is not the same as a superiority trial that shows "no significant difference." In a superiority trial, a non-significant result means you failed to prove the new drug works better. In a non-inferiority trial, a non-significant difference is exactly what you were hoping for.

2

Research Bias, Validity & Reliability

Type of BiasDefinitionWhich Study Designs?How to Reduce It
Selection BiasParticipants are not representative of the target populationAll typesRandomisation; careful sampling
Recall BiasCases (people with disease) remember past exposures more vividly than controlsCase-control studiesObjective data sources; standardised questioning
Publication BiasPositive/significant studies are published more often than negative onesMeta-analyses (detected via funnel plot)Trial registration; grey literature search
Attrition BiasLoss of participants to follow-up distorts results (dropouts differ from completers)Cohort studies, RCTsIntention-to-treat analysis; minimise dropout
Lead Time BiasScreening gives the illusion of improved survival by detecting disease earlierScreening studiesUse disease-specific mortality, not survival from diagnosis
Length BiasScreening detects more slow-growing (less aggressive) diseaseScreening studiesRCTs with disease-specific outcomes
Observer / Assessment BiasKnowledge of treatment allocation affects outcome assessmentRCTsBlinding (single, double, triple)
Hawthorne EffectParticipants change their behaviour because they know they are being observedAll types, especially observationalControl groups; blind observers
Verification BiasOnly patients with a positive test result get the gold-standard confirmatory test โ€” so sensitivity appears falsely high and specificity falsely lowDiagnostic test studiesEnsure all patients (positive and negative) receive the gold-standard test
ConfoundingA third variable is associated with both exposure and outcome, distorting the apparent relationshipObservational studiesRandomisation (RCTs); stratification; multivariate analysis
Classic confounding example: "Ice cream sales are associated with drowning." Is ice cream dangerous? No โ€” both go up in summer. Season is the confounder. This illustrates why correlation โ‰  causation and why confounders must be controlled for.
๐Ÿ”€ Confounding โ€” When Two Things Look Linked But Aren't

Confounding is one of the most important concepts in research methodology โ€” and one of the most common reasons why apparently convincing observational findings turn out to be wrong. The key idea is simple: two things can appear to be linked not because they directly affect each other, but because both are linked to a hidden third variable.

๐Ÿ”ด The Core Definition

A confounder (or confounding variable) is a third variable that is independently associated with both the exposure and the outcome. It creates a spurious (false) association โ€” or masks a real one โ€” between the exposure and outcome you are studying.

Imagine you notice that people who carry lighters are more likely to develop lung cancer. Should we ban lighters? No โ€” the real culprit is smoking. Smoking is associated with both carrying a lighter and developing lung cancer. Smoking is the confounder. The lighter-cancer association is entirely explained by it.

Classic Examples

Apparent LinkThe ConfounderWhy It Explains Everything
Ice cream sales โ†’ drowning deathsHot weather (summer)Hot weather causes both more ice cream eating AND more swimming โ†’ more drownings. Ice cream doesn't cause drowning.
Coffee drinking โ†’ lung cancerSmokingSmokers drink more coffee on average. Early studies linked coffee to cancer โ€” until smoking was controlled for.
Carrying a lighter โ†’ lung cancerSmokingSmokers carry lighters. The lighter has no biological effect โ€” smoking does.
Grey hair โ†’ heart diseaseAgeBoth grey hair and heart disease increase with age. Age is the confounder โ€” not hair colour.
Shoe size โ†’ reading ability (in children)AgeOlder children have bigger feet AND read better. Age explains both.
๐Ÿ“ The Three Criteria for a True Confounder

A variable is a confounder if it meets all three of these:

  1. It is associated with the exposure (the thing you're studying)
  2. It is associated with the outcome (the result you're measuring)
  3. It is not on the causal pathway between exposure and outcome (it's a separate third variable, not a step in between)
๐Ÿ’ก How to Control for Confounding
  • Randomisation (in RCTs) โ€” distributes known and unknown confounders equally between groups. This is the strongest protection.
  • Restriction โ€” only enrol participants who are similar on the confounder (e.g. only non-smokers in the coffee study)
  • Matching โ€” pair cases and controls on the confounder variable
  • Stratification โ€” analyse results separately for each level of the confounder
  • Multivariate statistical adjustment โ€” statistically account for multiple confounders simultaneously
โš ๏ธ Why Confounding Is Mainly a Problem in Observational Studies

In a well-conducted RCT, randomisation distributes confounders (both known and unknown) equally between groups โ€” eliminating confounding as an explanation for differences. In cohort and case-control studies, you can only adjust for confounders you know about and have measured. Unmeasured confounders always remain a potential explanation for any observed association โ€” which is why observational studies can never definitively prove causation.

โœ… Validity & Reliability โ€” What's the Difference?
Internal Validity

Does the study measure what it claims to measure within the study population? Are the results of this study trustworthy? Threatened by bias and confounding.

External Validity (Generalisability)

Can the results be applied to other populations or real-world settings? A highly controlled RCT in a specialist centre may not reflect what happens in primary care.

Reliability (Reproducibility)

Does the test produce consistent results when repeated under the same conditions? Measured by inter-rater reliability (kappa statistic) or test-retest reliability.

Kappa Statistic (ฮบ)

Measures agreement between two raters beyond chance. ฮบ = 1 (perfect agreement); ฮบ = 0 (agreement no better than chance); ฮบ < 0 (worse than chance).

3

Measuring Risk & Treatment Effect

This is the most heavily tested area in AKT statistics. You need to know these formulas cold and be able to apply them to trial data tables under exam conditions.

Imagine you're at a casino. The lottery says you've doubled your chances of winning โ€” sounds amazing. But if you went from 1-in-a-million to 2-in-a-million, the relative improvement is 100% but the absolute difference is still essentially nothing. That's the difference between RRR and ARR. Drug companies love to quote RRR. You should always ask for ARR.

The Key Metrics

Absolute Risk Reduction (ARR)
ARR = CER โˆ’ EER
The actual difference in event rates between groups. The most clinically honest metric.
Relative Risk Reduction (RRR)
RRR = (CER โˆ’ EER) รท CER
Often sounds more impressive โ€” can be misleading without knowing baseline risk. Alternative formula: RRR = 1 โˆ’ RR (since RR = EERรทCER, RRR = 1 โˆ’ that value).
Relative Risk (RR)
RR = EER รท CER
Used in cohort studies and RCTs. RR of 1 = no effect; <1 = reduced risk; >1 = increased risk.
Number Needed to Treat (NNT)
NNT = 1 รท ARR
How many patients to treat to prevent one bad outcome. Lower = more effective. Always round UP.
Number Needed to Harm (NNH)
NNH = 1 รท ARI
How many patients to treat to cause one additional adverse effect. Higher = safer treatment.
Absolute Risk Increase (ARI)
ARI = EER โˆ’ CER
The additional risk conferred by a harmful exposure or treatment side effect.
๐Ÿ“Š NNT Interpretation โ€” What Do The Numbers Actually Mean?
โš ๏ธ Common Confusion โ€” Lower NNT = Better (not worse)

NNT tells you how many patients you need to treat for one to benefit. So the fewer patients you need to treat to get one benefit, the more effective the treatment. NNT = 2 means 1 in every 2 patients benefits โ€” that's excellent. NNT = 100 means only 1 in 100 benefits โ€” much weaker.

Think of giving out umbrellas on a rainy day. NNT = 2 โ†’ give 2 umbrellas, 1 person stays dry โ€” very effective. NNT = 100 โ†’ give out 100 umbrellas, only 1 person avoids getting wet โ€” pretty underwhelming.
NNT RangeRough InterpretationExample Context
< 10Very effectiveAntibiotics for certain infections; some acute treatments
10 โ€“ 50Moderate effectMany common preventative medications
> 100Weak effectSome population-level preventive strategies
โš ๏ธ These Bands Are Informal โ€” Context Is Everything

These thresholds are not from NICE or RCGP โ€” they are rough teaching aids only. The "right" NNT is always context-dependent:

  • An NNT of 100 might still be worthwhile if the outcome prevented is death or serious irreversible harm
  • An NNT of 5 might not be acceptable if the treatment has frequent or serious side effects

One-line rule: NNT tells you how many patients you treat for one to benefit โ€” lower = stronger effect. But always weigh it against the severity of the outcome and the burden of treatment.

๐Ÿ”„ Don't Confuse NNT with % Benefit

If 60 out of 100 patients benefit, that is an ARR of 60% (= 0.6), giving NNT = 1 รท 0.6 โ‰ˆ 1.7 โ€” an excellent result. The NNT is not the number who benefit; it is the number you treat to get one benefit.

๐Ÿ“– Odds Ratio & Hazard Ratio โ€” When Are These Used?
MeasureUsed InInterpretation
Relative Risk (RR)Cohort studies, RCTsDirectly compares risk in two groups. More intuitive than OR.
Odds Ratio (OR)Case-control studiesCompares odds of exposure in cases vs controls. Approximates RR when disease is rare.
Hazard Ratio (HR)Survival analysis (time-to-event)Like RR but accounts for when events occur over time. HR <1 = reduced hazard in treatment group.
โš ๏ธ AKT Trap โ€” OR โ‰  RR

For common diseases, the OR overestimates risk compared to the RR. For rare diseases (<10% prevalence), they are approximately equal. A case-control study generates an OR โ€” you cannot directly calculate incidence or RR from a case-control study.

๐Ÿงฎ Worked Example โ€” Calculating NNT from Trial Data

๐ŸŽฏ Scenario: Statin Trial

A 5-year RCT shows that among patients with high cardiovascular risk: 6% in the placebo group had a heart attack, compared to 4% in the statin group.

1 CER (Control Event Rate) = 6% = 0.06    EER (Experimental Event Rate) = 4% = 0.04
2 ARR = CER โˆ’ EER = 0.06 โˆ’ 0.04 = 0.02 (2%)
3 NNT = 1 รท ARR = 1 รท 0.02 = 50
Treat 50 patients for 5 years to prevent 1 heart attack
4 RRR = ARR รท CER = 0.02 รท 0.06 = 33%
Sounds impressive! But the absolute benefit is only 2%.
5 RR = EER รท CER = 0.04 รท 0.06 = 0.67
The statin group had 67% of the risk of the placebo group โ€” a 33% relative reduction.
๐Ÿ’ก The Clinical Bottom Line

When a patient asks "Will this statin help me?", use NNT. "If 50 people like you take this tablet for 5 years, 1 heart attack will be prevented. For you personally, it's a 2% absolute benefit." That's far more honest than "It reduces your risk by 33%."

๐Ÿ’ฌ Communicating Risk to Patients (AKT Favourite)

The AKT often tests how you would explain statistical information to patients. There are four main formats:

FormatExampleBest For
Natural frequency"5 out of every 100 people"Easiest for patients to understand
Percentage"5% of people"Widely used but can mislead
NNT"Treat 20 to prevent 1 event"Communicates absolute benefit clearly
Cates plotVisual grid of 100 facesBest visual aid for shared decision-making
๐Ÿšจ Never Use RRR Alone When Communicating Risk

Saying "this reduces your risk by 33%" without giving the baseline risk is misleading. A 33% RRR from a baseline of 0.3% means your absolute benefit is 0.1%. Always pair RRR with baseline risk or give ARR/NNT instead.

๐Ÿ’Š The Pharma Rep Trick โ€” How Drug Companies Spin Statistics

Drug company representatives are trained to present trial data in the most favourable light. They use a simple but effective statistical sleight of hand:

  • Benefits โ†’ quoted as Relative Risk Reduction (RRR) โ€” because it sounds bigger and more impressive
  • Harms โ†’ quoted as Absolute Risk Increase (ARI) โ€” because it sounds smaller and less concerning

Worked Example โ€” a fictional statin rep visit

A new statin reduces heart attacks from 2% to 1% over 5 years, but increases myopathy from 1% to 2%.

What the rep saysWhat it actually means
"This drug reduces heart attacks by 50%"RRR = 50% โ€” but ARR is only 1%. NNT = 100. Treat 100 people for 5 years to prevent 1 heart attack.
"The myopathy risk increases by only 1%"ARI = 1% โ€” but that's actually a doubling of the myopathy risk (Relative Risk Increase = 100%).
The honest version: "If 100 patients take this drug for 5 years, 1 extra person avoids a heart attack โ€” and 1 extra person develops myopathy." Same data. Completely different impression.

The antidote: always ask โ€” "What is the absolute difference?" When a rep quotes a relative figure, convert it yourself: ARR = CER โˆ’ EER, then NNT = 1 รท ARR. This applies equally to benefits and harms.

4

Diagnostic Testing & Screening โ€” The 2ร—2 Table

The 2ร—2 contingency table is the foundation of all diagnostic statistics. If you can build and read this table, you can answer most diagnostic AKT questions.

The 2ร—2 Table

ACTUAL DISEASE STATUS
TEST RESULTDisease PresentDisease Absent
Test PositiveTrue Positive (TP)
Test says YES โ†’ has disease โœ“
False Positive (FP)
Test says YES โ†’ no disease โœ—
Test NegativeFalse Negative (FN)
Test says NO โ†’ has disease โœ—
True Negative (TN)
Test says NO โ†’ no disease โœ“
Sensitivity
TP รท (TP + FN)
Proportion of people WITH disease who test positive. "How good at catching disease?"
Specificity
TN รท (TN + FP)
Proportion of people WITHOUT disease who test negative. "How good at ruling out disease?"
Positive Predictive Value (PPV)
TP รท (TP + FP)
Probability that a positive test truly means disease. Affected by prevalence.
Negative Predictive Value (NPV)
TN รท (TN + FN)
Probability that a negative test truly means no disease. Affected by prevalence.
๐Ÿง  SnNout & SpPin โ€” The Memory Aids (and Why They Work)
SnNout

Snout: A highly Sensitive test, if Negative, rules the disease Out.

If sensitivity is very high, a negative test result means the disease is very unlikely (low false-negative rate). Use a sensitive test to screen and rule out.

SpPin

Spin: A highly Specific test, if Positive, rules the disease In.

If specificity is very high, a positive test result means disease is very likely (low false-positive rate). Use a specific test to confirm diagnosis.

Think of sensitivity as a very sensitive smoke alarm โ€” it goes off for everything, including burnt toast. It will never miss a real fire (SnNout). Specificity is a well-calibrated alarm that only triggers for actual fires โ€” if it goes off, you can be confident there really is a fire (SpPin).
๐Ÿ“Š The Effect of Prevalence on PPV & NPV โ€” Critical AKT Topic

This is one of the most important and most tested concepts in diagnostic statistics. Sensitivity and specificity are fixed properties of the test. But PPV and NPV change dramatically depending on how common the disease is in the population you're testing.

Imagine you use the same pregnancy test in a fertility clinic (60% of women are pregnant) versus at a random GP surgery (5% of women could be pregnant). Same test โ€” same sensitivity/specificity. But a positive result means something very different in each setting.
ScenarioPrevalencePPVNPV
Screening the general population for a rare disease (e.g. HIV in low-risk)1%LOW (~16%)HIGH (~99.9%)
Testing in a high-risk specialist clinic50%HIGH (~95%)HIGH (~95%)
๐Ÿ”ด Key Rules to Remember
  • As prevalence โ†‘ โ†’ PPV โ†‘, NPV โ†“
  • As prevalence โ†“ โ†’ PPV โ†“, NPV โ†‘
  • In a low-prevalence population, most positive results are false positives (low PPV) โ€” even with a highly specific test
  • A negative test in a high-prevalence population may still miss disease (lower NPV)
๐Ÿ“ Likelihood Ratios โ€” When You Want to Go Further

Likelihood ratios (LRs) combine sensitivity and specificity into a single number that tells you how much a test result shifts the probability of disease. More advanced than PPV/NPV, but useful โ€” and occasionally tested in AKT.

Positive Likelihood Ratio (LR+)
LR+ = Sensitivity รท (1 โˆ’ Specificity)
How much more likely a positive test is in disease vs. no disease. LR+ >10 = strong evidence for disease.
Negative Likelihood Ratio (LRโˆ’)
LRโˆ’ = (1 โˆ’ Sensitivity) รท Specificity
How much more likely a negative test is in disease vs. no disease. LRโˆ’ <0.1 = strong evidence against disease.
LR+Effect on Post-Test Probability
>10Large and often conclusive increase in probability
5โ€“10Moderate increase
2โ€“5Small increase
1No change (test is useless)
0.1โ€“0.5Small to moderate decrease
<0.1Large decrease โ€” strong negative rule-out
Fagan's nomogram uses LRs graphically โ€” you draw a line from pre-test probability through the LR to read off post-test probability. If you see this in the AKT, remember: a positive test shifts probability up, a negative test shifts it down.
5

Population Statistics & Epidemiology

Incidence
New cases รท Population at risk ร— multiplier
Measures risk. Counts only NEW cases over a defined time period.
Point Prevalence
All existing cases รท Total population
Measures disease burden. Counts ALL cases (new and old) at a specific point in time (Time T). Also written as: Cases at Time T รท Population at Time T.
Incidence is the number of new students joining a school this year. Prevalence is the total number of students currently in the school (new + existing). A chronic disease with low incidence can still have high prevalence if people live with it for years.
๐Ÿ“Š Standardised Mortality Ratio (SMR)
SMR Formula
(Observed deaths รท Expected deaths) ร— 100
Adjusts for age and sex when comparing mortality between populations.
SMR ValueInterpretation
= 100Mortality same as reference population
> 100Excess mortality (higher than expected)
< 100Lower mortality than expected
AKT question type: "A mining community has an SMR of 145. What does this mean?" โ†’ Excess mortality โ€” 45% more deaths than expected in a comparable general population.
๐ŸŽฏ Screening โ€” Lead Time & Length Bias (Key AKT Traps)
๐Ÿ”ด Lead Time Bias

Screening detects a disease earlier, making it appear that survival has improved โ€” even if the patient dies at the same point in time. The "survival time from diagnosis" has simply been extended by early detection, not by actual treatment benefit. The patient doesn't live longer; they just know for longer.

๐ŸŸ  Length Bias

Screening programmes are more likely to detect slow-growing, indolent disease (which has a longer "detectable preclinical phase") than aggressive disease that progresses quickly. This makes screening look more effective than it really is for severe disease.

Lead time bias: Imagine you know you're going to die at age 60. Without screening you find out aged 55; with screening you find out aged 40. Screening appeared to give you 15 extra years of survival from diagnosis โ€” but you still die at 60.
โœ… Wilson & Jungner Screening Criteria

Before a screening programme is introduced, it should satisfy the Wilson & Jungner criteria (originally published 1968, still the standard framework). The AKT tests both knowledge of these criteria and application of them to scenarios.

#CriterionWhat It Means In Practice
1Important health problemThe condition has significant morbidity or mortality โ€” worth the effort of screening
2Accepted treatment availableNo point detecting disease you cannot treat
3Facilities for diagnosis & treatment existInfrastructure must be in place before launching
4Recognisable latent or early stageThe disease must have a detectable pre-symptomatic phase
5Suitable test availableTest must be acceptable to the population, safe, and reasonably accurate
6Test acceptable to the populationPeople must be willing to undergo it โ€” invasive or uncomfortable tests may deter uptake
7Natural history adequately understoodMust know how the disease progresses if left untreated
8Agreed policy on who to treatClear protocols needed โ€” not just detection but what happens next
9Cost-effectiveCost of finding each case must be balanced against benefit
10Continuous process, not one-offScreening must be ongoing โ€” disease incidence continues
โš ๏ธ AKT Application โ€” Why PSA Screening Fails These Criteria

PSA screening for prostate cancer is not part of the NHS national screening programme precisely because it struggles with criteria 5 and 7: PSA is not a sufficiently accurate test (low specificity โ†’ many false positives), and the natural history of many low-grade prostate cancers means they would never cause symptoms in the patient's lifetime. This links directly to overdiagnosis (see below).

Memory aid: think of the criteria as three groups โ€” the disease (important, understood, has a latent stage), the test (suitable, acceptable, accurate), the system (treatment exists, facilities exist, policy agreed, cost-effective, continuous).
๐Ÿ” Overdiagnosis โ€” Finding Problems That Wouldn't Have Caused Problems

Overdiagnosis occurs when a real disease is detected โ€” one that truly exists โ€” but that disease would never have caused symptoms, harm, or death during the patient's lifetime if left undetected. It is not a false positive (the disease is real); it is a true positive that did not need to be found.

๐Ÿ”ด Why It Matters

Overdiagnosis converts well people into patients. It exposes them to the anxiety, side effects, and risks of treatment for a condition that would never have harmed them. It is one of the most important harms of screening programmes โ€” and one the AKT tests directly.

ConditionOverdiagnosis Example
Prostate cancerMany low-grade cancers detected by PSA would never progress or cause symptoms โ€” men die with them, not from them
Thyroid cancerUltrasound finds tiny papillary thyroid cancers that are almost universally indolent โ€” detection has soared but mortality unchanged
DCIS (breast)Ductal carcinoma in situ detected by mammography โ€” some would never become invasive cancer
Overdiagnosis is like finding a tiny crack in a wall that would never have caused the house to fall โ€” but now you've spotted it, you feel compelled to fix it. The crack was real; the harm from finding it came from the treatment, not the crack.
Overdiagnosis vs Overtreatment

Overdiagnosis = finding a disease that didn't need finding. Overtreatment = treating a disease that didn't need treating (which may follow overdiagnosis, or may occur independently). They are related but distinct concepts.

๐Ÿ”ข Age Standardisation

When comparing disease rates or mortality between different populations (e.g. different countries, different time periods), you need to adjust for the fact that those populations may have different age distributions. Older populations will naturally have higher mortality even if their health is equally good.

Key Concept

Age standardisation is a statistical technique that removes the distorting effect of different age distributions when comparing health outcomes between populations. It produces a rate that would be observed if both populations had the same age structure (the "standard population").

๐ŸŽ—๏ธ Cancer Statistics โ€” AKT Must-Knows

The AKT occasionally tests specific cancer statistics. You do not need exhaustive oncology knowledge โ€” but these headline figures come up and are worth knowing.

CancerKey StatisticWhy It Matters
Testicular cancer>98% 10-year survivalOne of the most treatable cancers โ€” important to know for counselling young men
Lung cancerLeading cause of cancer death (UK)Despite not being the most common cancer, it kills more people than any other
โš ๏ธ Survival Rate vs Mortality Rate โ€” Don't Confuse These

A cancer can have a high incidence (common) but low mortality (treatable), like breast cancer. Or it can have a lower incidence but very high mortality, like pancreatic cancer. The AKT may test your ability to interpret cancer statistics correctly โ€” don't assume the most common cancer is the deadliest.

โš–๏ธ Health Inequalities

Health inequalities describe unfair, avoidable differences in health between different groups of people. They are a significant focus of UK public health policy and appear in the AKT in the context of epidemiology and social determinants of health.

Definition

Health inequalities are unfair, avoidable differences in health status or in the distribution of health determinants between different population groups. They are "avoidable" because they stem from social, economic, or environmental conditions that could in principle be changed โ€” not from random chance or biological variation.

TypeExamples in the UK
SocioeconomicLower life expectancy in deprived areas; higher rates of cardiovascular disease, diabetes, and mental illness in poorer communities
Geographic"North-South divide" โ€” poorer health outcomes in parts of Northern England compared to the South
EthnicHigher rates of type 2 diabetes in South Asian populations; higher cardiovascular risk in Black populations
GenderMen have lower life expectancy but women have more years of ill health (morbidity)
In the AKT, health inequalities often appear as questions about the social determinants of health or about interpreting population data that shows differences between groups. Key tool: the Marmot Review (Fair Society, Healthy Lives) and the concept of proportionate universalism โ€” universal services delivered with more intensity to those with greater need.
6

Data Distribution & Statistical Significance

Measures of Central Tendency

MeasureDefinitionBest Used WhenWatch Out
MeanSum of all values รท number of valuesData is normally distributedEasily skewed by outliers
MedianMiddle value when sorted in order. If there is an even number of values, the median is the average of the two middle numbers.Skewed data (e.g. income, hospital stay length)Ignores the actual values at extremes
ModeMost frequently occurring valueCategorical data; bimodal distributionsCan be meaningless with continuous data
RangeMaximum โˆ’ MinimumQuick sense of spreadEntirely determined by outliers
IQR (Interquartile Range)75th percentile โˆ’ 25th percentilePaired with median for skewed dataIgnores upper and lower 25%
Standard Deviation (SD)Average spread from the meanNormally distributed dataMisleading if data is not normally distributed
๐Ÿ—‚๏ธ Types of Data โ€” Nominal, Ordinal, Interval, Ratio

Understanding what type of data you have determines which summary statistics and which statistical tests are appropriate. The AKT tests this โ€” usually by presenting a dataset and asking which test or measure to use.

TypeDefinitionExamplesAnalogy
NominalCategories with no natural orderBlood group (A, B, AB, O); sex; eye colour; cause of deathA fruit bowl โ€” apples and bananas are just different, neither is "more"
OrdinalOrdered categories, but the gaps between them are not necessarily equalNYHA heart failure class (Iโ€“IV); pain scale (mild/moderate/severe); Likert scalesRace positions โ€” 1st, 2nd, 3rd. We know the order, but the gap between 1st and 2nd may be very different to the gap between 2nd and 3rd
IntervalOrdered with equal gaps between values, but no true zeroTemperature in ยฐC; calendar dates; IQ scoresA thermometer โ€” 0ยฐC doesn't mean "no temperature." You can't say 20ยฐC is "twice as warm" as 10ยฐC in any absolute sense
RatioOrdered, equal gaps, AND a true absolute zeroHeight, weight, blood pressure, age, income, drug doseMoney โ€” ยฃ0 means you genuinely have nothing. ยฃ40 is twice as much as ยฃ20
๐Ÿ’ก Why It Matters for the AKT
  • Nominal/Ordinal data โ†’ use non-parametric tests, mode or median for averages
  • Interval/Ratio data (normally distributed) โ†’ can use mean, SD, parametric tests
  • You cannot meaningfully calculate a mean for nominal data (e.g. "mean blood group" is nonsense) or make proportional statements with interval data (you can't say someone with an IQ of 120 is "twice as clever" as someone with 60)
๐Ÿ”ฌ Parametric vs Non-Parametric Tests

Statistical tests fall into two families depending on the assumptions they make about your data. The AKT tests which type of test is appropriate for a given scenario โ€” you do not need to perform the calculations, just know when to use which.

The Core Distinction

Parametric tests assume the data is normally distributed (or the sample is large enough that this doesn't matter much) and that the data is at least interval-level. Non-parametric tests make no such assumptions โ€” they work with ranks or categories and are suitable for skewed data, small samples, or ordinal/nominal data.

Parametric tests are like a recipe that only works if your oven is exactly the right temperature. Non-parametric tests are more forgiving โ€” they work even if the oven runs a bit hot or cold.
PurposeParametric TestNon-Parametric Equivalent
Compare means of 2 independent groupsIndependent t-testMann-Whitney U test
Compare means: same group, 2 time pointsPaired t-testWilcoxon signed-rank test
Compare means of 3 or more groupsANOVA (Analysis of Variance)Kruskal-Wallis test
Correlation between two continuous variablesPearson correlationSpearman rank correlation
Compare proportions / categorical dataโ€”Chi-squared test (ฯ‡ยฒ)
โš ๏ธ AKT Decision Rules
  • Data is skewed or not normally distributed โ†’ use non-parametric
  • Small sample size (and normality uncertain) โ†’ use non-parametric
  • Data is ordinal (e.g. pain scores, Likert) โ†’ use non-parametric
  • Comparing proportions or categories โ†’ chi-squared test
  • Large sample, continuous, approximately normal โ†’ parametric test is fine
๐Ÿ”” Normal Distribution โ€” The 68-95-99.7 Rule

A normal distribution is a symmetrical bell-shaped curve. The mean, median, and mode are all equal and sit at the centre.

Range% of Values Included
Mean ยฑ 1 SD68%
Mean ยฑ 2 SD95%
Mean ยฑ 3 SD99.7%
AKT application: Reference ranges for blood tests are usually defined as mean ยฑ 2 SD โ€” meaning 5% of normal healthy people will fall "outside the normal range." This is why a mildly abnormal result in an asymptomatic patient often doesn't need treatment.
โš ๏ธ Skewed Data

With positively skewed data (e.g. income, GP waiting times, serum bilirubin in a ward), the mean is pulled to the right by a few high outliers. In this case, use the median โ€” it better represents the typical value. The AKT loves testing this distinction.

๐Ÿ“‰ P-Values & Confidence Intervals โ€” What They Really Mean

P-Values

The p-value is the probability of observing results at least as extreme as those seen if the null hypothesis is true (i.e. if there is no real effect). It is not the probability that the null hypothesis is correct.

p-valueInterpretation
p < 0.05Statistically significant โ€” less than 5% probability this result occurred by chance
p > 0.05Not statistically significant โ€” result may be due to chance
p = 0.011% chance of getting this result if no real effect

Confidence Intervals (CIs)

A 95% CI means: if you repeated the study 100 times, in 95 of those times the true population value would fall within this range.

A CI is like a weather forecast saying "expect between 14ยฐC and 22ยฐC tomorrow." It doesn't mean the temperature will definitely be in that range โ€” but you're 95% confident it will be.
๐Ÿ”ด The Most Tested Rule โ€” Does the CI Cross the Magic Number?
  • For a ratio (RR, OR, HR): if 95% CI includes 1.0 โ†’ not statistically significant
  • For a difference (mean difference, ARR): if 95% CI includes 0 โ†’ not statistically significant
  • If the CI is entirely above 1 (for ratios) โ†’ significantly increased risk
  • If the CI is entirely below 1 (for ratios) โ†’ significantly reduced risk
โŒ Type I & Type II Errors โ€” Crying Wolf vs Missing the Wolf
Type I Error (ฮฑ โ€” Alpha)

Rejecting the null hypothesis when it is actually true. In other words: concluding that a treatment works when it doesn't. The false positive rate. Conventionally acceptable at 5% (ฮฑ = 0.05, p < 0.05).

"Crying wolf" โ€” sounding the alarm when there's no wolf.

Type II Error (ฮฒ โ€” Beta)

Failing to reject the null hypothesis when it is actually false. In other words: missing a true treatment effect. The false negative rate. Conventionally acceptable at 20% (ฮฒ = 0.2, power = 80%).

"Missing the wolf" โ€” saying no wolf when there is one.

Statistical Power

Power = 1 โˆ’ ฮฒ. It is the probability of correctly detecting a true effect. Higher power = less likely to miss a real effect. Power increases with larger sample sizes. A well-designed trial needs โ‰ฅ80% power.

โšก Statistical Significance โ‰  Clinical Importance

This is one of the most important and most tested nuances in AKT statistics โ€” and one that many trainees miss.

๐Ÿ”ด The Core Rule

A result can be statistically significant (p < 0.05) while being clinically meaningless. Statistical significance only tells you the result is unlikely to be due to chance โ€” it says nothing about whether the effect is large enough to matter in practice.

A trial of 50,000 patients finds that a new antihypertensive reduces systolic BP by 1 mmHg. p = 0.001. Highly significant โ€” but would you change your prescribing for a 1 mmHg drop? No. The p-value is tiny because the sample is enormous, not because the effect is important.
ScenarioStatistically Significant?Clinically Important?
BP drops 1 mmHg, n=50,000, p=0.001YesNo
BP drops 15 mmHg, n=30, p=0.08NoProbably yes
BP drops 12 mmHg, n=500, p=0.02YesYes
๐Ÿ’ก What To Use Instead

Always look at the effect size (ARR, NNT, mean difference) alongside p-values and CIs. A narrow confidence interval that excludes zero or one is more meaningful than a p-value alone โ€” it tells you both the direction and the precision of the effect.

๐Ÿ“ Confidence Interval Width โ€” Precision at a Glance
  • Narrow CI โ†’ precise estimate โ†’ high confidence the true value is close to the point estimate (usually from a large sample)
  • Wide CI โ†’ uncertain estimate โ†’ true value could be anywhere in a broad range (usually from a small or heterogeneous sample)

Example: RR = 1.5 (95% CI 1.4โ€“1.6) โ†’ precise, convincing. RR = 1.5 (95% CI 0.6โ€“3.8) โ†’ wide, uncertain โ€” and crossing 1.0 so not even significant.

๐Ÿ“ˆ Regression to the Mean โ€” The Hidden Confounder of Clinical Practice

Regression to the mean is the statistical tendency for an extreme measurement to be closer to the average on a second measurement โ€” regardless of any intervention. It is one of the most under-recognised sources of misleading conclusions in medicine.

๐Ÿ”ด Why It Matters Clinically

Patients are often investigated or treated precisely when their symptoms or measurements are at their worst. Natural variation means those measurements are likely to improve anyway on re-testing โ€” not necessarily because of your intervention. Without a control group, it is impossible to distinguish regression to the mean from genuine treatment effect.

ScenarioWhat Looks Like Treatment EffectWhat May Actually Be Happening
Patient has very high BP on one reading โ†’ started on medication โ†’ BP lower at next visitDrug is workingFirst reading may have been an outlier; BP would have been lower anyway on repeat measurement
Pupil scores very poorly on a test โ†’ gets extra tuition โ†’ scores better next timeTuition helpedThe poor score may have been unrepresentative; natural performance tends toward their average
Patient with severe flare of eczema starts a new cream โ†’ flare improvesCream is effectiveSevere flares naturally improve over time regardless of treatment
Imagine you only measure someone's height when they're standing on a box. They look very tall. Next time you measure them normally โ€” they appear to have "shrunk." The measurement changed; the person didn't. That's regression to the mean.
๐Ÿ’ก How to Protect Against It
  • Take multiple baseline measurements and use the average before starting treatment
  • Use a control group โ€” regression to the mean affects both groups equally, so any difference between groups is more likely to be a real treatment effect
  • This is one of the strongest arguments for RCTs over uncontrolled before-and-after studies
7

Statistical Graphs โ€” What to Look For

The AKT regularly presents graphs and asks you to interpret them. Learn to spot the key feature in each graph type โ€” do not try to read everything. One targeted observation is all that's needed.

Graph TypePrimary UseThe One Thing to Look For
Forest PlotMeta-analysis resultsDoes the diamond cross the vertical line of no effect?
Funnel PlotPublication bias detection / GP outlier monitoringIs the plot asymmetrical? (Gap = missing unpublished studies)
Cates PlotCommunicating NNT visually to patientsCount the coloured "benefit" faces; NNT = 100 รท those faces
L'Abbรฉ PlotExploring heterogeneity in meta-analysesDot on the diagonal line = zero treatment effect
Box-and-Whisker PlotData distribution and spreadMiddle line = median (not mean); dots beyond whiskers = outliers
Fagan's NomogramPre-test โ†’ post-test probabilityDraw a line from pre-test probability through LR to read post-test probability
Stem-and-Leaf PlotDistribution โ€” preserves original valuesLike a histogram but shows individual data points
Kaplan-Meier CurveSurvival analysis โ€” time to eventSteps drop when events occur; curves that diverge early and stay apart suggest sustained treatment benefit
HistogramDistribution of continuous dataShape of curve: symmetrical = normal distribution; skewed = use median not mean
๐ŸŒฒ Forest Plots โ€” In Detail
How to Read a Forest Plot
  • Each square = one study. The size of the square = the study's weight (usually driven by sample size)
  • Horizontal lines = 95% confidence interval for that study
  • The diamond at the bottom = the pooled overall estimate. Its width = the pooled 95% CI
  • Vertical line at 1.0 = "line of no effect" (for ratios). If a CI line or the diamond crosses this line, that result is not statistically significant
Heterogeneity โ€” The Iยฒ Statistic

Iยฒ measures how much variation between studies is due to true heterogeneity (real differences) rather than chance.
Iยฒ < 25% = low heterogeneity โœ“
Iยฒ = 25โ€“50% = moderate heterogeneity
Iยฒ > 50% = substantial heterogeneity โ€” pooled result is less reliable โš ๏ธ

Think of the diamond as the jury's final verdict. If the diamond crosses the vertical "line of no effect," the jury is hung โ€” no definitive conclusion. The Iยฒ statistic tells you whether the jurors even agreed on the same case.
๐Ÿ“ฏ Funnel Plots โ€” Publication Bias & GP Outlier Monitoring

Funnel plots appear in two entirely different contexts in the AKT โ€” make sure you recognise both.

In Meta-Analysis (Publication Bias)

Plots effect size (x-axis) against study precision or size (y-axis). A symmetric inverted funnel = no bias. An asymmetric funnel with a gap at the bottom-left = missing small negative studies โ†’ publication bias.

In Practice Performance Monitoring

Compares GP practices on metrics (e.g. referral rates, mortality). Data points outside the funnel lines are statistical outliers warranting investigation โ€” not necessarily proof of poor performance.

๐Ÿ˜Š Cates Plots โ€” How to Extract NNT Visually

A Cates plot (sometimes called a "smiley face plot") uses a grid of 100 faces to help patients visualise absolute benefit and harm.

  • Each of the 100 faces represents one person per 100 treated
  • Yellow/green faces = people who benefited (events prevented)
  • Red faces = people who experienced harm
  • Grey/plain faces = no effect either way
NNT from Cates Plot
NNT = 100 รท number of benefit faces
Example: 4 yellow faces โ†’ NNT = 100 รท 4 = 25
๐Ÿ“‰ Kaplan-Meier Survival Curves โ€” How to Read Them

Kaplan-Meier (KM) curves show the probability of surviving (or remaining event-free) over time. They appear in AKT questions about cancer trials, cardiovascular studies, and any research tracking time to an event.

FeatureWhat It Means
The Y-axisProbability of survival (or event-free survival) โ€” starts at 1.0 (100%) and falls over time
The X-axisTime (days, months, years)
Each downward stepAn event has occurred (e.g. death, relapse). Larger steps = more events at that point
Tick marks on the lineCensored patients โ€” lost to follow-up or study ended. Not events.
Two curves diverging early and staying apartSuggests sustained treatment benefit throughout follow-up
Curves crossingSuggests hazard is not proportional over time โ€” complicates interpretation
Median survivalThe time point where the survival curve crosses 50% โ€” the point where half the patients have had the event
Link to Hazard Ratio

The log-rank test compares two KM curves statistically. The hazard ratio (HR) summarises the overall difference between curves. HR < 1 = treatment group has fewer events over time. If the curves overlap substantially, the HR will be close to 1 (no benefit).

Think of a KM curve as a staircase going down. Each step down is someone having the event. A treatment that works keeps people on the higher steps for longer โ€” the treated group's staircase descends more slowly.
๐Ÿ“Š Histograms โ€” Distribution at a Glance

A histogram displays the distribution of a continuous variable (e.g. age, blood pressure, BMI) by grouping values into intervals (bins) and showing how many observations fall in each. Unlike a bar chart, the bars touch โ€” because the data is continuous.

ShapeMeaningUse Mean or Median?
Symmetrical bell shapeNormal distribution โ€” mean, median, mode all equalEither โ€” but conventionally mean ยฑ SD
Right (positive) skewLong tail to the right โ€” a few very high values pulling mean upMedian (more representative)
Left (negative) skewLong tail to the left โ€” a few very low values pulling mean downMedian (more representative)
Bimodal (two peaks)Two distinct subgroups in the dataReport both peaks separately
AKT tip: if a histogram shows positive skew, the correct measure of central tendency to quote is the median. If it shows a normal distribution, the mean is appropriate.
8

Quality Improvement & Clinical Audit

ToolPurposeKey Features
Clinical AuditMeasure practice against explicit standards; identify and close gapsRequires a defined standard. Uses PDSA cycle. Involves closing the loop (re-audit). Not research โ€” no hypothesis, no new knowledge generated.
Significant Event Analysis (SEA)Systematic multidisciplinary review of a single significant eventBlame-free. Focuses on system learning, not individual fault. Shared within the team. Documents learning and action.
Root Cause Analysis (RCA)In-depth investigation of serious incidentsMore structured than SEA. Uses "5 Whys" technique. Identifies contributory and root causes. Often for never events or serious harm.
QOF Exception ReportingAppropriate exclusion of patients from QOF indicatorsClinically appropriate for: maximal tolerated therapy, informed patient dissent, extreme frailty, or clinical contraindication.
๐Ÿ”„ Clinical Audit vs Research โ€” Key Differences
FeatureClinical AuditResearch
PurposeImprove care by comparing with standardsGenerate new knowledge
HypothesisNone โ€” compares against existing standardAlways has a hypothesis
Ethics approvalUsually not requiredUsually required
ConsentUsually not requiredUsually required
RandomisationNeverMay include RCTs
End resultService improvement; action planNew evidence; publication
AKT trap: "A GP wants to investigate whether a new antibiotic achieves better cure rates than standard treatment in his practice." โ†’ This is research (generates new knowledge, has a hypothesis, needs ethics). "A GP measures whether his practice's asthma review rate meets the NICE standard of 80%." โ†’ This is audit.
๐Ÿ” PDSA Cycle

The Plan-Do-Study-Act (PDSA) cycle is the continuous improvement framework used in clinical audit and quality improvement.

StageWhat Happens
PlanDefine the question, set the standard, plan data collection
DoImplement the change or measure current practice
StudyAnalyse results; compare against the standard; identify gaps
ActImplement improvements; plan re-audit to close the loop
9

Clinical Calculations (AKT Formula Bank)

These formulas crop up in AKT calculation questions. Learn each one and know the clinical cut-offs that trigger action.

Body Mass Index (BMI)
Weight (kg) รท Heightยฒ (mยฒ)
Underweight <18.5 | Normal 18.5โ€“24.9 | Overweight 25โ€“29.9 | Obese โ‰ฅ30 | Morbidly obese โ‰ฅ40
Ankle-Brachial Pressure Index (ABPI)
Highest ankle systolic รท Highest brachial systolic
Normal: 1.0โ€“1.3 | PAD: <0.9 | Severe PAD: <0.5 | Non-compressible (calcified): >1.3
Alcohol Units
(Volume in ml ร— ABV%) รท 1000
e.g. 750ml ร— 12% รท 1000 = 9 units. Low risk: โ‰ค14 units/week for both sexes. No safe level in pregnancy.
Non-HDL Cholesterol
Total Cholesterol โˆ’ HDL Cholesterol
Target: <2.5 mmol/L in those on statin therapy (NICE guidance). Includes all atherogenic lipoproteins.
Paediatric Drug Dose Volume
(Desired dose รท Available strength) ร— Volume
e.g. Child needs 250mg, suspension is 125mg/5ml: (250 รท 125) ร— 5ml = 10ml
Percentage Weight Loss (Neonates)
[(Birth wt โˆ’ Current wt) รท Birth wt] ร— 100
>10% loss in first 5 days requires review. Normal to lose up to 10%; should regain by day 10โ€“14.
Cockcroft-Gault (CrCl)
[(140โˆ’Age) ร— Weight(kg)] รท Creatinine(ยตmol/L) ร— K
K = 1.23 (men), 1.04 (women). Estimates creatinine clearance โ€” used for drug dosing in renal impairment.
๐Ÿงฎ ABPI Worked Example

๐ŸŽฏ Scenario: Mrs T, 72, with leg pain on walking

1 Ankle pressure (highest of DP/PT): 90 mmHg  |  Brachial pressure (highest arm): 130 mmHg
2 ABPI = 90 รท 130 = 0.69
3 ABPI 0.69 = Peripheral Arterial Disease confirmed (cut-off <0.9)
4 Compression bandaging is CONTRAINDICATED โ€” refer to vascular surgery
๐Ÿท Alcohol Unit Worked Examples
DrinkVolume (ml)ABV%CalculationUnits
Large glass wine250ml13%250 ร— 13 รท 10003.25
Bottle of wine750ml12%750 ร— 12 รท 10009.0
Pint lager (strong)568ml5%568 ร— 5 รท 10002.84
Single spirit measure25ml40%25 ร— 40 รท 10001.0
Low-risk guidance: โ‰ค14 units/week (men and women), spread across 3+ days, with 2โ€“3 alcohol-free days per week. This is what you'd discuss in a consultation about alcohol reduction.
๐Ÿงฎ

Worked Examples

๐ŸŽฏ Example 1: Calculating All Risk Metrics from a Trial Table

An RCT randomises patients to receive either Drug X or placebo. After 3 years: 80 out of 500 placebo patients had a stroke; 40 out of 500 Drug X patients had a stroke.

1CER = 80/500 = 0.16 (16%)    EER = 40/500 = 0.08 (8%)
2ARR = 0.16 โˆ’ 0.08 = 0.08 (8%)
3NNT = 1 รท 0.08 = 12.5 โ†’ round up to 13
4RRR = 0.08 รท 0.16 = 0.5 = 50%
5RR = 0.08 รท 0.16 = 0.5 (Drug X halves the risk of stroke)

๐ŸŽฏ Example 2: Constructing a 2ร—2 Table and Calculating Sensitivity & Specificity

A new test is applied to 200 patients, 100 of whom have the disease. The test correctly identifies 85 of the 100 with disease, and correctly identifies 80 of the 100 without disease.

1TP = 85  |  FN = 15  |  TN = 80  |  FP = 20
2Sensitivity = TP รท (TP+FN) = 85 รท (85+15) = 85/100 = 85%
3Specificity = TN รท (TN+FP) = 80 รท (80+20) = 80/100 = 80%
4PPV = TP รท (TP+FP) = 85 รท (85+20) = 85/105 = 81%
5NPV = TN รท (TN+FN) = 80 รท (80+15) = 80/95 = 84%

๐ŸŽฏ Example 3: Paediatric Dose Calculation

A child requires 300mg of amoxicillin. The available suspension is 250mg/5ml. What volume should be given?

1Formula: (Desired dose รท Available strength) ร— Volume
2(300 รท 250) ร— 5 = 1.2 ร— 5 = 6ml
๐Ÿ“‹

Formulas Cheat Sheet & Memory Aids

Risk & Treatment Formulas

ARR = CER โˆ’ EER RRR = ARR รท CER RR = EER รท CER NNT = 1 รท ARR NNH = 1 รท ARI

Diagnostic Testing

Sensitivity = TP รท (TP+FN) Specificity = TN รท (TN+FP) PPV = TP รท (TP+FP) NPV = TN รท (TN+FN)

Population & Clinical Formulas

SMR = (Obs รท Exp) ร— 100 BMI = Wt(kg) รท Htยฒ(m) ABPI = Ankle SBP รท Brachial SBP Alcohol = (Vol ร— ABV%) รท 1000
๐Ÿง  Master Memory Aids
  • SnNout: Sensitive test โ€” Negative rules OUT disease (screening)
  • SpPin: Specific test โ€” Positive rules IN disease (confirmation)
  • CI crosses 1.0 (ratio) or 0 (difference) โ†’ not significant
  • Forest plot diamond crosses line โ†’ not significant
  • Funnel asymmetry โ†’ publication bias (or GP outlier if performance context)
  • Box plot middle line โ†’ MEDIAN (not mean)
  • 68-95-99.7 โ†’ ยฑ1SD, ยฑ2SD, ยฑ3SD in normal distribution
  • Iยฒ >50% โ†’ substantial heterogeneity
  • OR from case-control | RR from cohort | Prevalence from cross-sectional
  • Audit โ‰  Research (audit measures vs standard; research generates new knowledge)
๐Ÿ“Š Study Design Mnemonic: "For Cases, OR โ€” For Cohorts, RR"
  • Case-control โ†’ OR (Odds Ratio) โ€” looking BACK at exposures
  • Cohort โ†’ RR (Relative Risk) โ€” going FORWARD from exposure
  • Cross-sectional โ†’ Prevalence โ€” SNAPSHOT in time
  • RCT / SR โ†’ Gold standard for treatment questions
๐ŸŽ“

Trainer & Teaching Pearls

Common Trainee Blind Spots on This Topic
  • Trainees often learn NNT as a formula without understanding what it actually means clinically โ€” ensure they can explain it in a sentence a patient would understand
  • The distinction between sensitivity/specificity (test properties) and PPV/NPV (influenced by prevalence) is poorly understood โ€” the pregnancy test analogy works well
  • Many trainees know what a forest plot looks like but cannot explain what to look at โ€” focus on the diamond and whether it crosses the line of no effect
  • Lead time bias is frequently confused with length bias โ€” use diagrams or timelines to illustrate the distinction
  • Trainees routinely confuse clinical audit with research โ€” use the RCGP's own examples from assessments
Tutorial Ideas & Discussion Starters
  • "Here's the summary table from a drug trial โ€” can you tell me the NNT and whether you'd prescribe it for this patient?"
  • "Look at this forest plot. Is the intervention effective? How confident are you in that answer?"
  • "Your patient has a positive FIT test. The PPV in a low-risk population is about 3%. How do you explain this to him?"
  • "We're seeing a lot of high PSA results. What's the issue with using PSA as a screening test?"
  • "A colleague wants to compare our sepsis outcomes with the Trust's. Is that audit or research? Does it need ethics?"
  • "How would you explain a 1-in-50 chance of a side effect to a patient who asks 'Is it safe?'"
Reflective Questions for Tutorials
  • How do you currently explain risk to patients? Do you use absolute or relative figures?
  • Can you recall a recent clinical guideline that cited a significant NNT โ€” what was it, and how did it influence your practice?
  • Have you ever ordered a test and not known its sensitivity/specificity? How did you interpret the result?
  • Why might a drug company choose to present their trial results as RRR rather than ARR?

๐Ÿ”ฅ AKT High-Yield Tips

These are the patterns that repeatedly appear in AKT papers. Memorise these and you will score marks.

๐ŸŽฏ NNT Calculation

Always convert percentages to decimals first. ARR = 5% โ†’ NNT = 1 รท 0.05 = 20. Always round up to the nearest whole number. A lower NNT = more effective treatment.

๐ŸŽฏ CI Crosses the Magic Number

For ratios (RR, OR, HR): CI crosses 1.0 = not significant. For differences: CI crosses 0 = not significant. This comes up in nearly every forest plot question.

๐ŸŽฏ Study Design Matching

Rare disease โ†’ case-control โ†’ OR. New exposure going forward โ†’ cohort โ†’ RR. Prevalence snapshot โ†’ cross-sectional. Best evidence for treatment โ†’ RCT or SR/meta-analysis.

๐ŸŽฏ Sensitivity vs Specificity โ€” Which to Use When

Screening test โ†’ want high sensitivity (don't want to miss cases โ†’ SnNout). Confirmatory test โ†’ want high specificity (don't want false positives โ†’ SpPin).

๐ŸŽฏ PPV Drops in Low-Prevalence Populations

Even a highly specific test (99%) gives poor PPV in a low-prevalence setting. Most positive results in population screening are false positives. This is why we don't screen everyone for everything.

๐ŸŽฏ Forest Plot Diamond

If the diamond (pooled estimate) crosses the vertical line of no effect โ†’ overall result is NOT statistically significant. If Iยฒ > 50% โ†’ substantial heterogeneity โ†’ pooled result is less reliable.

๐ŸŽฏ Funnel Plot Asymmetry

A gap in the bottom-left of a funnel plot = publication bias โ€” small negative studies were not published. This inflates the apparent effect of a treatment in the meta-analysis.

๐ŸŽฏ Box-and-Whisker โ€” Always Median, Not Mean

The line inside the box is the median. The box = IQR (middle 50%). Dots or circles beyond the whiskers = outliers.

๐ŸŽฏ RRR Sounds More Impressive Than ARR

Pharmaceutical companies love quoting RRR because it sounds bigger. A 50% RRR sounds amazing โ€” until you know the baseline risk was only 2% (โ†’ ARR = 1%, NNT = 100). Always ask: what was the baseline risk?

๐ŸŽฏ Median โ€” Use for Skewed Data

Skewed distributions (income, hospital stay, serum bilirubin) โ†’ use median not mean. The mean is pulled by outliers; the median is not.

๐ŸŽฏ Audit vs Research

Audit: measures against an existing standard; no ethics needed; no hypothesis. Research: generates new knowledge; needs ethics approval; has a hypothesis. A key distinction the AKT tests repeatedly.

๐ŸŽฏ Intention to Treat (ITT)

ITT analysis includes all randomised participants regardless of adherence. This gives a conservative estimate of effectiveness โ€” more realistic for clinical practice. Per-protocol analysis overestimates the effect.

๐ŸŽฏ ABPI Cut-Off

ABPI < 0.9 = peripheral arterial disease. ABPI > 1.3 = non-compressible (calcified) vessels โ€” unreliable result. Compression bandaging is contraindicated if ABPI < 0.8 (check with your compression guidelines).

๐ŸŽฏ Cates Plot NNT

Count the benefit faces (usually yellow or green). NNT = 100 รท (number of benefit faces). 5 yellow faces โ†’ NNT = 20.

โš ๏ธ

Common Mistakes & Trainee Traps

These are the errors that appear repeatedly across AKT marking schemes. Every one of these is a real mark lost by real candidates.

  • Forgetting to convert percentages to decimals before calculating NNT (e.g., ARR = 5% โ†’ must use 0.05, not 5, to get NNT = 20, not 0.2)
  • Rounding NNT down rather than up (NNT = 12.5 โ†’ answer is 13, not 12)
  • Confusing RRR with ARR and quoting the more impressive-sounding relative figure as the clinical benefit
  • Saying a result is "significant" when the CI just touches 1.0 โ€” it must not include 1.0 to be significant
  • Stating the box-and-whisker plot middle line is the mean โ€” it is always the median
  • Confusing sensitivity with PPV โ€” sensitivity is a fixed property of the test; PPV depends on prevalence
  • Thinking a highly specific test in a low-prevalence population will give a reliable positive result โ€” it won't (low PPV)
  • Confusing an OR with an RR โ€” ORs cannot be directly used as RRs except when disease is rare
  • Saying a case-control study generates RR โ€” it generates OR, because you start with cases and controls, not an exposed cohort
  • Confusing clinical audit with research โ€” claiming an audit needs ethical approval
  • Misinterpreting lead time bias as meaning a screening programme genuinely improves survival
  • Forgetting that Iยฒ >50% in a forest plot raises concerns about the validity of the pooled result
  • Using the mean to describe skewed data (e.g. income, hospital stay length) โ€” use the median

๐Ÿ Final Take-Home Points

  1. NNT = 1 รท ARR. Always convert percentages to decimals. Always round up. Lower NNT = better treatment.
  2. ARR is clinically honest. RRR sounds impressive but can mislead. Always pair RRR with baseline risk.
  3. Sensitivity and specificity are fixed properties of a test. PPV and NPV change with disease prevalence.
  4. SnNout: sensitive tests rule OUT when negative. SpPin: specific tests rule IN when positive.
  5. Forest plot diamond crosses the line of no effect โ†’ result not statistically significant. Iยฒ >50% โ†’ heterogeneity concerns.
  6. Funnel plot asymmetry โ†’ publication bias. Points outside funnel limits in performance monitoring โ†’ outlier practices.
  7. CI for a ratio that includes 1.0 โ†’ not significant. CI for a difference that includes 0 โ†’ not significant.
  8. Case-control โ†’ OR. Cohort โ†’ RR. Cross-sectional โ†’ prevalence. RCT/SR โ†’ gold standard for treatment.
  9. Skewed data โ†’ use median, not mean. Box plot middle line = median. Dots beyond whiskers = outliers.
  10. Clinical audit measures against standards โ€” no ethics needed, no hypothesis. Research generates new knowledge โ€” ethics required.

Statistics questions in the AKT are among the most reliably learnable marks in the paper. A few hours with this page and a handful of practice questions will pay dividends well beyond their investment.

Test Yourself...

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Scroll to Top

How IT ALL STARTED
WHAT WE'RE ABOUT
WHO ARE WE FOR?

Bradford VTS was created by Dr. Ramesh Mehay, a Programme Director for Bradford GP Training Scheme back in 2001. Over the years, it has seen many permutations.ย  At the time, there were very few resources for GP trainees and their trainers so Bradford decided to create one FOR EVERYONE.ย 

So, we see Bradford VTS asย  the INDEPENDENTย vocational training scheme website providing a wealth of free medical resources for GP trainees, their trainers and TPDs everywhere and anywhere.ย  We also welcome other health professionals โ€“ as we know the site is used by both those qualified and in training โ€“ such as Associate Physicians, ANPs, Medical & Nursing Students.ย 

Our fundamental belief is to openly and freely share knowledge to help learn and developย withย each other.ย  Feel free to use the information – as long as it is not for a commercial purpose.ย  ย 

We have a wealth of downloadable resources and we also welcome copyright-free educational material from all our users to help build our rich resource (send to bradfordvts@gmail.com).

Our sections on (medical) COMMUNICATION SKILLS and (medical) TEACHING & LEARNING are perhaps the best and most comprehensive on the world wide web (see white-on-black menu header section on the homepage).