Evaluation of Teaching
"Because teaching without feedback is a bit like driving blindfolded. You might arrive somewhere β just not necessarily where you intended."
Last updated: April 2026
π₯ Downloads
Handouts, evaluation forms, and teaching extras β ready when you are. Perfect for HDR planning, trainer development, and real-world feedback sessions.
path: EVALUATION
π Web Resources
A hand-picked mix of official guidance and real-world GP training resources. Because sometimes the best pearls are not hiding in the official documents.
Evaluation is not just a box to tick. It is the engine of improvement. Without it, a teaching session might feel great but leave zero lasting impact β and nobody knows why.
β With evaluation you can...
- Identify what works and keep doing it
- Spot what isn't working and fix it
- Understand your learners' needs better
- Track growth and improvement over time
- Show evidence of quality teaching
- Build confidence in your teaching skills
- Close the feedback loop with your learners
β Without evaluation...
- Good habits stay invisible β and die out
- Poor habits repeat, unchallenged
- Learner confusion goes unnoticed
- No data to improve the programme
- Teaching stays stagnant
- You're guessing at impact
What are we actually evaluating?
Evaluation in medical education can target several different things. Being clear about what you're evaluating helps you pick the right tool.
Professor Donald Kirkpatrick developed his famous model of training evaluation in the late 1950s. It remains the most widely used framework for evaluating educational programmes in healthcare and beyond. It has since been updated into the New World Kirkpatrick Model by his family, but the core four levels remain unchanged.
The model asks: how well is your training programme actually delivering on what it set out to do? Kirkpatrick's genius was recognising that all four levels matter β and that levels 3 and 4 are the ones trainers most often ignore.
Higher levels = harder to measure but more meaningful. Most GP evaluations only reach Levels 1β2.
How far do most programmes evaluate?
Research consistently shows that most training programmes stop at Level 1 or 2. Yet the most important levels β Behaviour and Results β are the hardest and most expensive to measure. Here's a practical summary:
| Level | What it measures | Typical method | How hard to do | How often done in GP training |
|---|---|---|---|---|
| 1 β Reaction | Satisfaction / experience | Feedback form, verbal feedback | Easy | Very common β |
| 2 β Learning | Knowledge / skills acquired | Pre/post quiz, observed practice | Moderate | Sometimes β |
| 3 β Behaviour | Practice change after training | WPBA review, trainer observation | Hard | Rarely β οΈ |
| 4 β Results | Impact on outcomes | Pass rates, safety data, patient outcomes | Very hard | Very rarely β οΈ |
James and Wendy Kirkpatrick updated the original model around 2009β2010 to make it more applicable to modern organisational learning. The core four levels remain the same, but the emphasis changes:
- The new model recommends working backwards from Level 4 β start with the results you want, then design evaluation (and training) to achieve them.
- Level 3 is now given extra prominence β creating the right conditions for behaviour transfer is seen as the most critical success factor.
- The model distinguishes between "leading indicators" (early signs behaviour is changing) and lagging indicators (final results).
- Manager and supervisor support is highlighted as essential for Level 3 success β in GP terms, this means trainers and TPDs actively reinforcing the learning after the HDR session.
For most GP trainers, the practical message is: don't just ask whether trainees enjoyed the session β ask whether it changed anything.
Kirkpatrick's model was originally designed for industrial training β not the complex, multi-layered world of medical education. Honest educators should know its limits:
- It assumes levels are linked: In reality, a brilliant reaction (Level 1) does not guarantee learning (Level 2), and learning doesn't automatically produce behaviour change (Level 3).
- It ignores context: The workplace environment matters enormously for Level 3. A trainee may know what to do but be working in a culture that doesn't support it.
- It overlooks the teacher: Kirkpatrick's model focuses on the learner, but says nothing about evaluating the teacher's development.
- Soft outcomes are hard to measure: How do you quantify a trainee becoming more compassionate? The model struggles with complex humanistic competencies.
- Level 4 is often impractical: Measuring whether a session on clinical reasoning ultimately improved patient safety is methodologically very difficult.
This doesn't mean Kirkpatrick is useless β it remains the most practical framework available. But using it thoughtfully, and being realistic about what you can measure, is the key.
Most people think "evaluation" means handing out a form at the end of a session. But there are many more creative, engaging, and often more useful ways to evaluate teaching. Don't limit yourself.
Most evaluation forms are too long, too generic, and forgotten the moment they're collected. Here's how to design one that actually gets useful responses β and that you'll actually use.
Two categories of questions to include
Think of your evaluation form in two distinct parts:
Get feedback on the actual quality of the session and the teaching.
- Clarity β Was the session clear and easy to follow?
- Organisation β Did it flow logically? Was it well-structured?
- Engagement β Did it hold your interest throughout?
- Relevance / Usefulness β Was it useful to your learning and practice?
- Materials quality β Were slides, handouts, and resources appealing and helpful?
Get actionable suggestions for making the session better.
- Three things that were most useful β so these can be kept and repeated
- Anything that was not useful or confusing β so it can be removed or improved
- One thing you'll do differently in practice β captures intended behaviour change (Level 3 signal)
- Any other suggestions β open-ended, catches what you didn't think to ask
What makes a good evaluation question?
| β Weak question | Why it fails | β Better version |
|---|---|---|
| "Was the session good?" | Too vague β yes/no gives you nothing useful | "What was the most valuable thing you got from today's session?" |
| "Did you enjoy it?" | Enjoyment β learning. Measures Level 1 incompletely. | "How will you use what you learned in your next clinic?" |
| "Rate the session 1β10" | No context β a 7 could mean many different things | "Rate the session 1β5 for relevance to your current stage of training, and explain your rating." |
| "Any comments?" | Too open β most people write nothing | "What one thing would you change about this session?" |
| 20-question form | Participants give up halfway through | Keep it to 5β7 focused questions maximum |
Research on feedback forms consistently shows diminishing returns after about 5β7 questions. If you can only ask five things, ask these:
- What worked well in this session? (open)
- What one thing would you change? (open)
- How relevant was this to your learning needs? (1β5 scale)
- What is one thing you'll apply in practice after today? (open)
- Any other comments? (optional open)
The Right Order β Plan Your Evaluation Before Your Session
Many teachers design their session first and bolt on evaluation at the end. This is the wrong order. Here's the right approach:
Even experienced teachers make predictable mistakes with evaluation. Recognising them is the first step to avoiding them.
This is the single most common mistake. If you haven't defined what success looks like before the session, you can't meaningfully measure it afterwards. "Did they enjoy it?" is not the same as "Did they achieve the learning objectives?" Always set your outcomes first β even if they're simple and informal.
Generic forms become invisible. Trainees stop engaging, start ticking boxes, and add nothing useful. If your form looks the same in year three as it did in year one, it's time to redesign it. Involve trainees in the redesign β they'll tell you what they actually want to be asked.
This erodes trust. If trainees feel that their feedback disappears into a void, they stop giving honest responses. The most important step after collecting evaluation data is to close the feedback loop β briefly acknowledge what you heard, and say what (if anything) you've changed as a result. Even "We heard your feedback about pacing, and we've adjusted the programme" is enough.
A popular session is not always an effective one. Trainees can love a session that is entertaining but teaches nothing durable. Similarly, a challenging session β one that pushes thinking and generates discomfort β might get lower satisfaction scores but produce more genuine learning. Kirkpatrick Level 1 is important, but it should never be your only measure.
This is the classic "I know this is quick but..." moment β and the resulting data is usually shallow and rushed. If you want useful feedback, build it into the session design. Allow 5β7 minutes at the end. Or do it digitally so people can respond in their own time, with more thought.
It's tempting to dismiss a very negative response as "one person who just had a bad day." Sometimes that's true. But sometimes a single piece of strongly negative feedback is pointing to something real that the majority haven't articulated. Read outliers carefully. Consider whether they might be identifying something genuine.
The insights below come from patterns seen across many GP training schemes, trainer development discussions, and educational supervisor conversations.
The insights below come from recurring patterns shared by GP trainees, trainers, and educators across UK training communities β online forums, deanery discussions, and peer learning groups. Every point here aligns with official RCGP guidance on good educational practice. Nothing here is gossip; it is collected wisdom, translated into clear teaching points.
"The best HDR sessions I've attended didn't just teach me something. They made me want to go back and check something, or try something differently in clinic. The ones that stayed with me were the ones where the teacher actually asked us what we were going to do next."
β A recurring theme from trainee forums
"When I first started using evaluation forms, I expected mostly ticks. What I got was gold. One trainee wrote that she felt the session moved too fast in the second half and she lost the thread. I hadn't noticed. I completely restructured how I pace the second hour after that."
β Shared experience from a UK GP trainer
"We changed our programme by asking trainees to help design it. The sessions they were most passionate about were ones where they'd identified their own learning gap. Ownership changed everything β including engagement with evaluation."
β Recurring insight from UK TPD discussions
What do GP trainees actually want from session evaluation?
Across UK training communities, trainees consistently say the same things when asked what matters most to them about evaluation. Here is what they want β in their own words, translated into a clear picture.
Trainees want to know their input made a difference β not to be ignored.
Power dynamics are real. Honest feedback needs a safe channel.
Nobody wants a 20-question form after an already long session.
Numbers alone can't capture what worked and what didn't.
What makes an HDR session memorable? Trainees say...
Across UK GP training communities, the sessions that get the best evaluation scores β and that trainees still talk about months later β share these features. These are not just what trainees say at the time. These are what they remember when reflecting later.
Sessions linked to real clinical scenarios β not abstract theory β get the best feedback. Trainees remember "that session on heartsink patients" far longer than generic communication skills.
Interactive sessions score much higher than lecture-only ones. Trainees want to think, not just listen. Even 10 minutes of small group discussion transforms how a session lands.
When teachers ask "what do you already know?" or "what would you actually do in clinic?" β and then adapt the session accordingly β trainees feel valued. That trust shows up in the evaluation.
Trainees consistently say the sessions they remember best ended with one clear message: "If you forget everything else, remember this." A closing summary is not optional β it is the most important part.
The most common complaint in trainee evaluations? "It felt rushed in the second half." Pacing matters. Build in time to breathe. Don't sacrifice depth for coverage.
The sessions trainees rate highest are ones where the teacher came back at the next session and said "Based on your feedback, I've changed X." This simple act closes the loop and builds trust.
A recurring theme among GP trainers and TPDs who reflect openly on their own practice: we often evaluate whether trainees were happy, but rarely whether they changed anything. The sessions that get 5/5 satisfaction scores are not always the ones that produce measurable behaviour change. The two are related β but they are not the same thing. The most growth-focused educators evaluate both, separately, and at different time points.
The Honest Truth About Evaluation Forms in GP Training
Here is something that experienced GP educators rarely say out loud, but almost all privately acknowledge. Treating it as a known reality β rather than a flaw to hide β helps you design better evaluation systems.
- If the form has their name on it, they will soften any criticism
- If the TPD is watching them fill it in, they are even less honest
- Forms handed out at the end of a tiring all-day session get the least thoughtful responses
- Generic forms get generic answers β specific questions get specific, useful answers
- If they've seen feedback go nowhere before, they stop engaging entirely
- Anonymous digital forms (Microsoft Forms, Google Forms) β completed after the session, at home
- Genuinely anonymous forms with no identifiers β trainees need to believe this
- Short, specific forms with at least one open question
- Following up at the next session: "Here is what we heard and what we're changing"
- Involving trainees in designing the evaluation β they feel ownership and take it seriously
What UK Research on GP Training Tells Us
Studies evaluating GP training programmes in the UK have revealed some consistent and important findings. These align with and reinforce what trainers and trainees say in their communities.
πΉ Video-Based Teaching Insights β Applied to GP Training
The core educational principles below are drawn from well-established teaching and learning science, applied here directly to the context of UK GP training sessions and HDR evaluation. These are the insights from skilled educators that translate most clearly to GP practice.
Experienced medical educators consistently highlight that learners remember the last thing they hear most vividly. This is known as the recency effect. Yet most HDR sessions end with logistics β "remember to sign the register," "don't forget your portfolio." The last minute should be reserved for one thing: a clear, memorable take-home message.
Ask yourself: If a trainee could only remember one thing from today β what do I want it to be? Then say that last, clearly, and out loud. This is the single highest-leverage change you can make to any teaching session.
Evaluation link: The most impactful evaluation question you can ask immediately after a session is: "What was the single most useful thing from today?" If trainees struggle to answer this, the closing minute did not do its job.
A common frustration shared by UK GP educators: asking trainees to "reflect on today's session in your portfolio" rarely produces meaningful reflection. Reflection has to be structured and prompted. Open-ended "reflect on this" produces generic output β because people don't know what to reflect on.
Better approach: give trainees a specific prompt. For example: "Describe one moment from today's session where your thinking shifted. What shifted? What would you do differently in clinic as a result?"
This prompt produces real Kirkpatrick Level 2β3 evidence. It can be built into the evaluation form as Question 3 or 4, and the answers will be far richer than a 1β5 satisfaction score.
Research on how the brain processes new information shows that every person has a limited mental bandwidth β cognitive load β for absorbing new material. When a teaching session overloads this capacity, learning stops, regardless of how good the content is.
Signs that a session has overloaded cognitive load:
- Trainees begin to look glazed or stop participating after a certain point
- Evaluation forms describe feeling "overwhelmed" or "confused in the second half"
- Lots of information was delivered but trainees can only recall one or two points
In practice: Less is more. Cover fewer concepts, cover them deeper. Build in pauses. Change activity every 15β20 minutes. Use your evaluation form to ask specifically about pacing β if you never ask, you'll never know.
Most evaluation forms ask backwards-looking questions: "Was the session well-organised?" "Did you enjoy it?" These give you information about the past. But the most valuable evaluation question is forward-looking:
"What will you do differently in your next consultation because of today?"
This question reaches for Kirkpatrick Level 3 intent. It doesn't measure behaviour change (that happens later), but it measures commitment to change β which is a reliable predictor of whether change will actually happen.
Skilled educators from across healthcare agree: this is the single most powerful question you can add to any post-session evaluation. If you're going to add just one new question to your form, make it this one.
This is rarely discussed β but widely observed by UK GP trainers. When one or two trainees are very vocal about enjoying a session, others tend to match their scores upwards. When a small group is disengaged, others often drift downwards. Evaluation reflects not just the session but the group mood on that day.
This is why triangulation matters β looking at evaluation data across multiple sessions and multiple methods, rather than reading too much into one session's feedback. A single bad evaluation can be weather. A pattern across six sessions is climate.
Also: verbal round-the-room feedback is particularly susceptible to this effect. Digital anonymous forms β completed alone β produce more independent responses.
The GP Training Evaluation Cycle β Making It Sustainable
Good evaluation is not a single event. It is a cycle. Here is how it works in practice β and what the training community says about keeping it alive throughout the year.
The training community agrees: the cycle breaks down most often between Analyse and Change β and when it does, trainees notice within two to three sessions. The remedy is simple: at the start of the next session, spend two minutes saying "Here's what you told us last time, and here's what we've done about it." Two minutes. That's all it takes to keep the cycle alive.
Evaluation is a topic that is often taught but rarely modelled well. The most powerful thing a trainer or TPD can do is demonstrate good evaluation practice consistently.
Common trainee difficulties with this topic
- Trainees often confuse evaluation (measuring a teaching session) with assessment (measuring a learner's performance). Clarify this early.
- Many trainees struggle to understand why Levels 3 and 4 of Kirkpatrick matter β they need a concrete GP-relevant example (e.g. a great session on clinical examination that produces no change in how anyone examines patients).
- IMGs may be used to very formal, high-stakes evaluation cultures β help them understand that informal, low-stakes evaluation is just as valid and arguably more useful day-to-day.
Tutorial ideas & discussion prompts
Give trainees a fictional teaching session title (e.g. "Managing Hypertension in Primary Care" or "Breaking Bad News"). Ask them to:
- Define 2β3 learning outcomes for the session
- Decide which Kirkpatrick level(s) they want to evaluate
- Design a 5-question form that evaluates against those outcomes
- Share and compare forms β discuss: what did different trainees prioritise?
This exercise makes the connection between learning outcomes and evaluation design concrete and practical.
Prompt for group discussion: "You run a session. The feedback says participants found it confusing and not well organised. How do you respond?" This surfaces how trainees handle critical feedback β a skill as important in clinical supervision as in teaching evaluation.
Key teaching points: negative feedback is data, not a verdict. Your response to critical feedback is more important than the feedback itself.
Ask trainees to recall a recent HDR or educational session and apply the Kirkpatrick framework retrospectively:
- Level 1: How did you feel about it at the time?
- Level 2: What did you actually learn?
- Level 3: Have you changed anything in practice because of it?
- Level 4: Can you identify any patient-level impact?
This is often an eye-opening exercise β trainees realise that memorable sessions are not always the ones that changed their behaviour, and vice versa.
The most powerful thing a TPD can do is evaluate their own HDR sessions openly and transparently β and share the results with trainees, including what they're going to change. This models evaluation as a normal, non-threatening part of professional practice, not something that only happens to juniors.
As short as possible while still being useful. Research suggests diminishing returns after 5β7 questions. If you have 20 questions, you'll get 20 rushed, low-quality answers. Five good questions beat twenty mediocre ones every time.
Technically yes β and it does allow you to compare sessions over time. But if used rigidly for every session without variation, engagement drops. Consider a core set of 3β4 consistent questions, plus 1β2 session-specific questions that change each time.
Yes β and in some ways more so. Verbal feedback is immediate, can be explored in depth, and allows clarifying questions. Its weakness is that it's harder to record and analyse. In a small HDR group, verbal feedback at the end of a session is often more valuable than a form. The key is to make notes afterwards while it's fresh.
This is where evaluation becomes genuinely valuable for your own development. Ask specifically about the areas you found difficult: "Was the pacing right?" / "Was the explanation of X clear?" Being honest about wanting feedback on specific areas often produces more targeted, useful responses.
Many IMGs come from educational cultures where evaluation is highly formal, high-stakes, and rare. The UK GP training approach β where evaluation is frequent, informal, low-stakes, and genuinely intended to improve teaching rather than penalise teachers β can feel surprisingly unfamiliar. Reassuring IMGs that evaluation is a collaborative, developmental process (not an inspection or judgement) often unlocks more honest responses.
Need a simple way to remember Kirkpatrick's four levels? Think of them as a staircase β each step takes you deeper into the impact of teaching, and each step is harder to climb.
| Letter | Stands for | One-line prompt |
|---|---|---|
| R | Reaction | "Did they smile?" |
| L | Learning | "Did they grow?" |
| B | Behaviour | "Did they change?" |
| R | Results | "Did it matter?" |
Before you close this page, make sure these ideas are fixed in your mind. They'll serve you every time you teach.
Videos
Although some of these videos talk about teaching at school, the key principles are transferable to teaching in General Practice.
Bill Gates on Teachers Need Real Feedback
Beware of Cognitive Overload
three
four
five