Building a Virtual Escape Room with AI

sasha97518
Oct 10
4 min read

Hamish Fernando from the University of Sydney outlines how he used AI to build a virtual escape room. It didn't make it easier; it made it possible! Gain insights into the process so that you can replicate it.

When generative AI tools arrived, I realised they could do more than speed up marking or draft text. They could help me design activities I’d never have had the time to create. A good example is a virtual escape room I trialled in Data, AI and Society in Health.

While I am mainly a physiology teacher, this wasn’t about physiology directly, but about something equally important: healthcare data. Students worked in teams to investigate why NeuroSense, a hospital AI used for neurological diagnosis, was producing catastrophic errors, including missed cerebral aneurysms.

The challenge

To solve the mystery, teams moved through five “departments,” each exposing a different weakness that, when combined, explained the failure:

Radiology: 3D MRI scanner calibration and protocol inconsistencies (e.g., equipment repurposed or mismatched acquisition settings) leading to missed findings.
Patient Records: Timelines with missing occupational/environmental exposure histories that, once completed, explain atypical biomarker results.
IT Systems: Integration errors and inconsistent units across systems (metadata mismatches, schema changes, unit discrepancies) that disrupted analysis.
Clinical Staff: Staff interviews revealing how the AI struggled with atypical presentations, prompting clinician workarounds (e.g., resubmitting, exaggerating symptoms).
Laboratory: Updated reference ranges not applied consistently after an analyser change—misclassifying subtle biomarker values.

Individually, each issue looked manageable. Together, they showed how small gaps across a system can cascade into serious harm. That’s the point: integration failures are often the most dangerous.

AI produces small, inspectable datasets

This includes:

Radiology: Imaging protocol and scanner metadata showing calibration/protocol inconsistencies that correlate with missed findings.
Patient Records: Encounter timelines with incomplete exposure histories that, when filled, make sense of biomarker anomalies.
IT Systems: Logs/exports showing unit mismatches and metadata conflicts (e.g., schema updates, protocol IDs).
Clinical Staff: Short interview excerpts surfacing human-in-the-loop bias and workarounds.
Laboratory: Panels with reference ranges carried over from an older analyser (the “smoking gun”).

I always ask the model to insert unit labels, version IDs, and telltale metadata, so clues can be “read” without needing actual images.

Where terms are unfamiliar, I prompt the model to include one-line in-context definitions, so students learn as they use the data.

How AI made it possible

On my own, I couldn’t have produced this level of detail. Each room needed datasets, scanner logs, timelines, lab panels, staff notes, that looked real, with just enough inconsistency to make students think. That would have taken weeks.

With Claude, I generated full simulated tables and text artefacts (units, IDs, anomalies) in two hours. I then used ChatGPT to tighten the narrative flow, check coherence, and make sure everything lined up with the learning outcomes. I set the scaffold, what students should learn and what evidence should exist, and AI built the detail at a speed I simply couldn’t match.

Blueprint (so colleagues can try it)

Pick a failure scenario. Anchor it to one clear outcome (data quality, integration, or process gaps).

Define 3–5 rooms. Each reveals a different failure; together they explain the overall breakdown.

Generate realistic datasets. Ask AI for small, inspectable tables/logs/notes with clear units, version IDs, and metadata; include one or two deliberate anomalies.

Add in-context definitions. Where terms are new, include one-line explanations directly in the artefacts.

Iterate for coherence. Ensure the evidence points to the right conclusion without being obvious; trim anything that creates dead-ends.

Keep it collaborative. Design interdependent clues so teams actually need each other’s perspectives.

Delivery (how I package it)

I publish the rooms as simple LMS pages with module progression, so teams unlock the next room only after submitting the key finding from the current one. This keeps momentum, reduces spoilers, and structures the collaboration.

Guardrails and risks I manage

Cognitive overload. Evidence in each room is tight and scaffolded; if progress stalls, I release pre-written hints.
Over-gamification. The “game” never overrides learning; every clue points back to the outcome I’m assessing.
Collaboration over competition. Groupmates need each other’s perspectives; the focus is solving the system, not beating each other.
Transparency about AI. I tell students where AI helped (drafting datasets, notes) and where I stepped in (accuracy, alignment, assessment).

Student feedback

Feedback was enthusiastic. Students described the task as rewarding, fun, and unlike anything they had done before. Several noted it pushed them out of their comfort zones, encouraging more creative thinking and closer collaboration than traditional assessments.

The evidence behind it

Research into escape rooms specifically highlights their value in healthcare and engineering education when built around authentic challenges. Dittman et al. (2021) demonstrated that virtual escape rooms are not only feasible with large interprofessional cohorts but also flexible in format, making them a strong option for online or blended teaching. Čubela et al. (2023) showed how combining problem-based learning with gamified, data-driven activities can sharpen students’ ability to handle complex, real-world scenarios—exactly the kind of integration issues highlighted in the NeuroSense crisis. Meta-analyses such as Li et al. (2023) confirm that gamification has its greatest impact in higher education when students actively work with realistic mechanics and feedback rather than passive narratives. At the same time, reviews like Castillo-Parra et al. (2022) remind us that escape rooms succeed only when they are carefully scaffolded: poorly designed versions risk frustration and disengagement. These findings align with my own experience, AI enabled me to generate rich but inspectable datasets, scaffolded in a way that kept the puzzles challenging without overwhelming students.

Closing thought

AI didn’t make this easier; it made it possible. Without it, the richness of the datasets, the coherence of the story, and the speed of production would have been out of reach. Students felt like they were playing a game; underneath, they were learning a difficult truth: in health, tiny cracks in data and systems can cause the biggest failures.

Hamish's previous article on using AI for storytelling is also available.

Dr Hamish Fernando

10 October 2025

School of Biomedical Engineering

University of Sydney