Innovative Prompting Technique Significantly Enhances AI Accuracy in Healthcare Recommendations

In a groundbreaking study that could reshape the future integration of artificial intelligence (AI) within healthcare, researchers at Technische Universität Berlin have demonstrated a remarkable improvement in the accuracy of Large Language Models (LLMs) when advising patients on medical care-seeking decisions. By embedding human reasoning frameworks directly into the prompts given to these models, the team has successfully bridged a critical gap between machine processing and real-world clinical judgment, signaling a paradigm shift in both AI prompt engineering and medical decision support.

The rapid rise of conversational AI platforms like ChatGPT has ushered in a new era where millions rely on these tools for preliminary health advice. Yet, a persistent challenge remains: LLMs tend to default toward an overly cautious approach, routinely recommending emergency or professional medical intervention even in cases where home care would suffice. This tendency to “over-triage” contributes not only to increased healthcare costs but also to unnecessary anxiety and resource strain.

Addressing this problem, the Berlin team, led by human factors experts Marvin Kopka and Markus A. Feufel, turned to theories rooted in cognitive psychology rather than conventional computer logic. Their focus was on Naturalistic Decision-Making (NDM), a framework that models how human experts make complex, high-stakes decisions under uncertainty—a sharp contrast to purely algorithmic or rule-based reasoning typically used in AI prompt design.

Two key psychological models underpinning their approach were Recognition-Primed Decision-Making (RPD) and Data-Frame Theory. RPD encourages the AI to identify patterns in symptoms that match typical cases and mentally simulate outcomes akin to how experienced clinicians anticipate patient trajectories. Meanwhile, Data-Frame Theory equips the system to construct and continually question situational “frames,” allowing flexibility in interpretation as new clinical data emerges.

Through extensive testing across ten different versions of ChatGPT, including the cutting-edge GPT-4o and GPT-5 models, the team observed dramatic improvements. When employing NDM-inspired prompts, the models’ overall accuracy soared, particularly in identifying cases suitable for self-care. Typical baseline advice saw correct self-care recommendations in only about 13.4% of instances; this figure jumped to nearly 30% when the AI applied the NDM reasoning framework—a significant leap in patient empowerment and system efficiency.

Perhaps more compellingly, even the simpler ChatGPT variants, which traditionally failed to discern nuanced care pathways, began delivering sophisticated, context-sensitive advice when prompted with a “human reasoning blueprint.” This demonstrates that the secret to unlocking AI’s real-world usefulness may lie not in more data or bigger models alone but in infusing them with decision heuristics modeled on expert cognition.

Safety—a paramount concern in deploying AI for medical guidance—remained uncompromised. The system maintained its robust ability to recognize emergencies, reinforcing that integrating human-inspired reasoning selectively tempers AI’s default caution without compromising critical vigilance. The AI became better calibrated: it knew when it was appropriate to reassure and recommend home care and when to escalate concerns appropriately.

Marvin Kopka emphasized the significance of shifting AI evaluation and training toward scenarios reflecting real-world ambiguity and imperfect information. “Too often, AI assessments occur under idealized conditions with perfect data,” Kopka remarked. “But actual clinical environments demand decision-making amid uncertainty. Our work shows that leveraging established cognitive models as prompt strategies enables LLMs to navigate this complexity with improved precision.”

This research not only challenges long-standing assumptions about effective AI design but also charts a path toward personalized medicine enhanced by AI-human synergy. By enabling models to simulate outcomes and critically evaluate their own interpretative frameworks, the technology mirrors how clinicians dynamically adapt to evolving patient presentations. This cognitive flexibility is arguably essential for safe, ethical, and effective AI decision support.

While the results herald a substantial leap forward, the researchers caution that the current advancements are best suited to controlled or semi-controlled environments, where inputs are reasonably consistent. Embedding these reasoning frameworks into the chaotic and heterogeneous landscape of everyday clinical encounters will require further study and refinement. The next frontier lies in validating whether such NDM-inspired prompts can sustain improved performance in diverse, non-standardized settings beyond research environments.

The implications extend beyond just healthcare. The principle of infusing AI with naturalistic human reasoning could revolutionize a wide spectrum of fields where decisions must be made under uncertainty and incomplete information—from emergency response to industrial process control. It suggests a broader reevaluation of prompt engineering, favoring cognitive science insights over rigid computational paradigms.

Marvin Kopka’s work has garnered notable recognition, including receiving the 2025 JMIR Publications Early Career Researcher Award, underscoring the study’s impact and the growing importance of human-centered AI research. As AI systems become ubiquitous, integrating psychological theories into their design could pave the way for machines that not only process information but genuinely think in ways aligned with human experts.

In sum, this study from Technische Universität Berlin exemplifies a crucial step in transforming LLMs from blunt instruments of caution into nuanced advisors that respect both safety and appropriateness. By embracing the complexity of human decision-making strategies, the researchers illuminate a promising future where AI assists clinicians and patients with greater confidence, compassion, and competence.

Subject of Research: Not applicable

Article Title: Increasing Large Language Model Accuracy for Care-Seeking Advice Using Prompts Reflecting Human Reasoning Strategies in the Real World: Validation Study

News Publication Date: May 11, 2026

Web References:

JMIR Biomedical Engineering
DOI: 10.2196/88053

References: Kopka M, Feufel M. Increasing Large Language Model Accuracy for Care-Seeking Advice Using Prompts Reflecting Human Reasoning Strategies in the Real World: Validation Study. JMIR Biomed Eng 2026;11:e88053

Image Credits: Marvin Kopka, Division of Ergonomics, Department of Psychology & Ergonomics (IPA) at Technische Universität Berlin

Keywords

Biomedical engineering, Ergonomics, Medical technology, Medical equipment, Artificial intelligence, Machine learning, Cognitive robotics, Naturalistic Decision-Making, Large Language Models, Clinical decision support, Recognition-Primed Decision-Making, Data-Frame Theory

Tags: AI-driven patient care-seeking decision supportbridging machine processing and clinical expertisecognitive psychology applied to AI healthcareconversational AI in preliminary health advicecost reduction in healthcare through AIenhancing AI clinical judgment with human factorsimproving AI accuracy for medical adviceinnovative AI prompting techniques in healthcareintegrating human reasoning into AI promptslarge language models for healthcare recommendationsnaturalistic decision-making in AI systemsreducing AI over-triage in medical decisions