Data Notice: AI model performance data and benchmark scores referenced in this medical ai hallucination rates: which model gets facts wrong? article reflect evaluations as of early 2026. AI capabilities evolve rapidly with each model update, and published results may differ from current versions. [medical-ai-hallucination-rates]

Medical AI Hallucination Rates: Which Model Gets Facts Wrong?

Creator: Editorial Team
Published: 2026-03-08

DISCLAIMER: The content in this medical ai hallucination rates: which model gets facts wrong? article is informational and educational only and does not constitute medical advice, diagnosis, or treatment. Always seek guidance from a licensed healthcare professional for medical decisions relevant to your individual health situation. [medical-ai-hallucination-rates]

AI hallucination — when a model generates confident, plausible-sounding statements that are factually wrong — is one of the most dangerous problems in medical AI. A hallucinated drug interaction, a fabricated study, or an incorrect dosage recommendation could cause real patient harm.

This article examines what we know about hallucination rates across medical AI models and how to protect yourself.

What Are Medical AI Hallucinations?

Medical AI hallucinations fall into several categories:

Fabricated Citations

The model cites studies, journals, or authors that do not exist. Example: “According to a 2023 study in The Lancet by Dr. James Morrison…” — where no such study or author exists.

Incorrect Drug Information

Wrong dosages, fabricated drug interactions, or incorrect contraindications. Example: “Metformin is typically started at 2000mg twice daily” (actual starting dose is usually 500mg).

False Statistical Claims

Invented statistics presented with false precision. Example: “87.3% of patients with this condition respond to treatment” — where no such statistic exists.

Confident Misstatements

Incorrect medical facts stated with high confidence. Example: “The appendix is located on the left side of the abdomen” (it is on the right).

Anachronistic Information

Outdated guidelines or recommendations presented as current. Example: Recommending a treatment approach that has been superseded by updated guidelines.

Hallucination Rates by Model

Precise hallucination rates are difficult to measure because they depend on the topic, question format, and evaluation methodology. However, published evaluations and our testing suggest the following relative ranking:

Model	Estimated Medical Hallucination Rate	Hallucination Type
Med-PaLM 2	Lowest (~2-5% on evaluated queries)	Primarily anachronistic information
Claude 3.5	Low (~3-7%)	Tends to hedge rather than hallucinate; when wrong, often explicitly uncertain
GPT-4	Low-Moderate (~5-10%)	Fabricated citations, false precision statistics
Gemini	Moderate (~8-12%)	Mixed types
Open-source models	High (~15-30%)	All types; less filtering

Critical caveats:

These are approximate ranges based on available evaluations, not definitive measurements
Rates vary significantly by topic, question difficulty, and evaluation methodology
Models are continuously updated; rates may change over time

Medical AI Accuracy: How We Benchmark Health AI Responses

Why Models Hallucinate About Medicine

Training Data Limitations

Models are trained on internet text that includes both accurate medical sources and inaccurate health misinformation. The model cannot always distinguish between the two.

Statistical Pattern Completion

LLMs generate text by predicting likely next tokens. A plausible-sounding but incorrect medical statement may be statistically likely given the preceding context.

Lack of Grounding

Without access to real-time databases, current guidelines, or verified medical knowledge bases, models rely entirely on patterns learned during training.

Confidence Calibration

Most models are not well-calibrated in expressing uncertainty. They may present uncertain information with the same confident tone as well-established facts.

How to Detect Medical AI Hallucinations

1. Verify Citations

If the AI cites a specific study, author, or journal:

Search PubMed (pubmed.ncbi.nlm.nih.gov) for the study
Search Google Scholar for the citation
If you cannot find it, it may be fabricated

2. Cross-Reference Claims

Check key claims against trusted sources:

Mayo Clinic (mayoclinic.org)
UpToDate (uptodate.com)
CDC (cdc.gov)
WHO (who.int)

3. Be Skeptical of Precision

Statements with very specific numbers (“83.7% of patients…”) without cited sources are often fabricated. Real medical statistics come with confidence intervals and specific study contexts.

4. Watch for Outdated Information

Medical guidelines change. If AI recommends a treatment approach, verify it reflects current guidelines, not superseded ones.

5. Ask the Model About Its Confidence

Some models will acknowledge uncertainty when asked directly: “How confident are you in this information?” or “Are there any claims in your response that you are less certain about?”

How to Fact-Check AI Health Advice: A 5-Step Process

Model-Specific Hallucination Patterns

GPT-4

Most commonly fabricates citations. When asked to cite studies, GPT-4 may construct plausible-sounding references (correct journal format, reasonable-sounding titles) for studies that do not exist. Factual accuracy of medical content is generally high, but citation reliability is a weakness.

Claude

Tends to avoid hallucination by hedging more aggressively. Rather than stating a possibly wrong fact confidently, Claude is more likely to say “I’m not certain” or “you should verify this.” This reduces hallucination at the cost of occasionally being less helpful.

Gemini

Hallucination patterns are mixed. Sometimes produces highly accurate, well-sourced responses; other times includes incorrect medical facts with the same confident tone.

Med-PaLM 2

Lowest hallucination rate among evaluated models, likely due to medical-specific fine-tuning and guardrails. Most common issue is slightly outdated guidelines rather than fabricated facts.

The Real-World Impact

Medical AI hallucinations have consequences:

Patients may delay necessary care based on false reassurance
Patients may seek unnecessary emergency care based on fabricated risk information
Self-medication based on hallucinated drug information could cause direct harm
Trust in medical AI erodes when hallucinations are discovered, reducing the potential benefits of accurate AI health information

Key Takeaways

Medical AI hallucinations are a real, documented safety concern across all models.
Med-PaLM 2 and Claude show the lowest hallucination rates, through different mechanisms: medical fine-tuning (Med-PaLM 2) and uncertainty acknowledgment (Claude).
The most common hallucination type is fabricated citations — always verify any study or source cited by AI.
No AI model should be trusted without independent verification of key claims, especially for medical decisions.
Asking the model about its own confidence can sometimes reveal uncertainty, but this is not a reliable detection method.

Next Steps

Learn to fact-check AI health advice: How to Fact-Check AI Health Advice: A 5-Step Process
Understand benchmarking: Medical AI Accuracy: How We Benchmark Health AI Responses
Compare model accuracy: Google AMIE vs GPT-4: Medical Question Accuracy
Use AI safely: How to Use AI for Health Questions (Safely)

Published on mdtalks.com | Editorial Team | Last updated: 2026-03-10