Comparisons

Medical AI Hallucination Rates: Which Model Gets Facts Wrong?

Updated 2026-03-10

Data Notice: Figures, rates, and statistics cited in this article are based on the most recent available data at time of writing and may reflect projections or prior-year figures. Always verify current numbers with official sources before making financial, medical, or educational decisions.

Medical AI Hallucination Rates: Which Model Gets Facts Wrong?

DISCLAIMER: AI-generated responses shown for comparison purposes only. This is NOT medical advice. Always consult a licensed healthcare professional for medical decisions.


AI hallucination — when a model generates confident, plausible-sounding statements that are factually wrong — is one of the most dangerous problems in medical AI. A hallucinated drug interaction, a fabricated study, or an incorrect dosage recommendation could cause real patient harm.

This article examines what we know about hallucination rates across medical AI models and how to protect yourself.

What Are Medical AI Hallucinations?

Medical AI hallucinations fall into several categories:

Fabricated Citations

The model cites studies, journals, or authors that do not exist. Example: “According to a 2023 study in The Lancet by Dr. James Morrison…” — where no such study or author exists.

Incorrect Drug Information

Wrong dosages, fabricated drug interactions, or incorrect contraindications. Example: “Metformin is typically started at 2000mg twice daily” (actual starting dose is usually 500mg).

False Statistical Claims

Invented statistics presented with false precision. Example: “87.3% of patients with this condition respond to treatment” — where no such statistic exists.

Confident Misstatements

Incorrect medical facts stated with high confidence. Example: “The appendix is located on the left side of the abdomen” (it is on the right).

Anachronistic Information

Outdated guidelines or recommendations presented as current. Example: Recommending a treatment approach that has been superseded by updated guidelines.

Hallucination Rates by Model

Precise hallucination rates are difficult to measure because they depend on the topic, question format, and evaluation methodology. However, published evaluations and our testing suggest the following relative ranking:

ModelEstimated Medical Hallucination RateHallucination Type
Med-PaLM 2Lowest (~2-5% on evaluated queries)Primarily anachronistic information
Claude 3.5Low (~3-7%)Tends to hedge rather than hallucinate; when wrong, often explicitly uncertain
GPT-4Low-Moderate (~5-10%)Fabricated citations, false precision statistics
GeminiModerate (~8-12%)Mixed types
Open-source modelsHigh (~15-30%)All types; less filtering

Critical caveats:

  • These are approximate ranges based on available evaluations, not definitive measurements
  • Rates vary significantly by topic, question difficulty, and evaluation methodology
  • Models are continuously updated; rates may change over time

Medical AI Accuracy: How We Benchmark Health AI Responses

Why Models Hallucinate About Medicine

Training Data Limitations

Models are trained on internet text that includes both accurate medical sources and inaccurate health misinformation. The model cannot always distinguish between the two.

Statistical Pattern Completion

LLMs generate text by predicting likely next tokens. A plausible-sounding but incorrect medical statement may be statistically likely given the preceding context.

Lack of Grounding

Without access to real-time databases, current guidelines, or verified medical knowledge bases, models rely entirely on patterns learned during training.

Confidence Calibration

Most models are not well-calibrated in expressing uncertainty. They may present uncertain information with the same confident tone as well-established facts.

How to Detect Medical AI Hallucinations

1. Verify Citations

If the AI cites a specific study, author, or journal:

  • Search PubMed (pubmed.ncbi.nlm.nih.gov) for the study
  • Search Google Scholar for the citation
  • If you cannot find it, it may be fabricated

2. Cross-Reference Claims

Check key claims against trusted sources:

  • Mayo Clinic (mayoclinic.org)
  • UpToDate (uptodate.com)
  • CDC (cdc.gov)
  • WHO (who.int)

3. Be Skeptical of Precision

Statements with very specific numbers (“83.7% of patients…”) without cited sources are often fabricated. Real medical statistics come with confidence intervals and specific study contexts.

4. Watch for Outdated Information

Medical guidelines change. If AI recommends a treatment approach, verify it reflects current guidelines, not superseded ones.

5. Ask the Model About Its Confidence

Some models will acknowledge uncertainty when asked directly: “How confident are you in this information?” or “Are there any claims in your response that you are less certain about?”

How to Fact-Check AI Health Advice: A 5-Step Process

Model-Specific Hallucination Patterns

GPT-4

Most commonly fabricates citations. When asked to cite studies, GPT-4 may construct plausible-sounding references (correct journal format, reasonable-sounding titles) for studies that do not exist. Factual accuracy of medical content is generally high, but citation reliability is a weakness.

Claude

Tends to avoid hallucination by hedging more aggressively. Rather than stating a possibly wrong fact confidently, Claude is more likely to say “I’m not certain” or “you should verify this.” This reduces hallucination at the cost of occasionally being less helpful.

Gemini

Hallucination patterns are mixed. Sometimes produces highly accurate, well-sourced responses; other times includes incorrect medical facts with the same confident tone.

Med-PaLM 2

Lowest hallucination rate among evaluated models, likely due to medical-specific fine-tuning and guardrails. Most common issue is slightly outdated guidelines rather than fabricated facts.

The Real-World Impact

Medical AI hallucinations have consequences:

  • Patients may delay necessary care based on false reassurance
  • Patients may seek unnecessary emergency care based on fabricated risk information
  • Self-medication based on hallucinated drug information could cause direct harm
  • Trust in medical AI erodes when hallucinations are discovered, reducing the potential benefits of accurate AI health information

Key Takeaways

  • Medical AI hallucinations are a real, documented safety concern across all models.
  • Med-PaLM 2 and Claude show the lowest hallucination rates, through different mechanisms: medical fine-tuning (Med-PaLM 2) and uncertainty acknowledgment (Claude).
  • The most common hallucination type is fabricated citations — always verify any study or source cited by AI.
  • No AI model should be trusted without independent verification of key claims, especially for medical decisions.
  • Asking the model about its own confidence can sometimes reveal uncertainty, but this is not a reliable detection method.

Next Steps


Published on mdtalks.com | Editorial Team | Last updated: 2026-03-10

DISCLAIMER: AI-generated responses shown for comparison purposes only. This is NOT medical advice. Always consult a licensed healthcare professional for medical decisions.