AI Answers About Tuberculosis: Model Comparison
Data Notice: Figures, rates, and statistics cited in this article are based on the most recent available data at time of writing and may reflect projections or prior-year figures. Always verify current numbers with official sources before making financial, medical, or educational decisions.
AI Answers About Tuberculosis: Model Comparison
DISCLAIMER: AI-generated responses shown for comparison purposes only. This is NOT medical advice. Always consult a licensed healthcare professional for medical decisions.
Tuberculosis (TB) remains one of the world’s deadliest infectious diseases, with ~approximately 10.6 million new cases and ~1.3 million deaths globally each year. In the United States, ~roughly 8,300 cases are reported annually, disproportionately affecting foreign-born individuals, who account for ~approximately 72 percent of US TB cases. Latent TB infection affects ~an estimated one-quarter of the global population, though only ~5 to 10 percent of these individuals will develop active disease. Multidrug-resistant TB (MDR-TB) is a growing concern, with ~approximately 450,000 new cases globally each year.
We asked four AI models about tuberculosis to evaluate their diagnostic and management guidance.
The Question We Asked
“I’m a 34-year-old man who immigrated from India five years ago. I’ve had a persistent cough for about six weeks, along with night sweats, fatigue, and I’ve lost about 10 pounds without trying. A friend at work was recently diagnosed with TB. I had a BCG vaccine as a child. Should I be concerned about tuberculosis? What tests should I expect?”
Model Responses: Summary Comparison
| Criteria | GPT-4 | Claude 3.5 | Gemini | Med-PaLM 2 |
|---|---|---|---|---|
| Identified TB as likely diagnosis | Yes | Yes | Yes | Yes |
| Addressed BCG vaccine impact on testing | Yes | Yes | Partial | Yes |
| Recommended IGRA over TST | Yes | Yes | No | Yes |
| Discussed chest X-ray | Yes | Yes | Yes | Yes |
| Mentioned sputum testing | Yes | Yes | Yes | Yes |
| Discussed treatment duration | Yes | Yes | Partial | Yes |
| Addressed public health implications | Yes | Yes | Partial | Yes |
| Discussed contact tracing | Yes | Yes | Partial | Yes |
What Each Model Got Right
GPT-4
GPT-4 correctly assessed the high pretest probability for TB given the symptom pattern, endemic country of origin, and known TB contact. The model provided an excellent discussion of TB diagnostics, explaining why an interferon-gamma release assay (IGRA) like QuantiFERON-TB Gold is preferred over the tuberculin skin test (TST) in BCG-vaccinated individuals because IGRA is not affected by prior BCG vaccination. GPT-4 discussed the diagnostic workup including chest X-ray, sputum smear microscopy, culture, and molecular testing with GeneXpert. The model outlined the standard 6-month treatment regimen (2 months RIPE followed by 4 months isoniazid and rifampin).
Claude 3.5
Claude 3.5 provided the most reassuring and well-structured response, acknowledging the patient’s concern while clearly explaining the diagnostic process. The model correctly stated that active TB is treatable and curable with proper antibiotic therapy. It provided a clear explanation of the difference between latent and active TB, which is important for patient understanding. Claude 3.5 discussed the BCG vaccine’s impact on skin testing and recommended IGRA. The model also addressed isolation precautions during the diagnostic workup and the importance of contact tracing.
Gemini
Gemini correctly identified TB as a strong concern given the clinical picture and recommended immediate medical evaluation. The model discussed chest X-ray and sputum testing and provided a clear explanation of how TB is transmitted through airborne droplets. Gemini was effective at explaining why the combination of chronic cough, night sweats, weight loss, and a known contact strongly suggests TB evaluation is needed.
Med-PaLM 2
Med-PaLM 2 delivered the most clinically comprehensive response, discussing the full diagnostic algorithm from clinical suspicion through microbiological confirmation. The model discussed acid-fast bacilli smear, mycobacterial culture, nucleic acid amplification testing, and drug susceptibility testing. Med-PaLM 2 addressed the treatment regimen in detail including directly observed therapy (DOT) requirements and the rationale for multi-drug combinations to prevent resistance. The model also discussed the public health infrastructure for TB management including mandatory reporting and contact investigation.
What Each Model Got Wrong or Missed
GPT-4
GPT-4 did not adequately discuss the possibility of drug-resistant TB, which is relevant given the patient’s origin from India, a country with significant MDR-TB burden. The model also did not address the patient’s obligations regarding workplace notification and public health processes.
Claude 3.5
Claude 3.5 did not discuss the specific treatment regimen in detail, which, while appropriate for initial consultation, left the patient without understanding of the treatment commitment. For a condition requiring ~6 to 9 months of treatment with multiple medications, setting expectations early is valuable. The model also did not mention drug resistance considerations.
Gemini
Gemini did not clearly recommend IGRA over TST for a BCG-vaccinated individual, which is a clinically important distinction. The model also provided limited information about the treatment process, leaving the patient without understanding of what TB treatment involves. Contact tracing and public health reporting were insufficiently discussed.
Med-PaLM 2
Med-PaLM 2 provided extensive clinical detail that may be overwhelming for a worried patient. The model did not adequately address the emotional dimension of a potential TB diagnosis, including stigma concerns and the anxiety of potentially having exposed others. Practical guidance about what to do immediately, such as covering the cough and wearing a mask, was insufficient.
Red Flags All Models Should Mention
All AI models should urgently flag these concerns in the context of possible TB:
- Hemoptysis (coughing up blood), which is common in cavitary TB and requires urgent evaluation
- Symptoms suggesting extrapulmonary TB including bone pain, neck stiffness, or abdominal swelling
- Close contact with infants, elderly, or immunocompromised individuals who are at highest risk for severe TB
- Symptoms of TB meningitis including severe headache, altered mental status, and neck stiffness
- History of prior incomplete TB treatment, which increases MDR-TB risk
- Any immunocompromising condition including HIV, which significantly increases TB risk and severity
When to Trust AI vs. See a Doctor
When AI Information May Be Helpful
AI tools can help individuals from TB-endemic regions recognize symptom patterns that warrant evaluation, overcoming the tendency to attribute chronic cough to other causes. AI can also explain diagnostic tests and treatment expectations, helping patients feel prepared for medical encounters.
When You Must See a Doctor
Suspected active TB requires immediate medical evaluation. TB is a public health emergency that involves mandatory reporting, contact investigation, and often directly observed therapy. Diagnosis requires laboratory confirmation that only a healthcare facility can provide. Treatment involves multiple medications taken for months under medical supervision, with regular monitoring for drug side effects. Self-treatment or delayed treatment increases transmission risk and the chance of drug resistance.
For more on how AI handles infectious disease questions, see whether AI can replace your doctor.
Methodology
We submitted the identical patient scenario to GPT-4, Claude 3.5 Sonnet, Gemini 1.5 Pro, and Med-PaLM 2 in March 2026. Each model received the prompt without prior conversation context. Responses were evaluated by an infectious disease specialist against current CDC and WHO TB guidelines. Models were scored on diagnostic accuracy, testing recommendations, treatment knowledge, and public health awareness.
Key Takeaways
- All four models correctly identified TB as a strong diagnostic possibility and recommended urgent medical evaluation.
- Testing recommendations were most accurate from GPT-4, Claude 3.5, and Med-PaLM 2, which correctly recommended IGRA over TST for BCG-vaccinated individuals, while Gemini failed to make this distinction.
- Public health dimensions including contact tracing and mandatory reporting were best addressed by GPT-4 and Med-PaLM 2.
- Drug resistance considerations, particularly relevant given the patient’s origin from a high MDR-TB burden country, were inadequately addressed by all models.
- Suspected TB requires immediate professional medical evaluation and public health involvement, and AI should serve strictly as a tool that prompts patients to seek care.
Next Steps
If you found this comparison helpful, explore these related resources:
- Can AI Replace Your Doctor? What the Research Says
- Medical AI Accuracy: How We Benchmark Health AI Responses
- How to Ask AI Health Questions Safely
- Compare Medical AI Models Side by Side
DISCLAIMER: AI-generated responses shown for comparison purposes only. This is NOT medical advice. Always consult a licensed healthcare professional for medical decisions.