In 2016, Geoffrey Hinton said radiologists should stop training because AI would replace them within five years. A decade later, radiologists are still working. AI, however, has become their most effective diagnostic tool.
The claim "AI will replace doctors" is media-friendly but scientifically false. The claim "AI is just a tool and won’t change medicine" is equally false. The truth lies in the mechanisms: where AI excels, where it fails, what this means for system design, and what legal obligations apply.
What AI actually can do in medicine
#The best-documented results involve perceptual tasks on large datasets. In dermatology, convolutional models classify skin lesions with sensitivity comparable to experienced dermatologists. In ophthalmology, retinal imaging systems detect diabetic retinopathy with precision that previously required specialists to triage hundreds of patients monthly. In radiology, AI reduces missed lung abnormalities by 20-40% under high workloads.
These results are real and noteworthy. But they share a common denominator: they address well-defined, repetitive tasks with large training samples and clear labels. Outside this scope, reliability drops.
Sepsis risk prediction from electronic health records, early detection of ICU deterioration, triage from ECG images—these are additional validated applications. What they share: AI processes signals faster and more accurately than humans under pressure, within a narrow task window. It doesn’t replace doctors. It gives them better signals, sooner.
Where AI fails and why it matters
#Two weaknesses are structural, not incidental.
First is the black box problem. A neural network classifying skin lesions can’t explain its reasoning. It may learn from artifacts: background color, watermarks, or dataset biases. Studies show models labeled "better than dermatologists" lost their edge when tested on images from different cameras or centers. This is the hallucination and drift problem in a zero-tolerance-for-error domain.
Second is the clinical context problem. A patient with dyspnea who works in a mine is different from a nonsmoker with dyspnea at a desk—even if X-rays look identical. AI processes input data. Doctors process patients in their lives. This isn’t a barrier scalable models can overcome.
Additionally, there are systematic biases. If training data comes mostly from one demographic, the model learns that group. A 2024 NEJM study found cardiovascular risk prediction models were systematically inaccurate for women and patients from Sub-Saharan Africa. Deploying such a model without audit is a medical event, not just a technical one.
AI Act: medicine as a high-risk domain
#This isn’t optional. Since 2025, the AI Act classifies medical AI systems as high-risk systems (Annex III), imposing specific technical and documentation requirements before deployment.
Key requirements for high-risk medical systems:
| Requirement | What it means in practice |
|---|---|
| Human oversight (human-oversight) | Physicians must have the ability to challenge or override AI recommendations |
| Transparency and explainability | AI decisions must be explainable to allow verification |
| Risk management | Documented risk analysis before deployment and after significant changes |
| Audit logs | Every AI-assisted decision is logged—who, when, what the model suggested, what the physician decided |
| Training data | Documentation of data sources, representativeness, and validation procedures |
| Impact assessment (DPIA) | Required if the system processes health data or makes decisions about people |
Systems failing these requirements cannot be legally deployed in the EU. For medical software providers, this means compliance architecture must be designed from the first line of code—not bolted on before certification. This principle mirrors our approach in every enterprise deployment: compliance is a design, not a patch.
Explainability: from buzzword to legal obligation
#For years, explainability was an academic topic. The AI Act turned it into a legal requirement for high-risk systems. In medicine, this means concrete architecture.
SHAP and attention maps are the most common post-hoc methods: the model shows which pixels or features influenced the decision. Useful diagnostically, but limited—they show correlation, not causation.
Inherently explainable models (decision trees, logistic regression with feature selection) are easier to audit but weaker perceptually. In image diagnostics, they can’t replace convolutional networks.
Retrieval-Augmented Generation (RAG) introduces a different explainability model: the system doesn’t generate answers from model weights but searches a verified knowledge base and cites sources. A clinical assistant based on RAG can show which ESC or AHA guidelines a recommendation comes from—an explainability level pure LLMs can’t match. We describe a similar architecture in enterprise knowledge assistants.
In designing systems for regulated sectors, we follow this principle: if you can’t explain a model’s decision in domain language, the model shouldn’t make that decision autonomously.
Human-in-the-loop: a mechanism, not philosophy
#“Human oversight” sounds like an ethical principle. In system engineering, it’s a concrete pattern: human-gate—a decision point no action can bypass without human confirmation.
In medicine, an NLP assistant might suggest a differential diagnosis with probabilities. The physician decides which tests to order. AI doesn’t write orders autonomously—that’s the gate. In ICU alerting systems, AI might generate a sepsis score. A nurse confirms or rejects it before the protocol starts—that’s the gate. In radiology, AI flags areas for review. The radiologist verifies before reporting—that’s the gate.
This pattern (model recommends, human approves irreversible actions) is the same one we use for enterprise AI agents: every action with external consequences requires confirmation before execution. In medicine, external consequences mean patient health—the gate requirement is absolute.
Data, privacy, and RODO in clinical systems
#Medicine is one of the most challenging domains for AI data processing—health data is sensitive under RODO, with strict protection regimes and legal basis requirements under Article 9.
Key practical principles for compliant deployments:
Data minimization. The model gets only what’s necessary for the task. Identifying data is masked or pseudonymized before processing—we detail this in PII anonymization.
Processing location. Health data may require processing within the EU or Poland. Self-hosting LLMs or contracts with EU-based providers eliminate this issue structurally.
Retention and right to erasure. AI decision logs must be retained for accountability but no longer than necessary. Patients have the right to request data deletion and access to automated decisions—architecture must support this technically, not just procedurally.
DPIA is required for large-scale health data processing or automated decisions about patients. It’s not a one-time document: it must be updated with every significant system change.
Try it live
#Describe an AI deployment scenario in a medical or regulated context—the model will help preliminarily assess which AI Act and RODO requirements may apply (for informational purposes, not legal advice; playground: PII masked, zero retention):
FAQ
#Are medical AI systems considered high risk under the AI Act?
#In the vast majority of cases, yes. The AI Act (Annex III) classifies as high risk AI systems used in the management and operation of medical devices, as well as systems supporting clinical decisions about patients. This means requirements for technical documentation, risk management, decision logging, and human oversight before deployment. Always confirm a specific system’s classification with a legal expert.
Can AI make diagnostic errors, and who is responsible?
#Yes, AI can and does make errors. Responsibility for clinical decisions lies with the physician who made them. The AI Act and medical law don’t transfer liability to model providers if the physician had the ability to challenge recommendations. That’s why the human-gate pattern is critical: physicians must have tools to verify and the option to reject system suggestions.
How does AI explainability work in clinical practice?
#It depends on the architecture. RAG-based systems cite sources (guidelines, publications) for each recommendation. Perceptual systems (imaging, ECG) use attention maps or SHAP to show which data features influenced the result. This isn’t full causality but gives physicians an entry point for verification. Systems without any explainability don’t meet AI Act requirements for high risk.
Can patient data leave the hospital or country?
#Yes, if RODO requirements are met: a valid legal basis under Article 9, a data processing agreement with the provider, standard contractual clauses, or an adequacy decision for transfers outside the EU. In practice, many hospitals and facilities opt for self-hosting or EU-based providers to eliminate this issue structurally. PII processing must be covered by a DPIA if it involves large-scale data or automated decisions.
Will AI replace doctors in the foreseeable future?
#Not in the role they play today. AI will take over—and is already taking over—narrow, repetitive perceptual tasks: screening, anomaly flagging, risk prediction from structured data. It frees up physicians’ time for what AI lacks: clinical context, relationships, decision-making under uncertainty, and responsibility. The change is real and significant, but the direction is specialization and augmentation, not substitution.