AI-assisted archaeology: what algorithms see in the data

Imagine a typical scenario: a research team working with LiDAR data for a hard-to-access region receives a list of dozens of potential settlement locations invisible in aerial photographs from the algorithm. Some will be confirmed by field research, while others will turn out to be natural formations or false positives resulting from the specifics of the training data. The proportions depend on data quality and the region. This pattern is both impressive and instructive: AI accelerates candidate selection, but a human must visit the site and verify.

At Cashcrown, we observe a similar pattern in all fields where we apply algorithms to research data analysis. Speed and scale on the model side, evaluation and decision on the expert side.

What data does AI process in archaeology

The material algorithms work with in archaeology is more diverse than in most scientific fields.

Remote sensing data. LiDAR data, infrared satellite images, drone aerial photography. Computer vision models, especially convolutional networks, detect terrain anomalies and geometric regularities in this data, suggesting the presence of structures beneath the surface or vegetation.

Ceramic and artifact records. Photographs, 3D scans, dimensions, and material composition. A classifier trained on thousands of described ceramic fragments can assign a new find to a culture, period, and functional group in a time that would manually take weeks.

Environmental and GIS data. Terrain shape, proximity to water sources, soil composition, historical vegetation range maps. Predictive models combine these layers to indicate areas with an increased probability of finds.

Texts and inscriptions. Optical character recognition and language models assist in reading partially damaged inscriptions and tablets. This is a task where AI acts as a proposal, and an epigrapher or classical philologist makes the final decision.

Where the algorithm actually helps

It’s worth separating applications that work repeatably from those that are still experimental.

Application	Maturity	Human Role
Ceramic classification based on images	Mature, production-ready	Sample verification, exception management
LiDAR anomaly detection	Mature, widely used	Field validation before announcing discovery
Site location prediction from GIS data	Proven in limited regions	Research priority selection, excavation decisions
Photogrammetry and 3D reconstruction	Mature	Cultural and chronological interpretation
Reading damaged inscriptions	Experimental	Epigrapher approves or rejects each proposal
Dating based on artifact style	Experimental	Researcher compares with stratigraphic context

Common denominator: the more a task involves pattern recognition in large, homogeneous datasets, the better the model performs. The more it requires understanding cultural context, intent, narrative, or the ethical dimension of a find, the more indispensable the human becomes.

What the pipeline from data to candidate looks like

A typical AI-supported analysis cycle does not replace research methodology. It integrates as an accelerating layer.

Input data undergoes preprocessing: normalization of image resolution, georeferencing, filling missing values in environmental data. Then, a feature extraction model converts raw pixels or measurements into numerical representations that can be compared.

The actual algorithm operates on this representation: a classifier for artifacts, a spatial prediction model for site locations, an anomaly detector for imagery. The result is a list of candidates with assigned confidence levels, not a list of facts.

Confidence level is key here. A good system informs the researcher not only what it proposed but also how far the proposal is from the training data distribution. A result outside this distribution signals that the model is operating in an area where its calibration is uncertain. In our implementations, such a signal is passed to the user as an annotation, not hidden.

▶Assess the credibility of site location predictionssandbox · reasoning

Model boundaries and where the archaeologist decides

Hallucinations in archaeology are not just a technical problem. A false-positive site prediction can direct limited research resources to the wrong place. An incorrect artifact classification can become entrenched in the literature and be cited by subsequent systems learning from that same literature.

Several limitations to keep in mind when designing a system:

Training data bias. Models learn from what has already been discovered and described. Less-researched areas, underrepresented cultures in datasets, and artifacts deviating from well-known types will be classified worse. This is not an algorithm error—it’s a mirror of the input data.

Lack of contextual reasoning. The model doesn’t know that a specific ceramic configuration in a given region has ritual significance rather than utilitarian. It won’t interpret a find in light of connections with neighboring cultures. That’s the expert’s task.

Sensitivity to data quality. Imagery from different sensors, seasons, and resolutions can yield inconsistent results even for the same area. Preprocessing determines result credibility more than model architecture choice.

Following the principles we apply in our projects, every model result that impacts excavation decisions or artifact classification in heritage registers undergoes human oversight: verification by an authorized researcher with documented justification. This doesn’t slow down research—it’s the standard without which results aren’t published.

Data issues and responsibility for heritage

Archaeological data has a special status. Information about site locations, if made public without proper safeguards, can lead to looting. 3D scans of sacred objects and cultural artifacts of Indigenous communities require separate consent protocols, which no model enforces independently.

In practice, this means several requirements when designing a system:

Datasets with location descriptions are stored with restricted access, separate from the analysis model. The model works with representations, not raw GPS coordinates passed through open APIs.

Audit of training data bias is part of project documentation. If training data comes from specific regions or site types, this is explicitly documented, and results for underrepresented areas are marked as less reliable.

Descendant communities have the right to determine what data concerning their heritage may be processed and for what purpose. The AI system does not replace this consultation.

Related issues of model explainability and responsibility for results are discussed further in the context of the black box problem and responsible innovation.

FAQ

Can AI independently confirm an archaeological discovery?

No. The model can propose a location or classification with a certain confidence level, but confirming a discovery requires field validation or laboratory analysis by an authorized researcher. Publication standards and cultural heritage registry requirements demand methodological documentation that the model alone cannot provide.

What types of artifacts does AI classify most effectively?

The best results are achieved for artifacts with large, well-described training datasets: ceramics (shape, ornament, composition), flint tools (flaking technique, type), coins, and seals (iconography, inscriptions). Effectiveness drops for organic artifacts, objects from underrepresented cultures, and finds where micro-stratigraphic context—not just visual features—is decisive.

Are predictive site models suitable for every region?

Models trained on one region do not transfer directly to another with different geology, settlement history, or data availability. Transfer learning allows model adaptation with a limited number of known sites in the new area but requires verification on a local test set before guiding research decisions. Precision on the training set is no guarantee of field effectiveness.

How does the AI Act affect algorithm use in archaeology?

AI systems used in decision-making processes concerning cultural heritage and official registries may be subject to high-risk or significant cultural rights impact requirements. This means obligations for technical documentation, risk assessment, and auditability. Purely assistive systems that do not generate administrative decisions have lighter requirements, but the explainability of results remains a best practice regardless of regulations.

How to distinguish a useful AI system from a tool that just shifts the problem?

A useful system reduces expert work time while maintaining or improving classification quality. Warning signs: no confidence level information in results, no training data bias audit, no path for human verification of model proposals. If a system delivers results without limitation annotations, it shifts the verification burden to the researcher without providing tools for that verification. More on this pattern in the article about the role of humans in AI processes.

The pattern we see in archaeology aligns with what we describe in the context of AI as a scientific assistant and scientists working with AI: speed and scale on the algorithm side, evaluation and responsibility on the human side. If you’re planning to implement a similar system for research data analysis in your organization, the readiness assessment tool will help identify gaps before starting development.

Related case studyMature Product Builder — a gated playbook that builds the app on its own