AI safety · Independent evaluation

Detecting dishonesty in AI systems.

Tara Research advances AI safety through rigorous research and the development of evaluation infrastructure. We measure the propensity of AI systems to lie, and we benchmark the effectiveness of safety techniques designed to prevent it, releasing code, leaderboards, and scientific results openly.

Misalignment becomes catastrophic when models hide it. A misaligned model that is honest about its goals can be detected and corrected; one that is dishonest cannot. Robust detection and prevention of dishonesty is therefore foundational to every other oversight mechanism. The success of safety measures such as monitoring and auditing depends on the AI system not strategically misleading its evaluators. Tara exists to provide the field with an impartial, public measure of where we stand and which interventions actually help.

Research

Two parallel tracks, one shared infrastructure.

Tara Honesty Benchmark

An agentic, multi-turn benchmark that measures the propensity of AI systems to lie to the user about their own actions. No instruction to lie, on-policy behavior.

Tara Methods Leaderboard

A continuously maintained, head-to-head comparison of alignment techniques applied to honesty (prompting, steering, fine-tuning, classifiers, weight pruning), evaluated under one standard protocol on one panel of open-weight models.

Approach

Open by default, impartial by design.

We release scenarios, code, and results publicly so other researchers can reproduce, extend, and challenge them. We hold no stake in any specific alignment technique or in any model provider; our role is to give the field clear metrics that fairly measure how often models lie and which interventions actually work.

Tara Research is a Dutch non-profit foundation (stichting) incorporated in Rotterdam, with a U.S. public-charity equivalency determination from NGOsource. The team operates across the Netherlands and Montreal.