Test your dental AI before real patients do
The first benchmarking platform for dental AI receptionists. Simulate real patient calls, catch safety failures, verify PMS sync, and get a deployment-ready scorecard.
Three steps to confidence
Run your first test in under 3 minutes. No integration required.
Enter the phone number
Provide the phone number of the AI receptionist you want to test. That's all we need to start.
Pick scenarios & personas
Choose from 103 pre-built test scenarios and 100 realistic caller personas, or create your own.
Get your scorecard
We call the AI, conduct the conversation, score it across 8 dimensions, and deliver a deployment-ready report.
Four levels of evaluation
From raw conversation metrics to deployment-ready business outcomes. Every call is analyzed across 25+ raw metrics, 8 scored dimensions, detailed guardrail audits, and 25 business outcome indicators.
Raw metrics
25 objective data points extracted automatically from the transcript and audio.
Scored dimensions
8 dimensions scored 1-10 by an LLM judge with scenario-specific rubrics.
Guardrail audit
Every moment a safety boundary was tested, logged with timestamp and verdict.
Business outcomes
25 binary metrics that answer "Did the AI actually do its job?"
103 scenarios. 100 personas. Infinite combinations.
Every real-world dental call situation, paired with every type of caller behavior. From a cooperative new patient booking a cleaning to an adversarial caller attempting social engineering.
Scenario categories
Every reason a patient might call, with easy/medium/hard difficulty levels
Persona categories
How the caller behaves — from cooperative to adversarial
Catch dangerous failures before patients do
Our scenarios include trap moments — natural-sounding questions designed to test whether your AI crosses safety boundaries.
Medical advice detection
Does the AI diagnose conditions, predict outcomes, or recommend medications? It shouldn't.
HIPAA & PII protection
Does the AI solicit SSNs, share patient data with unauthorized callers, or confirm patient identity to third parties?
System prompt protection
Can an adversarial caller trick the AI into revealing its instructions, training, or configuration?
Multilingual guardrails
Do safety boundaries hold when patients switch languages mid-call? Many AI systems have weaker guardrails in non-English.
Verify real data sync, not just conversation
Most AI receptionists claim PMS integration. We verify it actually works — with multi-step test sequences that check data persistence across calls.
One report. One answer. Deploy or don't.
Built for DSO managers who need to make deployment decisions, not debug AI. The batch report tells you exactly what's working, what's broken, and what to fix — in 30 seconds.
Inspect the judge. Run your own instance.
The entire evaluation engine is open source. See exactly how we score every call, every dimension, every guardrail. No black boxes.
Transparent evaluation
Every scoring rubric, every weight, every critical failure definition is in the source code. Review the judge module line by line to understand how your AI is being evaluated.
Self-hosted option
Clone the repo, add your own API keys, and run the entire platform on your infrastructure. Your data never leaves your environment. Full control, zero vendor lock-in.
Community-driven scenarios
Submit your own test scenarios and personas. The library grows with real-world edge cases discovered by the dental AI community. PRs welcome.
Your AI receptionist is talking to patients right now.
Do you know what it's saying?
Run your first test$5 free credit on signup · No credit card required