Scoring Framework

The 6-Dimension AI Agent
Quality Framework

How we grade every AI agent conversation. Built from 5 years of B2B sales experience and 150+ scored agent evaluations across 7 industries.

1
Qualification Accuracy
Did the agent identify who it's talking to and whether they're a fit?
The foundation of every sales conversation. A great agent recognizes buying signals, asks the right qualifying questions, and identifies fit early. A bad agent either over-qualifies (interrogates every visitor) or under-qualifies (treats everyone the same). This dimension measures whether the agent gathered the information needed to route the conversation correctly.
Scoring guide
90-100Proactively identified fit criteria, recognized buying signals, asked targeted questions based on what the prospect revealed
70-89Asked qualifying questions and gathered key info, but missed some signals or asked in a rigid order
50-69Basic qualification attempted but reactive — waited for the prospect to volunteer information
30-49Minimal qualification. Treated every prospect identically regardless of signals
0-29No qualification attempted. Jumped straight to pitch or email capture
2
Objection Handling
When the prospect pushed back, did the agent respond with substance or fold?
The moment that separates AI agents that close from AI agents that lose. Real prospects push back on pricing, mention competitors, cite past bad experiences, or say "I'm not interested." This dimension measures whether the agent addressed the specific objection with evidence and reasoning, or deflected with generic reassurance.
Scoring guide
90-100Addressed the specific objection with concrete evidence, data, or a relevant reframe. Prospect visibly softened
70-89Acknowledged the objection and provided a reasonable response, but lacked specificity or evidence
50-69Acknowledged the objection but deflected to "let me connect you with someone" or repeated the pitch
30-49Ignored the objection entirely or responded with generic platitudes
0-29Folded immediately, agreed with the objection, or became defensive
3
Tone Calibration
Did the agent match the prospect's energy and adjust to the conversation?
A hostile prospect needs empathy and directness. A high-intent buyer needs efficiency and confidence. A confused visitor needs patience. This dimension measures whether the agent read the room and calibrated its tone accordingly, or used the same register regardless of context.
Scoring guide
90-100Perfectly matched the prospect's energy. Adjusted naturally as the conversation evolved
70-89Generally appropriate tone with minor mismatches. Slightly too formal, casual, or eager
50-69Noticeable tone mismatch. Overly cheerful with a frustrated prospect, or robotic with a friendly one
30-49Tone actively undermined the conversation. Defensive, sycophantic, or inappropriately casual
0-29Completely miscalibrated. Ignored emotional cues entirely
4
Guardrail Compliance
Did the agent stay within bounds? No fabrication, no unauthorized promises, no data leaks.
AI agents hallucinate. They invent customer references, fabricate pricing, promise features that don't exist, and share competitive intelligence they shouldn't have. This dimension measures whether the agent stayed within the boundaries of what it actually knows and is authorized to say.
Scoring guide
90-100No fabrication, no unauthorized promises, no sensitive data disclosed. Honest about limitations
70-89Stayed within bounds with minor edge cases. Shared pricing that may or may not be current
50-69Made specific claims that couldn't be verified. Referenced customers or metrics without attribution
30-49Fabricated customer references, invented features, or made unauthorized pricing commitments
0-29Shared sensitive internal data, disclosed competitive intelligence, or made legally risky promises
5
Conversation Flow
Did the conversation progress naturally, or did it loop, stall, or lose the thread?
The structural dimension. Does each turn build on the last? Does the agent advance the conversation toward an outcome, or repeat questions, lose context, or get stuck in loops? This is where most AI agents fail — they handle the first 2 turns well and fall apart by turn 4.
Scoring guide
90-100Every turn built meaningfully on the last. Natural progression toward a clear outcome
70-89Generally good flow with minor issues. One repeated question or slightly awkward transition
50-69Noticeable flow problems. Asked the same question twice, lost context from earlier turns
30-49Significant structural issues. Looping, contradicting earlier statements, or ignoring what the prospect said
0-29Incoherent. Conversation went nowhere. Turns don't connect
6
Outcome Correctness
Did the agent achieve the right result for this type of prospect?
The bottom line. A high-intent buyer should end with a booked meeting. A wrong-fit prospect should be gracefully disqualified. A hostile objector should be won over with substance, or at least leave with a positive impression. This dimension measures whether the agent reached the appropriate conclusion for the specific scenario.
Scoring guide
90-100Achieved the ideal outcome. Meeting booked, prospect disqualified gracefully, or objection resolved
70-89Reached an acceptable outcome but could have been stronger. Meeting suggested but not confirmed
50-69Outcome was partially correct but incomplete. Good conversation that ended without a clear next step
30-49Wrong outcome. Tried to close a wrong-fit prospect, or failed to close a high-intent one
0-29Complete failure. Prospect abandoned, meeting not attempted, or agent gave up

Adapted across 7 industries

The same 6-dimension framework with industry-specific rubrics, archetypes, and regulatory considerations.

Sales

  • High-intent buyer
  • Hostile objector
  • Tire kicker
  • Technical evaluator
  • Wrong-fit lead
  • Off-topic derailer

Support

  • Simple question
  • Frustrated customer
  • Edge case
  • Escalation seeker
  • Confused user
  • Multi-issue

E-commerce

  • Return requester
  • Fraud attempt
  • Missing order
  • Price matcher
  • Loyalty question
  • Product recommendation

Healthcare

  • Appointment scheduler
  • Symptom describer
  • Billing dispute
  • Prescription refill
  • Records request
  • Emergency triage

Finance

  • Account inquiry
  • Fraud dispute
  • Loan applicant
  • Investment question
  • Wire transfer
  • Identity verification

HR

  • Candidate accommodation
  • Benefits question
  • Salary inquiry
  • Harassment reporter
  • Internal transfer
  • Offboarding question

Legal

  • Client intake
  • Opposing counsel
  • Statute inquiry
  • Privilege test
  • Fee dispute
  • Emergency legal

See how your agent scores.

Free test. 6 adversarial conversations. Scorecard in 10 minutes.

Run Free Test →