Help Center/Call Handling

Watchtower — AI Call Quality Monitoring

Watchtower automatically scores every customer call across five quality dimensions, aggregates an org-level health score, and surfaces trends and flagged calls — so you know how your AI agent is actually performing without listening to every recording.

Watchtower is the quality-monitoring layer that runs on every customer call. After each call ends, an AI scorer reads the transcript, rates the call across five dimensions, and contributes to your organization's rolling 30-day health score. You see the result on the Watchtower dashboard, in a weekly digest email, and as quality badges on individual calls.

The point: you don't have to listen to recordings to know whether your agent is doing a good job. Watchtower surfaces problems automatically.

How It Works

Every customer call (a real conversation on your AI agent's number) ends, and three things happen:

Engagement check. Was this a real call worth measuring? Robocalls, wrong-number hangups, silent dialer attempts, and very short non-engagements are filtered out. Watchtower only scores calls where there was a genuine exchange. (More on what gets filtered in "What doesn't get scored" below.)
Quality scoring. For real calls, an AI scorer reads the transcript and rates five dimensions (Resolution, Accuracy, Professionalism, Sentiment, Escalation), each on a 1-10 scale. A composite quality score from 0-100 is computed from those five dimensions.
Aggregation. Every four hours, your organization's health score is recomputed from the last 30 days of scored calls. Recent calls are weighted more heavily than older ones (calls in the last 7 days count fully; week 2 calls count 75%; week 3-4 calls count 50%).

Your dashboard reflects the latest snapshot. Weekly digests fire on Monday. Alerts fire when something meaningful changes.

The Five Dimensions

Dimension	Weight	What it measures
Resolution	30%	Did the caller get what they needed?
Accuracy	25%	Was the information correct? No hallucinations?
Professionalism	20%	Natural flow, clear communication, smooth handoffs
Sentiment	15%	Emotional arc — how did the caller feel through the call?
Escalation	10%	When escalation was needed, was it handled well?

The composite quality score is the weighted sum of the five dimension scores, rounded to a 0-100 scale. The composite is computed deterministically from the five dimension scores in code, not by the AI. That means scoring is reproducible: the same dimension scores always produce the same composite.

What the Numbers Mean

Score range	Label	Color
90+	Excellent	Green
75-89	Good	Blue
60-74	Needs attention	Yellow
Below 60	Critical	Red

These thresholds appear consistently across the dashboard, emails, and call list badges.

Flagged Calls

Any individual call where one or more dimensions scored 3 or below gets automatically flagged. Flags are deterministic — there's no AI-driven "is this concerning?" judgment, just a fixed cutoff.

Flagged calls show up:

On the Watchtower dashboard with a brief reason ("Resolution scored 2 — caller never got an answer to their pricing question")
On the call list with a yellow or red quality badge
In the weekly digest email (top 5 most recent flagged calls)

The flag is a "look at this" prompt, not a final judgment. Sometimes a 2 on Resolution reflects a legitimate transfer to a human (which the AI scored as the call not resolving on its own). The badge is meant to start a quick review, not to alarm.

What You See on the Dashboard

Visit /dashboard/watchtower to see:

Health score hero — The current 0-100 composite for your org, with the color label and trend (improving, stable, or declining).
Dimension bars — Each of the five dimension averages over the rolling 30 days, with their individual scores.
Score distribution — A histogram of how many calls fell into each score bucket over the last 30 days.
Flagged calls list — Up to 10 most recent flagged calls with the reason and a link to the call detail.
Scored calls feed — Paginated list of recent scored calls. Each row expands to show the per-dimension rationales the AI provided.

If you have multiple locations, the location selector at the top of the dashboard filters everything to that location's calls.

The main dashboard at /dashboard also shows a compact health-score widget with a "View report" link to the full Watchtower view.

Manual Overrides

Sometimes the AI scorer makes a call you'd disagree with. Common cases:

Marking a call as "non-engagement" when it really was a quick legitimate question
Including a clearly-spam call that slipped past the engagement filter

Both directions are correctable. Open the call detail and look for the "Health score inclusion" section. You'll see the AI's verdict and an option to override it. Four states:

State	What it means
AI included, no override	Default. Call counts toward your health score per the AI's verdict.
AI included, you excluded	You decided this call shouldn't count. AI score is preserved but the call is filtered out of the health-score average.
AI excluded, no override	AI determined the call wasn't real engagement. No effect on health score.
AI excluded, you re-included	You decided this call SHOULD count even though the AI didn't think so.

Overrides are reversible — you can reset to "follow the AI's decision" at any time.

The dashboard subtitle shows X scored • Y excluded so you can see at a glance how many calls are being filtered.

Weekly Digest

Every Monday morning, Watchtower sends an email summarizing the prior week:

Current health score with the color label
All five dimension averages
Total calls scored
Up to 5 most recent flagged calls
A callout if any dimension averaged below 7

The digest goes to recipients in your Monitoring notification category (Settings → Notifications). If the category has no recipients configured, the digest falls back to your account email.

Real-Time Alerts

Three kinds of alerts fire when something meaningful changes:

Below threshold — When your health score drops below 70 for the first time (or for the first time after a recovery), a one-time alert fires. Neutral copy explaining where the score is and what the typical causes are.

Score drop — When your already-below-threshold score drops by 3 or more points since the last alert, a follow-up alert fires noting the magnitude ("dropped from 68 to 64, -4 pts"). Smaller fluctuations under 3 points are suppressed so the inbox doesn't get noisy.

Recovery — When your health score crosses back above 70 after a prior alert, a recovery email fires once celebrating the return. After that, the next breach reads as a fresh first-alert.

Trend decline — When you have three consecutive declining snapshots in a row, a trend alert fires (with a 24-hour cooldown to avoid spam).

All four alert types are sent at most once per 24 hours per organization. The weekly digest carries the steady-state status; alerts only fire on state changes.

What Doesn't Get Scored

Watchtower deliberately filters out calls that don't represent real customer engagement. This protects your health score from being polluted by spam, dialer artifacts, and wrong-number hangups. Filtered out:

Calls under 10 seconds
Calls where the caller never spoke (silent answering-machine dialer attempts)
Calls under 5 caller-words with no tools fired
Calls flagged by the AI scorer as robocalls, wrong-number hangups, IVR-keypress probes, or pure non-engagement

The filter is conservative — when in doubt, the call IS included. We'd rather measure too much than silently drop a real conversation. If you see a call that was filtered out and shouldn't have been, the manual override is your fix.

What's Not Yet Supported

A few capabilities are deliberately deferred. They're either on a future roadmap or out of V1 scope:

Discovery-call scoring — Watchtower scores customer calls on your AI agent's number, not the discovery calls prospects make to learn about Allison Voice itself. Discovery calls have their own quality controls (the discovery agent is admin-monitored separately).
Per-team-member scores — Health scores are computed at the organization level. There's no breakdown by which team member would have handled the call if it had transferred.
Configurable thresholds — The 90 / 75 / 60 thresholds are fixed across the platform. Subscribers can't currently set custom score targets.
Historical exports — Snapshots are stored indefinitely but there's no UI to download the historical series. The dashboard always shows the latest snapshot. Reach out if you need historical data exported.
Daily or custom-cadence digests — The digest email runs Monday only. There's no per-subscriber control over digest cadence.
Per-call audio review — Watchtower scores from the transcript only. The score doesn't include audio-quality factors (noise, dropout, clarity).

If any of these matter for your operation, ask Allison to file a support ticket and the team will track demand.

Still have questions? Log in to chat with Allison.