Sales Pilot: Building Real-Time Voice AI for Sales Coaching

October 21, 2025

by Abdelkader Bekhti, Production AI & Data Architect

Why I Built Sales Pilot

After years of advising companies on AI systems, I wanted to prove something: production-ready AI is possible, but it requires different thinking than most teams apply.

Sales Pilot is my answer—a live AI sales coach for any industry that provides real-time coaching during live customer calls. It analyzes conversations as they happen and provides intelligent suggestions to sales teams.

Key metrics:

Currently in development
Sub-300ms speech-to-text latency
Real-time AI coaching during live calls
Built for 99.9% uptime reliability

Sales Pilot is currently in development, built with the same production-ready approach I've used for enterprise systems.

The Technical Challenge

Building Sales Pilot required solving several hard problems simultaneously:

1. Real-Time Voice Processing

Customer calls happen in real-time. You can't batch process them 5 minutes later—the conversation is already over. The AI needs to analyze speech and provide suggestions while the salesperson is still talking to the customer.

Target latency: Less than 1 second from speech to AI suggestion

2. Conversation Context Understanding

AI can't just transcribe words—it needs to understand conversation flow, detect objections, identify buying signals, and provide contextually appropriate coaching.

Challenge: Maintain conversation state across a 30-minute call with multiple topics and objections.

3. Production Reliability

Sales teams depend on this system during live customer calls. Failures aren't just inconvenient—they cost real revenue.

Requirement: 99.9% uptime, graceful degradation, and immediate error recovery.

4. Multi-Tenant Architecture

Different businesses need customized coaching, separate data isolation, and white-label deployments.

Challenge: Build one system that serves multiple businesses with complete data separation.

Technology Stack: What and Why

After evaluating multiple options, here's the stack that made it to production:

Frontend: Next.js + TypeScript

Why Next.js:

Server-side rendering for fast initial loads
API routes for backend logic
Built-in optimizations and routing
Excellent developer experience

Why TypeScript:

Type safety prevents entire classes of bugs
Better IDE support
Self-documenting code
Critical for production reliability

Voice Processing: Deepgram

Why Deepgram over Google/AWS:

300ms latency (vs 1-2 seconds for competitors)
WebSocket streaming support
Better accuracy on sales conversations
Simpler pricing model

Real numbers:

Average latency: 280ms
Accuracy: 92%+ on sales vocabulary
Cost: ~$0.0043 per minute

Alternatives considered:

Google Speech-to-Text: Good accuracy, but 1-2s latency killed real-time use case
AWS Transcribe: Similar latency issues
OpenAI Whisper: Too slow for real-time (10-30s for processing)

AI Engine: GPT-4

Why GPT-4:

Best conversation understanding
Reliable structured outputs
Handles context windows (8k tokens = ~30 min conversation)
Fast response times (500-800ms)

Prompt engineering critical learnings:

System prompts matter more than I expected
Few-shot examples dramatically improve quality
Temperature 0.3 works best for coaching (not too creative, not too rigid)
Structured output formats (JSON) prevent parsing errors

Cost management:

Average cost per call: $0.08-$0.15
Cached system prompts reduce costs 40%
Smart context truncation keeps token counts manageable

Real-Time Communication: WebSockets

Why WebSockets over HTTP polling:

True real-time (no polling delay)
Lower server load
Bi-directional communication
Native browser support

Architecture:

Next.js API route handles WebSocket upgrade
Deepgram WebSocket for voice streaming
Client WebSocket for real-time updates
Automatic reconnection on failures

Deployment: Vercel

Why Vercel:

Zero-config Next.js deployment
Global CDN automatically
Excellent DX (push to deploy)
Built-in analytics and monitoring

Production config:

Edge functions for low latency
Automatic HTTPS
Preview deployments for testing
Environment variable management

Database: PostgreSQL (Supabase)

Why PostgreSQL:

Strong consistency for business data
Excellent JSON support for conversation logs
Row-level security for multi-tenancy
Battle-tested reliability

Schema design:

Separate schemas per tenant
Conversation logs with full JSON
Indexed for fast querying
Automatic backups

Architecture Overview

The data flow works as follows:

User speaks into microphone in browser
Audio streams to Next.js via WebSocket
Next.js forwards to Deepgram for transcription
Transcription sent to GPT-4 with conversation context
GPT-4 returns coaching suggestions
Suggestions sent back to client in real-time
Full conversation logged to PostgreSQL

Latency breakdown:

Audio capture: ~50ms
WebSocket transmission: ~30ms
Deepgram processing: ~280ms
GPT-4 analysis: ~600ms
Total end-to-end: ~960ms (under 1 second target)

Production Challenges and Solutions

Challenge 1: WebSocket Reliability

Problem: WebSocket connections drop frequently on mobile networks and unstable WiFi.

Solution: Automatic reconnection with exponential backoff. If reconnect attempts are under 5, delay is calculated as minimum of (1000 * 2^attempts, 10000) milliseconds before retry.

Result: Connection success rate increased from 85% to 99.2%

Challenge 2: Conversation Context Management

Problem: GPT-4 has token limits. A 30-minute conversation exceeds 8k tokens.

Solution: Smart context windowing:

Keep full system prompt
Keep last 10 messages for immediate context
Summarize older messages
Include key objections/buying signals

Result: Maintain conversation quality while staying under token limits

Challenge 3: Cost Management

Problem: Initial costs were ~$0.50 per call (not sustainable)

Solution:

Cached GPT-4 prompts (40% cost reduction)
Debounce transcription processing (30% reduction)
Smart conversation summarization
Batch non-critical logs

Result: Reduced to ~$0.10 per call while maintaining quality

Challenge 4: Multi-Tenant Data Isolation

Problem: Multiple businesses using same system—data leaks would be catastrophic.

Solution: PostgreSQL row-level security with tenant_id policy. Database-level enforcement makes it impossible to leak data across tenants.

Result: Database-enforced isolation, impossible to leak data across tenants

Challenge 5: Error Handling in Production

Problem: Any error during a live call loses the customer.

Solution: Graceful degradation:

If Deepgram fails → switch to browser native speech recognition
If GPT-4 fails → show cached coaching tips
If WebSocket fails → fall back to HTTP polling
Always log the conversation, even if AI fails

Result: Zero conversation losses even during service outages

Monitoring and Observability

You can't fix what you can't see. Sales Pilot has comprehensive monitoring:

Key Metrics:

Latency p50, p95, p99: Track AI response times
Error rates: By service (Deepgram, GPT-4, DB)
WebSocket connection success rate
Cost per conversation
Transcription accuracy (spot-checked manually)

Alerts:

Latency greater than 2s for 5 minutes
Error rate greater than 5% for 3 minutes
WebSocket success under 95%
Daily cost exceeds budget

Tools:

Vercel Analytics for basic metrics
Custom CloudWatch dashboards
Sentry for error tracking
DataDog for deep investigation

Expected Business Outcomes

For sales teams using Sales Pilot:

Target 20-30% increase in call conversion rates
Expected 40% reduction in onboarding time for new sales staff
Real-time objection handling to improve close rates
Enable managers to monitor and coach remotely

Technical targets:

99.9% uptime goal
Average response time: under 1000ms
Zero data loss through robust error handling
Target $0.10 average cost per call

Lessons Learned

1. Pick Technologies Designed for Your Use Case

I evaluated 6 different speech-to-text services. Deepgram won because it was specifically designed for real-time streaming. Don't use batch tools for real-time problems.

2. Monitor from Day One

I added monitoring before the first customer. Every outage I've had was caught by alerts before customers noticed. You need monitoring more than you need features.

3. Cost Optimization is Continuous

My first version cost 5x more than current version—same functionality. Cost optimization isn't a one-time task, it's ongoing engineering work.

4. Error Handling Makes or Breaks Production

Half my development time went to error handling and edge cases. This isn't wasted time—it's the difference between a demo and production software.

5. Real-Time is Really Hard

Real-time systems are fundamentally more complex than batch systems. Every architectural decision must account for latency, failures, and network issues.

What's Next

Sales Pilot is currently in development and launching Q1 2026. The architecture is designed to work for any industry that needs real-time AI assistance during live sales conversations.

Future expansion possibilities include:

Healthcare (patient intake assistance)
Legal (deposition analysis and coaching)
Financial services (compliance monitoring)

The core architecture works for any industry that needs real-time AI coaching.

Get Early Access

Sales Pilot is currently in development. Want to be notified when we launch or interested in early access?

Request Demo - I work with select companies that need production-ready AI sales coaching, not prototypes.

Abdelkader Bekhti is building Sales Pilot to prove that production-ready AI sales coaching is possible. He has 10+ years building enterprise-scale platforms and works with companies in Dubai and globally.