Sales Pilot: Building Real-Time Voice AI for Sales Coaching

by Abdelkader Bekhti, Production AI & Data Architect

Why I Built Sales Pilot

After years of advising companies on AI systems, I wanted to prove something: production-ready AI is possible, but it requires different thinking than most teams apply.

Sales Pilot is my answer—a live AI sales coach for any industry that provides real-time coaching during live customer calls. It analyzes conversations as they happen and provides intelligent suggestions to sales teams.

Key metrics:

  • Currently in development
  • Sub-300ms speech-to-text latency
  • Real-time AI coaching during live calls
  • Built for 99.9% uptime reliability

Sales Pilot is currently in development, built with the same production-ready approach I've used for enterprise systems.

The Technical Challenge

Building Sales Pilot required solving several hard problems simultaneously:

1. Real-Time Voice Processing

Customer calls happen in real-time. You can't batch process them 5 minutes later—the conversation is already over. The AI needs to analyze speech and provide suggestions while the salesperson is still talking to the customer.

Target latency: Less than 1 second from speech to AI suggestion

2. Conversation Context Understanding

AI can't just transcribe words—it needs to understand conversation flow, detect objections, identify buying signals, and provide contextually appropriate coaching.

Challenge: Maintain conversation state across a 30-minute call with multiple topics and objections.

3. Production Reliability

Sales teams depend on this system during live customer calls. Failures aren't just inconvenient—they cost real revenue.

Requirement: 99.9% uptime, graceful degradation, and immediate error recovery.

4. Multi-Tenant Architecture

Different businesses need customized coaching, separate data isolation, and white-label deployments.

Challenge: Build one system that serves multiple businesses with complete data separation.

Technology Stack: What and Why

After evaluating multiple options, here's the stack that made it to production:

Frontend: Next.js + TypeScript

Why Next.js:

  • Server-side rendering for fast initial loads
  • API routes for backend logic
  • Built-in optimizations and routing
  • Excellent developer experience

Why TypeScript:

  • Type safety prevents entire classes of bugs
  • Better IDE support
  • Self-documenting code
  • Critical for production reliability

Voice Processing: Deepgram

Why Deepgram over Google/AWS:

  • 300ms latency (vs 1-2 seconds for competitors)
  • WebSocket streaming support
  • Better accuracy on sales conversations
  • Simpler pricing model

Real numbers:

  • Average latency: 280ms
  • Accuracy: 92%+ on sales vocabulary
  • Cost: ~$0.0043 per minute

Alternatives considered:

  • Google Speech-to-Text: Good accuracy, but 1-2s latency killed real-time use case
  • AWS Transcribe: Similar latency issues
  • OpenAI Whisper: Too slow for real-time (10-30s for processing)

AI Engine: GPT-4

Why GPT-4:

  • Best conversation understanding
  • Reliable structured outputs
  • Handles context windows (8k tokens = ~30 min conversation)
  • Fast response times (500-800ms)

Prompt engineering critical learnings:

  • System prompts matter more than I expected
  • Few-shot examples dramatically improve quality
  • Temperature 0.3 works best for coaching (not too creative, not too rigid)
  • Structured output formats (JSON) prevent parsing errors

Cost management:

  • Average cost per call: $0.08-$0.15
  • Cached system prompts reduce costs 40%
  • Smart context truncation keeps token counts manageable

Real-Time Communication: WebSockets

Why WebSockets over HTTP polling:

  • True real-time (no polling delay)
  • Lower server load
  • Bi-directional communication
  • Native browser support

Architecture:

  • Next.js API route handles WebSocket upgrade
  • Deepgram WebSocket for voice streaming
  • Client WebSocket for real-time updates
  • Automatic reconnection on failures

Deployment: Vercel

Why Vercel:

  • Zero-config Next.js deployment
  • Global CDN automatically
  • Excellent DX (push to deploy)
  • Built-in analytics and monitoring

Production config:

  • Edge functions for low latency
  • Automatic HTTPS
  • Preview deployments for testing
  • Environment variable management

Database: PostgreSQL (Supabase)

Why PostgreSQL:

  • Strong consistency for business data
  • Excellent JSON support for conversation logs
  • Row-level security for multi-tenancy
  • Battle-tested reliability

Schema design:

  • Separate schemas per tenant
  • Conversation logs with full JSON
  • Indexed for fast querying
  • Automatic backups

Architecture Overview

The data flow works as follows:

  1. User speaks into microphone in browser
  2. Audio streams to Next.js via WebSocket
  3. Next.js forwards to Deepgram for transcription
  4. Transcription sent to GPT-4 with conversation context
  5. GPT-4 returns coaching suggestions
  6. Suggestions sent back to client in real-time
  7. Full conversation logged to PostgreSQL

Latency breakdown:

  • Audio capture: ~50ms
  • WebSocket transmission: ~30ms
  • Deepgram processing: ~280ms
  • GPT-4 analysis: ~600ms
  • Total end-to-end: ~960ms (under 1 second target)

Production Challenges and Solutions

Challenge 1: WebSocket Reliability

Problem: WebSocket connections drop frequently on mobile networks and unstable WiFi.

Solution: Automatic reconnection with exponential backoff. If reconnect attempts are under 5, delay is calculated as minimum of (1000 * 2^attempts, 10000) milliseconds before retry.

Result: Connection success rate increased from 85% to 99.2%

Challenge 2: Conversation Context Management

Problem: GPT-4 has token limits. A 30-minute conversation exceeds 8k tokens.

Solution: Smart context windowing:

  • Keep full system prompt
  • Keep last 10 messages for immediate context
  • Summarize older messages
  • Include key objections/buying signals

Result: Maintain conversation quality while staying under token limits

Challenge 3: Cost Management

Problem: Initial costs were ~$0.50 per call (not sustainable)

Solution:

  • Cached GPT-4 prompts (40% cost reduction)
  • Debounce transcription processing (30% reduction)
  • Smart conversation summarization
  • Batch non-critical logs

Result: Reduced to ~$0.10 per call while maintaining quality

Challenge 4: Multi-Tenant Data Isolation

Problem: Multiple businesses using same system—data leaks would be catastrophic.

Solution: PostgreSQL row-level security with tenant_id policy. Database-level enforcement makes it impossible to leak data across tenants.

Result: Database-enforced isolation, impossible to leak data across tenants

Challenge 5: Error Handling in Production

Problem: Any error during a live call loses the customer.

Solution: Graceful degradation:

  • If Deepgram fails → switch to browser native speech recognition
  • If GPT-4 fails → show cached coaching tips
  • If WebSocket fails → fall back to HTTP polling
  • Always log the conversation, even if AI fails

Result: Zero conversation losses even during service outages

Monitoring and Observability

You can't fix what you can't see. Sales Pilot has comprehensive monitoring:

Key Metrics:

  • Latency p50, p95, p99: Track AI response times
  • Error rates: By service (Deepgram, GPT-4, DB)
  • WebSocket connection success rate
  • Cost per conversation
  • Transcription accuracy (spot-checked manually)

Alerts:

  • Latency greater than 2s for 5 minutes
  • Error rate greater than 5% for 3 minutes
  • WebSocket success under 95%
  • Daily cost exceeds budget

Tools:

  • Vercel Analytics for basic metrics
  • Custom CloudWatch dashboards
  • Sentry for error tracking
  • DataDog for deep investigation

Expected Business Outcomes

For sales teams using Sales Pilot:

  • Target 20-30% increase in call conversion rates
  • Expected 40% reduction in onboarding time for new sales staff
  • Real-time objection handling to improve close rates
  • Enable managers to monitor and coach remotely

Technical targets:

  • 99.9% uptime goal
  • Average response time: under 1000ms
  • Zero data loss through robust error handling
  • Target $0.10 average cost per call

Lessons Learned

1. Pick Technologies Designed for Your Use Case

I evaluated 6 different speech-to-text services. Deepgram won because it was specifically designed for real-time streaming. Don't use batch tools for real-time problems.

2. Monitor from Day One

I added monitoring before the first customer. Every outage I've had was caught by alerts before customers noticed. You need monitoring more than you need features.

3. Cost Optimization is Continuous

My first version cost 5x more than current version—same functionality. Cost optimization isn't a one-time task, it's ongoing engineering work.

4. Error Handling Makes or Breaks Production

Half my development time went to error handling and edge cases. This isn't wasted time—it's the difference between a demo and production software.

5. Real-Time is Really Hard

Real-time systems are fundamentally more complex than batch systems. Every architectural decision must account for latency, failures, and network issues.

What's Next

Sales Pilot is currently in development and launching Q1 2026. The architecture is designed to work for any industry that needs real-time AI assistance during live sales conversations.

Future expansion possibilities include:

  • Healthcare (patient intake assistance)
  • Legal (deposition analysis and coaching)
  • Financial services (compliance monitoring)

The core architecture works for any industry that needs real-time AI coaching.

Get Early Access

Sales Pilot is currently in development. Want to be notified when we launch or interested in early access?

Request Demo - I work with select companies that need production-ready AI sales coaching, not prototypes.


Abdelkader Bekhti is building Sales Pilot to prove that production-ready AI sales coaching is possible. He has 10+ years building enterprise-scale platforms and works with companies in Dubai and globally.

More articles

Real-Time Fraud Detection Pipelines

How to build real-time fraud detection pipelines using Kafka streaming, DBT for pattern detection, and Cube.js for metrics. Production architecture achieving 15% fraud reduction.

Read more

Building a Data Mesh: Lessons from Retail

How to implement a decentralized data architecture, scaling to 10 domains in 8 weeks using domain-driven DBT models and Terraform automation. Real-world lessons from retail.

Read more

Ready to build production-ready systems?

Based in Dubai

  • Dubai
    Dubai, UAE
    Currently accepting limited engagements