Sales Pilot: Building Real-Time Voice AI for Sales Coaching
by Abdelkader Bekhti, Production AI & Data Architect
Why I Built Sales Pilot
After years of advising companies on AI systems, I wanted to prove something: production-ready AI is possible, but it requires different thinking than most teams apply.
Sales Pilot is my answer—a live AI sales coach for any industry that provides real-time coaching during live customer calls. It analyzes conversations as they happen and provides intelligent suggestions to sales teams.
Key metrics:
- Currently in development
- Sub-300ms speech-to-text latency
- Real-time AI coaching during live calls
- Built for 99.9% uptime reliability
Sales Pilot is currently in development, built with the same production-ready approach I've used for enterprise systems.
The Technical Challenge
Building Sales Pilot required solving several hard problems simultaneously:
1. Real-Time Voice Processing
Customer calls happen in real-time. You can't batch process them 5 minutes later—the conversation is already over. The AI needs to analyze speech and provide suggestions while the salesperson is still talking to the customer.
Target latency: Less than 1 second from speech to AI suggestion
2. Conversation Context Understanding
AI can't just transcribe words—it needs to understand conversation flow, detect objections, identify buying signals, and provide contextually appropriate coaching.
Challenge: Maintain conversation state across a 30-minute call with multiple topics and objections.
3. Production Reliability
Sales teams depend on this system during live customer calls. Failures aren't just inconvenient—they cost real revenue.
Requirement: 99.9% uptime, graceful degradation, and immediate error recovery.
4. Multi-Tenant Architecture
Different businesses need customized coaching, separate data isolation, and white-label deployments.
Challenge: Build one system that serves multiple businesses with complete data separation.
Technology Stack: What and Why
After evaluating multiple options, here's the stack that made it to production:
Frontend: Next.js + TypeScript
Why Next.js:
- Server-side rendering for fast initial loads
- API routes for backend logic
- Built-in optimizations and routing
- Excellent developer experience
Why TypeScript:
- Type safety prevents entire classes of bugs
- Better IDE support
- Self-documenting code
- Critical for production reliability
Voice Processing: Deepgram
Why Deepgram over Google/AWS:
- 300ms latency (vs 1-2 seconds for competitors)
- WebSocket streaming support
- Better accuracy on sales conversations
- Simpler pricing model
Real numbers:
- Average latency: 280ms
- Accuracy: 92%+ on sales vocabulary
- Cost: ~$0.0043 per minute
Alternatives considered:
- Google Speech-to-Text: Good accuracy, but 1-2s latency killed real-time use case
- AWS Transcribe: Similar latency issues
- OpenAI Whisper: Too slow for real-time (10-30s for processing)
AI Engine: GPT-4
Why GPT-4:
- Best conversation understanding
- Reliable structured outputs
- Handles context windows (8k tokens = ~30 min conversation)
- Fast response times (500-800ms)
Prompt engineering critical learnings:
- System prompts matter more than I expected
- Few-shot examples dramatically improve quality
- Temperature 0.3 works best for coaching (not too creative, not too rigid)
- Structured output formats (JSON) prevent parsing errors
Cost management:
- Average cost per call: $0.08-$0.15
- Cached system prompts reduce costs 40%
- Smart context truncation keeps token counts manageable
Real-Time Communication: WebSockets
Why WebSockets over HTTP polling:
- True real-time (no polling delay)
- Lower server load
- Bi-directional communication
- Native browser support
Architecture:
- Next.js API route handles WebSocket upgrade
- Deepgram WebSocket for voice streaming
- Client WebSocket for real-time updates
- Automatic reconnection on failures
Deployment: Vercel
Why Vercel:
- Zero-config Next.js deployment
- Global CDN automatically
- Excellent DX (push to deploy)
- Built-in analytics and monitoring
Production config:
- Edge functions for low latency
- Automatic HTTPS
- Preview deployments for testing
- Environment variable management
Database: PostgreSQL (Supabase)
Why PostgreSQL:
- Strong consistency for business data
- Excellent JSON support for conversation logs
- Row-level security for multi-tenancy
- Battle-tested reliability
Schema design:
- Separate schemas per tenant
- Conversation logs with full JSON
- Indexed for fast querying
- Automatic backups
Architecture Overview
The data flow works as follows:
- User speaks into microphone in browser
- Audio streams to Next.js via WebSocket
- Next.js forwards to Deepgram for transcription
- Transcription sent to GPT-4 with conversation context
- GPT-4 returns coaching suggestions
- Suggestions sent back to client in real-time
- Full conversation logged to PostgreSQL
Latency breakdown:
- Audio capture: ~50ms
- WebSocket transmission: ~30ms
- Deepgram processing: ~280ms
- GPT-4 analysis: ~600ms
- Total end-to-end: ~960ms (under 1 second target)
Production Challenges and Solutions
Challenge 1: WebSocket Reliability
Problem: WebSocket connections drop frequently on mobile networks and unstable WiFi.
Solution: Automatic reconnection with exponential backoff. If reconnect attempts are under 5, delay is calculated as minimum of (1000 * 2^attempts, 10000) milliseconds before retry.
Result: Connection success rate increased from 85% to 99.2%
Challenge 2: Conversation Context Management
Problem: GPT-4 has token limits. A 30-minute conversation exceeds 8k tokens.
Solution: Smart context windowing:
- Keep full system prompt
- Keep last 10 messages for immediate context
- Summarize older messages
- Include key objections/buying signals
Result: Maintain conversation quality while staying under token limits
Challenge 3: Cost Management
Problem: Initial costs were ~$0.50 per call (not sustainable)
Solution:
- Cached GPT-4 prompts (40% cost reduction)
- Debounce transcription processing (30% reduction)
- Smart conversation summarization
- Batch non-critical logs
Result: Reduced to ~$0.10 per call while maintaining quality
Challenge 4: Multi-Tenant Data Isolation
Problem: Multiple businesses using same system—data leaks would be catastrophic.
Solution: PostgreSQL row-level security with tenant_id policy. Database-level enforcement makes it impossible to leak data across tenants.
Result: Database-enforced isolation, impossible to leak data across tenants
Challenge 5: Error Handling in Production
Problem: Any error during a live call loses the customer.
Solution: Graceful degradation:
- If Deepgram fails → switch to browser native speech recognition
- If GPT-4 fails → show cached coaching tips
- If WebSocket fails → fall back to HTTP polling
- Always log the conversation, even if AI fails
Result: Zero conversation losses even during service outages
Monitoring and Observability
You can't fix what you can't see. Sales Pilot has comprehensive monitoring:
Key Metrics:
- Latency p50, p95, p99: Track AI response times
- Error rates: By service (Deepgram, GPT-4, DB)
- WebSocket connection success rate
- Cost per conversation
- Transcription accuracy (spot-checked manually)
Alerts:
- Latency greater than 2s for 5 minutes
- Error rate greater than 5% for 3 minutes
- WebSocket success under 95%
- Daily cost exceeds budget
Tools:
- Vercel Analytics for basic metrics
- Custom CloudWatch dashboards
- Sentry for error tracking
- DataDog for deep investigation
Expected Business Outcomes
For sales teams using Sales Pilot:
- Target 20-30% increase in call conversion rates
- Expected 40% reduction in onboarding time for new sales staff
- Real-time objection handling to improve close rates
- Enable managers to monitor and coach remotely
Technical targets:
- 99.9% uptime goal
- Average response time: under 1000ms
- Zero data loss through robust error handling
- Target $0.10 average cost per call
Lessons Learned
1. Pick Technologies Designed for Your Use Case
I evaluated 6 different speech-to-text services. Deepgram won because it was specifically designed for real-time streaming. Don't use batch tools for real-time problems.
2. Monitor from Day One
I added monitoring before the first customer. Every outage I've had was caught by alerts before customers noticed. You need monitoring more than you need features.
3. Cost Optimization is Continuous
My first version cost 5x more than current version—same functionality. Cost optimization isn't a one-time task, it's ongoing engineering work.
4. Error Handling Makes or Breaks Production
Half my development time went to error handling and edge cases. This isn't wasted time—it's the difference between a demo and production software.
5. Real-Time is Really Hard
Real-time systems are fundamentally more complex than batch systems. Every architectural decision must account for latency, failures, and network issues.
What's Next
Sales Pilot is currently in development and launching Q1 2026. The architecture is designed to work for any industry that needs real-time AI assistance during live sales conversations.
Future expansion possibilities include:
- Healthcare (patient intake assistance)
- Legal (deposition analysis and coaching)
- Financial services (compliance monitoring)
The core architecture works for any industry that needs real-time AI coaching.
Get Early Access
Sales Pilot is currently in development. Want to be notified when we launch or interested in early access?
Request Demo - I work with select companies that need production-ready AI sales coaching, not prototypes.
Abdelkader Bekhti is building Sales Pilot to prove that production-ready AI sales coaching is possible. He has 10+ years building enterprise-scale platforms and works with companies in Dubai and globally.