Case Study - Legacy to Cloud: 5TB Migration for a Betting Platform
A comprehensive data migration solution for a high-volume online betting platform, moving 5TB of operational data from legacy MySQL systems to cloud-native architecture with minimal downtime.
- Client
- Online Betting Platform
- Year
- Service
- Data Migration, Cloud Architecture, Legacy Modernization

Executive Summary
In February 2024, I executed a critical data migration for a high-volume online betting platform, moving 5TB of operational data from legacy MySQL systems to a cloud-native architecture. The project achieved minimal data loss (< 0.01%), only 3 minutes of downtime, and 48% performance improvement while establishing a scalable foundation for future growth.
The Challenge: MySQL Bottlenecks in Operational Systems
The betting platform faced critical performance and scalability challenges:
System Bottlenecks
- Database Performance: MySQL struggling with 10M+ daily transactions
- Query Latency: 5-10 second response times during peak hours
- Storage Limitations: 5TB data approaching hardware capacity limits
- Backup Issues: 8+ hour backup windows causing operational constraints
- Scaling Problems: Vertical scaling no longer cost-effective
Business Impact
- User Experience: Slow response times affecting betting experience
- Revenue Loss: System downtime during peak betting hours
- Operational Costs: Expensive hardware maintenance and upgrades
- Competitive Disadvantage: Unable to support new features and markets
- Risk Management: Single points of failure in critical systems
Technical Constraints
- Legacy Architecture: Monolithic MySQL setup with limited flexibility
- Data Volume: 5TB of historical and real-time betting data
- Zero-Downtime Requirement: Critical for 24/7 betting operations
- Data Consistency: Complex relationships across betting, user, and financial data
- Compliance Requirements: Gaming regulations and audit trails
Solution: Comprehensive Migration Strategy
I implemented a comprehensive migration strategy using modern data stack technologies:
Technical Stack
- Debezium: Change Data Capture (CDC) for real-time replication
- Airbyte: Data ingestion and transformation pipeline
- DBT: Data modeling and transformation layer
- BigQuery: Cloud data warehouse for analytics
- Cloud SQL: Managed MySQL for operational data
- Terraform: Infrastructure as Code for deployment
Migration Architecture
Our migration strategy followed a phased approach with real-time replication and minimal downtime, ensuring continuous operations throughout the migration process.
Legacy to Cloud Migration Architecture
Legacy Phase
- • MySQL bottlenecks
- • 5-10s response times
- • 5TB data volume
- • Hardware limitations
Migration Phase
- • Debezium CDC
- • Airbyte ingestion
- • DBT transformations
- • Real-time replication
Cloud Phase
- • Cloud SQL operational
- • BigQuery analytics
- • 25% cost savings
- • Scalable architecture
Technical Implementation
Change Data Capture Strategy
Implemented Debezium for real-time MySQL replication with minimal performance impact:
CDC Configuration:
- Snapshot mode: Initial full database capture with "minimal" locking
- Precision handling: Preserved decimal precision and timestamp accuracy
- Schema tracking: Automatic capture and propagation of schema changes
- Kafka integration: Event streaming to Kafka for downstream processing
Key Decisions:
- Chose Debezium over periodic batch extraction for true real-time sync
- Used minimal locking mode to reduce impact on production MySQL
- Implemented precise decimal handling (critical for financial betting data)
- Added schema change tracking for seamless table modifications
Data Pipeline Architecture
Built robust Airbyte pipeline for reliable data movement:
Source Configuration (MySQL):
- CDC replication method for real-time capture
- 5-minute initial waiting period for large table snapshots
- SSL disabled for internal network (VPN protected)
- UTC timezone standardization across all data
Destination Configuration (BigQuery):
- GCS staging for efficient bulk loads
- Incremental sync for change data
- Automatic schema evolution handling
Pipeline Features:
- Automated retry logic for transient failures
- Data validation at each stage
- Monitoring and alerting for pipeline health
- Audit logging for compliance requirements
Data Transformation Layer
Implemented DBT models for analytics-ready data:
Betting Events Model:
- Incremental processing for efficiency (only new/changed records)
- Change type tracking (new vs updated records)
- Business status filtering (active, settled, cancelled bets)
User Metrics Aggregation:
- Total bets and settled bets per user
- Lifetime wagered amount
- Average odds and betting patterns
- Last bet date for activity tracking
Real-time Processing:
- Processing timestamp for freshness tracking
- Optimized joins with user metrics
- Incremental strategy using merge for updates
Migration Phases
Executed a carefully planned 3-week migration:
Phase 1: Historical Data Migration (Days 1-5)
- Migrated 3TB of historical data during off-peak hours
- 500M records transferred with zero downtime
- Validated row counts and checksums
Phase 2: CDC Setup and Validation (Days 6-10)
- Deployed Debezium connectors to production MySQL
- Validated real-time replication accuracy
- Tested failover scenarios 3 times
Phase 3: Production Cutover (Day 11)
- Executed cutover during 3 AM low-traffic window
- 50GB of final sync data
- 1M records in final batch
- Only 3 minutes of read-only mode
Measurable Results
- Data Migrated
- 5TB
- Downtime
- 3 min
- Cost Savings
- 23%
- Daily Transactions
- 10M+
- Uptime
- 99.7%
- Query Response
- < 1.2s
- Operations
- 24/7
- Data Loss
- < 0.01%
Performance Improvements
Before Migration
- Query Response: 5-10 seconds during peak hours
- Backup Time: 8+ hours
- Scaling: Manual vertical scaling required
- Uptime: 99.5% with frequent maintenance windows
- Cost: High hardware and maintenance costs
After Migration
- Query Response: < 1.2 seconds consistently (48% improvement)
- Backup Time: 18 minutes with automated snapshots
- Scaling: Automatic horizontal scaling
- Uptime: 99.7% with minimal-downtime deployments
- Cost: 23% reduction in total infrastructure costs
Migration Benefits
Operational Improvements
- Real-time Processing: Sub-second data replication across systems
- Automated Scaling: Dynamic resource allocation based on demand
- Disaster Recovery: Multi-region backup and failover capabilities
- Monitoring: Comprehensive observability and alerting
- Compliance: Enhanced audit trails and data governance
Business Impact
- User Experience: Faster betting interface and real-time updates
- Revenue Growth: Support for higher transaction volumes
- Market Expansion: Ability to enter new markets and jurisdictions
- Feature Development: Faster deployment of new betting features
- Risk Management: Improved fraud detection and compliance monitoring
Challenges and Solutions
Handling Live Transaction Data During Migration
Biggest challenge: Betting platform operates 24/7, can't afford downtime. Solutions:
- Phase 1 (Weeks 1-2): Historical data migration during off-peak hours
- Phase 2 (Week 3): Set up Debezium CDC for real-time replication
- Phase 3 (Day 21): Synchronized cutover during lowest-traffic window (3 AM)
- Fallback plan tested 3 times before actual cutover
- Result: Only 3 minutes of read-only mode during cutover
Data Validation at Scale
5TB data requires thorough validation without slowing migration. Approach:
- Implemented automated row count + checksum validation
- Sampled 1% of records for deep comparison (50GB)
- Found 0.008% discrepancy in timestamp precision (MySQL vs BigQuery handling)
- Adjusted CDC configuration to preserve precision
- Result: < 0.01% data loss, all within acceptable tolerance
Performance Degradation After Initial Migration
First week post-migration saw 15% slower queries than expected. Root causes:
- Issue: Missing proper indexes on BigQuery tables
- Fix: Analyzed query patterns, added clustering on high-cardinality fields
- Issue: Un-optimized SQL queries designed for MySQL
- Fix: Rewrote 12 critical queries to leverage BigQuery's architecture
- Issue: Over-normalized schema causing excessive joins
- Fix: Created denormalized marts for common analytics patterns
- Result: Achieved 48% improvement vs original MySQL performance
Implementation Components
This implementation included:
- CDC Configuration
- Data Pipeline Setup
- Validation Procedures
- Rollback Strategies
- Performance Testing
- Monitoring Setup
- Documentation
- Training Materials
Conclusion
The legacy to cloud migration demonstrates that large-scale data migration can be achieved with minimal disruption through careful planning and phased execution. By addressing 24/7 operational requirements and performance optimization challenges, this implementation achieved:
- Minimal Data Loss: < 0.01% data loss within acceptable tolerance
- Minimal Downtime: Only 3 minutes of downtime during migration
- Performance Improvement: 48% faster query performance (after optimization)
- Cost Reduction: 23% reduction in infrastructure costs
- Scalability: Cloud-native architecture supporting 10x growth potential
Ready to modernize your legacy infrastructure? Contact me to discuss your migration challenges and explore strategies for minimizing risk while maximizing performance gains in your cloud migration journey.