When building a payment gateway, every millisecond counts. At Lumen Pay, we recently undertook a major refactoring of our core transaction processing engine. The goal was simple: handle 10x the traffic with the same infrastructure while maintaining data integrity.
The Bottleneck
Our initial profiling using Blackfire.io revealed that synchronous database writes were locking up PHP worker processes. For every transaction, we were waiting for:
- The initial insert into the transactions table (I/O bound)
- A fraud check query against a large dataset (CPU bound)
- An update to the user's balance with row-level locking
- A webhook notification dispatch to the merchant's server (Network bound)
This synchronous approach meant that a single slow external API call (like a fraud check or a slow merchant server) could stall the entire checkout process for the user.
The Solution: Asynchronous Queues
We moved the non-critical path items (webhooks, email receipts, and complex fraud analysis) to a Redis-backed queue. This allowed the initial HTTP request to return a "Processing" status almost immediately.
// Old Synchronous Way
$payment->process(); // Took 800ms
$webhook->send(); // Took 400ms
return response(); // Total: 1.2s
// New Asynchronous Way
$payment->process(); // Took 800ms
Queue::push(new SendWebhook($transaction)); // Took 5ms
return response(); // Total: 805msDatabase Optimization
We also optimized our SQL queries. We identified that our reporting dashboard was performing full table scans on the `transactions` table. By adding composite indexes on `transaction_date` and `status`, and partitioning the table by month, we sped up our reporting dashboard queries by over 90%.
We also implemented read-replicas for all `SELECT` queries that didn't require immediate consistency, offloading the primary database for write-heavy operations.
"Premature optimization is the root of all evil, but at scale, lack of optimization is death."
Conclusion
By decoupling our services, implementing aggressive caching strategies with Redis, and optimizing our database layer, we've managed to reduce average response times from 1.2s to under 800ms. This provides a snappier experience for end-users and higher throughput for merchants during peak sales events like Black Friday.