Scaling Coinbase's Payout Infrastructure

By Nishant Gupta

TL;DR: Coinbase processes over a billion payout transactions a year across staking rewards, USDC rewards, and Coinbase One benefits. We’re sharing the evolution of our Payout Framework—from the limitations of synchronous processing to a high-throughput async architecture that handles rewards across dozens of assets with precision.


Coinbase Logo

Why Payouts Are Hard

Payouts are one of the hardest engineering problems at Coinbase.

From the outside, it looks simple. Stake ETH, get rewards. Participate in USDC, earn rewards. Subscribe to Coinbase One, get benefits. But underneath, each of those products has fundamentally different mechanics that all have to be exactly right.

Staking is protocol-driven. We support multiple proof-of-stake networks, each with its own epoch timing, reward distribution rules, and edge cases. Some protocols distribute rewards every few hours, others on multi-day epoch boundaries. We track onchain rewards, figure out each user's proportional share based on their holdings, and distribute the correct amount. Every asset is a little different, and getting the math wrong isn't an option.

USDC rewards are based on qualifying platform participation. We're working with platform participation snapshots and rate schedules rather than onchain events. Millions of dollars flow to retail users every month. Different calculation from staking, same bar for accuracy.

Coinbase One layers subscription logic on top. Boosted staking rates, trading fee rebates — the payout system has to know who's a subscriber, apply the right multipliers, and handle the overlap with base rewards.

And all of this sits inside a regulated financial system with jurisdiction-specific rules and compliance holds. The payout system can't just be fast and accurate, it has to be auditable and compliant too.

Three products, three sets of rules, three schedules. All running at a scale where "approximately correct" isn't good enough.

Async Payout Systems

Early payout systems at Coinbase were largely synchronous. Calculate a user's reward, write it to their account, move on to the next one. This is the intuitive approach, and it works when you have a manageable number of users and a handful of assets.

It stops working when you don't.

Synchronous processing creates a tight coupling between calculation and distribution. If the ledger service is slow, everything backs up. If a single asset's reward calculation hits an edge case, it blocks everything behind it. Error handling is tricky, do you skip the failed user and keep going or halt the whole batch? Either choice has consequences, and in a synchronous model, you have to make that decision in real time with limited information.

The bigger problem is throughput. When you're processing payouts for millions of users across dozens of assets, doing it sequentially means your total processing time scales linearly with the number of users. That's a ceiling you can hit fast.

We moved to a fully asynchronous architecture and it changed everything about how we think about payouts. Decoupling calculation from distribution gave us the ability to validate everything before any money moves. It let us retry failed distributions without re-running calculations. It lets different assets process at their own pace without blocking each other and gives  us natural checkpoints where we can inspect, audit, and catch problems early.

The shift wasn't just a performance optimization. It was a fundamentally different model for how payouts should work at scale.

The Payout Framework

We run all of this through a single system with two async phases: determine what users are owed, then pay them.

scaling payouts image 1

Phase 1: Accrual

Every payout starts with a calculation. For staking, that's determining each user's share of onchain rewards based on their holdings and the protocol's reward mechanics. For USDC, it's a rewards computation based on qualifying platform participation. For CB1, it's applying subscriber-tier logic on top.

The design separates the what from the how. Each product team owns the calculation logic for their domain — they know their protocols and products best. The framework handles everything around it: batching millions of computations, scheduling them on the right cadence, validating outputs, and deduplicating to make sure nothing gets counted twice. Every calculation gets a full audit trail where we can trace any payout back to the exact inputs that produced it.

This separation is what makes the system extensible. When we add support for a new staking asset, we write the reward calculation for that protocol's mechanics. Everything downstream  (validation, accounting, distribution, monitoring) already exists. What used to take months now takes days.

Phase 2: Distribution

Accrual produces a validated set of payouts. Distribution gets them to users.

This is where the async model really pays off. We're distributing payouts to millions of users, and every single one has to be correct. In a synchronous system, a slow downstream dependency stalls the entire pipeline. In our model, distribution runs independently. Payouts flow through in waves, throttled to respect system capacity but tuned to keep the wait short. If downstream systems slow down, we back off and resume. No state is lost, no payouts are dropped.

Every payout goes through double-entry accounting where a credit to the user's account always has a matching debit from the reward pool. We enforce this at the framework level. It's not optional and it's not something individual product teams implement themselves. 

The whole pipeline is idempotent. If distribution fails midway and we have to retry, nobody gets paid twice, nobody gets skipped. That sounds like a solved problem until you're doing it across billions of records and dozens of assets simultaneously. A lot of our design effort went into making retries safe and the async model is what makes safe retries practical. Because calculation and distribution are decoupled, we can retry distribution without re-running accrual. That's a property you don't get for free in a synchronous system.

Concurrency

Because processing is async, payout jobs across different assets and products run concurrently. ADA rewards don't wait for ETH to finish. USDC rewards don't block CB1 benefits. This matters because protocol timings vary, some assets accrue rewards hourly, some daily, some on epoch boundaries that don't align with anything else.

Some assets deliver first rewards within days, others get consistent next-day delivery. Those timelines are driven by protocol mechanics, but the framework's concurrency model is what lets us meet them without assets queuing behind each other. In the old synchronous world, assets processed sequentially which was a slow calculation for one asset and delayed rewards for everything after it.

Scaling Global Payouts

In financial systems, accuracy isn't a feature, it's the baseline. When managing rewards across diverse protocols like Ethereum and Solana, the primary challenge is not just moving capital, but managing the underlying complexity of disparate epoch timings, reward mechanics, and network finality.

As our volume scaled to over 1 billion annual transactions, our legacy approach reached a breaking point. We needed to shift from a collection of bespoke scripts to a unified, hardened payout framework.

By standardizing our distribution primitives, we transformed our delivery velocity and system reliability.

  • New asset onboarding takes days of engineering work — the protocol-specific calculator is typically the only new code; distribution, accounting, monitoring, and retry infrastructure are inherited from the framework

  • 99.99% of payouts land on the expected date, with the remaining edge cases caught and resolved within 24 hours

  • No known material payout accuracy issues since the system reached full production

  • Concurrent processing means asset payouts run independently — a slow epoch finalization on one chain doesn't delay rewards for any other

Design Decisions

A few choices that shaped how the system works:

Async with checkpoints, not streaming. We considered a streaming architecture but chose async batch processing with explicit checkpoints instead. Streams are great for throughput, but for financial accuracy at our scale, we wanted the ability to pause, inspect, and approve before distribution begins.

Framework-level accounting, not product-level. When you have multiple products running payouts, you don't want each team implementing their own accounting logic. Double-entry bookkeeping, audit trails, and reconciliation live in the framework. Products focus on their domain — reward calculation, subscription logic, protocol integration — and the framework guarantees financial integrity.

Idempotency as a design constraint, not a feature. Every stage of the pipeline can be safely retried. This wasn't bolted on at the end — it's a core constraint that influenced how we structured state, how we generate payout identifiers, and how we handle partial failures. The async model makes this tractable: because phases are decoupled, we know exactly what needs to be retried and what's already complete.

Product teams own their calculation, framework owns everything else. This boundary is strict. A product team adding a new staking asset writes the reward math and nothing else. They don't touch distribution, accounting, monitoring, or retry logic. This is how we onboard new assets so quickly — the surface area of new code is small, and the hard infrastructure problems are already solved.

What We Learned

Async isn't just faster — it's safer. During one asset's payout cycle, a protocol rate calculation returned anomalous reward values for a subset of users. Because accrual completes and validates before distribution begins, we caught the outlier values at the validation gate, corrected the calculation, and re-ran accrual — without any incorrect payouts reaching users. In a synchronous system, some users would have already received wrong amounts before we detected the issue.

Speed is trust. When users stake crypto and see their first reward arrive quickly, that builds confidence in the platform. Payout latency isn't just a technical metric, it shapes how the product feels.

Low marginal cost changes what you can build. When adding a new asset takes days instead of being a major project, you start saying "yes" to things you may have said "no" to. The economics of your roadmap change.

Accuracy is a systems problem, not a people problem. You don't maintain payout accuracy at the billion-transaction scale by asking teams to be careful. You get it by making the framework enforce correctness (double-entry accounting, validation gates, idempotent retries) so that individual product teams can't accidentally get it wrong.

As we expand to more assets and more reward types, the system's extensibility gets tested in new ways. That's the part we're most excited about. The system was designed to scale, and we're still finding out how far it goes.


Recent stories

Disclaimers: Derivatives trading through the Coinbase Advanced platform is offered to eligible EEA customers by Coinbase Financial Services Europe Ltd. (CySEC License 374/19). In order to access derivatives, customers will need to pass through our standard assessment checks to determine their eligibility and suitability for this product.