Reducing Fraud Loss With an Automated Dynamic Policy

By Jared Coleman , Bhaskar Krishnamachari , Li LiuMar 11, 2026

Tl;dr: Coinbase has developed a novel, dynamic control policy that replaces static rules to automatically manage risk, resulting in superior financial loss mitigation and more efficient utilization of constrained resources.

Risk management and fraud detection systems often rely on manually-tuned rules and static thresholds to determine the appropriate action for a given transaction or event. While effective, these systems are rigid. At scale, they require constant human intervention to keep pace with shifting market conditions and evolving adversarial patterns.

At Coinbase, we’ve moved beyond static thresholds by developing an automated policy framework based on Reinforcement Learning (RL). This system minimizes expected loss by dynamically learning and applying rules, allowing us to balance risk mitigation with user friction and operational constraints in real-time.

The Problem with Static Thresholds

In many risk systems, decisions are made by comparing transaction attributes (like value or a risk score) against pre-set, static thresholds. For example, a transaction over a certain dollar amount might be flagged for additional review, or a high risk-score might lead to an automatic rejection.

The primary issue is that these thresholds are often set arbitrarily and fail to account for two critical factors:

Dynamic Conditions: An "optimal" threshold is a snapshot of a specific threat landscape. As attackers rotate their tactics or market volatility shifts user behavior, these thresholds quickly become suboptimal.
Resource Constraints: The best action for minimizing risk often involves a limited resource (such as human review capacity, specialized system checks, or a complex external verification service) that operates on a strict budget (e.g., a maximum number of transactions per hour).

To solve this, we reframed the decision process as a dynamic optimization problem under constraints.

The Framework: Constrained Policy Optimization

Our approach formalizes the decision-making process into a policy that maps a multi-dimensional state vector to an optimal action while adhering to a strict resource budget.

State and Actions

For each transaction or event, the state is a multi-dimensional value that includes characteristics like the transaction value, a risk score derived from a detection model, and a measure of the current system load (e.g., the rate of transactions currently utilizing the constrained resource).

There are several possible actions the policy can choose from, ranging from allowing a transaction to complete normally to blocking the transaction. Some actions may also be resource-constrained (e.g., a detailed account review that cannot be invoked too often due to budget or capacity limitations).

The Loss Function

For each possible action, we model the expected loss. This loss accounts not only for potential fraud and financial risk but also for costs such as:

Direct Loss: Predicted monetary fraud impact.
Indirect Loss: Quantified negative user experience.
Opportunity Cost: Revenue lost from false-positive rejections.
Operational Cost: The literal cost of the intervention.

The goal is to develop a policy such that the actions taken by the policy minimize this total expected loss.

Proportional-Integral-Derivative (PID) Control for Budget Management

We consider a scenario where the actions include a constrained option (e.g., detailed account review) that is often the most effective at minimizing loss, but it is limited by a budget. Taking this action on every transaction is simply not feasible.

To manage this, we introduce a dynamic budget threshold (h) into the policy calculation. The policy only chooses the Constrained Action if its loss is h better than the next-best action's loss.

Crucially, this budget threshold h is not a static constant. Instead, it is a controlled function of the system state. Specifically, the current utilization of the resource (e.g., the queue size of a detailed account review system).

We use a PID-inspired controller to automatically adjust the budget threshold in real-time. This mechanism ensures that the system dynamically adapts to fluctuating load:

This dynamic control loop allows the policy to be highly selective when the resource is constrained (high system load) and less selective when the resource is abundant (low system load). This prevents resource queues from becoming unmanageable while ensuring that only the highest-risk, most-critical transactions are sent for the Constrained Action.

Figure 1: Diagram of the system, where the controller determines a threshold for the constrained resources based on the current state, which the loss model takes into consideration in deciding which action to take for a particular transaction.

High-Level Results and Future Impact

When simulated over historical data, this automated policy has demonstrated significant improvement over the static-threshold baseline, resulting in a considerable reduction in total expected loss.

The key benefit is that the system operates at a near-optimal point by minimizing financial risk while simultaneously maintaining a stable and feasible utilization of constrained resources. This approach is broadly applicable to any system (not just risk and fraud detection) where a high-value action is constrained by a fixed budget or capacity.

Figure 2: Constrained resource usage over time of the proposed strategy versus a baseline strategy which uses a human-tuned thresholding approach for determining which action to take. The baseline policy violates resource constraint (Action A should be taken at most 20 times per hour) while our policy does not. The data presented here comes from a simulation of an actual use-case at Coinbase.

By automating and dynamically optimizing the decision-making process, we move away from human-tuned, static rules toward a sophisticated, adaptive system capable of maximizing loss mitigation under real-world operational constraints.