Caching Authorization Decisions at the API Gateway

Your gateway calls the policy decision point on every request, and at peak the PDP round trip — not the database — is your p99. This page is part of the Policy Enforcement Points in Microservices guide, and it covers how to cache permit/deny verdicts at the enforcement point without letting a revoked permission keep working: what is actually cacheable, how to build a safe cache key, how short the TTL must be, when to invalidate, and where OPA’s partial evaluation beats per-request caching entirely.

Why the latency exists, and why caching is risky

A policy enforcement point (PEP) is on the hot path of every request. When it asks a remote PDP — an OPA sidecar, a Cedar service — for a decision, it pays a network round trip plus evaluation time, typically 1–10ms but spiky under load and during bundle reloads. Multiply that by every north–south request and every east–west hop and the PDP becomes a latency tax on the entire mesh.

Caching the verdict removes that tax: identical decisions repeat constantly (the same subject reading the same resource), so a local map or Redis lookup replaces the round trip. The danger is staleness. An authorization decision is a point-in-time statement: “given these roles and this policy right now, permit.” The moment an admin revokes a role, deletes a share, or ships a new policy bundle, every cached permit for the affected subject becomes a security hole that lasts exactly as long as your TTL. Caching authz is therefore a deliberate trade of freshness for latency — and the entire job is bounding that trade.

flowchart LR
    REQ["Request at PEP"]:::client --> LK{"Cache hit\nand fresh?"}:::idp
    LK -->|hit| RET["Return cached\npermit / deny"]:::rs
    LK -->|miss / expired| PDP["PDP\nOPA / Cedar"]:::idp
    PDP --> STO["Store verdict\nkey + short TTL"]:::store
    STO --> RET
    EVT["Permission change\nrole / share / policy"]:::client -.->|publish| INV["Invalidate\nkeys or bump epoch"]:::store
    INV -.-> STO
    classDef client fill:#fff0ee,stroke:#c0392b,stroke-width:2px,color:#1a1614
    classDef idp    fill:#eef0ff,stroke:#2c3e8c,stroke-width:2px,color:#1a1614
    classDef store  fill:#fffbec,stroke:#d4840a,stroke-width:2px,color:#1a1614
    classDef rs     fill:#ebf5fb,stroke:#2980b9,stroke-width:2px,color:#1a1614

What is cacheable

Cache a decision only when the verdict is a pure function of inputs the cache key fully captures. In practice:

  • Cacheable: decisions that depend on the subject’s identity and roles, the resource type/id and its owner, the action, and stable context (tenant). These are deterministic for the TTL window.
  • Not cacheable: decisions whose result depends on volatile context not in the key — time-of-day windows, current risk score, rate-limit state, “break-glass” flags, or anything read live from a PIP per request. If you cache these, you cache the wrong answer. Either include the volatile factor in the key (if it is low-cardinality) or skip the cache for that route.

A useful discipline: the PDP can return cache metadata alongside the verdict. If a decision consulted volatile data, have the policy emit cacheable: false and have the PEP honor it.

Building a safe cache key

The key must contain every input that can change the verdict. The canonical tuple is subject + resource + action + context:

// cache-key.ts
import { createHash } from "node:crypto";

export interface AuthzInput {
  subject: { id: string; roles: string[]; tenant: string };
  resource: { type: string; id?: string };
  action: string;
  context: { region?: string };  // include ONLY decision-relevant, stable context
}

// A monotonic epoch lets us invalidate everything at once (see below).
export function decisionKey(i: AuthzInput, epoch: number): string {
  const canonical = JSON.stringify({
    s: i.subject.id,
    t: i.subject.tenant,
    r: [...i.subject.roles].sort(),     // order-independent
    res: `${i.resource.type}:${i.resource.id ?? "*"}`,
    a: i.action,
    c: i.context.region ?? "",
    e: epoch,
  });
  return createHash("sha256").update(canonical).digest("hex");
}

Two non-obvious rules. Sort roles (and any array) so ["a","b"] and ["b","a"] hash identically. Never put a raw client header in the key — the subject must come from the verified token, or an attacker varies a header to mint a permissive cache entry. The epoch field is a global invalidation lever explained below.

Short TTLs: the latency-vs-staleness dial

The TTL is the maximum time a revoked permission keeps working through the cache. Pick it by asking “how long can a stale permit survive?” — for most systems that is single-digit seconds, not minutes.

TTL Staleness window When appropriate
0 (no cache) None Break-glass, financial transfers, anything irreversible
1–5s Seconds Default for most authz at a busy gateway
30–60s Up to a minute High read volume, low-sensitivity resources, tolerant of slow revocation
> 5 min Minutes Almost never for authz; only with active invalidation

Short TTLs alone cap exposure without any invalidation wiring — even if you never publish a revocation event, a 5s TTL means a removed role stops working within 5 seconds. That makes TTL your safety floor and invalidation your fast path.

Invalidation on permission change

TTL bounds the worst case; invalidation makes revocation feel instant. Two patterns, often combined:

Targeted invalidation — when a specific grant changes, delete the affected keys. This needs a key index (e.g. a Redis set per subject) because you cannot enumerate hashed keys:

// invalidate.ts  (Redis-backed cache)
import { redis } from "./redis";

// On write, also index the key under its subject.
export async function cachePut(subjectId: string, key: string, value: "1" | "0", ttlS: number) {
  await redis.set(`authz:dec:${key}`, value, "EX", ttlS);
  await redis.sadd(`authz:idx:${subjectId}`, key);
  await redis.expire(`authz:idx:${subjectId}`, ttlS);
}

// When a user's roles/shares change, drop every cached decision for them.
export async function invalidateSubject(subjectId: string) {
  const keys = await redis.smembers(`authz:idx:${subjectId}`);
  if (keys.length) await redis.del(...keys.map((k) => `authz:dec:${k}`));
  await redis.del(`authz:idx:${subjectId}`);
}

Epoch bump (global invalidation) — for a new policy bundle, you cannot enumerate which decisions changed, so increment a global epoch counter. Because epoch is in every key, every existing entry instantly misses and the cache repopulates from the new policy. This is the e: field in decisionKey; read the current epoch from Redis (or push it on bundle activation) and pass it into the key builder.

Wire targeted invalidation to your permission-mutation events (role assigned/removed, share created/revoked) and the epoch bump to PDP bundle activation. Subjects whose grants change get sub-second freshness; everyone else falls back to the TTL floor.

Negative caching

Cache deny as well as permit — but treat them asymmetrically. Negative caching protects the PDP from being hammered by a client retrying a forbidden action (or an attacker probing), turning a denial storm into cheap map hits. Keep the negative TTL short and equal-or-shorter than the positive TTL: a freshly granted permission should not be blocked for long by a stale deny. When a grant is added, the same invalidateSubject call clears stale denials too, so a new share takes effect immediately rather than waiting out the negative TTL.

Per-request caching vs OPA partial evaluation

Per-request caching memoizes answers. OPA’s partial evaluation instead specializes the policy: you ask OPA to compile the policy against the parts of input that are known and constant (the policy bundle, role-to-action maps, tenant rules), and it returns a residual program — often a simple set of conditions — that the gateway can evaluate locally with zero PDP calls.

// partial-eval.ts  — compile once, evaluate many locally
const res = await fetch("http://127.0.0.1:8181/v1/compile", {
  method: "POST",
  headers: { "content-type": "application/json" },
  body: JSON.stringify({
    query: "data.authz.allow == true",
    input: { action: "read", resource: { type: "doc" } }, // known now
    unknowns: ["input.subject", "input.resource.id"],       // resolved per request
  }),
});
// res -> residual conditions (e.g. "subject.tenant == resource.tenant")
// cache the COMPILED residual, re-run it per request against live subject/resource

Partial evaluation shines when the policy is stable but the data varies per request: you cache the compiled residual (invalidated on bundle epoch bump) and never pay a per-decision round trip, while still evaluating against fresh subject and resource values. It avoids the staleness problem of answer-caching for the data dimension, because only the policy is frozen, not the verdict. Many production OPA deployments combine both: partial evaluation to push policy to the edge, plus a short-TTL answer cache for the truly hot exact-match decisions.

Prevention & monitoring hooks

  • Emit cache metrics: hit ratio, negative-hit ratio, and decision-cache size. A hit ratio near 100% with a long TTL is a staleness risk, not a win.
  • Log every invalidation (subject targeted, or epoch bumped to N) so you can prove how fast a revocation propagated during an incident review.
  • Alert on bundle activation without an epoch bump — that means new policy is live but stale permits are still served.
  • Track time-to-revoke: from “role removed” event to “first cache miss for that subject.” It should be sub-second with targeted invalidation, and never exceed the TTL.

Frequently Asked Questions

Should I cache decisions in the gateway process or in Redis?

Both, layered. An in-process LRU gives nanosecond hits but is per-instance, so each gateway replica caches independently and an invalidation must fan out to all of them. A shared Redis cache invalidates once for the whole fleet but adds a network hop. A common setup is a very short in-process tier (1–2s) in front of a Redis tier with targeted invalidation, accepting that the in-process layer can be up to its tiny TTL stale even after a Redis invalidation.

How do I cache when the decision depends on time-of-day or risk score?

Don’t answer-cache it. Volatile context that isn’t in the key produces wrong verdicts. Either include the factor in the key only if it’s low-cardinality and quantized (e.g. a businessHours: true/false flag rather than a raw timestamp), or mark the route uncacheable and use partial evaluation so the residual program re-reads the volatile value per request while still skipping the policy round trip.

Does the cache key need the full token or just the subject id?

Just the decision-relevant claims: subject id, roles, and tenant. Never key on the raw JWT string — two valid tokens for the same user (after a refresh) would miss the cache needlessly, and including a client-controlled value risks cache poisoning. Always derive these fields from the verified token (validated with an explicit algorithm allowlist such as ["RS256"]), never from a request header or body.

What TTL is safe for a permission revocation requirement?

Set the TTL to your maximum tolerable revocation delay and add targeted invalidation for the fast path. If policy says a removed user must lose access “immediately,” a 0–2s TTL plus event-driven invalidation gets you sub-second in practice with a hard 2s ceiling if an invalidation event is ever missed. For irreversible actions (payments, deletions), bypass the cache entirely and call the PDP fresh.

Is negative caching safe, or will it lock out newly-granted users?

It’s safe if the negative TTL is short and your grant events trigger invalidation. Cache deny for a second or two to absorb retry/probe storms, and call invalidateSubject whenever you add a role or share so the stale deny is cleared the instant access is granted. Never give a deny a longer TTL than a permit.