Notes

Short technical thoughts on reliability engineering and backend design decisions.

Back home

Why idempotency matters

Retries are normal in real networks. If a create endpoint is not idempotent, users pay with duplicates.

DB constraints are still required

Application locks reduce contention, but database constraints are still the final safety net for correctness.

Fail-open vs fail-closed is a design choice

For cache reads, fail-open can preserve availability. For critical keyspaces, fail-closed protects integrity.

Webhook endpoints are trust boundaries

Public callbacks must verify signature, source, freshness, and replay keys before any state transition.

Redis failure strategy

Cache outage should reduce performance, not correctness. Always preserve a reliable source-of-truth path.

Versioned cache invalidation scales better

Version bump invalidation avoids expensive wildcard deletes and reduces cache-stampede pressure.

Rate limiting mistakes

Rate limiting that is too coarse blocks healthy traffic; too weak allows request storms to bypass safeguards.

@Transactional pitfalls

Incorrect transaction boundaries can commit partial states. Rollback conditions must match real failure cases.

Queue backpressure is part of reliability

Bounded queues, overflow buffers, and adaptive batch sizing keep ingestion stable during traffic spikes.

Atomic quota reservation prevents race bugs

Reserve quota before bulk sends and release unused amounts later to avoid oversubscription under concurrency.

Restart-safe state is non-negotiable

Long-running services should persist critical runtime state so recovery after crash is predictable.

Gateway trust boundaries must be explicit

If internal identity headers can be set by clients, microservice auth breaks. Strip and rewrite trusted headers only at the gateway.

Route-aware rate limiting is safer than one global bucket

Registration and user-order APIs have different risk/traffic profiles; separate policies prevent either abuse gaps or accidental throttling.

Circuit breakers need clear fallback semantics

Retries and circuit breakers should fail with deterministic degraded responses so upstream clients get predictable behavior.