C&A Cricket Net Reservation API

A reliability-focused booking backend designed to remain consistent under concurrent requests, retries, and partial failures using idempotency keys and state-based reservations instead of traditional CRUD transactions.

Java 21Spring Boot 3.4PostgreSQL 17Redis + RedissonSpring Security (JWT/OAuth2)Resilience4jMicrometer + ActuatorPrometheusDocker ComposeNginxSwagger/OpenAPI2025

Architecture and Engineering Diagrams

What I did

  • >Production-grade reservation backend for physical cricket net slot booking with payment workflows.
  • >Built with Java 21 + Spring Boot, PostgreSQL 17, Redis/Redisson, and self-hosted Docker Compose behind Nginx.
  • >Designed for retry-heavy mobile networks and asynchronous payment callbacks, with consistency prioritized over throughput.
  • >Security layer includes JWT auth, RBAC, refresh-session rotation, and CSRF/CORS hardening.
  • >External integrations include payment gateway callbacks, SMTP email, and Notify.lk SMS notifications.
  • >Observability pipeline includes Spring Actuator and Prometheus-ready metrics.
  • >Live API docs: https://api.rumalg.me/swagger-ui/index.html and runtime health: https://api.rumalg.me/actuator/health.
  • >Single-engineer system with modular ownership across booking, payment, notifications, security, and admin integrity tooling.

Case Study

System Scope

  • >Backend service for booking physical cricket practice nets with payment, authentication, and booking lifecycle management.
  • >Primary objective: guarantee data correctness under concurrency, retries, and external callback uncertainty.
  • >Designed and deployed as a continuously running service, not a demo-only CRUD application.

Failure Modes Addressed

  • >Duplicate booking attempts from double taps, retries, refreshes, and callback replays.
  • >Slot contention where two users attempt the same net and timeslot simultaneously.
  • >Partial-failure paths such as payment success with API timeout and server restarts during active requests.
  • >Dependency outage scenarios where Redis is unavailable but correctness must remain intact.

Correctness Model

  • >Mandatory idempotency keys convert request retries into intent replay instead of duplicate writes.
  • >Layered write protection combines distributed lock scopes, pessimistic DB reads, and PostgreSQL overlap exclusion constraints.
  • >Transactions are intentionally narrow and only wrap state mutation boundaries to avoid long-held locks.
  • >Booking lifecycle is modeled as state transitions (PENDING, CONFIRMED, EXPIRED) to survive asynchronous flows.

Payment and Security Boundary

  • >Public payment callbacks are verified using HMAC signature checks, CIDR allowlist validation, timestamp freshness, and replay-dedupe keys.
  • >Payment creation and confirmation use layered idempotency with unique-key collision recovery to return existing state safely.
  • >Refresh session replay risk is reduced through hashed token storage, rotation on use, and session family revocation behavior.
  • >Role-aware rate limiting applies IP-based keys for public routes and user-based keys for authenticated APIs.

Resilience and Recovery

  • >Redis is treated as a performance accelerator, with degraded-mode fallbacks and fail-closed handling for critical keyspaces.
  • >Scoped cache version invalidation and after-commit invalidation hooks reduce stale availability windows.
  • >Notification delivery is hardened with retry queue processing, dead-letter handling, and scheduled recovery jobs.
  • >Audit and integrity services provide before/after traceability plus restore and validation workflows for destructive admin operations.
  • >Recurring booking generation runs under lock-protected scheduler paths with duplicate/conflict checks.

Platform and Runtime Engineering

  • >Spring Security pipeline includes JWT filter, RBAC boundaries, refresh-session replay defense, and secure cookie/session controls.
  • >Anti-abuse controls combine global sharded limiting, DoS threshold limiting, endpoint policies, and Retry-After style throttling semantics.
  • >Resilience stack combines Redisson-backed coordination, fail-open cache handling where safe, and fail-closed behavior for critical keyspaces.
  • >Operational observability includes Actuator health, Prometheus metrics export, and scheduler-driven maintenance/recovery workflows.
  • >Backend remains modular with clear domain boundaries (booking, payment, notifications, user, net, timeslot, security, integrity).

Deployment and Runtime

  • >Self-hosted deployment uses Docker Compose with Nginx reverse proxy and environment-specific runtime profiles.
  • >CI/CD flow runs through self-hosted GitHub runner automation (ARM64) and scripted rollout steps.
  • >Production endpoints include Swagger UI and actuator health checks to support operational visibility.

Hardening Roadmap

  • >Replace destructive production schema behavior (`ddl-auto=create-drop`) with strict migration-only policy.
  • >Move payment callback URL wiring fully to environment-managed secure configuration with startup validation.
  • >Expand end-to-end tests for callback security edge cases and Redis outage behavior on fail-closed paths.
  • >Formalize secret rotation lifecycle and SLO-driven alert thresholds for p95 latency and queue drain times.

Result

  • >Duplicate requests return the same booking result.
  • >Concurrent slot races cannot create overlapping confirmed bookings.
  • >Payment callbacks remain replay-safe and verification-gated.
  • >Redis outages reduce performance but not booking correctness.
  • >The backend remains predictable under retry-heavy mobile traffic and partial dependency failures.