Systems

Backend engineering deep dives focused on design quality and correctness under failure.

Back home

Cricket Net Booking System

Purpose

  • >A production-focused booking platform for correct reservations under concurrent traffic.
  • >Built to prevent double bookings, duplicate payments, and incorrect states.

Architecture

  • >Spring Boot service with PostgreSQL as source of truth and Redis for performance acceleration.
  • >Layered backend: HTTP controllers -> security/rate-limit layer -> domain services -> persistence.
  • >External boundaries include payment gateway callbacks, SMTP, and SMS notification provider integrations.
  • >Hosted in Docker containers behind Nginx reverse proxy.

Data Safety

  • >Transaction boundaries protect booking create/update operations from partial writes.
  • >Idempotency keys ensure repeated create/payment callbacks return the same result.
  • >Distributed locking and deterministic DB lock ordering protect concurrent slot writes.
  • >PostgreSQL overlap exclusion constraints act as final conflict safety net under race pressure.

Security and Abuse Controls

  • >Public payment callbacks are verified through HMAC signature checks, CIDR allowlist validation, and timestamp freshness checks.
  • >Replay defense uses Redis dedupe keys to safely ignore repeated callback payloads.
  • >Rate limiting is layered: global sharded limiter, DoS threshold limiter, and endpoint-level policies with Retry-After signaling.
  • >Refresh-session design uses rotation and replay-aware revocation to reduce token hijack risk.

Failure Behavior

  • >If Redis fails, the system enters degraded mode where performance drops but correctness remains DB-protected.
  • >Fail-open cache handling preserves availability on cache errors while critical keyspaces remain fail-closed.
  • >If payment confirmation is retried, system returns existing booking instead of duplicating.
  • >If any validation step fails, transaction rollback keeps state consistent.

Caching and Consistency

  • >Read path is cache-first with Redis primary and fallback behavior during backend instability.
  • >Write path performs after-commit invalidation so cache is never updated before durable state change.
  • >Scoped version-token invalidation (broad/net/date/net+date) avoids stale cross-node availability views.

Tradeoffs

  • >Correctness first: additional locking and validation add write-path latency under contention.
  • >Layered defenses increase operational complexity but provide deterministic behavior under retries and concurrency.
  • >Current hardening priorities include stricter production migration policy and callback configuration validation.

Advanced File Filter Bot

Purpose

  • >Store, search, and retrieve Telegram files efficiently under constant usage.

Architecture

  • >Async Python runtime with PyroFork/Pyrogram and layered handler -> service -> repository flow.
  • >MongoDB serves indexed metadata queries while Telegram remains the underlying media storage.
  • >Redis is used for sessions, cache acceleration, and rate-limiting primitives.
  • >Hosted in Docker behind Nginx with production runtime monitoring.

Data Safety

  • >MongoDB indexing strategy provides fast and deterministic retrieval for large channel datasets.
  • >User-wise rate limiting to isolate abusive or heavy request patterns.
  • >Atomic quota reservation with compensating release prevents oversubscription during bulk send operations.
  • >Global merge/sort before pagination preserves correctness in multi-database search responses.
  • >Supports multiple MongoDB databases using separate DB URIs for client-specific routing/failover.

Failure Behavior

  • >Per-database circuit breaker states (CLOSED/OPEN/HALF_OPEN) isolate failing pools without collapsing all writes.
  • >FloodWait and transient Telegram RPC failures are handled with adaptive retry scheduling and bounded concurrency.
  • >Bounded queue + overflow queue + dynamic batch sizing protect indexing pipeline during traffic spikes.
  • >Broadcast state and maintenance counters are persisted so restarts recover safely.

Operations Surface

  • >Operational endpoints: /health, /metrics, /performance.
  • >Admin runtime controls include cache/database/performance visibility and broadcast lifecycle actions.
  • >Maintenance jobs and structured task cleanup reduce long-running runtime drift and orphan background work.

Tradeoffs

  • >System chooses metadata indexing over external file storage, reducing storage/legal overhead but requiring robust reference reconstruction.
  • >Cross-database correctness and failover resilience add implementation complexity compared to single-DB bots.
  • >Built as a client project with configurable deployment behavior and operational controls.