System Scope>Telegram-based indexing and retrieval system that makes large channel media searchable for end users.>Acts as middleware on top of Telegram where external search APIs are not available.>Designed as an operational backend system, not only a command bot implementation.
Platform Constraints>Telegram channels can contain hundreds of thousands of posts with inconsistent native media search.>Rate limits and FloodWait behavior make naive crawling and send patterns unstable.>Messages may be deleted and file references can expire, so the index must self-heal.>Bot restarts are expected; progress and state must be restart-safe.
Core Architecture>Incremental indexing tracks last processed message ID, shifting ingestion from O(n) rescans to O(delta) updates.>MongoDB stores indexed metadata and Telegram remains the source file storage system.>Repository layer uses indexed fields and global merge/sort pagination for correctness across multiple databases.>Redis handles sessions, caching, and rate limiting with versioned cache invalidation for low-cost global resets.
Correctness and Concurrency Decisions>Send-all uses atomic quota reservation plus compensating release to prevent concurrent oversubscription.>Duplicate indexing and duplicate user-creation paths are handled idempotently to avoid restart-time conflicts.>Daily usage counter reset state is persisted to survive restarts and avoid incorrect counter drift.>Poster-to-file mapping uses message ordering to reconstruct logical media groupings not exposed directly by Telegram.
Scalability and Throughput>Batch duplicate checks and bulk-save indexing paths remove N+1 overhead in channel ingestion.>Bounded queue + overflow queue + dynamic batch sizing absorb spikes without collapsing ingestion workers.>Per-domain semaphores isolate Telegram API and database workloads to reduce contention.>Adaptive retry scheduling for FloodWait and transient RPC errors protects long-run throughput.
Fault Tolerance and Recovery>Multi-database manager uses per-database circuit breaker states (CLOSED, OPEN, HALF_OPEN) with recovery probing.>Smart write selection can route across multiple MongoDB URIs to isolate partial outages.>Broadcast runtime state is persisted and recovered on startup to avoid orphan active sessions.>Structured handler/task cleanup prevents background task leaks during shutdown and restart cycles.>Secure updater workflow includes backup, validation checks, and rollback support for safer runtime upgrades.
Technology and Operational Controls>Core runtime: Python async bot architecture with PyroFork/Pyrogram integration and aiohttp operational endpoints.>Data layer: MongoDB with index-driven search and optional multi-database routing/failover.>Coordination layer: Redis-backed rate limiting, session state, and versioned cache invalidation.>Delivery safety: Telegram API wrapper with FloodWait-aware retries and semaphore-based concurrency control.>Operations surface: admin commands for cache, database stats, performance checks, and broadcast lifecycle control.>Tooling quality gate: Ruff, mypy, pytest-oriented project setup with Docker Compose deployment workflow.
Production Readiness Signals>Incremental indexing supports large-channel operation without full-history rescans.>Queue backpressure controls and dynamic batch sizing keep ingestion stable during traffic spikes.>Atomic quota reservation preserves correctness for bulk-send flows under concurrency.>Restart-safe state recovery ensures broadcasts, counters, and indexing can resume predictably.>Designed and operated as a long-running production-style client system, not a demo-only bot.
Outcome>Transforms Telegram channels into searchable archives without externally copying media files.>Maintains predictable behavior across retries, restarts, FloodWait, and partial database failures.>Delivers fast user search and retrieval while preserving platform-safe operational behavior.