Degraded performance on our dashboard and API
Resolved
Dec 03 at 03:59pm GMT
Root Cause Analysis – 2 December 2025
1. Summary
On 2 December 2025 (16:23–20:10 CET), our dashboard and public API experienced severe degradation affecting all customers. API endpoints and agent loading were intermittently unavailable, and call logs were inaccessible throughout the incident.
2. Root Cause
A PostgreSQL migration was unintentionally executed through the Cloud SQL managed connection pool due to a misconfigured secret. This created long-running, lock-heavy transactions. At the same time, the “make call” API held PostgreSQL connections open while waiting on slow MongoDB operations caused by a failed index rebuild. These factors exhausted the database connection pool and blocked new requests.
3. Impact & Resolution
Impact:
- API endpoints unavailable
- Agent loading unavailable
- Call logs inaccessible, including historical data
Resolution:
We terminated stuck DB sessions, repaired the corrupted MongoDB index, created a clean calls collection, redeployed the BFF with corrected configuration, fixed readiness checks, restored API stability, and later migrated historical call data.
4. Preventive Actions
- Enforce direct DB connections for all migrations
- Add transaction/connection timeouts and pool saturation alerts
- Refactor internal endpoints to use short-lived transactions
- Move MongoDB index management out of app startup
- Standardize readiness probes to ensure only fully healthy pods receive traffic
Affected services
Updated
Dec 02 at 07:07pm GMT
All issues are now resolved. We apologize for the inconvenience
Affected services
Updated
Dec 02 at 04:32pm GMT
Our platform and API are now working correctly.
Call history aren’t showing right now. We’re on it everything is stored safely and there is no loss, the call history visibility will return shortly.
Affected services
Created
Dec 02 at 03:23pm GMT
Our dashboard is currently experiencing degraded performance, similarly with our API.
We are investigating the issue and will have a fix shortly.
Affected services