1. Purpose
This Disaster Recovery (DR) Policy defines how Superdash prepares for, responds to, and recovers from events that disrupt the availability or integrity of the systems that power our AI voice agent platform. Because our customers rely on us to answer and place calls on their behalf, prolonged downtime or data loss directly affects their operations and their customers. This policy exists to make sure we can restore critical services quickly, in a defined order, with minimal data loss. It complements — and should be read alongside — our information security, incident response, and business continuity practices.2. Scope
This policy applies to all production systems, infrastructure, data, and third-party services that are required to deliver the Superdash voice agent platform, and to all employees, contractors, and vendors involved in operating or recovering those systems. In scope:- Cloud infrastructure hosting our application, APIs, and orchestration layer.
- Telephony and connectivity (SIP trunks, carriers, and voice-call routing).
- The AI processing pipeline (speech-to-text, language model, and text-to-speech components and the providers behind them).
- Databases and data stores, including call logs, transcripts, recordings, configuration, and customer account data.
- Internal tooling required to operate or restore the above (monitoring, deployment, secrets management).
3. Definitions
- Disaster — any unplanned event that causes, or threatens to cause, an extended interruption to critical services or significant data loss.
- RTO (Recovery Time Objective) — the maximum acceptable time to restore a system after a disruption.
- RPO (Recovery Point Objective) — the maximum acceptable amount of data loss, measured as the time between the last usable backup and the disruption.
- Failover — switching operations to a redundant or standby system, region, or vendor.
- Failback — returning operations to the primary environment once it is restored.
- DR Coordinator — the person responsible for activating and managing the recovery effort.
4. Objectives and principles
Our recovery efforts are guided by the following priorities, in order: protect people and data, restore critical customer-facing services, preserve data integrity, and communicate honestly throughout.- Prioritize by impact. Systems are recovered in the order set out in Section 7, not first-come-first-served.
- Prefer automated, tested recovery over manual, ad-hoc fixes.
- Assume failure is normal. We design for redundancy so that the loss of any single component, region, or vendor does not take down the whole platform.
- No recovery action compromises security or data-protection obligations (see Section 13).
5. Roles and responsibilities
The following roles are activated during a declared disaster. One person may hold more than one role.- DR Coordinator — Vivek: declares a disaster, activates this plan, coordinates the response, and authorizes failover and failback.
- Incident Commander — Suraj: runs the technical response and directs recovery work on the ground.
- Infrastructure / Platform lead — Abhinav Pujahari: executes infrastructure failover, restores services, and validates system health.
- Data / Database lead — Abhinav Pujahari: restores data from backups and verifies integrity and RPO compliance.
- Communications lead — Akhil Thomas: manages internal, customer, vendor, and (where required) regulatory communications.
- Executive sponsor — Akhil Thomas: approves major decisions, external commitments, and resource allocation.
6. Risk scenarios
This policy is designed to address, at minimum, the following categories of disruption:- Loss of a cloud region or availability zone, or a major cloud provider outage.
- Failure or outage of a critical third-party dependency (telephony carrier, speech-to-text, text-to-speech, or language-model provider).
- Data corruption, accidental deletion, or failed migration.
- Cybersecurity incidents, including ransomware, intrusion, or denial-of-service attacks.
- Loss of access to systems due to compromised or lost credentials.
- Physical events affecting an office or key facility, or loss of key personnel.
7. System criticality and recovery targets
Systems are grouped into tiers. The following are the committed recovery targets for each tier.| Tier | Examples | Target RTO | Target RPO |
|---|---|---|---|
| Tier 1 — Critical (real-time) | Live call handling, telephony routing, the AI processing pipeline | 1 hour | Near-zero |
| Tier 2 — Essential | Customer dashboard, configuration APIs, authentication | 4 hours | 15 minutes |
| Tier 3 — Important | Call recordings, transcripts, analytics, reporting | 24 hours | 1 hour |
| Tier 4 — Deferrable | Internal tooling, non-customer-facing batch jobs | 72 hours | 24 hours |
8. Backup strategy
- Datastore and schedule: the primary datastore (MongoDB) is backed up on weekly and monthly cycles.
- RPO alignment: the weekly and monthly backups satisfy the Tier 3 and Tier 4 objectives. Meeting the near-zero and 15-minute RPO targets for Tier 1 and Tier 2 (Section 7) additionally requires continuous replication or point-in-time recovery for those datasets, beyond the scheduled snapshots.
- Coverage: databases, configuration, call recordings and transcripts, and any state required to rebuild the environment.
- Redundancy: backups are stored in a separate location or region from primary data, so a single regional failure cannot destroy both.
- Encryption: backups are encrypted at rest and in transit.
- Retention: backups are retained for 3 months, consistent with the data retention approach in our AI Transparency & Disclosure Policy and applicable law.
- Restore testing: backups are periodically restored to a test environment to confirm they are usable — an untested backup is not considered a valid backup (see Section 14).
9. Infrastructure resilience and redundancy
To reduce the likelihood and impact of disasters, Superdash maintains, where feasible:- Deployment across multiple availability zones, and the ability to fail over to a secondary region.
- Redundant telephony connectivity, with the ability to reroute calls if a primary carrier or SIP trunk fails.
- Fallback paths or alternate providers for critical AI pipeline components where practical, so that the failure of one speech or language-model provider does not halt all call handling.
- Infrastructure defined as code, so environments can be rebuilt reproducibly from version control.
- Monitoring and alerting on the health of all Tier 1 and Tier 2 systems.
10. Disaster recovery process
1. Detection. Monitoring, alerts, or staff reports surface a potential disruption. Any team member can raise a suspected disaster. 2. Assessment. The on-call engineer and Incident Commander assess scope, affected tiers, and likely cause, and estimate impact against the RTO/RPO targets. 3. Declaration. If the disruption exceeds normal incident thresholds, the DR Coordinator formally declares a disaster and activates this plan. Declaration criteria: a Tier 1 service is unavailable for more than 30 minutes, OR data loss is confirmed, OR a security breach is active. Any one of these triggers declaration. 4. Notification. The Communications lead notifies internal responders, affected customers, and relevant vendors, following Section 11. 5. Recovery and failover. Responders execute the recovery runbooks: fail over to redundant infrastructure or regions, reroute telephony, and restore data from backups in tier order. 6. Validation. Before declaring recovery, responders verify that restored systems are functioning correctly, data integrity is intact, and RPO targets were met. 7. Failback. Once the primary environment is confirmed stable, operations are returned to it in a controlled manner. 8. Post-incident review. Within 5 business days of resolution, the team conducts a blameless review documenting timeline, root cause, what worked, and corrective actions, and updates this plan and the runbooks accordingly. Detailed step-by-step recovery runbooks for each Tier 1 and Tier 2 system are maintained separately and referenced by this policy.11. Communication plan
- Internal: responders coordinate through a dedicated Google Chat space; the DR Coordinator owns the single source of truth on status.
- Customers: affected customers are notified promptly with what is known, expected impact, and updates at a stated cadence, via our status page and support@trysuperdash.com, which is monitored at all times. We commit to honesty over reassurance.
- Vendors: affected third-party providers are engaged for support and status.
- Regulators / authorities: where an incident involves a personal-data breach or other reportable event, notifications are made within the timeframes required by applicable law (including, where relevant, India’s Digital Personal Data Protection Act, 2023), coordinated with the data protection contact.
- Records: all communications and decisions are logged for the post-incident review.
12. Third-party dependencies
Because our platform depends on external providers (cloud, telephony carriers, and AI pipeline services), we:- maintain an up-to-date inventory of critical vendors, their role, and their criticality tier;
- review the availability commitments and DR posture of critical vendors;
- identify and, where feasible, pre-arrange fallback providers or routes for the most critical dependencies; and
- include recovery dependencies on these vendors in our testing.
13. Data integrity and security during recovery
Recovery must never become a route to a second incident. During any DR event:- Access to backups and recovery tooling is limited to authorized responders.
- Restored systems are verified for integrity before being returned to production.
- Encryption and access controls remain in force throughout; emergency access is logged and reviewed afterward.
- If the disaster is, or may be, a security breach, recovery is coordinated with incident response so that compromised components are not simply restored intact.
14. Testing and maintenance
- Tabletop exercises: conducted at least annually to walk responders through scenarios.
- Technical drills: failover and backup-restore tests conducted at least semi-annually in a controlled environment.
- Results: every test produces findings and corrective actions, tracked to completion.
- Maintenance: runbooks, contact lists, vendor inventory, and recovery targets are reviewed after every drill, after every declared disaster, and on any significant architecture change.
15. Plan availability
This policy and its supporting runbooks and contact lists are stored so that they remain accessible during a disaster — including in a location that does not depend on the systems they are meant to recover. Current custodian: Akhil Thomas — +91 93805 17603.16. Training and awareness
All engineering and operations staff are familiarized with this policy and their role in it during onboarding and at each review. Named DR responders receive role-specific preparation and participate in drills.17. Review and version control
This policy is reviewed at least annually and after any declared disaster or material change to the platform. The owner named above is responsible for maintaining it. Changes are tracked with version, date, and author. \18. Approval
| Role | Name | ||
|---|---|---|---|
| Policy owner | Abhinav Pujahari | ||
| Executive sponsor | Akhil Thomas |