Disaster Recovery Policy - Catalyst Superdash

Spanrr Technologies Private Limited (“Superdash”) Effective date: February 1, 2024 · Version: 1.0 · Owner: Abhinav Pujahari, CTO · Classification: Internal

1. Purpose

This Disaster Recovery (DR) Policy defines how Superdash prepares for, responds to, and recovers from events that disrupt the availability or integrity of the systems that power our AI voice agent platform. Because our customers rely on us to answer and place calls on their behalf, prolonged downtime or data loss directly affects their operations and their customers. This policy exists to make sure we can restore critical services quickly, in a defined order, with minimal data loss. It complements — and should be read alongside — our information security, incident response, and business continuity practices.

2. Scope

This policy applies to all production systems, infrastructure, data, and third-party services that are required to deliver the Superdash voice agent platform, and to all employees, contractors, and vendors involved in operating or recovering those systems. In scope:

Cloud infrastructure hosting our application, APIs, and orchestration layer.
Telephony and connectivity (SIP trunks, carriers, and voice-call routing).
The AI processing pipeline (speech-to-text, language model, and text-to-speech components and the providers behind them).
Databases and data stores, including call logs, transcripts, recordings, configuration, and customer account data.
Internal tooling required to operate or restore the above (monitoring, deployment, secrets management).

Out of scope: individual employee workstations (covered by IT policy) and customer-side systems and configurations.

3. Definitions

Disaster — any unplanned event that causes, or threatens to cause, an extended interruption to critical services or significant data loss.
RTO (Recovery Time Objective) — the maximum acceptable time to restore a system after a disruption.
RPO (Recovery Point Objective) — the maximum acceptable amount of data loss, measured as the time between the last usable backup and the disruption.
Failover — switching operations to a redundant or standby system, region, or vendor.
Failback — returning operations to the primary environment once it is restored.
DR Coordinator — the person responsible for activating and managing the recovery effort.

4. Objectives and principles

Our recovery efforts are guided by the following priorities, in order: protect people and data, restore critical customer-facing services, preserve data integrity, and communicate honestly throughout.

Prioritize by impact. Systems are recovered in the order set out in Section 7, not first-come-first-served.
Prefer automated, tested recovery over manual, ad-hoc fixes.
Assume failure is normal. We design for redundancy so that the loss of any single component, region, or vendor does not take down the whole platform.
No recovery action compromises security or data-protection obligations (see Section 13).

5. Roles and responsibilities

The following roles are activated during a declared disaster. One person may hold more than one role.

DR Coordinator — Vivek: declares a disaster, activates this plan, coordinates the response, and authorizes failover and failback.
Incident Commander — Suraj: runs the technical response and directs recovery work on the ground.
Infrastructure / Platform lead — Abhinav Pujahari: executes infrastructure failover, restores services, and validates system health.
Data / Database lead — Abhinav Pujahari: restores data from backups and verifies integrity and RPO compliance.
Communications lead — Akhil Thomas: manages internal, customer, vendor, and (where required) regulatory communications.
Executive sponsor — Akhil Thomas: approves major decisions, external commitments, and resource allocation.

Primary and after-hours escalation contact: Akhil Thomas — thomas@trysuperdash.com, +91 93805 17603.

6. Risk scenarios

This policy is designed to address, at minimum, the following categories of disruption:

Loss of a cloud region or availability zone, or a major cloud provider outage.
Failure or outage of a critical third-party dependency (telephony carrier, speech-to-text, text-to-speech, or language-model provider).
Data corruption, accidental deletion, or failed migration.
Cybersecurity incidents, including ransomware, intrusion, or denial-of-service attacks.
Loss of access to systems due to compromised or lost credentials.
Physical events affecting an office or key facility, or loss of key personnel.

A detailed risk assessment (likelihood and impact per scenario) is maintained separately and reviewed at least annually.

7. System criticality and recovery targets

Systems are grouped into tiers. The following are the committed recovery targets for each tier.

Tier	Examples	Target RTO	Target RPO
Tier 1 — Critical (real-time)	Live call handling, telephony routing, the AI processing pipeline	1 hour	Near-zero
Tier 2 — Essential	Customer dashboard, configuration APIs, authentication	4 hours	15 minutes
Tier 3 — Important	Call recordings, transcripts, analytics, reporting	24 hours	1 hour
Tier 4 — Deferrable	Internal tooling, non-customer-facing batch jobs	72 hours	24 hours

Live, in-progress calls cannot generally be recovered once dropped; the Tier 1 objective is to restore the ability to handle new calls as fast as possible and to fail over to redundant capacity before customer impact becomes severe.

8. Backup strategy

Datastore and schedule: the primary datastore (MongoDB) is backed up on weekly and monthly cycles.
RPO alignment: the weekly and monthly backups satisfy the Tier 3 and Tier 4 objectives. Meeting the near-zero and 15-minute RPO targets for Tier 1 and Tier 2 (Section 7) additionally requires continuous replication or point-in-time recovery for those datasets, beyond the scheduled snapshots.
Coverage: databases, configuration, call recordings and transcripts, and any state required to rebuild the environment.
Redundancy: backups are stored in a separate location or region from primary data, so a single regional failure cannot destroy both.
Encryption: backups are encrypted at rest and in transit.
Retention: backups are retained for 3 months, consistent with the data retention approach in our AI Transparency & Disclosure Policy and applicable law.
Restore testing: backups are periodically restored to a test environment to confirm they are usable — an untested backup is not considered a valid backup (see Section 14).

9. Infrastructure resilience and redundancy

To reduce the likelihood and impact of disasters, Superdash maintains, where feasible:

Deployment across multiple availability zones, and the ability to fail over to a secondary region.
Redundant telephony connectivity, with the ability to reroute calls if a primary carrier or SIP trunk fails.
Fallback paths or alternate providers for critical AI pipeline components where practical, so that the failure of one speech or language-model provider does not halt all call handling.
Infrastructure defined as code, so environments can be rebuilt reproducibly from version control.
Monitoring and alerting on the health of all Tier 1 and Tier 2 systems.

Where full redundancy is not yet in place for a dependency, that gap is recorded as a known risk with a remediation plan.

10. Disaster recovery process

1. Detection. Monitoring, alerts, or staff reports surface a potential disruption. Any team member can raise a suspected disaster. 2. Assessment. The on-call engineer and Incident Commander assess scope, affected tiers, and likely cause, and estimate impact against the RTO/RPO targets. 3. Declaration. If the disruption exceeds normal incident thresholds, the DR Coordinator formally declares a disaster and activates this plan. Declaration criteria: a Tier 1 service is unavailable for more than 30 minutes, OR data loss is confirmed, OR a security breach is active. Any one of these triggers declaration. 4. Notification. The Communications lead notifies internal responders, affected customers, and relevant vendors, following Section 11. 5. Recovery and failover. Responders execute the recovery runbooks: fail over to redundant infrastructure or regions, reroute telephony, and restore data from backups in tier order. 6. Validation. Before declaring recovery, responders verify that restored systems are functioning correctly, data integrity is intact, and RPO targets were met. 7. Failback. Once the primary environment is confirmed stable, operations are returned to it in a controlled manner. 8. Post-incident review. Within 5 business days of resolution, the team conducts a blameless review documenting timeline, root cause, what worked, and corrective actions, and updates this plan and the runbooks accordingly. Detailed step-by-step recovery runbooks for each Tier 1 and Tier 2 system are maintained separately and referenced by this policy.

11. Communication plan

Internal: responders coordinate through a dedicated Google Chat space; the DR Coordinator owns the single source of truth on status.
Customers: affected customers are notified promptly with what is known, expected impact, and updates at a stated cadence, via our status page and support@trysuperdash.com, which is monitored at all times. We commit to honesty over reassurance.
Vendors: affected third-party providers are engaged for support and status.
Regulators / authorities: where an incident involves a personal-data breach or other reportable event, notifications are made within the timeframes required by applicable law (including, where relevant, India’s Digital Personal Data Protection Act, 2023), coordinated with the data protection contact.
Records: all communications and decisions are logged for the post-incident review.

12. Third-party dependencies

Because our platform depends on external providers (cloud, telephony carriers, and AI pipeline services), we:

maintain an up-to-date inventory of critical vendors, their role, and their criticality tier;
review the availability commitments and DR posture of critical vendors;
identify and, where feasible, pre-arrange fallback providers or routes for the most critical dependencies; and
include recovery dependencies on these vendors in our testing.

13. Data integrity and security during recovery

Recovery must never become a route to a second incident. During any DR event:

Access to backups and recovery tooling is limited to authorized responders.
Restored systems are verified for integrity before being returned to production.
Encryption and access controls remain in force throughout; emergency access is logged and reviewed afterward.
If the disaster is, or may be, a security breach, recovery is coordinated with incident response so that compromised components are not simply restored intact.

14. Testing and maintenance

Tabletop exercises: conducted at least annually to walk responders through scenarios.
Technical drills: failover and backup-restore tests conducted at least semi-annually in a controlled environment.
Results: every test produces findings and corrective actions, tracked to completion.
Maintenance: runbooks, contact lists, vendor inventory, and recovery targets are reviewed after every drill, after every declared disaster, and on any significant architecture change.

An untested recovery procedure is treated as a known risk, not a control.

15. Plan availability

This policy and its supporting runbooks and contact lists are stored so that they remain accessible during a disaster — including in a location that does not depend on the systems they are meant to recover. Current custodian: Akhil Thomas — +91 93805 17603.

16. Training and awareness

All engineering and operations staff are familiarized with this policy and their role in it during onboarding and at each review. Named DR responders receive role-specific preparation and participate in drills.

17. Review and version control

This policy is reviewed at least annually and after any declared disaster or material change to the platform. The owner named above is responsible for maintaining it. Changes are tracked with version, date, and author. \

18. Approval

Role	Name
Policy owner	Abhinav Pujahari
Executive sponsor	Akhil Thomas

​1. Purpose

​2. Scope

​3. Definitions

​4. Objectives and principles

​5. Roles and responsibilities

​6. Risk scenarios

​7. System criticality and recovery targets

​8. Backup strategy

​9. Infrastructure resilience and redundancy

​10. Disaster recovery process

​11. Communication plan

​12. Third-party dependencies

​13. Data integrity and security during recovery

​14. Testing and maintenance

​15. Plan availability

​16. Training and awareness

​17. Review and version control

​18. Approval