CAM System Disaster Recovery Policy and Procedures

CAM System Disaster Recovery Policy and Procedures

Revised 2023

What is a disaster?

When defining the steps to be taken, we must first define the specific scenario that we mean to address. Drastic actions would not want to be taken for temporary situations. As such we will define a few scenarios and the steps that would be taken. Those scenarios would be a temporary service disruption for one more specific CAM services, corruption or substantial loss of data, and finally a long-term loss of regional services.

Scenario 1 – Temporary Service Disruption

Description:

This scenario would be defined as one or more CAM services being unavailable, due to either a disruption to specific AWS services in the region or an issue with the CAM software itself. The expectation in this scenario is that services are unavailable for either a known amount of time or an unknown amount of time but are expected to be restored within 8 hours. The expected time frame would be based upon communication with AWS or development as applicable.

Steps Taken:

  1. Upon determination there is a service disruption, the Litera Support team will be immediately notified. The uptime.prosperoware.com site will be checked to confirm the reportage of a service disruption.

  2. Determine the full extent of the issue, the DevOps team will work to determine the full impact of the outage, specifically what services or features of CAM are impacted, and the expectation for restoration of services.

  3. Within 30 minutes of an ongoing outage, a notice will be posted on the support.litera.com site Announcements providing the known details at the time.

  4. The situation will be continuously monitored until resolved. This would include communication with AWS and developers as needed as well as monitoring statuses and logs of the services themselves.

  5. Once the situation is resolved an update will be posted on support.litera.com indicating as much. It will either include, or be followed up with a subsequent posting, the identified cause, resolution, and future prevention details as they become available.

Scenario 2Corruption or Substantial Loss of Data

Description:

In this scenario data for one or more of our tenants has either been lost or corrupted. CAM Services themselves remain unaffected but the impacted tenants may be without necessary data or function as a result of the data loss.

Steps Taken:

  1. Upon identification of the issue, the Litera Support team will be immediately notified as would any team members directly involved with the tenants in question (e.g. Services, Customer Success.)

  2. Direct Communication would be drafted and sent to the registered customer contacts of the tenants involved. If the issue was systemic, and extended to an entire region or multiple regions, a support notice would also be posted on support.litera.com. Customers would be advised to delay further usage of the product until a full assessment can be completed in case full restoration is required.

  3. DevOps would work towards the full assessment of the issue, identifying the source and total impact.

  4. Upon completion of the assessment, the restoration plan would be finalized and again communicated with the impacted customers. In the event of a full restore being required, the customer would be advised to discontinue the use of the product until the restore is completed. If the issue can be resolved in a more targeted fashion and the issue is not impacting the use of the system, customers would be advised they may continue normal operations while recovery proceeds.

  5. Data recovery would begin. A full restoration would overwrite all existing data with a recent backup copy. Targeted restoration could take a few forms but generally would involve restoring a secondary copy temporarily and merging data sets.

  6. Upon completion and verification of the restoration process, an update will be provided to the customer and relevant Litera parties. In the event of a systemic issue, an update would also be posted to the support.litera.com site. It will either include, or be followed up with a subsequent posting, the identified cause, resolution, and future prevention details as they become available.

Scenario 3Long-Term Loss of Regional Services

Description:

In this scenario, CAM services are unavailable in at least one region. Communication from AWS indicates that the issue is expected to extend for a significant period (over 8 hours), or no communication is provided from AWS regarding the outage for the same period. In this worst-case scenario, the steps taken here would initially match the first scenario until it has been determined to be a long-term loss situation. Once confirmed, we would resort to transferring all data and assets of the impacted regions to alternate locations temporarily.

Steps Taken:

  1. Upon determination there is a service disruption, the Litera Support team will be immediately notified. The uptime.prosperoware.com site will be checked to confirm the reportage of a service disruption.

  2. Determine the full extent of the issue, the DevOps team will work to determine the full impact of the outage, specifically what services or features of CAM are impacted, and the expectation for restoration of services.

  3. Within 30 minutes of an ongoing outage, a notice will be posted on the support.litera.com site Announcements providing the known details at the time.

  4. The situation will be continuously monitored. It is during this stage that we would likely identify this as a long-term loss scenario either through direct communication from AWS or a lack of communication from AWS. The delay in making this determination is directly related to the amount of time it would take to restore services in a different region and the cost effort involved in ultimately restoring normal functionality.

  5. Once identified as a long-term outage scenario additional updates will be posted to the support.litera.com site indicating what is known at the time.

  6. DevOps will begin the process of restoring data and infrastructure to a new region. DNS records for domains would be updated so the same addresses would still apply.

  7. Once all systems have been restored notifications will be posted on support.litera.com.

  8. When the issue impacting the region has been resolved a maintenance window would be established for the restoration of the systems back to the primary location. Data from the new region would need to be synced back to the original. This would be achieved through a few methods depending on the individual services, but a maintenance window would be needed to ensure all data is synced properly with no conflicts and DNS records have been restored to the correct targets.

  9. Once the situation is resolved an update will be posted on support.litera.com indicating as much. It will either include, or be followed up with a subsequent posting, the identified cause, resolution, and future prevention details as they become available.

Let's Connect📌

☎ +1 630.598.1100
☎ ‪+44 20 3880 1550‬
📧 support@litera.com
💻 https://www.litera.com/support/

📝 Support is available:
4 am - 8 pm US Eastern
(9 am - 1 am GMT/BST
7 pm - 11 am AET) on normal business days (excluding holidays)

© 2024 Litera