AWS 中的灾难恢复服务器

Disaster Recovery Server in AWS

我们有一个 3 节点的 Hadoop 集群以及两个 Ruby 运行在 AWS 上的服务器。

我们需要为 AWS 中的服务器创建 DR 服务器。

任何人都可以建议创建 DR 服务器的最佳方法吗?

设计和实施灾难恢复计划的 "best way" 将在很大程度上取决于您的恢复时间和 recovery point objectives:

A recovery point objective, or “RPO”, is defined by business continuity planning. It is the maximum targeted period in which data might be lost from an IT service due to a major incident. The RPO gives systems designers a limit to work to. For instance, if the RPO is set to four hours, then in practice, off-site mirrored backups must be continuously maintained – a daily off-site backup on tape will not suffice. Care must be taken to avoid two common mistakes around the use and definition of RPO. Firstly, business continuity staff use business impact analysis to determine RPO for each service – RPO is not determined by the existent backup regime. Secondly, when any level of preparation of off-site data is required, rather than at the time the backups are offsited, the period during which data is lost very often starts near the time of the beginning of the work to prepare backups which are eventually offsited.

AWS 当然提供您开发 disaster recovery plan:

所需的所有服务

Businesses are using the AWS cloud to enable faster disaster recovery of their critical IT systems without incurring the infrastructure expense of a second physical site. The AWS cloud supports many popular disaster recovery (DR) architectures from “pilot light” environments that are ready to scale up at a moment’s notice to “hot standby” environments that enable rapid failover. With data centers in 12 regions around the world, AWS provides a set of cloud-based disaster recovery services that enable rapid recovery of your IT infrastructure and data.

首先通过管理定义这些价值,考虑维护灾难恢复基础设施的成本与停机成本,然后转向分析 AWS 提供的服务以实施该计划:

  • 将服务置于多个可用性区域
  • 将关键数据集备份到 S3
  • 运行"warm"其他地区服务器恢复
  • 书面计划,并每季度测试该计划