now browsing by category

Cloud topics in general


Disaster Recovery Scenarios with AWS



Enterprises have different levels of tolerance for business interruptions and therefore a wide variety of disaster recovery preferences, ranging from solutions that provide a few hours of downtime to seamless failover. Smart DR offers several strategies which meet the recovery needs of most enterprises using combinations of AWS services.


Method RTO Cost
Cold Low RTO >= 1 business day Lowest Cost
Pilot-Light Moderate RTO < 4 hours Moderate Cost
Warm Standby Aggressive RTO < 1 hour High Cost
Multi-Site No Interruptions Highest Cost

AWS enables you to cost-effectively operate each of these DR strategies. It’s important to note that these are just examples of possible approaches, and variations and combinations of these are possible. If your application is already running on AWS, then multiple regions can be employed and the same DR strategies will still apply.

Enterprise-level disaster recovery is primarily measured in terms of Recovery Time Objective (RTO) and Recovery Point Objective (RPO). RTO is a measure of the maximum amount of time within which operations are expected to be resumed after a disaster. RPO is a measure, in terms of time, of the maximum amount of data that can be lost as a result of a disaster.

Cold /Backup and Restore

The Backup and Restore scenario is an entry level form of disaster recovery on AWS. This approach is the most suitable one in the event that you don’t have a DR plan. In most traditional environments, data is backed up to non-volatile media such as tape and sent off-site regularly. If you use this method, it can take a long time to restore your system in the event of a disruption or disaster. Amazon S3 is an ideal destination for backup data that might be needed quickly to perform a restore. Transferring data to and from Amazon S3 is typically done through the network, and is therefore accessible from any location. There are many commercial and open-source backup solutions that integrate with Amazon S3. You can use AWS Import/Export to transfer very large data sets by shipping storage devices directly to AWS. For longer-term data storage where retrieval times of several hours are adequate, there is Amazon Glacier, which has the same durability model as Amazon S3. Amazon Glacier is a low-cost alternative starting from $0.01/GB per month. Amazon Glacier and Amazon S3 can be used in conjunction to produce a tiered backup solution. AWS Storage Gateway enables snapshots of your on-premises data volumes to be transparently copied into Amazon S3 for backup. You can subsequently create local volumes or Amazon EBS volumes from these snapshots. Storage-cached volumes allow you to store your primary data in Amazon S3, but keep your frequently accessed data local for low-latency access. As with AWS Storage Gateway, you can snapshot the data volumes to give highly durable backup. In the event of DR, you can restore the cache volumes either to a second site running a storage cache gateway or to Amazon EC2. For systems already running on AWS, you also can back up into Amazon S3. Snapshots of Amazon EBS volumes, Amazon RDS databases, and Amazon Redshift data warehouses can be stored in Amazon S3. Alternatively, you can copy files directly into Amazon S3, or you can choose to create backup files and copy those to Amazon S3. There are many backup solutions that store data directly in Amazon S3, and these can be used from Amazon EC2 systems as well.

The following figure shows data backup options to Amazon S3, from either on-site infrastructure or from AWS.

Cold method

Of course, the backup of your data is only half of the story. If disaster strikes, you’ll need to recover your data quickly and reliably. You should ensure that your systems are configured to retain and secure your data, and you should test your data recovery processes. 

Key steps for backup and restore method:

  • Select an appropriate tool or method to back up your data into AWS.
  • Ensure that you have an appropriate retention policy for this data.
  • Ensure that appropriate security measures are in place for this data, including encryption and access policies.
  • Regularly test the recovery of this data and the restoration of your system

The Backup and Restore plan is suitable for lower level business-critical applications. This is also an extremely cost-effective scenario and one that is most often used when we need backup storage. If we use a compression and de-duplication tool, we can further decrease our expenses here. For this scenario, RTO will be as long as it takes to bring up infrastructure and restore the system from backups. RPO will be the time since the last backup.


The term “Pilot Light” is often used to describe a DR scenario where a minimal version of an environment is always running in the cloud. This scenario is similar to a Backup and Restore scenario. For example, with AWS you can maintain a Pilot Light by configuring and running the most critical core elements of your system in AWS. When the time comes for recovery, you can rapidly provision a full-scale production environment around the critical core. Infrastructure elements for the pilot light itself typically include your database servers, which would replicate data to Amazon EC2 or Amazon RDS. Depending on the system, there might be other critical data outside of the database that needs to be replicated to AWS. This is the critical core of the system (the pilot light) around which all other infrastructure pieces in AWS can quickly be provisioned to restore the complete system.

To provision the remainder of the infrastructure to restore business-critical services, you would typically have some preconfigured servers bundled as Amazon Machine Images (AMIs), which are ready to be started up at a moment’s notice. When starting recovery, instances from these AMIs come up quickly with their pre-defined role (for example, Web or App Server) within the deployment around the pilot light. From a networking point of view, you have two main options for provisioning:

  • Use Elastic IP addresses, which can be pre-allocated and identified in the preparation phase for DR, and associate them with your instances. Note that for MAC address-based software licensing, you can use elastic network interfaces (ENIs), which have a MAC address that can also be pre-allocated to provision licenses against. You can associate these with your instances, just as you would with Elastic IP addresses.
  • Use Elastic Load Balancing (ELB) to distribute traffic to multiple instances. You would then update your DNS records to point at your Amazon EC2 instance or point to your load balancer using a CNAME. We recommend this option for traditional web-based applications.

For less critical systems, you can ensure that you have any installation packages and configuration information available in AWS, for example, in the form of an Amazon EBS snapshot. This will speed up the application server setup, because you can quickly create multiple volumes in multiple Availability Zones to attach to Amazon EC2 instances. You can then install and configure accordingly, for example, by using the backup-and-restore method. The pilot light method gives you a quicker recovery time than the backup-and-restore method because the core pieces of the system are already running and are continually kept up to date. AWS enables you to automate the provisioning and configuration of the infrastructure resources, which can be a significant benefit to save time and help protect against human errors. However, you will still need to perform some installation and configuration tasks to recover the applications fully.

Preparation phase

The following figure shows the preparation phase, in which you need to have your regularly changing data replicated to the pilot light, the small core around which the full environment will be started in the recovery phase. Your less frequently updated data,such as operating systems and applications, can be periodically updated and stored as AMIs.

Light method1

Key steps for preparation:

  1. Set up Amazon EC2 instances to replicate or mirror data.
  2. Ensure that you have all supporting custom software packages available in AWS.
  3. Create and maintain AMIs of key servers where fast recovery is required.
  4. Regularly run these servers, test them, and apply any software updates and configuration changes.
  5. Consider automating the provisioning of AWS resources.

Recovery phase

To recover the remainder of the environment around the pilot light, you can start your systems from the AMIs within minutes on the appropriate instance types. For your dynamic data servers, you can resize them to handle production volumes as needed or add capacity accordingly. Horizontal scaling often is the most cost-effectiveand scalable approach to add capacity to a system. For example, you can add more web servers at peak times. However, you can also choose larger Amazon EC2 instance types, and thus scale vertically for applications more intensive. From a networking perspective, any required DNS updates can be done in parallel.

After recovery, you should ensure that redundancy is restored as quickly as possible. A failure of your DR environment shortly after your production environment fails is unlikely, but you should be aware of this risk. Continue to take regular backups of your system, and consider additional redundancy at the data layer. The following figure shows the recovery phase of the pilot light scenario.

Light method2

Key steps for recovery:

  1. Start your application Amazon EC2 instances from your custom AMIs.
  2. Resize existing database/data store instances to process the increased traffic.
  3. Add additional database/data store instances to give the DR site resilience in the data tier;if you are using Amazon RDS, turn on Multi-AZ to improve resilience.
  4. Change DNS to point at the Amazon EC2 servers.
  5. Install and configure any non-AMI based systems, ideally in an automated way

Warm Standby

A Warm Standby scenario is an expansion of the Pilot Light scenario where some services are always up and running. Disaster Recovery in a warm configuration allows customers a near no-downtime solution with a near-to-100% uptime SLA arrangement. As we plan a DR plan, we need to identify crucial points of our on-premise infrastructure and then duplicate it inside the AWS. In most cases, we’re talking about web and app servers running on a minimum-sized fleet.

By identifying your business-critical systems, you can fully duplicate these systems on AWS and have them always on. These servers can be running on a minimum-sized fleet of Amazon EC2 instances on the smallest sizes possible. This solution is not scaled to take a full-production load, but it is fully functional. It can be used for non-production work, such as testing, quality assurance, and internal use. Once a disaster occurs, infrastructure located on AWS takes over the traffic and performs its scaling and converting to a fully functional production environment with minimal RPO and RTO. In AWS, this can be done by adding more instances to the load balancer and by resizing the small capacity servers to run on larger Amazon EC2 instance types. As stated in the preceding section, horizontal scaling is preferred over vertical scaling.

Preparation phase

The following figure shows the preparation phase for a warm standby solution, in which an on-site solution and an AWS solution run side-by-side.

Warm Standby method1

Key steps for preparation:

  1. Set up Amazon EC2 instances to replicate or mirror data.
  2. Create and maintain AMIs.
  3. Run your application using a minimal footprint of Amazon EC2 instances or AWS infrastructure.
  4. Patch and update software and configuration files in line with your live environment.

Recovery phase

In the case of failure of the production system, the standby environment will be scaled up for production load , and DNS records will be changed to route all traffic to AWS.

Warm Standby method2

Key steps for recovery:

  1. Increase the size of the Amazon EC2 fleets in service with the load balancer (horizontal scaling).
  2. Start applications on larger Amazon EC2 instance types as needed (vertical scaling).
  3. Either manually change the DNS records, or use Amazon Route 53 automated health checks so that all traffic is routed to the AWS environment.
  4. Consider using Auto Scaling to right-size the fleet or accommodate the increased load.
  5. Add resilience or scale up your database.


The Multi-Site scenario is a solution for an infrastructure that is up and running completely on AWS as well as on an “on-premise” data center. The data replication method that you employ will be determined by the recovery point that you choose.In addition to recovery point options, there are various replication methods,such as synchronous and asynchronous methods.

You can use a DNS service that supports weighted routing, such as Amazon Route 53, to route production traffic to different sites that deliver the same application or service. A proportion of traffic will go to your infrastructure in AWS, and the remainder will go to your on-site infrastructure. In an on-site disaster situation, you can adjust the DNS weighting and send all traffic to the AWS servers. The capacity of the AWS service can be rapidly increased to handle the full production load. You can use Amazon EC2 Auto Scaling to automate this process. You might need some application logic to detect the failure of the primary database services and cut over to the parallel database services running in AWS. The cost of this scenario is determined by how much production traffic is handled by AWS during normal operation. In the recovery phase, you pay only for what you use for the duration that the DR environment is required at full scale. You can further reduce cost by purchasing Amazon EC2 Reserved Instances for your “always on” AWS servers.

Preparation phase

The following figure shows how you can use the weighted routing policy of the Amazon Route 53 DNS to route a portion of yourtraffic to the AWS site. The application on AWS might access data sources in the on-site production system. Data is replicated or mirrored to the AWS infrastructure.

Multi site method1

Key steps for preparation:

  1. Set up your AWS environment to duplicate your production environment.
  2. Set up DNS weighting, or similar traffic routing technology,to distribute incoming requests to both sites.
  3. Configure automated failover to re-route traffic away from the affected site.

Recovery phase

The following figure shows the change in traffic routing in the event of an on-site disaster. Traffic is cut over to the AWS infrastructure by updating DNS, and all traffic and supporting data queries are supported by the AWS infrastructure.

Multi site method2

Key steps for recovery:

  1. Either manually or by using DNS failover, change the DNS weighting so that all requests are sent to the AWS site.
  2. Have application logic for failover to use the local AWS database servers for all queries.
  3. Consider using Auto Scaling to automatically right-size the AWS fleet. You can further increase the availability of your multi-site solution by designing Multi-AZ architectures.

AWS Production to an AWS DR Solution Using Multiple AWS Regions

Applications deployed on AWS have multi-site capability by means of multiple Availability Zones. Availability Zones are distinct locations that are engineered to be insulated from each other. They provide inexpensive, low-latency network connectivity within the same region. Some applications might have an additional requirement to deploy their components using multiple regions; this can be a business or regulatory requirement. Any of the preceding scenarios in this article can be deployed using separate AWS regions.

The advantages for both production and DR scenarios include the following:

  • You don’t need to negotiate contracts with another provider in another region.
  • You can use the same underlying AWS technologies across regionn.
  • You can use the same tools or APIs


AWS disaster recovery plan


First of all What is Disaster Recovery?

Disaster recovery (DR) is about preparing for and recovering from a disaster. Any event that has a negative impact on a company’s business continuity or finances could be termed a disaster. This includes hardware or software failure, a network outage, a power outage, physical damage to a building like fire or flooding, human error, or some other significant event.

In any case, it is crucial to have a tested disaster recovery plan ready. A disaster recovery plan will ensure that our application stays online no matter the circumstances. Ideally, it ensures that users will experience zero, or at worst, minimal issues while using your application.

Let’s take a closer look at some of the important terminology associated with disaster recovery:

Business Continuity. All of our applications require Business Continuity. Business Continuity ensures that an organization’s critical business functions continue to operate or recover quickly despite serious incidents.

Recovery time objective (RTO) — The time it takes after a disruption to restore a business process to its service level, as defined by the operational level agreement (OLA). For example, if a disaster occurs at 12:00 PM (noon) and the RTO is eight hours, the DR process should restore the business process to the acceptable service level by 8:00 PM.

Recovery point objective (RPO) — The acceptable amount of data loss measured in time. For example, if a disaster occurs at 12:00 PM (noon) and the RPO is one hour, the system should recover all data that was in the system before 11:00 AM. Data loss will span only one hour, between 11:00 AM and 12:00 PM (noon).

Traditional Disaster Recovery plan (on-premise)

A traditional on-premise Disaster Recovery plan often includes a fully duplicated infrastructure that is physically separate from the infrastructure that contains our production. In this case, an additional financial investment is required to cover expenses related to hardware and for maintenance and testing. When it comes to on-premise data centers, physical access to the infrastructure is often overlooked.

These are the security requirements for an on-premise data center disaster recovery infrastructure:

  • Facilities to house the infrastructure, including power and cooling.
  • Security to ensure the physical protection of assets.
  • Suitable capacity to scale the environment.
  • Support for repairing, replacing, and refreshing the infrastructure.
  • Contractual agreements with an internet service provider (ISP) to provide internet connectivity that can sustain bandwidth utilization for the environment under a full load.
  • Network infrastructure such as firewalls, routers, switches, and load balancers.
  • Enough server capacity to run all mission-critical services. This includes storage appliances for the supporting data, and servers to run applications and backend services such as user authentication, Domain Name System (DNS), Dynamic Host Configuration Protocol (DHCP), monitoring, and alerting.

Obviously, this kind of disaster recovery plan requires large investments in building disaster recovery sites or data centers (CAPEX). In addition, storage, backup, archival and retrieval tools, and processes (OPEX) are also expensive. And, all of these processes, especially installing new equipment, take time.

An on-premise disaster recovery plan can be challenging to document, test, and verify, especially if you have multiple clients on a single infrastructure. In this scenario, all clients on this infrastructure will experience problems with performance even if only one client’s data is corrupted.

Disaster Recovery plan on AWS

There are many advantages of implementing a disaster recovery plan on AWS. Financially, we will only need to invest a small amount in advance (CAPEX), and we won’t have to worry about the physical expenses for resources (for example, hardware delivery) that we would have in on an “on-premise” data center.

AWS enables high flexibility, as we don’t need to perform a failover of the entire site in case only one part of our application isn’t working properly. Scaling is fast and easy. Most importantly, AWS allows a “pay as you use” (OPEX) model, so we don’t have to spend a lot in advance. Also, AWS services allow us to fully automate our disaster recovery plan. This results in much easier testing, maintenance, and documentation of the DR plan itself.

This table shows the AWS service equivalents to an infrastructure inside an on-premise data center.

On premise data center infrastructure AWS Infrastructure
DNS Route 53
Load Balancers ELB/appliance
Web/app servers EC2/Auto Scaling
Database servers RDS
AD/authentication AD failover nodes
Data centers Availability Zones
Disaster recovery Multi-region

Now that you saw the differences between DR on-premise versus DR on AWS lets point out some tips  that you should take into consideration when you develop your DR plan:

1. Backups not equal DR

Disaster Recovery is not only doing backups, but rather, it is the process, policies, and procedures that you put in place to prepare for recovery or business continuity in the event of a crisis. In other words, simply backing up your data won’t be of much help unless you have a process in place to quickly retrieve and put it to use.

2. Prioritize: Downtime Costs Vs. Backup/Recovery Costs

As with any successful plan, an AWS disaster recovery strategy must be tailored to meet your company’s specific needs. As such, choices will have to be made between the amount of money spent on backup and restoration of data versus the amount of money that might be lost during downtime. If your company can withstand a lengthy outage without hemorrhaging cash, a slower, less expensive backup and recovery option might make sense. But if you run a business in which you can afford not even the slightest amount of downtime, that more expensive methods such as an AWS-based duplicate production environment might be required.

3.  Determine Your RTO/RPO

A company typically decides on an acceptable RTO and RPO based on the financial impact to the business when systems are unavailable. The company determines financial impact by considering many factors, such as the loss of business and damage to its reputation due to downtime and the lack of systems availability. IT organizations then plan solutions to provide cost-effective system recovery based on the RPO within the timeline and the service level established by the RTO.

4. Choose The Right Backup Strategy

As mentioned above, regular backups are only one part an effective AWS disaster recovery plan. Nonetheless, they are an extremely important component. That’s why choosing the right backup recovery plan for your business is vital. Even though you’ve already settled on a cloud-based solution, you will have to choose between various backup options such as using Amazon Machine Images (AMI) or EBS snapshots.

5. Identify Mission-Critical Applications And Know Your AWS DR Options

After determining your company’s RTO, RPO, and preferred backup strategy, it’s time to choose which type of AWS disaster recovery method is right for you. And depending on which option you ultimate choose, it may also be necessary to identify and prioritize mission-critical applications. Some of the most common methods include:

  • Backup and Restore: a simple, cost-effective method that utilizes services such as Amazon S3 to backup and restore data.
  • Pilot Light: This method keeps critical applications and data at the ready so that it can be quickly fired up should disaster strike.
  • Warm Standby: This method keeps a duplicate version of your business’ core elements running at all times, resulting in a nearly seamless transition with very little downtime.
  • Multi-Site Solution: Also known as a Hot Standby, this configuration leaves almost nothing to chance by fully replicating your data/applications between two or more active locations and splitting traffic/usage between them. In the event of a disaster, traffic is simply routed to the unaffected location, resulting in no downtime.

6. Implement Cross-Region Backups

As with traditional methods of backup and recovery, geographic diversification of your data is essential for your AWS disaster recovery plan. If a natural disaster or man-made catastrophe brings down your primary production environment, having a backup stored in the same building, or even the same region, makes little sense. Luckily, the global reach of AWS makes geographic diversification very easy to implement. If your primary AWS services are knocked off line, you can rest assured that your DR plan can be implemented using backup data that’s been safely stored a world away (literally).

7. Test And Retest Your Plan

Sometimes even the best-laid plans go awfully wrong. Even the most detail-oriented AWS disaster recovery plan has the potential to fail when put into actual practice. That’s why it’s important to constantly test and retest your plan for flaws. And thanks to AWS’ ability to create a duplicate environment, you can test your plan using real-world scenarios without jeopardizing your actual production environment.


Migrate Your Own VMs into AWS Cloud


Recently we received a task to move our virtual infrastructure into AWS cloud. During this process I faced a couple of challenges and I thought to shared them with you.
First let’s start with the prerequisites and limitations :

  • Operating systems that can be imported into EC2, Windows: Windows Server 2012 R2 (Standard), Windows Server 2012 (Standard, Data center), Windows Server 2008 R2 (Standard, Data center, Enterprise),Windows Server 2008 (Standard, Data center, Enterprise), Windows Server 2003 R2 (Standard, Data center, Enterprise), Windows Server 2003 (Standard, Data center, Enterprise) with Service Pack 1 (SP1) or later
  • Linux: Linux/Unix (64-bit)- Red Hat Enterprise Linux (RHEL) 5.1-5.10, 6.1-6.5, CentOS 5.1-5.10, 6.1-6.5, Ubuntu 12.04, 12.10, 13.04, 13.10, Debian 6.0.0-6.0.8, 7.0.0-7.2.0 (RHEL 6.0 is unsupported because it lacks the drivers required to run on Amazon EC2).
  • Image-Formats Supported: RAW format,VHD,VMDK, (you can only import VMDK files into Amazon EC2 that were created through the OVF export process in VMware).
  • Define an s3 bucket (in a region close to you, to speed up uploads.) This will be used to upload the images for conversion to AMI.
  • Define roles and policies in AWS. In particular:
  • vmimport service role and a policy attached to it, precisely as explained in this AWS doc.
  • If you’re logged on as an AWS Identity and Access Management (IAM) user, you’ll need the following permissions in your IAM policy to use VM Import/Export:
  "Version": "2012-10-17",
  "Statement": [
      "Effect": "Allow",
      "Action": [
      "Resource": "*"
      "Effect": "Allow",
      "Action": [
      "Resource": ["arn:aws:s3:::exported-vm","arn:aws:s3:::exported-vm/*"]
      "Effect": "Allow",
      "Action": [
      "Resource": "*"
      "Effect": "Allow",
      "Action": [
      "Resource": "*"
  • Fast upstream bandwidth as you will be uploading the image to s3!
  • Disk image cannot exceed 1 TB.
  • Make sure that you have at least 250 MB of available disk space for installing drivers and other software.
  • Multiple network interfaces are not currently supported.
  • IPv6 are not supported.
  • To use your own Microsoft licenses through set LicenseType as BYOL, your BYOL instances will be priced at the prevailing AWS EC2 Linux instance pricing, provided that you run on a Dedicated Instance.

In order to initiate and manage the migration (import), you’ll need to install the AWS Cli tools on the machine where the source images reside. You can refer to AWS documentation for installing CLI tools.
Migrating virtual machines: prepare your VM

  • Uninstall the VMWare Tools from your VMWare VM.
  • Disconnect any CD-ROM drives (virtual or physical).
  • Set your network to DHCP instead of a static IP address. If you want to assign a static private IP address, be sure to use a non-reserved private IP address in your VPC subnet.
  • Shut down your VM before exporting it.
  • On Windows, enable Remote Desktop (RDP) for remote access, and on Linux enable SSH server access.
  • Allow RDP and SSH access through your host firewall if you have one.
  • Use secure passwords for your all user accounts and disable Auto logon on your Windows VM.
  • Make sure that your Linux VM uses GRUB (GRUB legacy) or GRUB 2 as its boot loader.
  • Make sure that your Linux VM uses one of the following root file systems: EXT2, EXT3, EXT4, Btrfs, JFS, or XFS.
  • Export your VM from its virtual environment for VMware and Microsoft Hyper-V.

Okay, Now we are going to import our OVA files using AWS CLI tool. I’ve already created S3 bucket in my AWS account and uploaded OVA files also, put it in bucket named ”exported-vm”.

aws s3 mb s3://exported-vm –region us-west-1
aws s3 cp c:\myfolder\RHEL_6.5.ova s3://exported-vm/RHEL_6.5.ova

Let’s open our terminal/cmd console or whatever console you’re using with AWS CLI and type the following command to import OVA file and convert it into AMI image.
Here’s my example of the command:

aws ec2 import-image --cli-input-json "{ \"Description\": \"RHEL OVA\", \"DiskContainers\": [ { \"Description\": \"First CLI task\", \"UserBucket\": { \"S3Bucket\": \"exported-vm\", \"S3Key\" : \"RHEL_6.5.ova\" } } ]}"

Example response:

    "Status": "active",
    "Description": "RHEL OVA",
    "Progress": "2",
    "SnapshotDetails": [
            "UserBucket": {
                "S3Bucket": "exported-vm",
                "S3Key": "RHEL_6.5.ova"
            "DiskImageSize": 0.0
    "StatusMessage": "pending",
    "ImportTaskId": "import-ami-ffqfkywt"

Run the aws ec2 describe-import-image-tasks command to check status of importing:

aws ec2 describe-import-image-tasks --import-task-ids import-ami-ffqfkywt

Example response:

    "ImportImageTasks": [
            "Status": "active",
            "Description": "RHEL OVA",
            "Progress": "28",
            "SnapshotDetails": [
                    "UserBucket": {
                        "S3Bucket": "exported-vm",
                        "S3Key": "RHEL_6.5.ova"
                    "DiskImageSize": 15508464640.0,
                    "Format": "VMDK"
            "StatusMessage": "converting",
            "ImportTaskId": "import-ami-ffqfkywt"

After converting completed, you can find the new AMI in AWS Console / EC2 / AMIs and use it to create new EC2 instances. Note the instance ID from VM import status, right-click the instance, select Instance State, and then click Start.







How to Backup And Recover Your AWS EC2 Windows Instances


If you’re in the situation of using Windows instances on AWS than you need some sure means of backing up and recovering properly the instances when needed. Even if the AWS documentation is straight forwards on how to backup/recover an instance if you are a Windows user, however, the EC2 instance recovery process may require a bit more effort.
AWS offers two back-up methods. You can either create images or you can take snapshots of volumes. We have come to the conclusion that snapshots are more suitable for cloud backup since you can ensure their consistency. With Linux, you can take a snapshot of each volume of an instance, then create new AMIs from those snapshots. As a result, instances can be launched from the newly created AMIs but if you have a software licensed linked to the MAC address than you need to come up with another method of recovering your AWS instances. AWS doesn’t allow changing the MAC address of your instances.
Below, I will share 3 ways Windows EC2 instances can be backed up and recovered using AWS. At the end of the day, it is up to you to decide which approach best suits your needs.One thing to keep in mind with all of the approaches below is that nothing is perfect. It is always best to test a procedure before incorporating it into your backup/recovery process.

Recover from an AMI – If you don’t want to bother with snapshots, you can keep back-up and recovery simple by creating an AMI of your entire instance. This can be done as often as you like, be it once a day, once a week, or at any other frequency.
The downside to this approach is that you cannot ensure consistency unless you choose to reboot the instance. Due to the fact that the Windows instance needs to reboot in order to create an AMI, attempting to do this on a frequent basis will result in significant amounts of downtime. For the most part, this can be done once a week, generally on the weekends. However, many production systems find this approach unacceptable, especially if done frequently. Conversely, if you decide not to reboot the instance, there is no way of knowing if the AMI creation is consistent or not. The only way to know for sure that the image is consistent is when Windows shuts down in an properly manner.

Recover and Attach – Create an AMI of your instance every once in a while (i.e. once a week), including the root device, which is the disk of Windows C: drive. In most cases, an application’s data will be stored on other instance volumes (that are not the root device) and will be backed up separately. In comparison to the data that is stored on these volumes, data located on the C: drive does not change very frequently. However, by using your AMI as a foundation for a launch, you can launch a new instance, and attach the other most recent updated data volumes. This is generally a successful approach if there are not very frequent changes made to the operating system. Even in case Windows performs an update, it will update itself again when a new instance is started. Nevertheless, be sure to test the approach a few times to make sure you have a working server when it finishes. By providing an AMI to start an instance, and choosing the data you need from the back-up, the instance will be launched including all of the data volumes with a single mouse click.

Recover the Root Device – After launching an instance from an AMI, you can then stop the instance and switch some or all of the volumes, including the root device. That way, you can ensure that you have a new instance with the most recent copy of the C: drive.
There are two ways you can go about carrying out this approach:

Start an Official AWS AMI: For example, if you have a Windows 2012 server, you would want to use an official AWS AMI, launch an instance, stop it, and switch all of the volumes. While this works in most cases, sometimes, it looks like it is good to go, but when you start the instance again, it doesn’t work. Therefore, you must be very careful about which AMI is used for this approach.

Take snapshots of all your interested volumes, create new volumes of the same type&size and use the existent EC2 instance that you want to recover or launch a new instance from the existent one, and at the end switch the volumes with the newly created volumes respecting the naming convention (/dev/sda1 or xvdb).
After this you should have a recovered EC2 instance. The advantage of this method is that it allows you to keep the same MAC address for the instance that you need to recover.
This is the most powerful approach in terms of the final outcome. However, while it provides the exact server from the most recent back-up, it is a bit more complicated, and again, needs to be tested.

As presented there is more than one way to achieve a successful data recovery in Windows. The trick is finding the method that is satisfying your requirements.

Amazon CloudWatch Monitoring Scripts


How standard monitoring works in EC2

When it comes to monitoring EC2 instances, we have to keep in mind that an instance in the cloud is not an actual single computer, but a virtual machine running alongside some siblings on a bigger host, which runs the virtualization solution, or hypervisor. Specifically, AWS uses a customized version of Xen Hypervisor.
CloudWatch relies on the information provided by this hypervisor, which can only see the most hardware-sided part of the instance’s status, including CPU usage (but not load), total memory size (but not memory usage), number of I/O operations on the hard disks (but not it’s partition layout and space usage) and network traffic (but not the processes generating it).
While this can be seen as a shortcoming on the hypervisor’s part, it’s actually very convenient in terms of security and performance, otherwise the hypervisor would be an all-seeing eye, with more powers than the root user itself.

How to monitor key elements of an EC2 instance

By default, CloudWatch only what the hypervisor is able to see. Luckily, CloudWatch accepts inputs from sources other than the hypervisor. This is what enables CloudWatch to monitor RDS’s instances details (such as replica lag) or the depth of an SQS queue, and it’s available to the end user under the label “Custom metrics”.
We’re installing a script that will periodically send our custom metrics to CloudWatch. Depending on our setup, we will have to take one of the two following approaches:

Using IAM user-based permissions

Creating an IAM user

If we can’t use EC2 Instance Roles, then we need to create an IAM user with the right permissions. After you create the user assign it the below policy:

{"Version": "2012-10-17","Statement": [{
"Sid": "Stmt1449681555000",
"Effect": "Allow",
"Action": ["cloudwatch:PutMetricData"],"Resource": ["*"]}]}

Please remember to write your Access and Secret Keys down.

Installing and configuring the script


You must perform additional steps on some versions of Linux.

Amazon Linux AMI

Log on to your Amazon Linux AMI instance and install the following package:

sudo yum install perl-DateTime perl-Sys-Syslog perl-LWP-Protocol-https

Red Hat Enterprise Linux

To install the scripts for the first time:
Log on to your Red Hat Enterprise Linux instance and Install the following package:

sudo yum install perl-App-cpanminus.noarch
sudo cpanm -i Sys/ DateTime LWP::Protocol::https

SUSE Linux Enterprise Server

To install the scripts for the first time, log on to your SUSE Linux Enterprise Server instance and install the following packages:

sudo zypper install perl-DateTime
sudo zypper install –y "perl(LWP::Protocol::https)"

Ubuntu Server

To install the scripts for the first time, log on to your Ubuntu Server instance and install the following packages:

sudo apt-get update
sudo apt-get install unzip
sudo apt-get install libwww-perl libdatetime-perl

Getting Started

The following steps show you how to download, uncompress, and configure the Amazon CloudWatch Monitoring Scripts on an EC2 Linux instance.

To download, install, and configure the script:

Open a command prompt, move to a folder where you want to store the scripts, and then type the following: 

curl -O
cd aws-scripts-mon

The package contains these files:

  •—Shared Perl module that simplifies calling Amazon CloudWatch from other scripts.
  •—Collects system metrics on an Amazon EC2 instance (memory, swap, disk space utilization) and sends them to Amazon CloudWatch.
  •—Queries Amazon CloudWatch and displays the most recent utilization statistics for the EC2 instance on which this script is executed.
  • awscreds.template—File template for AWS credentials that stores your access key ID and secret access key.
  • LICENSE.txt—Text file containing the Apache 2.0 license.
  • NOTICE.txt—copyright notice.

If you aren’t using an IAM role, update the awscreds.template file that you downloaded earlier with the Access and Secret Keys from earlier when you created the user with specific rights.

The content of this file should use the following format:


Using the Scripts

This script collects memory, swap, and disk space utilization data on the current system. It then makes a remote call to Amazon CloudWatch to report the collected data as custom metrics.


The following examples assume that you have already updated the awscreds.conf file with valid AWS credentials. If you are not using the awscreds.conf file, provide credentials using the --aws-access-key-id and --aws-secret-key arguments.
To perform a simple test run without posting data to CloudWatch run the following command:

$ ./ --mem-util --verify --verbose
MemoryUtilization: 18.9431700959895 (Percent)
No credential methods are specified. Trying default IAM role.
ERROR: No IAM role is associated with this EC2 instance.
For more information, run ' --help
sudo aws-scripts-mon/ --mem-util --swap-util --disk-space-util --disk-path=/ --aws-credential-file=path/to/file/aws.creds
Successfully reported metrics to CloudWatch. Reference Id: 70320792-b2e5-11e5-afc5-b72f2d5df436

To collect all available memory metrics and send them to CloudWatch run the following command:

./ --mem-util --mem-used --mem-avail

To set a cron schedule for metrics reported to CloudWatch start editing the crontab using the following command:

crontab -e

Add the following command to report memory and disk space utilization to CloudWatch every five minutes:

*/5 * * * * ~/aws-scripts-mon/ --mem-util --disk-space-util --disk-path=/ --aws-credential-file=path/to/file/aws.creds --from-cron

If the script encounters an error, the script will write the error message in the system log.
This script queries CloudWatch for statistics on memory, swap, and disk space metrics within the time interval provided using the number of most recent hours. This data is provided for the Amazon EC2 instance on which this script is executed.


To get utilization statistics for the last 12 hours run the following command:

./ --recent-hours=12

The returned response will be similar to the following example output:

Instance metric statistics for the last 12 hours.
CPU Utilization
Average: 1.06%, Minimum: 0.00%, Maximum: 15.22%
Memory Utilization
Average: 6.84%, Minimum: 6.82%, Maximum: 6.89%
Swap Utilization
Average: N/A, Minimum: N/A, Maximum: N/A
Disk Space Utilization on /dev/xvda1 mounted as /
Average: 9.69%, Minimum: 9.69%, Maximum: 9.69%

Viewing Your Custom Metrics in the AWS Management Console

If you successfully call the script, you can use the AWS Management Console to view your posted custom metrics in the Amazon CloudWatch console.
To view custom metrics:

  1. Execute, as described earlier.
  2. Sign in to the AWS Management Console and open the CloudWatch console at
  3. Click View Metrics.
  4. In the Viewing list, your custom metrics posted by the script are displayed with the prefix System/Linux.

Linux Metric