Amazon EMR Migration Guide | Part 2
From the response we received from Amazon EMR – A Migration Plan we have decided to elaborate on the Amazon EMR topic. This post will dive deeper into next steps. So now that you have started your journey to Amazon EMR, gathering requirements, optimization and security are the next steps in the migrating process.
A list of metrics is useful to help with cost estimation, architecture planning, and instance type selection. These will help drive the decision-making process during migration. For example, you will need to capture each of these metrics to drive the decision-making process during migration:
* Aggregate number of physical CPUs
* CPU clock speed and core counts
* Aggregate memory size
* Amount of HDFS storage (without replication)
* Aggregate maximum network throughput
* At least one week of utilization graphs for the resources used above
Now we will cover optimization from the cost, storage and computing aspects. With Amazon EMR, you only pay a per-second rate for every second that you use the cluster. Amazon EMR provides various features to help lower costs. To make the best use out of those features, consider the workload type as well as the instance type. This will help to optimize costs. In addition to cost optimization, storage optimization is equally important. By optimizing your storage, you can improve the performance of your jobs. This approach enables you to use less hardware and run clusters for a shorter period. Here are some strategies to help you optimize your cluster storage:
* Partition Data
* Optimize File Size
* Compress the Dataset
* Optimize File Formats
While cost and storage optimization is important, it is imperative to understand the computing optimization as well. Here are some of the features and ways to optimize your Amazon EC2 cluster’s compute:
* Spot Instances
* Reserved Instances
* Instance Fleets
* Amazon EMR Auto Scaling
There are a number of factors to consider when estimating costs for an Amazon EMR cluster. These factors include EC2 instances (compute layer), EBS volumes, and Amazon S3 storage. Due to the per-second pricing of Amazon EMR, the cost of running a large EMR cluster that runs for a short duration would be similar to the cost of running a small cluster for a longer duration.
Once optimization is fully detailed, securing your resources on Amazon EMR is the next step. Amazon EMR has a comprehensive range of tools and methods to secure your data processing in the AWS Cloud. Some best practices are:
* Design early with security in mind
* Ensure that the supporting department is involved early in security architecture.
* Understand the risks
* Obtain security exceptions.
* Use different security setups for different use cases
Once you have hammered out the next steps of the migration process which are gathering requirements, optimization and security, you will be on your way to fully taking advantage of Amazon EMR. Talking with a cloud service company that is dedicated to helping organizations navigate platforms such as Amazon EMR is critical to the success of your project. Contact Cloud Rush today for a complimentary assessment for your organization.