Amazon AWS has changed the IaaS game for startups and growth companies. There are several practices we must acknowledge the importance of during the adoption and implementation of AWS services. Readers are more than welcome to comment, suggest modifications and even add their own little practices they follow during roll outs and implementations. As clients start piloting cloud initiatives it is best to avoid common pitfalls. The below best practices are a good first step in that effort. Below we have compiled a top ten list to be expanded in the future of best practices we have learned during our aws deployments. We look forward to expanding this list in the future.
1. Choose your VPC infrastructure carefully
a. As you move into the AWS environment often the first thing firms want to do is create a VPC, the question is what type of architecture is appropriate for you? The answer to this question is driven by the intended use case. Often internet based organizations will opt to go with a VPC containing Public and Private Subnets. Opting for this type of VPC ensures the services that don’t require direct access to the internet are in a segregated subnet while locating public services access to the public subnet.
b. Existing enterprise firms intending to burst into the cloud often don’t intend to utilize public facing services and may opt instead for private only subnet and create a secure tunnel between their enterprise and the public cloud.
c. The architecture with the greatest flexibility tends to be a public/private subnet ensuring the public subnet is available if necessary and can remain unpopulated if not needed.
2. CIDR Block Selection is driven by two realities: VPC address space is fixed and can’t be changed after it is created.
a.Ensure your VPC address space doesn’t overlap with your corporate space, for internet based firms this is less of an issue, however for enterprise firms this can prevent the need to rebuild the environment after a significant investment in time and resources
b. While AWS allows CIDR blocks to be as large as /16 don’t waste space, having several very large subnets reduces one flexibility as needs arise to support multi-AZ services, or other eventual requirements. On the other hand don’t overly constrain your environment.
3. Environment Isolation is an important consideration, the same isolation that’s present in the physical environment should be present in the cloud environment. The cloud give firms more flexibility in implementation of this isolation, in an effort to control cost, development, integration/test and production environments can be registered under separate accounts ensuring accurate billing and cost controls.
4. Security: Use security groups to isolate services and management rules
a. Separate public traffic from private using subnets
b. Create a SSH gateway as an entry point for SSH communication
c. Create a VPN tunnel for management access
5. VPC/Enterprise Integration
When integrating AWS into an existing enterprise environment planning will ensure minimal rework. Often the key consideration is assigning IP ranges to AWS that do not overlap with the corporate space. In this way routing between the corporate LAN and AWS will be seamless.
6. IAM: The primary AWS account should be treated like the traditional root user in Linux systems.
a. Create a separate account specifically for administration, with the minimum permissions necessary to perform the task at hand.
b. Create separate accounts for each user, or service to ensure audit traceablity
c. Permissions should be assigned to groups and users to the group, this will minimize duplication of effort
d. All users should utilize strong passwords, in this case refer to your corporate password policy. Cloud passwords should be as secure as the corporate infrastructure.
e. Don’t share credentials this will wreck audit traceability, and corporate policy, use roles to assign permissions.
f. Rotate credentials on a regular basis, the same way corporate credentials are rotated.
7. Disaster Recovery :Remember Disaster Recovery is designed to get backup after a failure. Traditional enterprise environments are often limited by the question do we need to building a duplicate system for recovery or can we through vendor SLAs perform a data recovery. In terms of the cloud enterprise customers are able to (at a lower cost) duplicate their environments across availability zones, regions and vendors to create an exceptionally resilient service offering.
a. Do create multiple availability zones.
b. Do use an automated CM tool to maintain your configurations
c. Do create snapshot of your volumes.
d. Do run drills to ensure everything operates as it was architected.
e. Do run drill more than once.
f. Do ensure everyone relevant is aware of the procedure
g. Do architect you environment with sufficient redundancy, to minimize the need recovery efforts.
8. Security Groups: Security Groups like traditional firewalls should be set with the least permission necessary to accomplish the mission.
a. Decide on a security group methodology; by creating groups to manage access to services ie allow access to ports 80 and 443 for nodes that will be a web server.
b. Create a ssh gateway/ Bastion node and group:
i. Only allow external SSH access to the SSH Gateway
ii.Only allow local SSH access from within a security group, or from the gateway node, this minimized an attackers ability to traverse the networks between zones.
c. Define an enterprise naming convention and stick to it.
d.Utilize the ability to assign multiple security groups to a single asset (up to 5 groups per asset).
9. Naming Conventions: AWS assets should be named in such a way they can be readily identified, a suggested standardization around the purpose of the Instance, its environment, its region, and its sequence number. An example of this might be haddop_dn_prod_usw1_001, this represents a production Hadoop data node, in the US-West 1 region.
a. When naming an AMI its a good practice to include the creation date as part of its name, and a complete description. This will minimize confusion when selecting ami for new instances.
b. When naming security groups and key pairs, continue to use the convention purpose, environment, region, function. For example db_prod_usw1_key or ws_prod_usw1_sg
10. Elastic Load Balancing: Use multiple availability zones to balance traffic in the event of an environment failure (this can still happen within the cloud)
a. Use Route53 to balance traffic between regions, this is not supported by ELB
b. Use ELBs for more than just web traffic, most any protocol can be supported by the ELB, ensure your service can support it
c. ELB timeout after 40 seconds, ensure your application touches the socket before then to prevent the session from timing out.
These are my top 10. Let me know what you think I missed.