In the first part of the Cloud Strategy blog we discussed the organizational changes required to adopt a cloud operating model effectively. In the second part, we dive into different phases of migration.
Before we dive into different phases of migration, an important aspect to consider for these workloads that are a target for migration is – are there ways to adopt a more efficient cloud like operating model on-premises?
Legacy workloads have a more or less have a steady state, are not undergoing rapid incremental or transformational change; are mostly in maintenance mode and probably don’t need much support from the cloud eco-system. In this model developers and operations teams will use a self-service portal to deploy new workloads and automation for patches, updates and change control. If this is a possible outcome, hybrid cloud becomes the most efficient and economical approach for the legacy workloads. (This is not to say that new workloads will not fit in this model, they are simply not part of the scope of migration.)
By maintaining an efficient on-premises data center and supplementing with public cloud, customers can recognize the economic benefits of running their own data center and also benefit from the public cloud eco-system. Of the many examples of companies that pulled back from the cloud to save money – Dropbox is a popular and well documented one. More on this in part 3.
Phases of Migration
Let dive back into different phases. This is not an original idea and is documented across various articles and cloud providers, but it has my own spin of what has worked in my experience.
Phase 1: Which Application First?
Too often, customers end up choosing an application that is too vast or too easy to migrate. The challenge with migrating a complex multi-dimensional application is that it is likely bound to fail in meeting deadlines or success criteria. Picking an easy application doesn’t deliver on the learning and decision making experience that is a byproduct of this exercise.
Ideally, one would want to choose something that maximizes learning and minimizes risk. Any applications that directly impact a line of business revenue should be immediately excluded. Homegrown back office applications- say a non critical ticketing system may good example of a low risk target. Moving an application to a SAAS version on any cloud is not considered a migration. It is a Replacement.
Intrinsic and Extrinsic Factors?
Consider other intrinsic factors – are there any applications that for any reason are already very unstable? This is a great candidate! Do consider latency requirements and and compliance requirements. Data storage for this application should ideally grow at a low rate or a moderate rate. This will help fine tune your life cycle and archival policies in public cloud without consuming your IT budget.
Are there Extrinsic factors- such as a data center evacuation that is impacting an app? In such a scenario, you are most likely going to take a downtime anyway, why not coordinate a migration to cloud at the same time?
Phase 2: Planning – it pays to be wrong!
Confused? Start with the premise that everything will go wrong, execute every step in an incremental way and prove your premise wrong or repeat until you are wrong i.e. the perfect migration.
This is probably the most important and arduous phase. The amount of effort you put into the planning stages will directly impact the outcome. A key element here is good old project planning. Do not commit to overwork your team, do not commit to underwork your team. Commit to a consistent workload every week and achieve that exact amount every week. (This is a great philosophy I first read in Great By Choice – by Jim Collins.) This ensures teams have predictable workloads and can stay motivated. Too often, leaders commit their team to a migration plan that is heavily focused on the very end.
For more specific migration related planning here are some things to consider. Map dependencies for the applications that are to be migrated. Which other applications, or resources does these applications talk to. What is the average flow rate, what is the hourly and 24 hour bandwidth consumption between applications that communicate? These metrics will help identify which applications need to be grouped together to move into the public cloud. Group the VM’s and create priorities for different move groups. This step likely requires tools that may specifically track these metrics.
Creating a migration schedule with the input from planning sessions is very important. The migration schedule decides the order in which application groups will move. Upon successful completion of a group move, a sanity check is a must before moving on to the next priority group.
Lastly, VM resizing must be considered. On-premises workloads are almost always over-provisioned. Use the right tools to identify how you can right size over-provisioned resources. This step can also be done after the migration, with cloud native tools.
Phase 3: Migration ‘Day’
The third phase is when the actual migration is completed. Assuming your direct connect or VPN tunnels are already set up, firewall ports are configured etc; you start migrating workloads. Various cloud native tools are available to get a data file like a VMDK over to the cloud. Storage vendors also provide ways to sync or backup your VMDK over to public cloud, but this requires you to consume their service in the cloud and hence should be the least preferred option. For instance, in the case of AWS, the storage vendor will likely use S3 to store the data rather than EBS like mountable volume.
Once the group migration is completed, verify if the migration was successful. If the migration broke at some point, retreat to the the last known good spot and trace your steps. In most cases, the data file is likely to have changed.
Exit Stage Left: To exit or disappear in a quiet, non-dramatic fashion, making way for more interesting events.
This is an often overlooked, transparent phase that stretches the first 3 phases. An idea of an exit strategy doesn’t mean you are abandoning the migration, it simply means to retreat to the previous logical step and re-evaluate options. This is simply an option when multiple failures have been experienced during the migration and it is not worthwhile to debug these failures. Something important was likely overlooked during the planning phase or the app selection phase. Remember it is acceptable to choose this option and suffer the intermediate setback. The migration is a marathon after-all, not a sprint. Stay calm and use the well documented exit runbook that was previously defined. Repeat Phases 1 and 2 with the new information. This may also be a good segue to bring in a partner who has done these migrations previously.
Phase 4: Are we there yet?
Workloads are in the cloud, the various line of businesses are not complaining, nothing major is broken, it feels like the migration is complete. Celebrate the team that reached this important milestone, but now is not the time to rest.
After a brief breather, start deploying and refining monitoring and management tools that were previously defined in the Cloud Framework. Verify that they are set to accurately trigger and classify alerts. Even if you did right-size your environment before migration, review workload optimization. You likely have access to many more resource optimization tools in public cloud. Start identifying candidates for automation. It is important to not automate everything initially. Automation is great to reduce errors introduced by manual provisioning. Although automation is only as good as the engineer who is automating.
Automation, especially with incident response, can be tricky. It is hard to automate unless you have considered all the causes and the possible outcomes and the necessary action. There have been outages caused by false alarms, because someone relied on automation entirely.
As you start ramping up on adoption of cloud native services, try to use managed services. This offloads patching and maintenance and planning for DR events – which can be offloaded to the cloud provider. For example Aurora in AWS in place of a self managed SQL database is a good choice.
Lastly, continuously refine SIEM (Security Incidents and Event Management) workflows. This should be a pinned agenda item in all cloud council meetings. All new information should be reviewed to see if it impacts existing SIEM workflows.
Congratulations! You have reached an important milestone. Celebratory lunches/team events are in order. It is important to quantify if the success criteria were met and to document the lessons learned from the process and factor this into the next migration. Share this knowledge with the entire organization and educate and uplift the team.
Too often, leaders set unrealistic goals for milestones. There is very little rationale behind deadlines and these are often met with messy, outage prone, time overrun and costly migrations. Over time, these different phases will start to amalgamate into a familiar workflow with minimal process variation.
Here are some external references that are good reads.
https://cloud.google.com/solutions/migration-to-gcp-getting-started – Detailed overview, the concepts and approach are largely applicable to any cloud provider.
https://cloud.google.com/files/Lift-and-Shift-onto-Google-Cloud.pdf Great read on slightly different phases of migration, but there is a significant number of common tasks that need to be considered for any cloud migration.
In the next part of this series, we will delve into how to create an effective Hybrid Cloud.
PS: Hat-tip Prabhu Barathi @prabhu_b for reviewing my work and providing me valuable feedback.