Make sure to come back every Wednesday for more tech shorts, how-tos, and deep dives into engineering tools and processes.
Digitalization and the need to adapt rapidly to changing market demand have caused a rise in the requirements and expectations that are placed on businesses. Many companies find it challenging to accommodate and adapt to these trends by using existing infrastructure and processes.
At the same time, IT departments find themselves under scrutiny and pressure to improve product performance, improve cost-effectiveness, and meet user demands, making it difficult to justify additional investments to extend and modernize systems and tools.
A hybrid cloud strategy provides a pragmatic solution.
- By using the public cloud, you can extend the capacity and capabilities of your organization without any upfront investment.
- By adding more than one cloud (multi/hybrid) to your existing infrastructure, you preserve your existing investment, and increase agility, resilience, security, flexibility & scalability.
A hybrid cloud strategy gives you the flexibility to modernize applications and processes incrementally as your resources allow. But what has happened in the past is that organizations move to use multiple clouds without implementing the necessary management, governance, and automation measures that will lead to cost savings. Another reason for overspending in the cloud is the misconception that you only pay for what you use, but what that actually means is you pay for what you provision. If an organization over-provisions resources with more capacity than they need, or fails to de-provision once it finished using resources, it will continue to pay for what has been provisioned regardless of the resources being used or not.
There are many scenarios where organizations can overspend in the cloud - we strongly recommend having a senior DevOps, or managed services provider closely monitoring your cloud environments & places where efficiency can be reinforced.
We will go over the 8 most common scenarios and best practices organizations can adopt to reduce spend in the cloud, regardless if its AWS, Azure, or Google cloud.
1. Delete unattached disk storage
When a Virtual Machine (VM) is launched, disk storage is usually attached to act as the local block storage for the application. When you terminate a VM, the disk storage isn’t deleted by default, this is a safety precaution to stop data loss. However, because it is not deleted, it remains active and continues to incur a full-price charge despite the fact that it is no longer used - and this is for both Google Cloud and Azure.
2. Delete aged snapshots and images
Many teams use snapshots and images to create a point-in-time recovery point in case of data loss or disaster. The storage of these points on their own isn’t costly but they can quickly get out of control if the number isn’t monitored. The area where it’s easy to lose track and forget is that users can configure settings to automatically schedule snapshots and images on an hourly or daily basis without also scheduling the deletion of older snapshots.
3. Terminate Zombie Assets
Zombie assets are infrastructure components that are running in your cloud environment but are not being used for any purpose. They can come in many shapes and sizes like:
- Storage volumes
- Aged snapshots
- Compute infrastructure
- Disassociated IPs
- No longer in use VMs & never turned off
- Zombie VMs - failed VMs during launch or deprovision
- Idle Load Balancers
- Idle SQL Databases
As an example, you want to save lots of time for engineers so you create a daily process by loading an anonymized production database into a cloud database for testing and verification in a safe environment. However, while you're helping engineering velocity, the customer never made a plan for cleanup (oh no). So now, each day a new database VM was created, with attached resources, and then abandoned, resulting in a large number of zombie resources.
Regardless of the type of asset and why it was created, you will be charged as long as they are in a running state. They must be isolated, evaluated, and immediately terminated if they no longer serve a purpose.
4. Stay up to date on VM generations
Every so often cloud providers release the next generation of VMs or new versions of the existing generations with improved prices for performance or additional functionality, and they usually come with performance improvements that may enable you to run fewer VMs, and reduce costs.
Microsoft retired Azure Service Manager (ASM) and completely replaced it with Azure Resource Manager (ARM). Any Azure customers who are still using the classic assets (ASM) should mitigate to the ARM to avoid any potential business impacts.
It’s important to note that you can’t change a generation of a VM after it’s created. If you need to switch or change VM generations you need to uninstall and create a new VM in the new generation. However, with Azure SQL databases and SQL-managed instances, you can select the hardware generation at the time of creation or change the generation later on.
5. Rightsize infrastructure
Rightsizing is an optimization initiative that has a direct impact on performance and costs. It’s common for engineers to create new VMs that are substantially larger than necessary to either give them some extra headroom, or because they don’t know the performance requirements of the new VM, but without rightsizing resources, costs will begin to increase.
To rightsize you can :
This is recommended for underutilized resources that achieve the same core performance, even with a downsized workload.
This is recommended for zombie resources, which are assets that are running in your account but are not in use.
Upgrade if your workloads are consistently under high utilization.
It’s important to consider CPU, memory, disk, and network in/out utilization, and to review trended metrics over time. Always use data to guide your decisions around reducing the size of the VM without hurting the performance of the app.
Take for example- memory utilization, network utilization, and/or disk use is above 50% of the provisioned capacity, downsizing a VM to half its current capacity will likely affect workload performance. In this kind of situation, change the VM family from General Purpose to Compute Intensive or Memory Intense, or deploy the workload in a VM Scale Set, which not only helps reduce spending but also increases application resiliency.
Disk storage can be rightsized, factor in capacity, IOPs, and throughput to select the disk size from the standard SSD, HDD, and Premium SSD or Ultra disks.
- Standard SSD
Cost-effective option optimized for workloads that need consistent performance at lower IOPs. Good for web servers, lightweight apps, and Dev/test workloads
- Standard HDD
Deliver reliable, low-cost disk support for VMs running latency-insensitive workloads. Suitable for backup, non-critical, infrequently accessed workloads.
- Premium SSD
Deliver high-performance and low-latency disk support for VMs with IO-intensive workloads. Suitable for production and performance-sensitive workloads.
- Ultra Disks
Deliver high throughput, high IOPs, and consistent low latency disk storage. Suitable for data-intensive workloads such as SAP HANA, top-tier databases, and transaction-heavy workloads.
6. Buy reservations
This is an extremely cost-effective technique that can be applied to more than 15 different Azure, Google, and AWS cloud services, including select VMs, storage, and database services.
You can view these here:
With reserved VM instances, you can make 1 to 3 years commitments to a predetermined VM utilization. In return, you get a discount on compute costs compared to pay-as-you-go pricing. Another advantage is that you don't have to pay upfront for the period of time you committed to, with an option to pay monthly and if your business situation changes and you no longer need the reservation, there are options to refund outstanding prepayments.
As a rule of thumb, the size of the reservation should be based on the total amount of compute used by the existing or soon-to-be-deployed database within a specific region and using the same performance tier and hardware generation.
7. Stop and start VMs on a schedule
Providers will bill for a VM as long as it is running, once it’s in a stopped state, there is no charge associated with that VM. As an extreme example but a good one by Vmware, if your VMs are running 24/7, your cloud provider will bill you between 672 to 744 +/- hours per VM. However, if you schedule your VM to shut off between 5pm and 9am on weekdays, and on weekends and holidays, you would save around 488-592 VMs per month. Now, this is an extreme breakdown, and definitely not realistic especially with our flexible work schedules, we can’t just power down VMs outside normal working hours, but outside of production, you’ll likely notice many VMs that don’t need to run 24/7/365. The most cost-efficient environments dynamically stop and start VMs based on a set schedule. Each cluster of VMs can be treated in a different way.
8. Move object data to lower-cost tiers
Cloud providers offer several tiers of storage at different price points and performance levels. The best cost management practice is to move data between tiers of storage depending on its usage. You can also adjust two things when it comes to storage - redundancy ( how many copies are stored across how many locations) and access tier ( how often the data is accessed). You should be able to mix and match both of these options to create the right mix solution for your business.
- As an example:
- Cold locally redundant storage (LRS) is ideal for longer-term storage, backups, recovery
- Cold geographically redundant storage (GRS) is ideal for archival
It’s important to remember that these best practices are not meant to be one-time activities, but ongoing processes. Because of the dynamic and ever-changing nature of the cloud, cost optimization activities should ideally take place continuously.
Is cloud security on your to-do list? Check out this checklist to help you get started.