Start from the basics
Why should you choose object storage? There are several reasons for this, but in most cases, end users adopt object stores for two main reasons:
1. Your application is written to take advantage of object storage. It may seem trivial, but more and more customers plan object storage adoption in advance now. They write applications with the S3 protocol in mind or they adopt solutions that are meant to work with object storage backends (such as backup or highly distributed file services).
2. You have reached the limit of traditional solutions. You have tried everything and the future looks bleak if you don't change the way you operate your infrastructure. Traditional storage, including scale-out file system storage, is fine up to a certain point, but then complexity increases exponentially, and management costs increase as well. This means that that your $/GB increases, making your business less competitive.
If you are in one of these categories you are probably thinking about, or have already adopted, an object store. But it’s not easy to determine which object store is ideal for you.
Understand your workload
Object stores are ideal for certain workloads, and the number and types of use cases where object stores are ideal increases almost daily. At first, they worked with cold and active data archives, but now they handle active backup workloads, big data and HPC, file services, back-end storage for IoT, machine learning, and AI applications. Any type of unstructured data that needs to be stored and accessed in large quantities and high throughput is a good candidate.
If you are considering object storage just for archiving, then any object store will probably work for you. But if you plan to make the object store a key component of your infrastructure, you should consider buying an object store that can consistently provide high performance. It should be optimized for flash memory, and should be capable of managing multiple media tiers in the same cluster.
Be ready for changes
Scale-out is the best architecture design for large capacity storage systems, and object storage isn’t an exception. Expanding your infrastructure by adding additional nodes will provide more capacity and parallelism, hence better performance, in a linear fashion.
There are several factors that are important in scale-out designs. Expansion granularity, scalability, and load balancing, and their implementation, should be the first characteristics to look at for a new scale-out storage system. It is important that the unlimited scalability we expect from our storage platform doesn’t have a minimal size for upgrades, and doesn’t take long for new resources to become available.
In recent years we have witnessed some important changes in the storage industry. We have seen strong adoption of software-defined solutions, but there is also an increasing interest in open source. Both of these allow end users to avoid lock-ins and increase their bargaining power when it comes to negotiating with vendors.
Software-defined solutions separate hardware from software, introducing many financial, technical, and operational benefits. Standard x86 servers are available from many vendors and changing from one to another is very simple. The market is highly competitive, allowing end users to get good prices. Configurations are usually more customizable and can be changed more easily through hardware upgrades.
Open source ensures that users can take advantage of all the innovation that comes from this world. Most of the large organizations in the world have adopted open source solutions, and cloud services, like Amazon AWS, Google Cloud Platform, and Microsoft Azure, have adopted open source operating systems and software to run their infrastructures. It is not just that the software is inspected and developed by a large community, hence is more secure and potentially reliable; the software is usually free and customers only pay for support.
Moving costs from CAPEX (software investment) to OPEX (support only) is a huge advantage, but free software also means that customers can potentially choose another support provider if they want.
$/GB is a key metric. When it is time to buy a new storage infrastructure, you look at TCA and TCO, and you now compare them with cloud storage costs, because if you plan to store huge amounts of data, you’d love to have someone else do it for you if the cost is similar.
The public cloud is quite inexpensive to store data that you’ll never retrieve, but transaction and egress costs could easily undermine your business (here is an example). On-premises object storage must be less expensive than cloud for storing data and, at the same time, its cost should be much more predictable.
If the capacity you need is relatively small, not all object stores are efficient enough to justify themselves. In most cases you need a large-scale cluster (1PB or more) to reach a level of efficiency that makes sense. Fortunately, not all object stores have the same architecture design, and for some solutions (you can guess which one I have in mind) a practical solution could start with very small nodes built out of recycled hardware or small VMs. And this without giving up on advanced data protection or data footprint optimization mechanisms.
Object storage is becoming more and more common thanks to the increasing number of applications and workloads that can take advantage of it, and more end users want to better understand its characteristics and whether it makes sense to adopt this technology in their infrastructure.