Object stores present a very different set of characteristics when compared to more traditional solutions like file and block storage, and the concepts behind their design are not very common.
Most end users don't choose object storage because they are designing their applications to take advantage of it, but because the storage solutions they adopted years ago are becoming unsustainable in terms of cost and manageability. For this reason, it is unlikely that object stores are implemented in greenfield scenarios.
To begin with, an object store has to adapt to requirements that arise more from a need to be compatible with legacy applications or existing infrastructures than anything else. This is particularly true in enterprise environments, but, over time, it will store a huge volume of data interacting with an increasing number of applications. With this in mind, no matter if you are an ISP or end user, it is clear that choosing the right object storage platform today is a fundamental step toward building a modern IT infrastructure that has the potential to stay competitive over time while evolving quickly to adapt to changing business needs.
In this article I'd like to highlight 5 key aspects that you should consider when looking for an object store (plus a bonus tip for preparing for the future evolution of the infrastructure). While most object stores seem to offer similar characteristics, you can only discover the true TCO and sustainability of the solution after looking at the details.
1. True Scalability
Practically all object storage platforms can scale to petabytes and beyond. So scalability shouldn't be a problem, right?
In theory. But in practice, things are not that simple. Traditional object stores present a lot of architectural constraints, and, even if they can theoretically scale to large capacities, the process required to expand the cluster could be very painful. Data rebalancing hurts performance or takes a long time before new resources are available to users. It is also common that new, more powerful hardware has no positive impact because data is distributed evenly on all nodes without considering their characteristics.
2. Load Balancing and Performance Consistency
Many traditional object stores still use static load-balancing mechanisms and CHORD-based algorithms (which is also why the cluster topology is presented as a ring). Combining these two characteristics leads too poor and inconsistent performance over time, limiting the number of potential use cases.
The number of hops required to reach data stored in the system can increase with the number of nodes, making the infrastructure less responsive in large-capacity scenarios, while the absence of a dynamic load balancing mechanism (capable of always writing and reading data from the nearest and most available location) diminishes the impact of new, more powerful hardware that is added to the cluster.
This latter aspect is fundamental to keep the cluster efficient and automatically tuned with nodes comprised of several hardware generations. For example, three or four years from now, it is likely that the same hardware used for the first implementation will no longer be available for purchasing. And, if it is, new hardware will be cheaper, more efficient, and offer a better $/GB than what is currently available… what would you prefer to buy? And would you like to take full advantage of it?
3. $/GB is Tactical, Advanced Features are Strategic
$/GB (both TCA and TCO) is the first parameter you look at when evaluating this type of storage, and rightly so. The pain point that object storage should solve is the cost of storing more data at a fraction of the cost, especially at scale. But as long as you are building an infrastructure that, for its characteristics and size, will last for many years, why not look beyond $/GB?
Some object stores can provide additional capabilities, such as, for example, advanced indexing and search features, while others can also take advantage of serverless computing frameworks to offload complex and repetitive operations directly to the storage infrastructure. The latter could be done by installing the two platforms on the same cluster nodes (to take advantage of unused resources) or on public cloud VMs/containers powered up only when needed.
4. Open Source
Open source helps avoid lock-ins and eases the adoption of new technology; this is true not just for object storage, but in the entire IT industry. Everybody wants to adopt disruptive new technology, but most of it comes from small startups or is developed as side projects from large vendors.
Open source helps organizations of all sizes adopt new technology without worrying too much about the long-term future of the project behind it (which could disappear or be acquired). Essentially, there is always an escape route which comes from forking the project or finding someone else who can support it.
At the same time, because the code is open source, the application license is free and the customer pays a support subscription. This is a huge advantage from a financial point of view because it moves infrastructure costs from CAPEX to OPEX, introducing a much more predictable pay-as-you-go model where you pay only for what you really need at a precise moment.
An efficient object store is good for many non-latency sensitive applications, but it is best seen it as an infrastructure backend with applications and clients accessing it through APIs, such as S3, or file protocols like NFS and SMB.
The ecosystem of solutions around the object store is fundamental to make its resources available to as many applications and workloads as possible. Native connectors, file gateways, application integrations, and so on are fundamental to ensure that the investment in the object store will be beneficial for a large part of the infrastructure, speeding up ROI.
Bonus Tip: All-Flash Ready
Even though hard disk drives win out when looking at $/GB figures for capacity-driven workloads, flash storage is quickly becoming an option for a large number of use cases. If you consider overall capacity density, power consumption, and speed, there are uses cases for which the TCO of an all-flash object store can already defeat HDDs.
Unfortunately, many object stores were developed before the advent of affordable flash storage, and they are not designed to take advantage of its characteristics, making all-flash configurations inefficient and practically unfeasible.
In the near future, with the introduction of QLC 3D NAND at large scale, the price per GB of flash will decrease even further, and, if your object store isn't ready, the entire infrastructure won't be able to take advantage of the increased performance and responsiveness of the new media, limiting the number of applications and use cases it can handle.
Object storage projects are all long term. The biggest mistake someone could make is to look only at the base characteristics of products without considering how to manage growth, performance, and additional workloads over time.
OpenIO is designed to be an evolutionary platform, meaning that everything has been architected to be easy to use and to manage at any scale, with hardware of different types, and providing the best flexibility to customers, allowing them to change cluster characteristics on the fly and respond swiftly to changing business needs.