Initial research into the concept of Object Storage dates back to 1996 and it was a Belgian company, FilePool (since acquired by EMC), that was behind the very first implementation of object storage for archiving purposes. Between 1999 and 2019, hundreds of millions of dollars of venture capital have been invested in developing solutions based on this technology. All of the storage industry’s major players devoted engineering time to it, followed later by cloud providers who also sought to jump on this promising bandwagon. With Object Storage now becoming more widespread, it is sensible to look again to see which promises have been kept, and which have been forgotten along the way.
Wrestling with a wealth of data storage choices
The storage market is struggling to consolidate. It is fragmented between historical storage specialists, newcomers and startups developing innovative solutions, and cloud providers. While data now has a higher value than ever before, companies face an excess of choice. Public or private clouds, solutions associated with appliances or strong hardware constraints, software defined storage, open source or proprietary, complex business models: honestly, it’s a struggle to navigate the maze of possibilities.
In this context, one interesting way to compare the different solutions is to return to the original promises of Object Storage and see how each solution matches up. The IT world produces innovations in series. There are real technological breakthroughs, and then there are half innovations swamped by marketing promises. What about Object Storage?
Just how infinite is that “infinite scalability”?
Object Storage has been so successful because it seemed to be the universal solution to many of the challenges of data storage. These challenges had historically been given less attention than those of computation, but the exponential growth in data production is changing the game and shifting the focus. Object Storage is ideal for storing unstructured data and static content. It also simplifies data security and improves data availability, offering multi-site architectures by design, efficient data replication, and versioning features that reduce the need for backup. So far, lots of promises are being kept.
But it’s in the flagship promises of infinite scalability where we find our first big disappointment. Being able to increase the capacity of a storage platform in proportion to its needs - without creating a new cluster that "silos" the data when you really need to aggregate it - is a considerable step forward. But those who manage large volumes of data - we are talking dozens of petabytes or even exabytes - are beginning to come across certain limits. When increasing the size of the cluster by adding new machines, the platform usually has to rebalance the data to evenly spread the load across different servers.
This process may be painless on a small and/or sparsely populated platform, but rebalancing operations on larger systems can take several days, weeks, or even whole months. The impact on the cluster's performance and data availability throughout can be significant. Imagine the damage to an organization that responds to an increase in usage by adding more capacity, only to find their reward is actually decreased performance!
For this reason, it is not uncommon for companies to give up scaling one Object Storage cluster, just to build another, even if that means reluctantly creating new data silos which will be difficult to sync and connect. Such a move is heresy just as AI algorithms are finally opening up real opportunities to exploit "Big Data".
Agility without flexibility hits profitability
Agility is a concept implicitly associated with scalability, but it’s worth considering it its own right. Agility refers to the ability to start with a low-capacity storage cluster, which can then be expanded to track the life cycle of a project. We are talking here about an acceleration factor of x 100 or x 1000.
The problem is, not all Object Storage solutions allow you to "start small". As a result, organizations in the market for new storage platforms who want the option to scale in the future are often forced to invest up front in multiple, robust machines and petabytes of storage capacity, even though they may not need them immediately.
Pinpointing the truth about performance
Performance is most crucial when storing databases or virtual machine file systems, but it’s important to consider the responsiveness of a system, no matter what type of data is to be stored. Unfortunately, comparing different types of storage can be difficult, because the performance of a platform is best understood in three dimensions, and the importance of each will vary according to the use case.
There is of course the most obvious dimension: capacity. Next for consideration is the achievable bandwidth, expressed in mega/giga/tera bits per second. Bandwidth is important when the Object Storage system is to serve as a front end for video, or for data that will be explored by machine learning or artificial intelligence algorithms. Finally, a third dimension must be taken into account: latency - the time taken to access the first byte. In the case of e-mail storage, with many small files that need to be displayed to the user almost instantly, latency is an essential criterion.
Contrary to marketing promises, it’s rare for any one solution to be effective at all three dimensions of performance at once. The bad news for those in the market for storage solutions is that by the time you realize the truth of this statement - and the area where your chosen system falls short - it's often already too late!
To demonstrate the performance and scalability of OpenIO technology, we deployed OpenIO on a cluster of 350 servers provided by Criteo Labs last September. A benchmark that allowed us to reach the terabit per second mark in writing. Read more about the #TbpsChallenge
Whatever happened to being hardware agnostic?
Object storage is, by nature, software-defined storage (SDS). One of the fundamental ideas behind its development was to sever the unbreakable bond between hardware and software layers that was such a drawback in NAS and SAN-based storage solutions. Separating them should have a big impact on costs because it allows organizations to use cheaper, more standard (“commodity”) servers. But is that what we actually see happening in the Object Storage market?
In fact, we’re seeing some return to the appliance - specific hardware being sold alongside the control software layer. This switch is largely for the convenience of the solution developers. If the hardware is pre-defined, there’s no need to design your solution to accommodate and manage a range of different servers each with its own technical characteristics, power and capacity. This reduces the complexity of both development and deployment stages.
But the impact of this shift away from the original principles of Object Storage isn’t only felt in the cost of appliances. It also compromises the speed at which organizations can benefit from technological developments in hardware. The average storage platform is designed to have a lifespan of 5 to 7 years. In that time we can expect to see at least 2 or 3 generations of equipment, each making some improvement on its predecessors. To quickly benefit from these advances, you need storage software that can cohabit on different hardware types, rather than having to migrate everything with each evolution of server technology.
Cost reduction (TCO): there’s still a long way to go!
At first glance, any Object Storage solution should lower data storage costs compared to previous technologies. But as data usage trends upwards, organizations can find they’re hit hard by so-called “pay-as-you-go” business models, where cost is a function of both the volume of data stored and the rate of access to data (invoicing based on bandwidth consumption and request volume). At the same time, data migration becomes more difficult every day because of the exponential cost of a possible data repatriation.
In short, the current dominance of the public cloud storage model is attractive for CFOs who enjoy seeing Opex replacing Capex, but less pleasant for CTOs who find themselves trapped by their growing data.
So, has Object Storage failed to live up to expectations?
Let’s not throw the baby out with the bathwater: Object Storage is a real technological breakthrough, despite some disappointments forced by implementation choices, technological limitations and commercial strategies. But it’s important to remember that if you compared all the possible storage options on a Kiviat (spider web) diagram, few Object Storage technologies would satisfactorily cover the entire spectrum of the criteria mentioned above.
Each player in the field has made their own decisions, sometimes irreversibly turning their back on some of the promises of Object Storage in the process. For those, like us, who are trying to remain true to the founding principles, the Object Storage project is not yet complete. There is still room for improvement in several areas if we are to see more widespread adoption of Object Storage. These include ergonomics, ease of deployment and management, and backwards compatibility with coded applications for file systems.
Technological fashions may come and go, but the challenges of data storage are not about to disappear. Neither virtualization nor containerization - the two major trends in IT - will solve everything. More than 20 years after its invention, the concept of Object Storage remains full of promise and possibilities.