Beyond Data Life Cycle Management in Object Storage
Object storage is already considered one of the cheapest forms of data storage on the market. With the increasing number of use cases for object storage, efficiency and flexibility of data placement become fundamental to ensuing that data access speed matches your needs regarding cost and workloads.
OpenIO has been working on the flexibility and efficiency of SDS since day one, and we think that customers should have no constraints when it comes to data placement. This is why we developed features that allow end users to chose the best protection mechanism, media, or cloud provider to store data. Also, thanks to user-defined policies, data can quickly be moved and accessed transparently no matter where it is placed.
It’s more than just “tiering”
I recently wrote a document calling this set of features “dynamic tiering”, and at other times I’ve refered to it as “hybrid tiering,” but neither of these terms adequately describes what OpenIO SDS can do.
First of all, there is the concept of pools. If you are already familiar with some of the design principles behind OpenIO SDS, you already know that we can use heterogeneous nodes in the cluster. This means that a node could be all disk, all flash, or a mix of different drive types. To manage load balancing and performance, we compute a quality score for each node in real time and chose the best resource of the cluster for each operation. This helps us add flexibility when it’s time for an expansion, but also makes it possible to manage several different workloads at the same time.
Storage resources can be organized in separate pools across the cluster. This approach has several benefits, including better multitenancy, but in this particular case it can be used to specify where data is to be written. You can set a policy for almost everything: the size of an object, specific metadata fields, type of data protection applied, account, bucket/container, and so on. Again, flexibility is our mantra!
For example, you can set a policy to write files smaller than 32KB on flash, or another that checks a particular metadata tag to understand what is the best location, or you can configure a particular account or container to store objects only with a particular Erasure Coding scheme. Options are limited only by your imagination… and cluster resources, but the latter is not a software limitation. 🙂
Additional policies can be set to move data at a later time. Again, policies can include every object characteristic (including access time, size, metadata, and so on). Once the data policy is in place, internal processes, called data movers, enforce them by just doing what their name suggests. For example, one of our customer uses OpenIO SDS for email storage and the it is configured to store all new emails on SSDs (5% of overall capacity), after a week the system checks the size of the emails and compresses larger ones before moving all of them to a SATA-only tier. This automated operation saves a lot of capacity and improves overall efficiency.
But there’s more
Tiering across different storage pools in the same cluster is easy; what about doing it across different clusters, such as on the public cloud, or even on tapes?
With OpenIO SDS you can do that. A specific feature (which, again, for lack of imagination I call hybrid tiering) allows data to be moved a secondary location while maintaining metadata locally. There are several use cases for such a feature. You can decide to build two different clusters, one for performance workloads and a second configured for capacity; in other cases you can decide to reuse a slower cluster from the competition that you already have (because usually they are slower). But there are other options, such as Backblaze B2, Amazon AWS S3 public clouds, or even tapes!
Not all of them have better $/GB, but sometimes you need more capacity quickly to manage seasonal spikes. You just need flexibility about where you can place your data. This is always a Good Thing.
Why stop at tiering? We can do much more than that. We are releasing a new software version right now but, at the same time, we are working on version 17.10 (OpenIO SDS has a 6-month release cycle).
I don’t want to give too many spoilers, but in the fall you can expect to get much more than just tiering from us. Most enterprise customers want options, and we want to make OpenIO SDS a data platform that can stretch across multiple clouds; or if it is easier to understand, that can extend your on-premises installation to the public cloud.Public object stores will become storage pools too. What does this really mean? Quite a lot, actually!
Again, there are many use cases for this. First of all, for smaller customers, with only a single site, it will be possible to deploy a copy of their data on a public object store (not just tiering; an exact copy of what is stored locally). Then there are customers who want to keep control over the full stack; they’ll be able to get the same identical software running on the cloud, which is great when you want to start quickly and are still not sure about the future of the application. But, above all, you’ll be able to extend the cluster between your on-premises installation and the cloud with applications that will be transparently enabled to access data dynamically, always using the shortest or the most convenient path thanks to Conscience technology.
At the end of the day, with OpenIO SDS you’ll have a complete embedded data lifecycle management platform integrated in the object store, without needing to buy additional products.
Data life cycle management for object storage is important. If the object store can deliver more performance, but you want to maintain control over cost, policy-driven tiering and dynamic data placement are the keys. Even more so, if you can choose between on-premises storage and the public cloud (or both), you will have even more options and opportunities to save money, as well as provide better service.
Once again, OpenIO SDS gives you options, hence flexibility, because its core is designed from the ground up to work this way. Its lightweight architecture, based on a distributed directory structure instead of the traditional hash table, along with Conscience technology, make the difference in terms of flexibility, while Grid for Apps, our event-driven serverless compute framework, takes efficiency to the next level.