Storage Services vs. Data Services
I started working in the storage industry a long time ago, and I was fortunate to witness the evolution of storage systems from the first RAID systems to the present, with all the software-defined, cloud-integrated, and all-flash stuff that has become common. This is an amazing evolution, which has seen storage becoming smarter and more feature-rich.
One of the great innovations introduced in the late 1990s and early 2000s was data services. The ability to take snapshots, make virtual copies of data, make clones and, consequently, advanced replications, created a lot of opportunities for simplifying the work of sysadmins and improving storage management.
But that was it; calling them data services was wrong. They were simply storage services. Data services were not possible in the past, for many reasons. But now, things are changing pretty quickly.
If you’re in the market for a new storage array, there is a long list of features you take for granted, right? And storage services are one of them. Unlimited snapshots, data replication, and even storage analytics are common now. But these services are not meant to work with data, they are designed to operate on its containers. All the intelligence and smartness are left to the upper layers of the systems.
Most of these services were born when block was the primary access method; then files added something to it, but, even if the container had become the file, these services never touched the content.
First attempt at data services
In recent years, a growing number of storage startups began looking more at the content than the container. The first thing they examined was analytics.
Analyzing data while it is stored or accessed is not a bad idea, and this can give you a lot of insights, not only on workloads. It can be very useful in many other use cases, including auditing, security, compliance, and more. For example, scanning files while they are saved can give you a lot of information about their content, make them easier to search, but also to see in real time if you are storing something that is forbidden or should be encrypted (i.e. credit card numbers).
This is a step further but, again, it’s still not enough if you are piling up huge amounts of data and want to automate the largest possible number of processes.
It’s time for real data services
Large cloud providers and some innovative storage startups – including OpenIO, the company I work for – are now talking about serverless computing, and there are a couple of reasons for this. The first is that practically all these data services are based on object storage, because of the rich metadata capabilities, and the second is that most of them are triggered by events, which are much easier to generate and intercept at an object level.
The mechanism is quite simple. Any operation you do on an object generates an event, and events are passed to applications installed on the storage system. The application that receives the event also receives information on the object itself and can perform operations on it. This approach is simple and offloads a lot of activity directly to the storage system, which becomes the ultimate hyperconverged system, without the complexity of any orchestration tool, hypervisor, container, operating system, and so on. There’s just applications and data.
We are currently working with one of our customers to simplify the management of a large video archive. By adopting OpenIO SDS (our object storage platform) and Grid for Apps (our serverless framework), we created a smart video repository that saved the customer a huge amount of work, reduced their infrastructure, automated procedures, and, eventually, simplified access to these videos.
The process is straightforward. The customer, because of processes already in place, uses OIO-FS (our file system gateway) to interact with the object store, and writes files directly to an NFS shared volume.
When a file is ingested and saved to the object store, an event is created and an application starts with the scope of performing a couple of operations:
- Several additional files are generated in different sizes, and the customer’s logo is superimposed in the bottom-right of each frame;
- For each new video, metadata is updated with all the necessary information coming from the additional data.
This process is asynchronous and doesn’t impact system performance; only unused resources are leveraged for encoding the new videos. At the same time these new files become accessible through standard interfaces like S3, Swift, and direct HTTP gets, allowing customers to use the object store to serve videos directly to the internet and to any device without any additional components.
By combining object storage and the right serverless framework, the number of applications you can run directly on the storage system, while offloading the rest of the infrastructure, is practically endless. Some of these applications are more interesting than others at the moment (such as data analytics, index and search, video, etc.), but the future holds much more than that. Deep learning, AI, and Industrial IoT are just around the corner, and with OpenIO technology they will be available to anyone at affordable prices.
As before, for all other types of storage systems, some features are becoming common. Scale out, S3 compatibility, erasure coding, geo replication, software-defined storage on commodity hardware, and more; would you buy a storage platform today without these characteristics? OpenIO has all of them. In fact, that’s not even our differentiator. Grid for Apps, the unmatched flexibility we achieve thanks to our architecture design, and Conscience technology are what makes our solution ready for the challenges of today and tomorrow.