Navigation ↓
  |  Guillaume Delaporte

A technical introduction to Grid for Apps

Understanding Event-Driven Storage

This is the first in of a series of articles about the event-driven framework that is part of our SDS object storage solution. This framework allows users to process data at scale; we call it Grid for Apps.

I recommend that you first read this article to understand what we describe below: Run Applications Directly on the Storage Infrastructure.

There are many use cases for Grid for Apps. Today, this technology is used for video transcoding, metadata enrichment, image recognition and manipulation, pattern recognition in images and data files, real-time video transcoding and watermarking, and more. But, if you think of the future and the quantity of data we expect to produce, the number of use cases is even bigger, with applications in fields like industrial IoT, artificial intelligence, big data; the only limit is your imagination.

Let’s give it a try

Let’s start with a very simple use case: adding a new metadata field to an object right after its upload. We will tackle more complex use cases in the coming weeks.

To deploy an OpenIO SDS cluster, we will use the Docker container that we provide as a quick and easy way to use the software. But you can use the same steps to implement OpenIO Grid for Apps and use it on a very large platform with hundreds of nodes and billions of objects.

Retrieve the OpenIO SDS Docker container:

# docker pull openio/sds

Start your new OpenIO SDS environment:

# docker run -ti --tty openio/sds

You should now be at the prompt with an OpenIO SDS instance up and running.

Next, we will configure the trigger, so that every time you add a new object, the data is processed and a new metadata field is added.

Add the following content to the file /etc/oio/sds/OPENIO/oio-event-agent-0/oio-event-handlers.conf:

[handler:storage.content.new]
pipeline = process

[handler:storage.content.deleted]
pipeline = content_cleaner

[handler:storage.container.new]
pipeline = account_update

[handler:storage.container.deleted]
pipeline = account_update

[handler:storage.container.state]
pipeline = account_update

[handler:storage.chunk.new]
pipeline = volume_index

[handler:storage.chunk.deleted]
pipeline = volume_index

[filter:content_cleaner]
use = egg:oio#content_cleaner

[filter:account_update]
use = egg:oio#account_update

[filter:volume_index]
use = egg:oio#volume_index

[filter:process]
use = egg:oio#notify
tube = oio-process
queue_url = beanstalk://127.0.0.1:6014

As you can see in the configuration file, there are many events that can be triggered (such as storage.container.new, storage.content.deleted, etc.), but for this tutorial we will just focus on the storage.content.new event.

According to the configuration file, each time we put new content in the object store ([handler:storage.content.new”]), we will use the pipeline “process” (pipeline = process).

The pipeline “process” will then take the event and put it in the tube oio-process in the local beanstalk instance, as described at the end of the configuration file:

[filter:process]
use = egg:oio#notify
tube = oio-process
queue_url = beanstalk://127.0.0.1:6014

Then, restart the openio event agent to enable the modification:

# gridinit_cmd restart @oio-event-agent

Your event-driven system is now up and running. The next step is to write a small script that will take the events stored in the beanstalk to process the object.

Let’s create a script called add-metadata.py with the following content:

#!/usr/bin/env python
import json
from oio.api import object_storage
from oio.event.beanstalk import Beanstalk, ResponseError

# Initiate a connection to beanstalk to fetch the events from the tube oio-process
b = Beanstalk.from_url("beanstalk://127.0.0.1:6014")
b.watch("oio-process")

# Waiting for events
while True:
        try:
            # Reserve the event when it appears
            event_id, data = b.reserve()
        except ResponseError:
            # Or continue waiting for the next one
            continue
        # Retrieve the information from the event (namespace, bucket, object name ...)
        meta = json.loads(data)
        url = meta["url"]
        # Initiate a connection with the OpenIO cluster
        s = object_storage.ObjectStorageAPI(url["ns"], "127.0.0.1:6006")
        # Add the metadata to the object
        s.object_update(url["account"], url["user"], url["path"], {"uploaded" : "true"})
        # Delete the event
        b.delete(event_id)

Finally, launch it in background:

# python add-metadata.py &

Please note that the script is written in Python, but you can write it any other language.

How does it work?

It’s time to add a new object to see if it works. Using the OpenIO CLI, let’s upload the new object /etc/fstab to the container mycontainer in the account myaccount:

# openio --oio-ns OPENIO --oio-account myaccount object create mycontainer /etc/fstab

And check that the new metadata was properly set:

# openio --oio-ns OPENIO --oio-account myaccount object show mycontainer fstab

With the following result:

+---------------+----------------------------------+
| Field         | Value                            |
+---------------+----------------------------------+
| account       | myaccount                        |
| container     | mycontainer                      |
| ctime         | 1493721260                       |
| hash          | FB2B5EC6E6BC56CF7D02BE2B3D4AA5BA |
| id            | 64A81915884E0500529252884202F1CA |
| meta.uploaded | true                             |
| mime-type     | application/octet-stream         |
| object        | fstab                            |
| policy        | SINGLE                           |
| size          | 313                              |
| version       | 1493721260075114                 |
+---------------+----------------------------------+

You can see that the metadata was added to the object meta.uploaded | true

Join us on May 25

As I mentioned above, this if the first of a series of articles that will demonstrate our Grid for Apps technology with some interesting use cases (image recognition and manipulation, pattern recognition, content indexation, and more).

We are also planning a webinar for May 24, and we’ll give you a glimpse of what you can expect from Grid for Apps in the near future. This will be the chance for you to ask all your questions about how this technology works and how you can implement it in your environment.

Here’s the link to register for the webinar.

Want to know more about OpenIO SDS?

OpenIO SDS is available for testing in four different flavors: Linux packages, the Docker image, a simple ready-to-go virtualized 3-node cluster and Raspberry Pi.

Stay in touch with us and our community through Twitter, our Slack community channel, GitHub, blog RSS feed and our web forum, to receive the latest info, support, and to chat with other users.

What are you waiting for?! Sign up and join us on May 24 for “Run Applications Directly on the Storage Infrastructure” webinar!