Navigation ↓
  |  Guillaume Delaporte

Detect patterns in pictures at scale using Tensorflow and OpenIO Grid for Apps

Last week, we held a webinar where we explained how the event-driven processing system integrated into our object store works. The goal of this presentation was first to introduce the technology, explaining how it works under the hood, but also to perform a live demo using one of the most successful open source machine learning software libraries: Tensorflow. If you missed the webinar, you can watch a video of the event: webinar video.

If you are familiar with OpenIO and this blog, you may already be aware of the many tasks you can perform with our event-driven storage system, like in the article where I explain How to recognize faces in pictures within our object storage solution.

In the webinar, we built a more complex workflow. Any time a picture is uploaded using S3/Swift/OpenIO APIs we process it using Tensorflow, detecting its content while enriching the object’s metadata, and then pushing it to Elasticsearch (with the goal of being able to perform searches across all the objects at any time).
Regarding the metadata indexation using Elasticsearch, I strongly recommend you to read the article Simple Metadata Indexing through Grid for Apps, where we detail how it works.
Because a snippet of code is often worth a thousand words, here is the workflow that we will implement:

Let’s do it!

Like in our previous article, we will use our Docker container image to easily spawn an OpenIO SDS environment. We will also use the Docker image of Elasticsearch to deploy it.
Retrieve the latest Elasticsearch Docker image (5.4.0 to date):

# docker pull docker.elastic.co/elasticsearch/elasticsearch:5.4.0

And start an Elasticsearch instance:

# docker run -d -p 9200:9200 -e "http.host=0.0.0.0" -e "transport.host=127.0.0.1" docker.elastic.co/elasticsearch/elasticsearch:5.4.0

Retrieve the OpenIO SDS Docker image:

# docker pull openio/sds

Start your new OpenIO SDS environment:

# docker run -ti --tty openio/sds

You should now be at the prompt with an OpenIO SDS instance up and running.

Next, we will configure the trigger, so that each time you add a new object, metadata from the object will be pushed to Elasticsearch. Add the following content to the file /etc/oio/sds/OPENIO/oio-event-agent-0/oio-event-handlers.conf:

[handler:storage.content.new]
pipeline = process

[handler:storage.content.deleted]
pipeline = content_cleaner

[handler:storage.container.new]
pipeline = account_update

[handler:storage.container.deleted]
pipeline = account_update

[handler:storage.container.state]
pipeline = account_update

[handler:storage.chunk.new]
pipeline = volume_index

[handler:storage.chunk.deleted]
pipeline = volume_index

[filter:content_cleaner]
use = egg:oio#content_cleaner

[filter:account_update]
use = egg:oio#account_update

[filter:volume_index]
use = egg:oio#volume_index

[filter:process]
use = egg:oio#notify
tube = oio-process
queue_url = beanstalk://127.0.0.1:6014

(If you want to learn more about this configuration file, please refer to our previous article.)

Next, restart the OpenIO event agent to enable the modification:

# gridinit_cmd restart @oio-event-agent

Your event-driven system is now up and running.

Before writing the script, we will first need to install some modules:

# yum install python-elasticsearch.noarch numpy python-pillow-qt python-pip

Finally, install Tensorflow:

 # pip install tensorflow

And we can now write the script; let’s call it detect-pattern.py.

As the script is pretty long, it was uploaded to github. Download it using curl:

 # curl https://raw.githubusercontent.com/open-io/oio-sds-utils/master/g4a-tensorflow.py -o detect-pattern.py

You will have to modify the IP address of the Elasticsearch instance in the script. In my case, the IP address of my machine was 192.168.99.1. Change the following line:

es = Elasticsearch(['http://elastic:changeme@192.168.99.1:9200'])

Change it according to your environment.

Finally, launch it in background:

 # python detect-pattern.py &

Please note that the script is written in Python, but you can write it any other language.

How does it work?

It’s time to add a new picture to see if it works. First of all, let’s download this picture:

# curl http://www.openio.io/wp-content/uploads/2017/06/Volcano-OpenIO.jpg -o /tmp/Volcano-OpenIO.jpg

Using the OpenIO CLI, let’s upload it to the container mycontainer in the account myaccount.

# openio --oio-ns OPENIO --oio-account myaccount object create mycontainer /tmp/Volcano-OpenIO.jpg

Well done! You just uploaded it, while, in the background, its metadata was enriched with its category, and indexed in Elasticsearch. Let’s check the metadata belonging to this new object:

# openio --oio-ns OPENIO --oio-account myaccount object show mycontainer Volcano-OpenIO.jpg

With the following return:

+-----------------------------+----------------------------------+
| Field                       | Value                            |
+-----------------------------+----------------------------------+
| account                     | myaccount                        |
| container                   | mycontainer                      |
| ctime                       | 1496767740                       |
| hash                        | F8849240637AA145C2C8241D8D102262 |
| id                          | 2AF172654D510500DACF2D101400EA3B |
| meta.autocategory           | volcano                          |
| meta.autocategoryconfidence | 0.909392                         |
| mime-type                   | application/octet-stream         |
| object                      | Volcano-OpenIO.jpg               |
| policy                      | SINGLE                           |
| size                        | 6222                             |
| version                     | 1496767739916618                 |
+-----------------------------+----------------------------------+

To conclude, let’s ask to Elasticsearch to find all the objects that match the property configfile:

# curl -XPOST 'http://elastic:changeme@192.168.1.232:9200/myaccount/mycontainer/_search?pretty' -d '{"query":{"multi_match":{"query":"volcano","fields":["properties.autocategory"]}}}'

Our newly uploaded file will be detected by Elasticsearch as matching the request “query”: “volcano”.

Want to know more?

OpenIO SDS is available for testing in four different flavors: Linux packages, the Docker image, a simple ready-to-go virtualized 3-node cluster and Raspberry Pi.

Stay in touch with us and our community through Twitter, our Slack community channel, GitHub, blog RSS feed and our web forum, to receive the latest info, support, and to chat with other users.

Leave a comment

All fields are required. Your email address will not be published.