OpenIO sponsored and participated in an workshop organized by The HDF Group at the European Synchrotron Research Facility (ESRF), in Grenoble (France). The program started with two days of presentations, followed by two days of a hands-on workshop. This workshop gave us a great opportunity to test new features, make applications functional, tweak performance, etc. From the crowd in the room to the people presenting on stage, I quickly came to realize that the landscape of HDF5 users spans the entire high performance computing industry.
The HDF Group is a non-profit organization that was founded over 35 years ago. They organized this workshop at the ESRF to bring together the community that is built around HDF5, HDF’s flagship API used to manage structured datasets.
HDF5 is everywhere: in the HPC world as a file-based structure for records, in research fields to store the output of sensors, in IIoT for device metrology, in financial applications to persist captures of trade order flows, and in geospatial sciences for similar purposes.
OpenIO’s background has roots in the Cloud and the world of Telcos, so we were somewhat of an alien in this environment, that seems to evolve at a slower rhythm than what we are used to. From the very first presentations, it became clear that the ideal of stability for HDF5, pushed by the need for both a backward and forward compliance (you must always be able to open HDF5 archives, no matter how old, as is the case at the NASA) imposes a slower pace - so all decisions must be fully vetted.
HPC: the most demanding use cases
A majority of the presentations exposed scientific research facilities facing HPC use cases. Two things really stood out for me:
- I don’t think OpenIO should try to compete in the quadrant of hot storage, no matter the bandwidth or the low latency we achieve. This niche requires mutable datasets that can handle heavy concurrent read and write workloads on the same files. The need for a “live visualization” of an experiment is also particularly important: experiments are so expensive to run that you want to stop them ASAP if they don’t progress as expected.
- The data access pattern is very typical: an initial output is burst out of sensors and requires a huge amount of bandwidth. The need for bandwidth is so enormous that the tiers that are usually hot, the parallel FS, are no longer fast enough, and a specific very hot “burst” tier is used. A first orchestration moves the very recent data on a parallel FS to be initially processed and validated (it is then read/written at the same time). Then a second orchestration moves the data on to a cold tier for archiving purposes.
Why such orchestrations? Of course, because the hot tiers are too expensive for all datasets. Because of a need to optimize the TCO of the platform. Because budgets are finite, even in the field of scientific research. You can find many thoughts about this in another blog article.
It unavoidably ends in bipolar platforms; on one side there is the cold storage for immutable archived datasets, and on the other side the hot storage for mutable datasets. In the middle you can find either an orchestrator between independent tiers, or a logic of hierarchy placing the cold tier as secondary below the (primary) hot tier.
But the gap is huge between cold immutable tape and even the warmer shared FS. And you know what? Object Storage was developed for this purpose! In the context of HPC, OpenIO is the warm tier that will make your archives visible online, giving you a maximum parallelism on requests for immutable data.
HDF Kita / OpenIO: a tight integration
The chasm of mutability is crossed thanks to HDF Kita, which exposes a connector for collections of HDF5 files stored in the cloud. The Kita architecture allows for a distributed cache of objects that can be accessed (read / write) and holds on to them as sets of immutable slices, one object per slice. Amazon S3 is the protocol used to plug object stores in Kita, as it has become the de facto standard. A good point for OpenIO as we cover the S3 API.
How it started
Working alongside HDF we first tried a simple juxtaposition of both architectures, configuring an OpenIO S3 load-balancer as the S3 endpoint in Kita. And it actually worked like this! It worked … but not efficiently enough. It just validated the concept. In fact, the many load balancers and gateways were traveling from the client to the data at rest. We had introduced a significant amplification factor, or in other words, too much bandwidth. Also, we were using two distinct platforms, each with its own burden (Kubernetes for Kita and Ansible on bare-metal for OpenIO) so it was obvious that optimization was needed.
First optimization: one platform
As a first step, HDF accepted to deploy Kita services on the same host as OpenIO. This had the benefit of collocating the Kita services responsible for contacting an S3 endpoint with OpenIO S3 gateways. Both act stateless so we could remove a layer of load-balancer. At this point we already considered the integration tight, with no “OpenIO ingress”.
A benchmark campaign showed that we were still too bandwidth-greedy, and the ratio between the HDF5 bandwidth at the ingress and the total bandwidth usage observed was way too low. The culprit was obvious: we still had a load-balancer proxy at the ingress of the platform!
Second optimization: redirection-based load-balancing
This is definitely a job for the built-in load balancing mechanism of OpenIO's ConsciousGrid™ technology. We registered the Kita Service Nodes in the ConsciousGrid, then we applied our redirection-based load-balancing, and the magic happened! We had turned precious GB/s into a latency so small that it went unnoticed.
So, this is the current state of the OpenIO-Kita integration. A pragmatic approach led us to a tight and efficient integration.
There are still more avenues for improvements:
- We could consider using an SDK instead of the S3 gateway
- We will continue working on optimizing both OpenIO and Kita independent of each other
Closing the Circle
Our mission has always been clear to us, and the HDF workshop at the ESRF helped us to validate it against new use cases. We know the TCO problem is a general one, and that we have an answer to help solve it.