Object Storage for HPC
Universities, scientific research institutions and anyone running HPC calculations, need tools that can handle the most demanding use cases. High Performance Computing requires different types of storage to function and high-perf storage solutions are a key element.
In most situations, when there is raw data to capture from a field experiment, a ‘burst’ or ‘scratch’ buffer is used to face the extreme bandwidths that can be met.
The data is then gathered for an initial computation stage and dumped on a parallel file system (like the Lustre® file system). That technology is often the only one capable of rich semantics (POSIX) and high concurrency, as well as high throughput.
For the sake of security, once the data has been processed it is sent to an archive (which is usually tape) using an orchestrator such as iRODS. Once the data has been archived, it is immutable and secure. But this means that it is very difficult to retrieve.
The typical latency of any tape library or cold array makes it impractical for further exploitation, which poses a real problem for scientists who want to run additional analysis on archived data. To do so would require restoring the dataset on the parallel file system, which would also require higher capacities.
Object storage as warm tier for HPC
By placing a high performance object store like OpenIO in between the very hot and the cold storage tiers, researchers get the best of both worlds.
Their data becomes easily accessible for additional computations, and it no longer incurs the high costs of hot tier storage.
The ICM (Paris Brain Institute) uses a hot storage system based on Lustre, for their MRI (Magnetic Resonance Imaging). This solution is very fast, but highly inefficient due to the cost per GB.
Before working with OpenIO, the ICM sent the data sets directly to tape archives, after doing the initial calculations on the parallel file system. But this was less than ideal as researchers could not quickly or easily retrieve that data when they needed it for further analysis.
The ICM complemented its on-prem storage with an OpenIO warm storage pool, to optimize the analysis of medical images for research purposes.
In addition to using OpenIO Object Storage as a warm tier for HPC, the high performance of OpenIO makes it ideal for machine learning and AI algorithms.
Object storage is the standard today, because of its linear performance and its low cost per gigabyte. Once we needed a warm storage brick, we chose OpenIO because it’s an open source solution that is also highly scalable.
The gap is huge between cold immutable tape and even the warmer shared FS. In the context of HPC, OpenIO is the warm tier that will make your archives visible online, giving you a maximum parallelism on requests for immutable data.