The Performance of Storage Systems: 3 Criteria to Take into AccountLire la version française
There are a large number of storage solutions that claim to offer the best performance in the market. But it’s easier to exaggerate one’s own performance than to objectively compare the performance of different storage systems. And for good reason: performance, when it relates to data storage, is understood in at least three dimensions, each one depending on the use case. Without forgetting that performance must be linked to the Total Cost of Ownership (TCO) for different solutions, which is difficult to evaluate – all the more so when we add in the human cost of migration when substituting one storage system with another.
Not many organisations can afford to store their data without limitations, with the exception of some leading industries or research centers. And what’s more: the highest price does not always ensure the best performance. Limitations appear quickly if the usage is too different from what the system was originally designed to do. Storage system performance is a blurred concept, and companies need to realise this to see through the overly simplistic marketing that can bias their choices.
Let’s start with the most obvious dimension: storage capacity. There are two aspects to consider: the volume of data that can be stored in a single system, and the possibility to develop the platform’s capacity to meet growing needs – scale-out. More specifically, it’s a question of starting “small” and growing the storage cluster little by little instead of all at once, without migrating data, and by maintaining easy administration and linear performance at the same time.
Using scalability as an important capacity criteria, Object Storage naturally appears as the ideal solution, because theoretically it has no limit. In reality though, the different implementations of Object Storage are not the same as the original promise of infinite scalability. Often, when we increase the cluster size by adding new nodes, the platform has to move data to balance the load between different servers. A rebalancing operation, which can take several days or weeks, even entire months, significantly affects cluster performance and therefore data availability.
Software-defined object storage technology developed by OpenIO does not require data redistribution during scaling. This is possible because we developed a technology to intelligently place data on the various cluster nodes, according to their status at time T. Most other Object Storage solutions, for simplicity, distribute the data in a purely algorithmic way, regardless of the platform status. Despite the limits of certain object stores, compared to block and file storage systems, Object Storage is the solution that scales the best.
Next, we need to consider what kind of bandwidth is achievable, as measured in mega/giga/terabits per second. Bandwidth is important when Object Storage is used for primary data storage, for example to store video or datasets destined to be explored by machine learning algorithms or artificial intelligence. Object Storage is also a good fit for these purposes because of its distributed design, which allows for data recovery parallelisation (provided sufficient CPU resources are available to avoid limiting performance). Moreover, the bandwidth’s relative performance depends on optimising the process to read data, as well as the management of the OS and caching mechanisms.
The choice of storage media is also important: for example, one could use SSD flash drives in an Object Storage cluster to boost the distribution of frequently used data, and at the same time store other types of files on less expensive media – this is what we call data tiering. Data tiering reduces costs and improves performance by positioning data on the storage device that best meets the needs of the user, taking into account the life cycle of a project, the age of the data and how frequently it’s accessed….
Tiering uses storage policies and can also involve choosing between different data protection levels and different methods. So for best performance, replication is sometimes preferred over erasure coding, which requires more time for calculations but in return provides more space: to obtain the equivalent of a triple replication, the overhead linked to erasure coding can go down to 1.25.
Finally, let’s talk about latency, which is probably the least understood aspect of performance. Latency is the time needed to access the first byte of a stored file, otherwise known as the Time to First Byte (TTFB). This criteria is predominant for small-sized files. It becomes insignificant as the size of the file grows, since the file download time across the network exceeds the access time to the first byte.
Therefore, when we store databases, or large files that are accessed in small blocks and partially modified (intensive read/writes or highly frequent I/O), latency is very important as it has a direct impact on the execution speed of operations.
Latency is even more of a challenge in the context of distributed databases. Performance gains, measured in microseconds, linked directly to latency are then directly perceptible in the applications. Storing emails – many small files that must be displayed to the user almost instantly – is another use for which latency is of paramount importance. This performance dimension is closely linked to the intrinsic storage media performance, to disc organisation (type of RAID, disk bays…) and to software solutions that allow access to the data itself (data compression, deduplication, tiering, fine and dynamic allocation, etc.). Here, the block storage or file system, linked to hardware technologies such as the full flash, SSD and NVMe, seem to be the most adequate solutions.
The very concept of Object Storage, which fragments files and distributes them on different servers of a cluster, requires an incompressible calculation time to reassemble segments of a file before it can be used. The performance, in terms of latency, is expressed in milliseconds… compared to microseconds for purely latency-based block storage systems. And sometimes this can make all the difference.
Leaving aside bandwidth capacity, for which Object Storage has a slight advantage, there is a clear divide between latency and capacity-orientated solutions. The first are constrained by a scaling model such as scale-up, while the second are designed for scale-out, horizontal scalability. In other words, by design, latency-orientated solutions will never be suited to massive data storage. The cost per GB would be too high. Furthermore, technical limits are rapidly emerging, which require multiple storage systems in parallel to meet growing storage needs, even though tomorrow’s uses such as Big Data, Machine Learning and artificial intelligence involve removing businesses’ data silos to create “data lakes”.
So as we’ve seen, it’s pointless to claim the best performance in the market, while at the same time criticizing competing solutions, especially when they don’t rely on the same storage mode (block, file or object storage). The performance of a solution is the result of technical choices, justified by the ambition to respond to certain uses rather than others. And this works well! A case in point would be storage on a virtual machine: a smart strategy would be to combine block storage (for when the machine is active) and object storage (to store images when the machine is in sleep mode).