Data Repatriation: What, When, Why, and How

The public cloud adds a lot of flexibility to your IT, but it can be expensive, especially when it comes to data storage. This is why more and more customers are opting for hybrid infrastructures.
High-Perf Object Storage

One of the most interesting trends we have witnessed at OpenIO in recent years is data repatriation and the hybridization of cloud storage. In this regard there are at least two types of customers:

  1. Large companies that bought into the public cloud concept and started to move more new projects to the cloud, alongside application refactoring in some cases.
  2. Startups, for whom the cloud was the only way to go, since they did not have any legacy constraints and could architect their IT with the latest technology.

In both cases, but for different reasons, when the amount of data they stored stabilized, they discovered that the cloud was too expensive. If you remove the best reason to use the public cloud – it is convenient and flexible - you can get almost the same level of service on premises, but at a lower price.

What is Data Repatriation?

The process of moving data back to a self-managed infrastructure, whether it is in your datacenter or on an external provider’s dedicated servers, is called data repatriation. The concept is very simple and straightforward, but there are a few things to consider before moving data back from the cloud.

Why do you want to repatriate your data?

The main reason to repatriate your data is cost. In a previous article we discussed how one of our customers saved €400K in the first year after migrating from Amazon AWS S3 to OpenIO on dedicated servers. And this was just the first year; the savings are increasing year after year!

The main reason for these savings is egress costs. Amazon, as well as the vast majority of cloud service providers, does not charge for uploading data to their service, but you pay for capacity and data transfers. External data transfers (egress), outside the cloud provider infrastructure, are particularly expensive, and they can easily make up the biggest part of your monthly bill.

And if you use data intensively, not just for archiving purposes, the cost skyrockets quickly and it can end up undermining your IT budget. When you think about the increasing number of workloads that you can run on your object store, this could really hurt your wallet and limit your ability to work with your data.

When is the best time to repatriate data?

The public cloud shows the best of its potential in the initial phase of every project. Quick provisioning and instant scaling make a huge difference when compared to traditional approaches. And if your application grows quickly and has unpredictable behavior, the public cloud is still preferable.

As soon as resource utilization trends become clearer, a cost/benefit analysis should be run to understand if, when, and what should be repatriated. Not all resources are used in the same way, and you may decide to repatriate only data while maintaining compute power in the cloud. A clear example of this is applications that have a strong seasonality, where data grows at a steady pace while compute resource utilization varies over the course of the year. In this case, you could store data locally and access it through the cloud at a fraction of the cost of doing everything from within the cloud. Today, most service providers offer high speed dedicated connectivity to their clients, limiting latency issues. And the same is true for big data lakes: a huge object store on your premises can be accessed concurrently from internal and external resources.

You should always keep an eye on your expenses. Start to investigate private cloud storage options as soon as your public cloud bill starts costing more than you expected.

How do you repatriate data?

All object stores are now S3 compatible but, contrary to what happens when you want to upload data, there are no tools that can help you move your data to another store. Public cloud providers use this as a form of customer lock-in. But, again, everything is S3 compatible.

When you buy a commercial solution, you should always check to see if you can work with multiple service providers in the backend, and if the applications take care of data movements between clouds; if you built your own application, you should always consider this type of scenario.

There are several tools available that allow you to copy data between object stores. Most of them are open source. The only problem could be the time needed to perform this kind of data migration. Cold archives are easier to move than frequently accessed repositories, and, in the latter case, you might need a smarter approach. For example, if you have a serverless computing platform like OpenIO GridForApps, you can skip the initial migration and point your application directly at OpenIO. You can easily write a function that checks the old object store for objects that are not present in the new one. Then, if an object is found in the old object store, you can return the object and copy it to the new one. This is easy, transparent, and has a minimal cost, because you only pay for migrating the object when you need it.

In addition to these already available tools, our R&D team is actively working on a cloud migration tool that will easily migrate data from an S3-compatible cloud to an OpenIO-powered cluster. It will be available later this year, simplifying even more the process of data repatriation.

Closing the circle

OpenIO is an object store that can be installed practically anywhere, on large nodes as well as on small devices or virtual machines. It’s open source and its friendly licensing model helps simplify the data repatriation process while unique features such as GridForApps can help minimize service disruption during migration.

Data repatriation is a very common topic now, as organizations are reconsidering their "cloud first" strategy with more mature hybrid- and multi-cloud approaches. The flexibility and cost-effectiveness of OpenIO makes it a good candidate to consider when powering these new infrastructures.

High-Perf Object Storage
All posts by OpenIO