Data Augmentation Explained, and its Benefits at the Edge
If you look at modern compute infrastructure and computing models you can see how object, serverless and even scale-out can easily solve most of the challenges posed by edge computing ad its integration with the cloud. Even more so, having the ability to deploy the same identical technology at both ends makes everything less complex and more manageable at scale.
What is Data Augmentation?
Data is a key asset for any organization today, and increasing its value is becoming a form of investment for the future. In fact, augmented data is easier to find hence to reuse. By comparison data is like money, you could stash cash under your mattress to see it depreciate and become less and less usable over time, or you can choose to create more wealth by investing it consciously.
Automate Data Augmentation
The idea of Data augmentation is not new. But, in the past it was less relevant for businesses and more complicated to do. Things are changing quickly though, we have moved from GBs to TBs and PBs, and now most of the data we create is unstructured. Without the necessary information, a system to query this type of data or at least a smart indexing system, all this information we create and store has a limited lifespan and its value decreases exponentially over time.
Data augmentation, to be effective, should happen during the ingestion process, exactly while you are storing data. A process should run every time new data is added to the system. No matter if it is done in real-time or queued to do it asynchronously, as long as it’s taken care of. This is why I think every modern storage system capable of managing files or objects should have embedded serverless capabilities. Each new event generated at the storage layer could be intercepted and a function (a small piece of code) could analyze the content or query external source to augment and enrich the data. And this would happen seamlessly and transparently to end users and applications.
Data Augmentation at the Edge
If it works at the core, in the cloud, data augmentation is even more effective at the edge!
When data is generated at the edge, there is no reason to move it to the cloud before augmenting it. It is expensive and all but efficient. Furthermore, by knowing the value of data before moving it to the cloud, enables to take decisions about its real value and if it makes sense to use bandwidth or even if it’s worth using up cloud resources for it.
Again, storage + serveless is the key at the edge too. When new data is created it can be immediately analyzed and relevant information can be added on the fly. Think about a smart surveillance camera, for example. The video stream could be analyzed while stored locally, and the system can decide what to do with it depending on the content. if your camera is placed in a city square, you might want to record everything but send only images containing people and vehicles in it to the cloud, optimizing bandwidth and use of cloud storage while sending out content that is augmented with the information to make it searchable.
Storage + serverless is a strong enabler for data augmentation. By offloading this process to the infrastructure, no matter if in the cloud or at the edge, data increases its value and even if we don’t need it today, chances are that it will be easier to find when it’s needed tomorrow.
Data augmentation simplifies and improves big data analytics and it can be of support for many cutting edge applications such us machine learning. It also improves and makes the cloud-edge integration more efficient, especially if the same technology can be deployed at both ends.
Looking at other applications, data augmentation can be beneficial for many other use cases. Take the media & entertainment industry, for example. Video ingested in the storage system can be analyzed, indexed, resampled and information about copyright and its content could be added to metadata fields making it self descriptive!