This audio was created using Microsoft Azure Speech Services
The Fast Data paradigm is one of the latest narratives to come out of the Big Data domain and has been spurred on by the success of IoT and the growing demand for actionable insights at low-latency. As a proponent of in-memory computing and In-Memory Database Management Systems over recent years, I’ve long sought to make the Big Data and IoT communities aware that Fast Data was on the near horizon and simply storing data and performing batch analytics was not enough. I proposed that segments across IoT, such as but not limited to Smart Grid, would demand better processing of data as well as different types of analytics including Graph analysis.
The fact is, companies have a lot of data that they simply do not know how to process effectively or how to derive the value from it. IoT promises to continue collecting more data more frequently with the increasing demand for effective processing and analytics whether this is Big, Micro, or Dark data. The IoT domain and the trending mobility of devices has also magnified the importance of advanced data processing and analytical capabilities such as the integration of machine learning and Graph based analytics to discover hidden relationships and the unknown unknowns. Companies are now beginning to look to Fast Data solutions to effect change and discover the benefits of their data. There are, of course, numerous open-source and commercial solutions on the market that can handle one or more of these needs of Fast Data and one in particular, Apache Spark, is able to handle all of these demands.
The concept of Fast Data is not new, although the term has only become lingua franca in recent years. Any data engineering professional working in the field, over the last few decades, will tell you data was fast before it was “big” and that the community has sought to conquer performance limitations, prior to Big Data, via practices such as scaling up servers, partitioning data on single nodes, and data warehousing solutions. However, the genesis of the modern Fast Data paradigm can be linked to Big Data concepts such as the three V’s (Volume, Velocity, and Variety) as well as the distributed data processing and horizontal scale-out architectures made popular by NOSQL and NewSQL technologies. It is important to realize that Fast Data is more than just supporting the high frequency of data ingestion or discovering performance gains by scaling out compute and storage across a distributed cluster or writing targeted queries. The focus of Fast Data is real-time data processing, deriving actionable insights quickly, and immediate delivery of these results all while leveraging increasingly complex analytics.
In my latest white paper,“ IoT & the Pervasive Nature of Fast Data & Apache Spark”, I will explore the rise of Fast Data and the concepts behind it. I will then review the state of the open-source market and Apache Spark’s ability to support the Fast Data paradigm as well as explore the Lambda Architecture. I will conclude with a brief look towards the future of Fast Data as the IoT market trends towards the Fog Computing paradigm as we attempt to further reduce the latency of data processing and delivery of analytics.