Skip to content

Key Concepts

Concept Description
Streaming Data A continuous flow of data generated by one or more sources. A data stream is often made of discrete data bundles that are transmitted one after the other. These data bundles are often referred to as events or messages.
Stream Processing A paradigm for processing streaming data. Stream processing lets users process continuous streams of data on the fly and derive results faster in near-real-time. This contrasts with conventional processing methods that require data to be stored prior to processing.
Streaming Data Integration The process that makes streaming data available for downstream destinations. Each consumer may expect data in different formats, protocols, or mediums (DB, file, Network, etc.). These requirements are addressed via Streaming Data Integration that involves converting data into different formats and publishing data in different protocols through different mediums. The data data is often processed using Stream Processing and other techniques to add value before integrating data with its consumers.
Streaming Integration In contrast to Streaming Data Integration that only focuses on making streaming data available for downstream, Streaming Integration involves integrating streaming data as well as trigger action based on data streams. The action can be a single request to a service or a complex enterprise integration flow.
Data Transformation This refers to converting data from one format to another (e.g, JSON to XML) or altering the original structure of data.
Data Cleansing This refers to filtering out corrupted, inaccurate or irrelevant data from a data stream based on one or more conditions. Data cleansing also involves modifying/replacing content to hide/remove unwanted data parts from a message (e.g., obscuring).
Data Correlation This refers to combining different data items based on relationships between data. In the context of streaming data, it is often required to correlate data across multiple streams received from different sources. The relationship can be expressed in terms of a boolean condition, sequence or a pattern of occurrences of data items.
Data Enrichment The process of adding value to raw data by enhancing and improving data by adding more information that is more consumable for users.
Data Summarization This refers to aggregating data to produce a summarized representation of raw data. In the context of Stream Processing, such aggregations can be done based on temporal criteria in real-time with moving time windows.
Real-time ETL This refers to performing data extraction, transformation, and loading in real-time. In contrast to traditional ETL that uses batch processing, Real-time ETL extracts data as and when it is available, transforms it on the fly, and integrates it.
Change Data Capture A technique or a process that makes it possible to capture changes in the data in real-time. This enables real-time ETL with databases because it allows users to receive real-time notifications when data in the database tables are changing.