09 Dec 2025
Min Read
DeltaStream for Databricks: True Real-Time Streaming with Apache Flink, Now Inside the Databricks UI
Table of contents
For years, Databricks has been the center of gravity for data engineering and batch processing. Its unified Lakehouse approach revolutionized how organizations handle data at scale. However, when it comes to real-time data processing, many data engineers hit an architectural ceiling.
Databricks Structured Streaming, while powerful, is fundamentally built on a micro-batch architecture. For many use cases, processing data in small batches every few seconds is acceptable. But for true real-time scenarios such as fraud detection, live IoT monitoring, or dynamic pricing, sub-second latency isn’t just a "nice to have"; it’s a requirement.
Furthermore, working with streaming sources in Databricks often feels like operating in the dark. There is little to no visibility into what is actually happening in your Kafka topics, Kinesis streams, or CDC logs until you build a pipeline to land that data into a Delta table.
Today, that changes.
We are excited to announce DeltaStream for Databricks, a native in-UI integration that brings true, continuous Apache Flink streaming and real-time materialized views directly into the Databricks notebook experience. Databricks users can now explore, inspect, transform, and continuously process high-velocity Kafka/Kinesis streams and CDC data sources without ever leaving the Databricks UI.
This integration brings the power of Apache Flink, the gold standard for true stream processing, along with real-time materialized views powered by ClickHouse, right to where Databricks users already work.
This video shows how easy it is to build an end to end streaming pipeline to bring power of Apache Flink into Databricks UI:
Streaming & CDC Visibility, Right Inside Databricks
Previously, if a Databricks user wanted to know what was in a Kafka topic, they had to write a blind ingestion job, wait for a micro-batch to trigger, land the data, and then query it.
With the new DeltaStream integration, the "black box" of streaming sources is opened. Right from a Databricks Notebook, users can now use DeltaStream commands to instantly explore and inspect streaming data stores. You can see Kafka topics, Kinesis streams, and CDC tables before you write a single line of ingestion logic. This unprecedented visibility drastically accelerates development and debugging cycles.
From Micro-Batch to True Continuous Streaming
Structured Streaming is fundamentally micro-batch. Even with low-latency configurations, workloads run as small batches, not as a continuously executing dataflow. This introduces:
- seconds (or more) of latency
- limited support for advanced event-time semantics
- operational complexity in managing state
- no ability to inspect or interact with raw streaming sources inside the UI
DeltaStream eliminates these limitations. DeltaStream uses Apache Flink as its core compute engine, powering true stream processing with continuous operators and sub-second latency. Pipelines run as long-lived streaming jobs, not periodic micro-batches.
Databricks users can now take advantage of:
- Continuous event-time processing
- Stateful stream operations at scale
- Exactly-once semantics
- True low-latency outputs, not micro-batch approximations
All directly from their existing Databricks workflows.
Continuously Updated Iceberg Tables (No More Staleness)
A major challenge with current Lakehouse streaming is data freshness. Even with aggressive micro-batching, data in your Lakehouse tables is always slightly stale.
The DeltaStream integration solves this by allowing you to sink the results of your Flink-powered pipelines directly into Lakehouse systems like Apache Iceberg. Because DeltaStream operates in true real-time, these Iceberg tables are continuously updated as new data is ingested.
Furthermore, DeltaStream automatically handles the maintenance and compaction of these tables.
The result? Databricks users can query these Iceberg tables with the confidence that the data is not stale by minutes or even seconds. It is truly up-to-date.
Real-Time Analytics with ClickHouse-Powered Materialized Views
For use cases requiring extremely fast queries, DeltaStream offers real-time materialized views built on ClickHouse:
- Sub-second analytical queries
- Perfect for monitoring, anomaly detection, dashboards, and alerting
- Fully maintained in real time by Flink pipelines
This gives Databricks users a way to combine long-term lakehouse storage (Iceberg) with ultra-fast operational analytics.
The Best of Both Worlds: Databricks + DeltaStream(Flink + ClickHouse)
With this new integration, Databricks users gain a powerful combination:
- Databricks for batch, ML, analytics, notebooks, governance
- DeltaStream for real-time ingestion, true continuous stream processing along with sub-second real-time materialized views
This is the modern real-time data stack, now directly accessible inside the Databricks environment.
The era of compromising on latency to stay within the Lakehouse ecosystem is over. By integrating DeltaStream's Flink and ClickHouse-powered capabilities directly into Databricks, we are empowering data teams to build truly real-time applications without leaving their preferred environment. It’s time to move beyond micro-batches and embrace the speed of now.
Are you using Databricks? Connect with DeltaStream and bring your streaming data to life inside Databricks.
This blog was written by the author with assistance from AI to help with outlining, drafting, or editing.