09 Dec 2025

Min Read

DeltaStream for Databricks: True Real-Time Streaming with Apache Flink, Now Inside the Databricks UI

For years, Databricks has been the center of gravity for data engineering and batch processing. Its unified Lakehouse approach revolutionized how organizations handle data at scale. However, when it comes to real-time data processing, many data engineers hit an architectural ceiling.

Databricks Structured Streaming, while powerful, is fundamentally built on a micro-batch architecture. For many use cases, processing data in small batches every few seconds is acceptable. But for true real-time scenarios such as fraud detection, live IoT monitoring, or dynamic pricing, sub-second latency isn’t just a “nice to have”; it’s a requirement.

Furthermore, working with streaming sources in Databricks often feels like operating in the dark. There is little to no visibility into what is actually happening in your Kafka topics, Kinesis streams, or CDC logs until you build a pipeline to land that data into a Delta table.

Today, that changes.

We are excited to announce DeltaStream for Databricks, a native in-UI integration that brings true, continuous Apache Flink streaming and real-time materialized views directly into the Databricks notebook experience. Databricks users can now explore, inspect, transform, and continuously process high-velocity Kafka/Kinesis streams and CDC data sources without ever leaving the Databricks UI.

This integration brings the power of Apache Flink, the gold standard for true stream processing, along with real-time materialized views powered by ClickHouse, right to where Databricks users already work.

This video shows how easy it is to build an end to end streaming pipeline to bring power of Apache Flink into Databricks UI:

Streaming & CDC Visibility, Right Inside Databricks

Previously, if a Databricks user wanted to know what was in a Kafka topic, they had to write a blind ingestion job, wait for a micro-batch to trigger, land the data, and then query it.

With the new DeltaStream integration, the “black box” of streaming sources is opened. Right from a Databricks Notebook, users can now use DeltaStream commands to instantly explore and inspect streaming data stores. You can see Kafka topics, Kinesis streams, and CDC tables before you write a single line of ingestion logic. This unprecedented visibility drastically accelerates development and debugging cycles.

From Micro-Batch to True Continuous Streaming

Structured Streaming is fundamentally micro-batch. Even with low-latency configurations, workloads run as small batches, not as a continuously executing dataflow. This introduces:

  • seconds (or more) of latency
  • limited support for advanced event-time semantics
  • operational complexity in managing state
  • no ability to inspect or interact with raw streaming sources inside the UI

DeltaStream eliminates these limitations. DeltaStream uses Apache Flink as its core compute engine, powering true stream processing with continuous operators and sub-second latency. Pipelines run as long-lived streaming jobs, not periodic micro-batches.

Databricks users can now take advantage of:

  • Continuous event-time processing
  • Stateful stream operations at scale
  • Exactly-once semantics
  • True low-latency outputs, not micro-batch approximations

All directly from their existing Databricks workflows.

Continuously Updated Iceberg Tables (No More Staleness)

A major challenge with current Lakehouse streaming is data freshness. Even with aggressive micro-batching, data in your Lakehouse tables is always slightly stale.

The DeltaStream integration solves this by allowing you to sink the results of your Flink-powered pipelines directly into Lakehouse systems like Apache Iceberg. Because DeltaStream operates in true real-time, these Iceberg tables are continuously updated as new data is ingested.

Furthermore, DeltaStream automatically handles the maintenance and compaction of these tables.

The result? Databricks users can query these Iceberg tables with the confidence that the data is not stale by minutes or even seconds. It is truly up-to-date.

Real-Time Analytics with ClickHouse-Powered Materialized Views

For use cases requiring extremely fast queries, DeltaStream offers real-time materialized views built on ClickHouse:

  • Sub-second analytical queries
  • Perfect for monitoring, anomaly detection, dashboards, and alerting
  • Fully maintained in real time by Flink pipelines

This gives Databricks users a way to combine long-term lakehouse storage (Iceberg) with ultra-fast operational analytics.

With this new integration, Databricks users gain a powerful combination:

  • Databricks for batch, ML, analytics, notebooks, governance
  • DeltaStream for real-time ingestion, true continuous stream processing along with sub-second real-time materialized views

This is the modern real-time data stack, now directly accessible inside the Databricks environment.

The era of compromising on latency to stay within the Lakehouse ecosystem is over. By integrating DeltaStream’s Flink and ClickHouse-powered capabilities directly into Databricks, we are empowering data teams to build truly real-time applications without leaving their preferred environment. It’s time to move beyond micro-batches and embrace the speed of now.

Are you using Databricks? Connect with DeltaStream and bring your streaming data to life inside Databricks.

This blog was written by the author with assistance from AI to help with outlining, drafting, or editing.

06 May 2025

Min Read

Noisy Neighbors Begone! DeltaStream’s Solution to Stream Processing Workload Isolation

When dealing with real-time stream processing, stability and predictability are critical. Many developers have experienced the frustration of seeing important dashboards slow down or critical pipelines stall because of resource contention in shared environments.

Typically, stream processing jobs run on shared clusters. While convenient initially, shared clusters suffer from the “noisy neighbor” issue—a single inefficient or resource-intensive query can degrade performance across all jobs on the cluster. This unpredictability is a real operational pain, and can even lead to full outages.

One common workaround is deploying a separate cluster per application. While this solves the isolation issue, it introduces its own headaches—like increased operational complexity, difficulty scaling, and tedious cluster management.

DeltaStream’s Serverless Revelation: Workload Isolation Without Orchestration Agony

DeltaStream tackles this differently. Our serverless architecture automatically creates an isolated Flink cluster for each streaming query. Instead of multiple queries competing in a shared environment, each one gets dedicated resources in its own isolated space. If one query becomes problematic, it won’t affect any others.

DeltaStream’s Key Advantages:

  • Complete Isolation: Each query runs independently. A problematic query doesn’t impact the performance or reliability of others.
  • Predictable Performance: Dedicated resources for every query mean your SLAs stay consistent and reliable.
  • Automatic Scaling: DeltaStream dynamically allocates CPU and memory resources per query, based on real-time needs—no manual tuning required.
  • Simplified Operations: No need to manage multiple clusters yourself. DeltaStream handles provisioning, scaling, and orchestration behind the scenes.

As the table illustrates, while traditional shared clusters suffer from the “noisy neighbor” problem and managing multiple clusters introduces operational overhead, DeltaStream offers a superior solution with its inherent query-level isolation and simplified serverless experience.

Embrace Stability and Focus on Your Data

DeltaStream’s approach significantly reduces the complexity and unpredictability associated with traditional stream processing setups. By providing query-level isolation and a fully-managed, serverless environment, developers and data engineers can concentrate on delivering real-time insights instead of managing infrastructure.

No more noisy neighbors, just reliable, predictable streaming analytics.

alert-icon

Please enter a valid email address.

Request Submitted

Thank you for requesting a demo.
You will receive your login information to your email soon.