23 Jun 2025

Min Read

Stream Smarter, Spend Less: Shift-Left with DeltaStream and Cut Snowflake Costs by 75%

Snowflake is one of the most powerful cloud data platforms available today. But as organizations increasingly rely on it to power business intelligence, AI, and real-time applications, many are discovering a costly tradeoff. Using Snowflake as the primary engine for all ELT (Extract, Load, Transform) processing can quickly balloon cloud spend.

Every Dynamic Table refresh, every intermediate table, and every downstream aggregation adds up—especially when data volumes or update frequencies grow.

This is why a growing number of modern data teams are shifting left to simplify streaming ETL and slash compute and storage bills.


Why Shift Left?

Most pipelines today follow an ELT model: extract raw data, land it in Snowflake, then transform it using tools like Dynamic Tables. While convenient, this pattern introduces hidden inefficiencies. You pay to store intermediate layers—often called Bronze and Silver tables—and rack up warehouse credits with every scheduled refresh. And because those transformations are tied to batch-based triggers, you’re often stuck waiting minutes (or longer) for updated insights.

DeltaStream offers a better way. By shifting left, teams can move from ELT to ETL, transforming data before it hits the warehouse. DeltaStream processes raw files as they land—cleaning, enriching, and aggregating them in motion. With just SQL, teams can build real-time pipelines that send only the final, analytics-ready results—Gold tables—into Snowflake. The result? A leaner, faster, and far more cost-effective architecture.

A Real-World Benchmark Using NYC Taxi Data


To prove the difference, we ran a 24-hour benchmark using NYC Yellow Taxi trip data. We tested two different pipeline strategies: one with Snowflake doing all the transformation work (ELT), and another where DeltaStream handled real-time transformations before the data reached Snowflake (ETL). In the Snowflake path, data landed in raw form and passed through multiple Dynamic Tables to get cleaned and enriched. Each table refreshed every minute, consuming compute resources even when no new data arrived. The result was a full transformation pipeline inside Snowflake—functional, but costly.

In the DeltaStream path, we ingested raw data directly from S3 into Kafka using a simple SQL statement. DeltaStream then joined and enriched the data in real time, skipping intermediate Silver tables altogether. Only the final Gold aggregates were streamed into Snowflake.

Ingest Once, Stream Forever


Both pipelines began by ingesting the same set of raw JSONL files dropped into an S3 bucket. 

  1. s3://aurora-demo-deltastream-e2e-s3-bucket/yellow-taxi/
  2. ├── yellow_taxi_2023-01.jsonl
  3. ├── yellow_taxi_2023-02.jsonl
  4. ├── ...
  5. └── taxi_zone_lookup.jsonl

Using DeltaStream, the ingestion process was automatic and serverless. New files were picked up as they landed, schemas were versioned and validated, and no custom Spark jobs or manual scripts were needed. In contrast to traditional batch jobs, the system was truly event-driven: ingest once, and the stream keeps flowing. Once data was ingested, we landed it into a Snowflake Bronze table using Snowpipe Streaming. 

  1. CREATE TABLE yt_2023_bronze
  2. WITH (
  3.   'store' = 'snow_bench', 
  4.   'snowflake.db.name' = 'DEMO_DB',
  5.   'snowflake.schema.name' = 'SNOW_BENCH'
  6. )
  7.  
  8. AS SELECT * FROM yellow_taxi_2023;

This stage was identical in both setups and created a consistent starting point. From there, however, the approaches diverged dramatically.

The Snowflake ELT Path: Functional but Expensive


Inside Snowflake, we used two Dynamic Tables to transform and enrich the data. One table cleaned the raw trip data and calculated basic metrics like trip speed. Another joined it with lookup tables to add zone and borough information. 

  1. SILVER_TAXI_CLEAN: cleans up trips, calculates mph
  2. SILVER_TAXI_ENRICHED: adds zone and borough names

Because Dynamic Tables refresh on a fixed schedule, these transformations ran every 60 seconds, regardless of whether new data had arrived.

Next, we built three additional Dynamic Tables for analytics: one for 15-minute zone aggregates, one for hourly borough stats, and one for daily top zones. While this delivered useful business insights, the cost of constantly running these transformations was substantial. 

The cost profile:

  • Warehouse Usage for Dynamic Table Refreshes: $17.72/day
  • Snowpipe Streaming: $0.16

    Compute was always on, and storage requirements grew with every new version of the Silver and Gold tables.

The DeltaStream ETL Path: Leaner, Faster, Cheaper


There were no intermediate Silver tables to maintain. No refresh schedules to manage. As soon as a new file landed in S3, DeltaStream loaded it into Kafka, ran the transformation, and streamed only the final Gold aggregates into Snowflake. 

  1. CREATE STREAM SILVER_TAXI_ENRICHED 
  2. AS
  3. SELECT *
  4. FROM yellow_taxi_2023 AS c
  5. JOIN yt_lookup_cl AS pu ON c.PULocationID = pu.LocationID
  6. ...

The cost profile:

  • Gold Table Storage: 0.64 MB
  • Warehouse Usage: $0.17 (initial load only)
  • Snowpipe Streaming: $0.40

This approach not only simplified the architecture, but eliminated unnecessary compute and storage costs.

Key Results: 75% Cost Savings!

Over 24 hours, we tracked compute and storage costs across both pipelines:

  • Snowflake storage (Silver tables = 371 MB vs Gold tables = .64MB)
  • Warehouse usage ($17.72 /day just for refresh vs $.17 for initial Gold table creation)
  • Snowpipe Streaming usage (.16 for ELT vs .4 for ETL )

Key Difference: 1-Minute Dynamic Table Refresh vs. Real-Time Updates

DeltaStream vs Snowflake

The Bottomline: The Snowflake ELT path consumed 4x–10x more compute resources than DeltaStream, a cost savings of more than 75%.


Bonus Benefit: DeltaStream Simplifies Streaming with SQL


Perhaps the most surprising outcome? Simplicity. With DeltaStream, you don’t need to learn Java or wrestle with Flink SDKs. You write SQL, just like in Snowflake. There’s no need to manage watermarks, orchestrate batch windows, or worry about how stream processing frameworks handle state. DeltaStream takes care of all of that—giving you clean, governed, and real-time data with far less operational burden.

You get all the power of streaming ETL—without the learning curve.

The Takeaway: Shift Left and Save Big


This benchmark confirms what many data teams are already realizing: doing everything inside Snowflake might be simple, but it’s not always efficient. With DeltaStream, you can reduce compute and storage costs, shrink latency, and streamline your architecture—all while using familiar SQL.

Shift left. Get fresher data. Cut your Snowflake bill. Stream smarter—with DeltaStream.


Want to see how much we can save you on your Snowflake bill?


Contact us for a complimentary stream assessment with DeltaStream CEO Hojjat Jafarpour.
We’ll evaluate your current architecture, identify quick wins, and deliver a custom action plan to reduce costs, simplify pipelines, and accelerate time to insight.

04 Jun 2025

Min Read

DeltaStream Fusion is Now Generally Available: Unify All Your Analytics in a Single Platform!

We are thrilled to announce the General Availability (GA) of DeltaStream Fusion, our Unified Analytics Platform! With DeltaStream Fusion, enterprises can simplify data infrastructure, lower compute costs, and accelerate insights from all data—from the fastest real-time streams to the deepest historical batches.

From Fragmentation to Fusion

Today’s businesses require real-time intelligence, but traditional fragmented analytics architectures make this challenging and costly. Separate tools for streaming, batch, and real-time analytics create operational complexity, redundant infrastructure costs, data duplication, and governance issues.

ADeltaStream Fusion eliminates these silos, offering a single powerful, serverless platform that seamlessly integrates:

  • Streaming Analytics: Leverage Apache Flink to instantly process, transform, and analyze data.
  • Batch Processing: Utilize Apache Spark for scalable, historical data analysis.
  • Real-Time Querying: Deliver blazing-fast query performance with ClickHouse, power live dashboards and applications.

Integrations

DeltaStream Fusion natively integrates with many solutions such as Apache Iceberg, Postgres and Apache Kafka, ensuring a consistent, performant lakehouse experience.

What are benefits of Unified Analytics?

Data teams can use DeltaStream Fusion to:

Accelerate Time to Insights: Detect anomalies, personalize experiences, and respond to business events in milliseconds, not hours.

Simplify Data Stack: Consolidate tools, reduce maintenance overhead, and free up valuable engineering resources.

Empower Teams: Provide a unified SQL interface and powerful capabilities, enabling data engineers, analysts, and scientists to collaborate seamlessly.

Innovate with Confidence: Build advanced applications like real-time fraud detection, predictive IoT maintenance, and dynamic customer analytics.

DeltaStream Fusion in Action: Live at Snowflake and Databricks Summits!

Are you attending Snowflake Summit? Maybe you’re headed to the Databricks Data&AI Summit? Don’t miss your chance to see DeltaStream Fusion’s powerful capabilities firsthand! Our team is on-site, ready to demonstrate how you can unify your real-time and batch data pipelines and accelerate your analytics.

Get a live demo of DeltaStream Fusion and speak directly with our experts. Schedule a dedicated meeting to discuss your specific data challenges and how Fusion can help by heading to our contact us page.

We’re excited to show you how DeltaStream Fusion seamlessly complements your Snowflake environment to deliver a truly unified data experience.

Ready to transform your data strategy?

Request a Personalized Demo. The future of analytics is unified, and DeltaStream Fusion is leading the way. Discover how simple, powerful, and cost-effective unified analytics can be.

We are excited for you to discover the full potential of DeltaStream Fusion and look forward to the amazing innovations you will build!

04 Apr 2025

Min Read

Introducing DeltaStream Fusion: The Unified Analytics Platform

Since introducing DeltaStream, our mission has been to build a comprehensive stream processing platform that is easy to use and easy to operate. I’m excited to introduce DeltaStream Fusion—a Unified Analytics Platform— bringing together streaming, real-time, and batch analytics into a single integrated solution. DeltaStream Fusion enables users to build high performance streaming pipelines, create real-time materialized views directly from streaming data and perform complex batch analytics on lakehouse data—all within one platform. 

With these capabilities, organizations can seamlessly handle diverse workloads, from real-time data ingestion for training applications and IoT analytics, to real-time dashboards and sophisticated batch analyses, without having to manually stitch together different platforms and creating silos.

Why We Built DeltaStream Fusion

DeltaStream began as a managed, serverless platform built around Apache Flink to process, govern and share streaming data. Typically, data is ingested into streaming storage systems and made available to downstream consumers, including data lakehouses—a common destination for streaming data. Often, data moves through multiple specialized systems: streamed data is processed by streaming platforms, stored in lakehouses, then queried by separate batch analytics engines. 

Consider the common example of clickstream analytics: weblog pageview events are enriched and aggregated via a streaming pipeline, then stored and analyzed separately within a lakehouse.

We see similar patterns of fragmentation in real-time analytics, where streaming data is stored in a real-time analytics database to power live dashboards or user-facing analytics.

A fragmented analytics landscape creates many inefficiencies. Managing separate analytics stacks for streaming, batch, and real-time workloads leads to: 

  • Operational complexity, where each tool requires specialized knowledge, unique deployment methods, and dedicated infrastructure.
  • Redundant infrastructure costs, as organizations deploy multiple tools that often overlap in functionality.
  • Data duplication and synchronization issues, especially when trying to maintain consistency across disparate systems.
  • Governance and compliance challenges, as teams must enforce security and policy standards in multiple places, increasing the risk of errors or non-compliance.

A unified analytics platform capable of supporting all analytics workloads would address these challenges. This is the main motivation behind DeltaStream’s Fusion platform.

The Unified Analytics Advantage

DeltaStream Fusion brings real-time, batch, and interactive analytics together in one seamless platform—so users can go from raw data to insights without jumping between tools.

With Fusion, teams can:

  • Build real-time streaming pipelines to prep data on the fly
  • Write that data to a lakehouse for long-term storage and deeper analysis
  • Instantly query and process both streaming and batch data—all within the same platform

Fusion also makes it easy to create real-time materialized views from streaming data, so you can deliver up-to-the-second insights to dashboards, applications, or end users—without ever leaving DeltaStream.

This unified architecture simplifies even the most complex analytics workflows. There’s no need to stitch together multiple systems or manage infrastructure. As a cloud-native, serverless solution, Fusion automatically chooses the best engine for the job:

  • Apache Flink for streaming
  • Apache Spark for batch
  • ClickHouse for low-latency queries

The diagram above shows how clickstream analytics flows through Fusion: streaming and lakehouse data are connected, processed, and queried—all in one place. Thanks to built-in real-time analytics capabilities, Fusion can deliver sub-second latency insights to power rich, responsive user experiences.

What You Can Do with DeltaStream Fusion

DeltaStream Fusion unlocks powerful analytics use cases across industries in one unified platform, without the complexity of managing separate tools or stitching together workflows.

Organizations can now build:

  • Real-time fraud detection systems that adapt as threats emerge
  • Predictive maintenance pipelines for IoT fleets based on live telemetry
  • Customer 360° analytics to personalize experiences with up-to-the-second insights
  • Financial analytics that blend real-time risk scoring with deep historical trend analysis

Moreover, businesses can deploy interactive dashboards powered by real-time materialized views, enabling instant insights for operational decisions. By integrating streaming, batch, and real-time querying together in one cohesive platform, DeltaStream Fusion gives data teams the agility to iterate faster, uncover deeper insights, and move from raw data to action in record time.

A Shift-Left in Analytics

Fusion also enables a critical shift in how organizations approach analytics—away from traditional batch-oriented platforms like Snowflake, Databricks, and Redshift, and toward continuous streaming as the default mode to process data. 

This shift brings major advantages:

  • Lower infrastructure and compute costs
  • Faster access to insights, reducing time-to-decision
  • Fewer data pipelines and systems to manage, cutting operational overhead

With native support for Apache Iceberg and seamless integration with platforms like Snowflake and Databricks, Fusion lets you unify batch and streaming into a single, governed pipeline—reducing duplication, maintaining consistency, and improving overall cloud efficiency.

Join the Early Access Program

We’re thrilled to open early access to DeltaStream Fusion, our next-generation Unified Analytics Platform. By breaking down silos and eliminating the complexity of hybrid stacks, Fusion enables organizations to build more agile, real-time data systems that drive innovation and competitive edge.

Spots are limited—sign up now to reserve your place and unlock the next generation of analytics.

06 Jan 2025

Min Read

DeltaStream: Looking Back at 2024

2024 was a pivotal year for DeltaStream of unprecedented growth and technological breakthroughs. Our journey is defined by relentless innovation, strategic advancements, and an unwavering commitment to creating the best stream processing for data teams. Central to this mission has been our vision to help organizations shift left in their analytics—transforming data as it streams in rather than relying on traditional Extract-Load-Transform (ELT) workflows.

As we reflect on the past year, we’re excited to share the milestones that have not only shaped our trajectory but have also reinforced our core mission of making stream processing more accessible, intuitive, and powerful. The following highlights capture the essence of our transformative year:

Raising our Series A

In September, we secured $15M in Series A funding from New Enterprise Associates (NEA), Galaxy Interactive, and Sanabil Investments. This brings our total raised to $25M and will accelerate DeltaStream’s vision of providing a serverless stream processing platform to manage, secure, and process all streaming data. We are excited to have NEA, Galaxy Interactive, and Sanabil Investments as partners on this journey!

Expanding Expertise

This year, we welcomed two additions to our team, including a Developer Advocate and Head of Product Marketing. These two additions will help us bring the best product possible to streaming data teams. We are also currently expanding our engineering team; you can apply here and join our team!

Enhancing our User Interface

This year, we made a series of updates to our user interface (UI) based on user feedback. These changes are tailored to streamline real-time data processing and management, making monitoring, managing, and interacting with the platform easier. We partnered with Greydient Lab to bring our users a robust yet simplified platform to better manage data streams. Take a look at a detailed account of everything we improved upon in our UI.

Engineered for Ease

2024 saw the release of many exciting capabilities by DeltaStream. We launched our API v2, which includes GO and Typescript drivers for our DeltaStream API. Additionally we added a Terraform provider and self-served private link support for MSK, along with an ever-improving SQL syntax. This coming year we have many more improvements on our roadmap that we are looking forward to sharing.

Enriching Integrations

Our goal has always been to find ways to serve more users, wherever their data may be. This year we joined the Confluent Partner Program and added the Confluent Kafka store to our platform. Additionally, thanks to user feedback we also added Postgres as a data store. We plan to continually add more data stores to our system so you can bring all your data into motion, wherever it may be.  

Widening our Availability

We’ve made it even easier to start stream processing by opening up our platform to an instant 14 day free trial. It’s important to us that users can easily get into our platform and start writing queries in minutes. In keeping with ease of accessibility, we are also in the AWS marketplace making it simple for AWS customers to purchase and start using DeltaStream.

Made Updates to our Open Source Contributions

Recognizing the power of collaboration, we open-sourced our Snowflake connector for Apache Flink in 2023. This connector facilitates native integration between other data sources and Snowflake. Open sourcing this connector aligns with our vision of providing a unified view over all data and making stream processing possible for any product use case. We also made updates and improvements to the connector in 2024, in keeping with our commitment to the Apache Flink community.

Bundle Queries Applications

To make stream processing simpler, more efficient, and more cost effective, we wanted the capability of combining multiple statements together. We did this by creating a new feature called Applications which simplifies workloads, reduces the load on stores, and reduces overall execution cost and latency.

Engaging the Community

We actively participated in various conferences and events, including Current, Databricks: DATA+ AI Summit, AWS reInvent, and numerous data industry gatherings. We hosted a special networking event at Current with our partners RedPanda, Clickhouse, and Conduktor. This year we also began conducting monthly webinars and showcased DeltaStream to the broader community. Thank you to everyone we met and connected with this year!

Looking Ahead

Looking ahead to 2025, our vision remains clear and ambitious. We are dedicated to pushing the boundaries of stream processing and making sophisticated data technologies accessible to teams of all sizes and complexity.

To everyone who has been part of our journey—our customers who trust us with their most critical data needs, our partners who challenge us to innovate, and our community who inspire us every day—we extend our deepest gratitude. Together, we are not just transforming data; we are creating a more responsive, intelligent, and data-driven future.

19 Nov 2024

Min Read

Introducing DeltaStream’s New User Interface for Enhanced Stream Processing

We are excited to announce a series of updates to our user interface (UI) for DeltaStream, designed to improve usability, efficiency, and functionality. These changes are tailored to streamline real-time data processing and management, making monitoring, managing, and interacting with the platform easier for users. We partnered with Greydient Lab to bring our vision of a complete and simplified platform to life.  Here’s a look at what’s new:

Enhanced Dashboard for Real-Time Insights

We’ve revamped the dashboard to give users an at-a-glance overview of their ongoing work. You can now easily check the number of queries running, the status of your data stores, and other key metrics without diving deep into different sections. This enhancement allows for faster decision-making and better system management.

New Query Status Bar for Easier Error Tracking

To help users manage their streaming queries, we’ve added a query status bar at the top of the navigation. This feature makes it easy to quickly check for errors or issues, ensuring that problems can be resolved before they impact your data pipelines.

Detailed Activity Logs for Admins

For security and user management, we’ve introduced activity logs specifically for Security Admin and User Admin roles. This feature provides a comprehensive view of actions taken within the organization, giving admins greater control and visibility over their environment.

Centralized Resources Page

We’ve created a new Resources page where the most important objects are gathered for easier management. This consolidation allows users to access and manage key resources quickly without navigating through multiple menus or screens.

Integration Management Page

The new Integration page simplifies external integration management. Whether connecting to third-party data sources or adding external tools, you now have a centralized location to handle all your integrations.

Enhanced Workspace for Streamlined Workflows

The workspace has been redesigned to include all essential sections—file explorer, SQL editor, result, and history—on a single page. This allows users to work more efficiently, with everything they need in one view.

  • Customizable Workspace: You can toggle on or off specific sections like the file explorer, SQL editor, or result pane to focus on the parts of the workspace that matter most at any given time.
  • File Explorer Improvements: The file explorer now enables users to directly check each data store or database without navigating away from the workspace, reducing time spent moving between pages.

Revamped Query Page

Our new Query page now includes overview information and detailed query metrics. This gives users more profound insights into their queries, helping to optimize performance and better understand the behavior of their data pipelines.

Conclusion

These UI updates make real-time stream processing more intuitive, secure, and efficient. We believe these changes will help users streamline their workflows, reduce errors, and better manage their data streams. Stay tuned as we continue to improve the platform and provide you with the best tools for real-time data processing. Try it for yourself – sign up for a free 14-day trial of DeltaStream.

06 Nov 2024

Min Read

Open Sourcing our Snowflake Connector for Apache Flink

November 2024 Updates:

At DeltaStream our mission is to bring a serverless and unified view of all streams to make stream processing possible for any product use case. By using Apache Flink as our underlying processing engine, we can leverage its rich connector ecosystem to connect to many different data systems, breaking down the barriers of siloed data. As we mentioned in our Building Upon Apache Flink for Better Stream Processing article, using Apache Flink is more than using robust software with a good track record at DeltaStream. Using Flink has allowed us to iterate faster on improvements or issues that arise from solving the latest and greatest data engineering challenges. However, one connector that was missing until today was the Snowflake connector.

Today, in our efforts to make solving data challenges possible, we are open sourcing our Apache Flink sink connector built for writing data to Snowflake. This connector has already provided DeltaStream with native integration between other sources of data and Snowflake. This also aligns well with our vision of providing a unified view over all data, and we want to open this project up for public use and contribution so that others in the Flink community can benefit from this connector as well.

The open-source repository will be open for contributions, suggestions, or discussions. In this article, we touch on some of the highlights of this new Flink connector.

Utilizing the Snowflake Sink

The Flink connector uses the latest Flink Sink<InputT> and SinkWriter<InputT> interfaces to build a Snowflake sink connector and write data to a configurable Snowflake table, respectively:

Diagram 1: Each SnowflakeSinkWriter inserts rows into Snowflake table using their own dedicated ingest channel

The Snowflake sink connector can be configured with a parallelism of more than 1, where each task relies on the order of data it receives from its upstream operator. For example, the following shows how data can be written with parallelism of 3:

  1.  
  2. DataStream<InputT>.sinkTo(SnowflakeSinkWriter<InputT>).setParallelism(3);

Diagram 1 shows the flow of data between TaskManager(s) and the destination Snowflake table. The diagram is heavily simplified to focus on the concrete SnowflakeSinkWriter<InputT>, and it shows that each sink task connects to its Snowflake table using a dedicated SnowflakeStreamingIngestChannel from Snowpipe Streaming APIs.

The SnowflakeSink<InputT> is also shipped with a generic SnowflakeRowSerializationSchema<T> interface that allows each implementation of the sink to provide its own concrete serialization to a Snowflake row of Map<String, Object> based on a given use case.

Write Records At Least Once

The first version of the Snowflake sink can write data into Snowflake tables with the delivery guarantee of NONE or AT_LEAST_ONCE, using AT_LEAST_ONCE by default. Supporting EXACTLY_ONCE semantics is a goal for a future version of this connector.

The sink writes data into its destination table after buffering records for a fixed time interval. This buffering time interval is also bounded by Flink’s checkpointing interval, which is configured as part of the StreamExecutionEnvironment. In other words, if Flink’s checkpointing interval and buffering time are configured to be different values, then records are flushed as fast as the shorter interval:

  1.  
  2. StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
  3. env.enableCheckpointing(100L);
  4. SnowflakeSink<Map<String, Object>> sf_sink = SnowflakeSink.<Row>builder()
  5. .bufferTimeMillis(1000L)
  6. .build(jobId);
  7. env.fromSequence(1, 10).map(new SfRowMapFunction()).sinkTo(sf_sink);
  8. env.execute();

In this example, the checkpointing interval is set to 100 milliseconds, and the buffering interval is configured as 1 second.  This tells the Flink job to flush the records at least every 100 milliseconds, i.e., on every checkpoint.

Read more about Snowpipe Streaming best practices in the Snowflake documentation.

We are very excited about the opportunity to contribute our Snowflake connector to the Flink community. We’re hoping this connector will add more value to the rich connector ecosystem of Flink that’s powering many data application use cases.If you want to check out the connector for yourself, head over to the GitHub repository. Or if you want to learn more about DeltaStream’s integration with Snowflake, read our Snowflake integration blog.

The Road to Raising DeltaStream’s Series A

DeltaStream has secured $15M Series A funding

Today, I’m excited to announce DeltaStream has secured $15M Series A funding from New Enterprise Associates(NEA), Galaxy Interactive, and Sanabil Investment. This brings our total raised to $25M and will accelerate DeltaStream’s vision of providing a serverless stream processing platform to manage, secure and process all streaming data. 

The Beginnings of Our Story

I joined Confluent in 2016 because I was given the opportunity to build a SQL layer on top of Kafka so users could build streaming applications in SQL. ksqlDB was the product we created, and it was one of the first SQL processing layers on top of Apache Kafka. While ksqlDB was a significant first step, it had limitations. Including being too tightly coupled with Kafka, only working with one Kafka cluster, and creating a lot of network traffic on Kafka cluster. 

The need for a next-generation stream processing platform was obvious. It had to be a completely new platform from the ground up, so it made sense to start from scratch. Starting DeltaStream was the beginning of a journey to revolutionize the way organizations manage and process streaming data. The challenges we set out to solve for were:

  1. Easy of use: Writing SQL queries is all a user should worry about
  2. Have a single data layer to process/analyze all streaming and batch data, including, for example, data from Kafka, Kinesis, Postgres, and Snowflake
  3. Standardize and authorize access to all data
  4. Enable high-scale and resiliency
  5. Flexible deployment models

Building DeltaStream

At DeltaStream, Apache Flink is our processing/computing engine. Apache Flink has emerged as the gold standard platform for stream processing with proven capabilities and a large and vibrant community. It’s a foundational piece of our platform, but there’s much more. Here is how we solved the challenges outlined above:

Ease of use

We have abstracted away the complexities of running Apache Flink and made it serverless. Users don’t have to think about infrastructure and can instead focus on writing queries. DeltaStream handles all the operations, including fault tolerance and elasticity.

Single Data Layer

DeltaStream can read across many modern streaming stores, databases, and data lakes. We then organize this data into a logical hierarchy, making it easy to analyze and process the underlying data. 

Standardize Access

We built a governance layer to manage access through fine-grain permissions across all data rather than across disparate data stores. For example, you would manage access to data in your Kafka clusters, Kinesis streams all within DeltaStream.

Enable High Scale and Resiliency

Each DeltaStream query is run in isolation, eliminating the “noisy neighbor” problem. Queries can be scaled up/down independently.

Flexible Deployment Models

In addition to our cloud service, we provide BYOC for companies that want more control of their data. This is essential for highly regulated industries and companies with strict data security policies. 

Also, with DeltaStream, we wanted to go beyond Flink and provide a full suite of analytics by enabling users to build real-time materialized views with sub-second latency. 

What’s next for DeltaStream

We’re just getting started. Here are a few things we’re planning:

  • Increase the number of stores we can read and write to. This includes platforms such as Apache Iceberg and Clickhouse.
  • Increasing the number of clients/adaptors we support, including dbt and Python.
  • Multi-Cloud
  • Leverage AI to enable users with no SQL knowledge to interact with DeltaStream

If you are a streaming and real-time data enthusiast and would like to help build the future of streaming data, please reach out to us; we are hiring for engineering and GTM roles!

If you want to experience how DeltaStream enables users to maximize the value of their streaming data, try it for yourself by heading to deltastream.io.

Finally, I would like to thank our customers, community, team, and partners—including our investors—for their unwavering support. Together, we are making stream processing a reality for organizations of all sizes.

17 Sep 2024

Min Read

DeltaStream Raises $15M in Series A Funding to Deliver the Future of Real-Time Stream Processing in Cloud

DeltaStream, Inc., the serverless stream processing platform, today announced it has raised $15M in Series A funding from New Enterprise Associates (NEA), Galaxy Interactive and Sanabil Investments. The funding will accelerate the company’s vision of enabling a complete serverless stream processing platform to manage, secure and process all streaming data, irrespective of data source or pipeline.

Event streaming platforms such as Apache Kafka have become essential in today’s data-driven industries, with many sectors adopting real-time data streaming. As AI  advances, real-time data is becoming even more critical for  applications.

DeltaStream, founded by Hojjat Jafarpour, CEO and creator of ksqlDB, helps  organizations build real-time streaming applications and pipelines with SQL in minutes. The  platform leverages the power of Apache Flink© while simplifying its complexity, making it accessible to businesses of all sizes–from startups to large enterprises. DeltaStream is available as a fully managed service or a bring-your-own-cloud (BYOC) option.

“Streaming data is essential for modern apps, but it’s been difficult and costly to get value from it. DeltaStream’s serverless platform simplifies infrastructure management, allowing users to focus on building their applications,” said Hojjat Jafarpour. “To fully benefit from real-time data, customers need an organized view across all data stores, role-based access control, and secure sharing of real-time data. What Databricks and Snowflake did for stored data, DeltaStream does for streaming data.”

DeltaStream also integrates with Databricks and Snowflake, letting customers create real-time pipelines that quickly move data from streaming platforms like Apache Kafka to these systems.

“We’re seeing rapid adoption of streaming data and the rise of platforms such as Apache Flink©,” NEA partner and DeltaStream board member Aaron Jacobson explained. “However, one of the main challenges of using such systems has been their operational complexity. With DeltaStream, users have the power of Apache Flink© without having to deal with its complexity resulting in significantly cost effective and accelerated time to market for real-time data applications.”

“DeltaStream is leading the charge in real-time streaming, where speed, low latency and intelligent decision-making are critical for businesses to maintain a competitive edge.” said Jeff Brown, Galaxy Interactive Partner, “Their enterprise grade, secure, and scalable solution simplifies complex stream processing, allowing teams to focus on deriving insights rather than managing infrastructure.”

About DeltaStream:

DeltaStream’s innovative stream processing platform harnesses the power of Apache Flink© to simplify and easily process real-time data. Furthermore, the platform provides governance, organization and secure sharing capabilities for streaming data across all streaming storage platforms including Apache Kafka, Apache Pulsar AWS Kinesis and many more. In addition to its SaaS offering, DeltaStream’s platform is also available in private SaaS(also known as Bring Your Own Cloud) deployment to address the needs of regulated industries with high data privacy and security requirements. DeltaStream’s platform seamlessly integrates with both Databricks and Snowflake platforms enabling customers to build real-time data pipelines to make data available in Databricks and Snowflake seconds after it is available in Streaming platforms such as Apache Kafka.
https://www.deltastream.io/

DeltaStream is exhibiting this week at the Current Conference in Austin, Texas. 

14 May 2024

Min Read

DeltaStream Joins the Connect with Confluent Partner Program

We’re excited to share that DeltaStream has joined the Connect with Confluent technology partner program.

Why this partnership matters

Confluent is a leader in streaming data technology, used by many industry professionals. This collaboration enables organizations to process and organize their Confluent Cloud data streams easily and efficiently from within DeltaStream. This breaks down silos and opens up powerful insights into your streaming data, the way it should be.

Build real-time streaming applications with DeltaStream

DeltaStream is a fully managed stream processing platform that enables users to deploy streaming applications in minutes, using simple SQL statements. By integrating with Confluent Cloud and other streaming storage systems, DeltaStream users can easily process and organize their streaming data in Confluent or wherever else their data may live. Powered by Apache Flink, DeltaStream users can get the processing capabilities of Flink without any of the overhead it comes with.

Unified view over multiple streaming stores

DeltaStream enables you to have a single view into all your streaming data across all your streaming stores. Whether you are using one Kafka cluster, multiple Kafka clusters, or multiple platforms like Kinesis and Confluent, DeltaStream provides a unified view of the streaming data and you can write queries on these streams regardless of where they are stored.

Break down silos with secure sharing

With the namespacing, storage abstraction and role based access control, DeltaStream breaks down silos for your streaming data and enables you to share streaming data securely across multiple teams in your organizations. With all your Confluent data connected into DeltaStream, data governance becomes easy and manageable.

How to configure the Confluent connector

While we have always supported integration with Kafka and continue to do so, we have now simplified the process for integrating with Confluent Cloud by adding a specific “Confluent” Store type. To configure access to Confluent Cloud within DeltaStream, users can simply choose “Confluent” as the Store type while defining their Store. Once the Store is defined, users will be able to share, process, and govern their Confluent Cloud and other streaming data within DeltaStream.

To learn how to create a Confluent Cloud Store, either follow this tutorial or watch the video below.

Getting Started

To get started with DeltaStream, schedule a demo with us or sign up for a free trial. You can also learn more about the latest features and use cases on our blogs page.

09 Jan 2024

Min Read

Bundle Queries with Applications by DeltaStream

Stream processing has turned into an essential part of modern data management solutions. It provides real-time insights which enable organizations to make informed decisions in a timely manner. Stream processing workloads are often complex to write and expensive to run. This is due to the high volumes of data that are constantly flowing into these workloads and the need for the results to be produced with minimal delay.

In the data streaming world, it’s common to think about your stream processing workload as pipelines. For instance, you may have a stream processing job ingest from one data stream, process the data, then write the results to another data stream. Then, another query will ingest the results from the first query, do further processing, and write another set of results to a third stream. Depending on the use case, this pattern of reading, processing, and writing continues until eventually you end up with the desired set of data. However, these intermediate streams may not be needed for anything other than being ingested by the next query in the pipeline. Reading, storing, and writing these intermediate results costs money in the form of network I/O and storage. For SQL-based stream processing platforms, one solution is to write nested queries or queries containing common table expressions (CTEs), but for multi-step pipelines, queries written in this way can become overly complex and hard to reason through. Furthermore, it may not even be possible to use nested queries or CTEs to represent some use cases, in which case materializing the results and running multiple stream processing jobs is necessary.

To make stream processing simpler, more efficient, and more cost effective, we wanted to have the capability of combining multiple statements together. We did this by creating a new feature called Applications, which has the following benefits:

  • Simplify the workload: Users can simplify a complex computation logic by dividing it into several statements without additional costs. This helps break down a complicated workload into multiple steps to improve readability of the computation logic and reusability of results. Smaller, distinct statements will also facilitate debugging by isolating the processing steps.
  • Reduce the load on streaming Stores: An Application in DeltaStream optimizes I/O operations on streaming Stores in two ways. First, the Application will only read from a unique source Relation’s topic once. This reduces the read operations overhead when multiple queries in the Application consume records from the same Relation. Second, users can eliminate the reads/writes from intermediate queries by specifying “Virtual Relations” in the Application. “Virtual Streams” and “Virtual Changelogs” are similar to regular Streams and Changelogs in DeltaStream, but they are not backed by any physical streaming Store. Instead, Virtual Relations are for intermediate results and other statements in the Application are free to read/write to them.
  • Reduce overall execution cost and latency: All statements in an Application run within a single runtime job. This not only reduces the overall execution cost by minimizing the total number of jobs needed for a workload, but also enhances resource utilization. Packing several statements together facilitates efficient resource sharing and lowers scheduling overhead for the shared resources. Additionally, the optimized I/O operations on streaming Stores (as previously mentioned) along with less network traffic from/to those Stores contribute to the overall cost and latency reduction.

Simplifying Data Workloads with Applications

Let’s go over an example to show how Applications can help users write a workload in a simpler and more efficient manner.

Assume we are managing an online retail store and we are interested in extracting insights on how users visit different pages on our website. There are two Kafka Stores/clusters, one in the US East region and one in the US West to store page views in each region. Registered users’ information is also stored separately in the US West Store. We have the following Relations defined on topics from these Stores:

  • “pageviews_east” and “pageviews_west” are two Streams defined on the topics in the US East and US West stores, respectively
  • “users_log” is a Changelog defined on the users’ information topic in the US west Store, using the “userid” column as the primary key

You can find more details about Stores, Streams and Changelogs in DeltaStream and how to create them here.

Our online advertisement team is curious to find out which product pages are popular as users are browsing the website. A page is popular if it is visited by at least 3 different female users from California in a short duration. Using the three Relations we defined above, we’ll introduce a solution to find popular pages without using an Application,then compare that with an approach that uses an Application.

“No Application” Solution

One way to find popular pages is by writing 4 separate queries: 

  • Query 1 & Query 2: Combine pageview records from the “pageviews_west” and “pageviews_east” Streams into a single relation. The resulting Stream is called “combined_pageviews”.
  • Query 3: Join “combined_pageviews” records with records from “users_log” to enrich each pageviews record with its user’s latest information. The resulting Stream is called “enriched_pageviews”.
  • Query 4: Group records in “enriched_pageviews” by their “pageid” column, aggregate their views, and find those pages that meet our popular page criteria.    

Figure 1 shows how the data flows between the different Relations (shown as rounded boxes) and the queries (shown as colored boxes). Each query results in a separate runtime job and requires its own dedicated resources to run. The dashed arrows between the Relations and the Stores indicate read and write operations against the Kafka topics backing each of the Relations. Moreover, given that each Relation is backed by a topic, all records are written into a persistent Store, including records in the “combined_pageviews” and “enriched_pageviews” Streams.

Solution with Application

Ideally, we are interested in reducing the cost of running our workload without modifying its computation logic. In the “No Application” solution above, while the records for “combined_pageviews” and “enriched_pageviews” are persisted, we don’t really need them outside the Application context. They are intermediate results, only computed to prepare the data for finding popular pages. “Virtual Relations” can help us skip materializing intermediate query results.

We can write an Application with Virtual Relations to find popular pages, as shown below. This Application creates a single long-running job and helps reduce costs in two ways:

  1. By using Virtual Relations, we can avoid the extra cost of writing intermediate results into a Store and reading them again. This will reduce the network traffic and the read/write load on our streaming Stores.
  2. By packing several queries into a single runtime job, we can use our available resources more efficiently. This will reduce the number of streaming jobs we end up creating in order to run our computation logic.

Here is the Application code for our example:

  1. BEGIN APPLICATION popular_pages_app
  2.  
  3. -- statement 1
  4. CREATE VIRTUAL STREAM virtual.public.combined_pageviews AS
  5. SELECT *
  6. FROM pageviews_east;
  7.  
  8. -- statement 2
  9. INSERT INTO virtual.public.combined_pageviews
  10. SELECT *
  11. FROM pageviews_west;
  12.  
  13. -- statement 3
  14. CREATE VIRTUAL STREAM virtual.public.enriched_pageviews AS
  15. SELECT v.userid, v.pageid,
  16. u.gender, u.contactinfo->`state` AS user_state
  17. FROM virtual.public.combined_pageviews v JOIN users_log u
  18. ON u.userid = v.userid;
  19.  
  20. -- statement 4
  21. CREATE CHANGELOG popular_pages WITH ('store'='us_west_kafka', 'topic.partitions'=1, 'topic.replicas'=3) AS
  22. SELECT pageid, count(DISTINCT userid) AS cnt
  23. FROM virtual.public.enriched_pageviews
  24. WHERE (UNIX_TIMESTAMP() - (rowtime()/1000) < 30) AND
  25. gender = 'FEMALE' AND user_state = 'CA'
  26. GROUP BY pageid
  27. HAVING count(DISTINCT userid) > 2;
  28.  
  29. END APPLICATION;

There are 4 statements in this Application. They look similar to the 4 queries used in the “no Application” solution above. However, the “combined_pageviews” and “enriched_pageviews” Streams are defined as Virtual Relations in the Application.

Figure 2 illustrates how the data flows between the different statements and Relations in the Application. Compared to Figure 1, note that the “combined_pageviews” and “enriched_pageviews” Virtual Streams (shown in white boxes) do not have dashed arrows leading to a Kafka topic in the storage layer. This is because Virtual Relations are not backed by physical storage, and thus reads and writes to the Store for these Virtual Relations are eliminated, reducing I/O and storage costs. In addition, the 4 queries in Figure 1 generate 4 separate streaming jobs, whereas all processing happens within a single runtime job in the solution using an Application.

Platform for Efficient Stream Processing

In this blog, we compared two solutions for a stream processing use case, one with an Application and one without. We showed how Applications and Virtual Relations can be used to run workloads more efficiently, resulting in reduced costs. The SQL syntax for Application helps users simplify their complex computation logic by breaking it into several statements, and allows reusing intermediate results with no additional cost. Stay tuned for more content on Application in the future, where we’ll dive more deeply into the individual benefits of running Applications and include more detailed examples.

DeltaStream is easy to use, easy to operate, and scales automatically. If you are ready to try a modern stream processing solution, you can reach out to our team to schedule a demo or start your free trial.

alert-icon

Please enter a valid email address.

Request Submitted

Thank you for requesting a demo.
You will receive your login information to your email soon.