Imagine you are a financial institution and are receiving a stream of credit card transactions that your customers are performing anytime, anywhere. You need to process the transactions and detect if any of them is fraudulent and if so block the fraudulent transaction. Timeliness of such processing is essential and you cannot rely on nightly or even hourly jobs to perform the processing and detect the fraud. By the time a periodic batch processing job detects a fraudulent transaction, the transaction has been approved and your institution has suffered the loss. However, by employing stream processing, you can detect and respond to such fraud scenarios with sub second latency and prevent substantial financial loss for your institution. This is just one example how accessing fresh, low latency data can provide a huge competitive advantage to enterprises. You can see similar use cases in areas like banking, Internet of Things, retail, IT, gaming, health care, manufacturing and many more. Customers and users demand low latency services and organizations that provide such service gain significant competitive advantage.
Stream processing, then and now
When I joined Confluent and started ksqlDB (formerly known as KSQL) project, streaming storage platforms such as Apache Kafka were mainly used by tech savvy Silicon Valley companies. Most of the data was at rest and people were reluctant to deal with the complexity of introducing streaming in their architecture. Fast forward six years to 2022 and event streaming systems such as Apache Kafka are one of the main components of modern data infrastructure. Many enterprises have adopted or are in the process of adopting event streaming platforms as the central nervous system of their data infrastructure. Furthermore, availability of such systems as managed services on cloud has made adoption even more compelling. Confluent Cloud, AWS Kinesis, Azure Event hub and GCP Pub/Sub are just a few of such streaming storage services that are available on cloud.
Adoption of such streaming storage services also resulted in many applications to be built on top of such platforms. Such applications enabled the processing and reacting to events in sub second latency which in turn resulted in enormous financial gain for enterprises. An online retail business can increase its revenue by analyzing customer behavior in real-time and recommending the right products while the user is shopping online. Such analysis cannot be done in batch mode since by the time the result is available, the customer has left the online store. Such recommendation applications along with many others such as streaming pipelines, anomaly detection, customer 360, click stream analysis, inventory logistics and log aggregation are just a few of applications that are built on top of platforms like Apache Kafka. However, building real-time streaming applications has been a challenging endeavor requiring highly skilled teams of developers in distributed systems and data management. Delivery guarantees, fault tolerance, elasticity and security are just a few of many challenges that make building real-time streaming applications out of reach for many organizations. Furthermore, even if organizations overcome these challenges and build real-time applications, operating such applications 24/7 in reliable, secure and scalable ways is a huge burden on the teams.
DeltaStream is a serverless stream processing platform to manage, secure and process all your streams on cloud. We built DeltaStream to take away the complexity from building and operating scalable, secure and reliable real-time streaming applications and make it as easy, fast and accessible as possible. To achieve this goal we are bringing the tried and true benefits of relational data management to the streaming world. Relational databases have successfully been used to manage and process data at rest for the last few decades and have played a crucial role in democratizing access to data in organizations. In addition to processing capabilities, these systems also provide capabilities to organize and secure data in a familiar way. With DeltaStream, we bring not just familiar processing capabilities to the streaming world, but also provide similar ways to manage and secure streaming data, a unique differentiator capability in DeltaStream.
The following are some of the principals that DeltaStream has built upon and make it a unique service:
- DeltaStream is serverless: this means as a developer, data engineer or anyone who interacts with real-time streaming data, you don’t have to provision, scale or maintain servers or clusters to build and run your real-time applications. No need to decide how big of a deployment or how many tasks to allocate to your applications. DeltaStream takes care of all those complexities so you can focus on building your core products that bring value to you and your customers instead of worrying about managing and operating distributed stream processing infrastructure. Indeed, there is no notion of cluster or deployment in DeltaStream and you can assume unlimited resources available for your applications while you are building them. You only pay for what you use and DeltaStream seamlessly scales up or down your applications as needed and recovers them from any failures.
- Embrace SQL database model: for the past few decades, SQL databases have proven to be a great way to manage and process data. Simplicity and ubiquity of SQL has made it easy to query and access the data. However, many real-time streaming systems either do not utilize these capabilities or only try to use them partially for expressing processing logic with SQL and ignore other capabilities such as managing and securing access to the data. DeltaStream brings all the capabilities you know and are used to in the SQL databases for data at rest to the streaming world.
- SQL: DeltaStream enables users to easily build real-time applications and pipelines with familiar SQL dialect. From simple stateless processing such as filtering and projections to complex stateful processing such as joins and aggregations can be done in DeltaStream with a few lines of SQL. DeltaStream seamlessly provides desired delivery guarantees (exactly once or at least once), automatic checkpointing and savepointing for elasticity and fault tolerance.
- Organizing your data in motion: Similar to SQL databases, DeltaStream enables you to organize your streaming data in databases and schemas. A database is a logical grouping of schemas and a schema is a logical grouping of objects such as streams, tables and views. This is the basis of namespacing in DeltaStream and enables our users to organize their data much more effectively compared to a flat namespace.
- Securing your data in motion: securing your data is one of the foundational features in DeltaStream. In addition to common security practices such as data security, authentication and authorization, DeltaStream provides the familiar Role- based Access Control(RBAC) model from SQL databases that enables users to control who can access data and what operations they can perform with data. Users can define roles and grant or revoke privileges the same way they do in other SQL databases. Combination of DeltaStream’s namespacing and security capabilities provide a powerful tool for the users to secure their data in motion.
- Separation of Compute and Storage: DeltaStream architecture separates the compute and storage resulting in the well known benefits such as elasticity, cost efficiency and high availability. Additionally, DeltaStream’s model of providing the compute layer on top of users’ streaming storage systems, such as Apache Kafka or AWS Kinesis, eliminates the need for data duplication and doesn’t add unnecessary latency to the real-time applications and pipelines. DeltaStream also is agnostic to the underlying storage service and can read from and write into data in motion storage services such as Apache Kafka or AWS Kinesis and data at rest storage services such as AWS S3 or DataLakes. Such flexibility gives DeltaStream the capability to provide an abstraction layer on top of many storage services where users can read data from one or more services, perform desired computation and write the results in one or more storage services seamlessly.
As a cloud service, DeltaStream provides a REST API with GraphQL. There are three ways of using it today.
- Web App: DeltaStream web app provides a browser-based application that users can interact with the service
- Command Line Interface(CLI): Users can also use the DeltaStream cli application to interact with the service through their terminal. The CLI provides all the functionalities that the Web App provides.
- Direct access to Rest API: Users can directly access the service through the provided Rest API. This enables users to integrate DeltaStream into their applications or their CI/CD pipelines.
Currently, DeltaStream is available on AWS cloud and we plan to offer it on GCP and Azure soon.
Today, we are excited to announce we have raised a $10M seed round led by New Enterprise Associates (NEA). This funding will allow us to speed ahead in providing DeltaStream. We are also announcing the availability of DeltaStream’s Private Beta. If you are on AWS and use any event streaming service such as Confluent Cloud, Redpanda, AWS MSK or AWS Kinesis, please consider joining our Private Beta program to try DeltaStream and help shape our product roadmap. The Private Beta is Free and we would only ask for your feedback in using DeltaStream. To join our Private Beta program, please fill out the form here and tell us about your use cases.
We are also expanding our team and hiring for different roles. If you are passionate about building streaming systems, drop us a line and let’s chat.