05 Aug 2025

Min Read

Building a Real-Time Generative AI Pipeline for Sepsis Detection with DeltaStream

The Use Case: Early Sepsis Detection
The Use Case: Early Sepsis Detection
The End-to-End Architecture
The End-to-End Architecture
Part 1: Simulating the Real World (Java/Python & Kafka)
Part 1: Simulating the Real World (Java/Python & Kafka)
Part 2: The Real-Time Brain (DeltaStream)
Part 2: The Real-Time Brain (DeltaStream)
Step 1: Create a Stream on the Kafka Topic
Step 1: Create a Stream on the Kafka Topic
Step 2: Calculate Clinical Features
Step 2: Calculate Clinical Features
Step 3: Construct the LLM Prompt
Step 3: Construct the LLM Prompt
Step 4: Call the LLM and Store the Response
Step 4: Call the LLM and Store the Response
Part 3: Closing the Loop - Taking Action
Part 3: Closing the Loop - Taking Action

Hojjat Jafarpour

Founder & CEO

In modern hospitals, nurses and doctors are inundated with data from countless monitors and systems. Yet, one of the most dangerous conditions, sepsis, often develops from subtle, overlapping changes in a patient's vital signs that can be easily missed during routine, periodic checks. The challenge isn't a lack of data, but a lack of real-time, contextual insight. What if we could build a system that continuously watches for these developing patterns, acting as a vigilant digital guardian for every patient?

This is now possible. Using medical-grade wearable sensors, we can capture a continuous, high-fidelity stream of patient data. But raw data alone is not enough; a stream of heart rate numbers without context is just noise. To be effective, we need a way to process this data in real-time, understand its trajectory, and infer a patient's condition as it evolves.

In this post, we will design a complete, end-to-end inference pipeline for the early detection of sepsis in post-surgical patients. We'll show how to use Kafka to stream the data and DeltaStream to process it, call a Generative AI model for clinical inference, and serve the results—all within a single, powerful real-time platform.

The Use Case: Early Sepsis Detection

Sepsis is a life-threatening condition where the body's response to an infection damages its own tissues. The early signs are often a combination of factors—such as an elevated heart rate, high respiratory rate, and abnormal temperature—that, in isolation, might not seem alarming. A wearable vitals sensor continuously tracks the key vitals relevant to sepsis detection:

Heart Rate (HR) at rest
Respiratory Rate (RR) at rest
Skin Temperature
Activity Level

Our goal is to build a system that analyzes the continuous stream of these vitals to flag patients at high risk of developing sepsis before a full-blown crisis occurs. This automated approach helps clinicians spot danger earlier by continuously evaluating criteria similar to clinical standards like the qSOFA (Quick Sequential Organ Failure Assessment) score.

The End-to-End Architecture

Our inference pipeline will consist of four stages, creating a highly efficient, low-latency flow from data generation to an actionable alert, with DeltaStream at the center.

Data Generation (Sensor -> Kafka): A simulator acts as the sensor's gateway, sending a constant stream of patient vital signs into a Kafka topic. This represents the raw data feed from the hospital floor.
Real-Time Feature Engineering (DeltaStream): DeltaStream ingests the raw vitals from Kafka. It uses continuously running SQL queries to transform the noisy, high-volume stream into clinically relevant features and trends, such as moving averages and standard deviations.
In-Stream Inference (DeltaStream UDF): A User-Defined Function (UDF) within DeltaStream calls a healthcare-focused LLM API directly. This passes the real-time features to the AI to get a clinical assessment as the data is being processed, eliminating the need for a separate microservice.
Alerting Sink: The final results, including the LLM's risk score and justification, are written from DeltaStream to a downstream sink (e.g., another Kafka topic, a webhook, or a monitoring dashboard) for immediate action by clinical staff.

Part 1: Simulating the Real World (Java/Python & Kafka)

First, our system needs data. We'll create a synthetic data generator that mimics a fleet of wearable sensors sending data for multiple patients. Using realistic, synthetic data is a critical first step for safely developing and validating the pipeline's logic before it's used in a clinical setting. Each message is a JSON object representing a single reading. You can find the runnable data generator code at https://github.com/deltastreaminc/examples/sepsis_detection.

Kafka Topic: bio_vitals

Sample JSON Message:

{
    "patient_id": "PID-1003",
    "event_timestamp": "2025-07-27 06:53:31.132182Z",
    "heart_rate_bpm": 76,
    "respiratory_rate_rpm": 16,
    "skin_temp_c": 36.96,
    "activity_level": "REST"
}

A simple script would continuously generate these messages with slight variations and push them into the bio_vitals Kafka topic. For at-risk patients, the script would simulate gradually worsening vital signs.

Part 2: The Real-Time Brain (DeltaStream)

This is where we turn raw data into a final, actionable verdict. We'll use DeltaStream's powerful SQL interface to process the bio_vitals stream and call our LLM. This is done by building a chain of CHANGELOG streams, where each step refines the data, making the pipeline modular and easy to debug.

Log into your DeltaStream workspace and run the following DDL.

Step 1: Create a Stream on the Kafka Topic

First, we tell DeltaStream about our Kafka source. This CREATE STREAM statement creates a live, queryable "window" into the data flowing through Kafka.

-- Create a Stream to represent the raw vital signs flowing through Kafka.
CREATE STREAM bio_vitals_stream (
  patient_id VARCHAR,
  event_timestamp TIMESTAMP_LTZ(6),
  heart_rate_bpm INT,
  respiratory_rate_rpm INT,
  skin_temp_c DECIMAL(5, 2),
  activity_level VARCHAR
) WITH (
  'topic' = 'bio_vitals',
  'value.format' = 'json',
  'timestamp' = 'event_timestamp'
);

Step 2: Calculate Clinical Features

Next, we create our first CHANGELOG stream. It consumes the raw vitals and calculates key clinical features over a 15-minute tumbling window.

-- Create a changelog stream to hold the calculated features for each patient window.
CREATE CHANGELOG vital_features AS 
SELECT
    patient_id,
    window_start,
    window_end,
    AVG(heart_rate_bpm) AS avg_hr_15m,
    STDDEV(heart_rate_bpm) AS stddev_hr_15m,
    AVG(respiratory_rate_rpm) AS avg_rr_15m,
    MAX(skin_temp_c) AS max_temp_c_15m
  FROM TUMBLE(bio_vitals_stream, SIZE 15 MINUTE)  
  GROUP BY
    patient_id, window_start, window_end;

Step 3: Construct the LLM Prompt

Using the features from the previous step, we create another CHANGELOG stream to construct the precise text prompt that will be sent to the Gemini LLM.

-- Create a changelog stream that transforms the features into a text prompt.
CREATE CHANGELOG llm_input AS 
SELECT
    patient_id,
    window_start,
    window_end,
    -- Create a detailed prompt string to send to the Gemini UDF
    'Analyze the following patient vitals for sepsis risk and respond with a JSON object containing "risk_score" and "justification". Vitals: ' ||
    'avg_hr_15m=' || CAST(avg_hr_15m AS VARCHAR) || ', ' ||
    'stddev_hr_15m=' || CAST(stddev_hr_15m AS VARCHAR) || ', ' ||
    'avg_rr_15m=' || CAST(avg_rr_15m AS VARCHAR) || ', ' ||
    'max_temp_c_15m=' || CAST(max_temp_c_15m AS VARCHAR) || '. Make sure the returned JSON starts with { and ends with } and is parsable without requiring any changes.' AS prompt
  FROM vital_features;

Step 4: Call the LLM and Store the Response

Now, we call your pre-defined call_gemini_explainer(prompt) UDF and store its raw JSON response in our final CHANGELOG stream.

-- Create the final changelog stream that calls the LLM and stores the raw response.
CREATE CHANGELOG patient_sepsis_alerts AS 
  SELECT
  patient_id,
  window_start,
  window_end,
  call_gemini_explainer(prompt) AS json_response
FROM llm_input;

Part 3: Closing the Loop - Taking Action

The patient_sepsis_alerts changelog now contains the final output of our pipeline. We can query this stream and parse the JSON response on the fly to drive our alerting systems.

-- Query the final changelog to parse the LLM response and create actionable alerts.
SELECT
  patient_id,
  window_end,
  -- Parse the 'risk_score' field from the JSON string returned by the UDF
  JSON_VALUE(
    json_response,
    '$.risk_score'
  ) AS sepsis_risk_score,
  -- Parse the 'justification' field from the JSON string
  JSON_VALUE(
    json_response,
    '$.justification'
  ) AS justification
FROM patient_sepsis_alerts;

This final query can be configured to sink its results directly to an alerting system:

Push to Kafka: Send high-risk alerts to a dedicated sepsis_alerts Kafka topic for integration with a hospital's Electronic Health Record (EHR) systems.
Webhook: Trigger a webhook to send a real-time message to a clinical communication platform like Slack or Microsoft Teams.
Dashboard: Serve as the direct source for a real-time monitoring dashboard for charge nurses and physicians.

By embedding the LLM call within DeltaStream, we have built a highly efficient, scalable, and elegant real-time inference pipeline. We've eliminated intermediate services, reduced latency, and simplified the overall architecture, creating a powerful system that bridges the gap between raw sensor data and life-saving clinical action.

The runnable DeltaStream SQL code for this use case is available at https://github.com/deltastreaminc/examples/sepsis_detection.

This blog was written by the author with assistance from AI to help with outlining, drafting, or editing.

Hojjat Jafarpour

Founder & CEO

Building a Real-Time Generative AI Pipeline for Sepsis Detection with DeltaStream

Table of contents

The Use Case: Early Sepsis Detection

The End-to-End Architecture

Part 1: Simulating the Real World (Java/Python & Kafka)

Part 2: The Real-Time Brain (DeltaStream)

Step 1: Create a Stream on the Kafka Topic

Step 2: Calculate Clinical Features

Step 3: Construct the LLM Prompt

Step 4: Call the LLM and Store the Response

Part 3: Closing the Loop - Taking Action

Your GenAI Is Flying Blind. It’s Time to Give It Real-Time Vision

Beyond the Prompt: Why Your GenAI Agent is Stuck in the Past (And How to Fix It)

From Minutes to Milliseconds: Slashing Waste and Boosting ROI with Real-Time Quality Control

Table of contents

The Use Case: Early Sepsis Detection

The End-to-End Architecture

Part 1: Simulating the Real World (Java/Python & Kafka)

Part 2: The Real-Time Brain (DeltaStream)

Step 1: Create a Stream on the Kafka Topic

Step 2: Calculate Clinical Features

Step 3: Construct the LLM Prompt

Step 4: Call the LLM and Store the Response

Part 3: Closing the Loop - Taking Action

Your GenAI Is Flying Blind. It’s Time to Give It Real-Time Vision

Your GenAI Is Flying Blind. It’s Time to Give It Real-Time Vision

Beyond the Prompt: Why Your GenAI Agent is Stuck in the Past (And How to Fix It)

Beyond the Prompt: Why Your GenAI Agent is Stuck in the Past (And How to Fix It)

From Minutes to Milliseconds: Slashing Waste and Boosting ROI with Real-Time Quality Control

From Minutes to Milliseconds: Slashing Waste and Boosting ROI with Real-Time Quality Control

Request Submitted

Share this blog post