25 Nov 2025

Min Read

Real-Time Context vs Real-Time Inference: Two Essential Patterns for Modern GenAI (and How DeltaStream Powers Both)

Pattern #1: Real-Time Context for Agents
Pattern #1: Real-Time Context for Agents
How it works
How it works
Pattern #2: Real-Time Inference Pipelines
Pattern #2: Real-Time Inference Pipelines
How it works
How it works
Real-Time Context vs Real-Time Inference: Key Differences
Real-Time Context vs Real-Time Inference: Key Differences
Which Should You Use?
Which Should You Use?
Final Thoughts
Final Thoughts

Hojjat Jafarpour

Founder & CEO

Over the past year, GenAI systems have evolved from simple chatbots into agentic systems that take actions, interact with tools, and process live information. As these systems mature, one thing has become clear:

Real-time streaming data is the backbone of intelligent agents and automated pipelines.

DeltaStream sits at the center of this shift, providing the streaming-native engine that makes live context and live inference both possible at scale.

Across security, fintech, SaaS, IoT, and AI-driven automation, we’ve found that teams often build one of two patterns:

Real-Time Context for Agents
Real-Time Inference Pipelines

These patterns look similar on the surface, but they serve different roles and are optimized for different problems. Both can (and often should) coexist.

This post will explain the differences, the architectures, and when to use each with realistic, modern GenAI use cases.

Pattern #1: Real-Time Context for Agents

“Give an LLM or Agent the freshest, most accurate picture of what’s happening right now.”

Modern AI agents, whether built with OpenAI Agent Builder, LangChain, or internal frameworks, rely heavily on tools. These tools query external systems: databases, APIs, vector stores, and now, streaming state.

But agents are stateless by design. They don’t store long-term memory or real-time state internally. This is where DeltaStream becomes the Real-Time Context Engine.

How it works

Ingest streaming + CDC sources
Kafka, Kinesis, Pulsar, Debezium/Postgres CDC, event logs, telemetry.
Use SQL to join, aggregate, and compute live state
- Tumbling/Hopping windows
- Real-time counters
- Rolling risk scores
- Feature aggregates
- Pattern detection
Expose the result as Materialized Views (MVs)
These are continuously updated tables that reflect the current world. With its built-in MCP server, DeltaStream makes it seamless! See here for a demo video as an example.
Agents query these MVs via tool calls (MCP, REST, SQL)
The agent receives structured, trustworthy, fresh context.

What it enables

Agents can now:

Explain what is happening (root cause, anomalies, spikes)
Make informed decisions
Use live features for recommendations or prioritization
Provide narrative reasoning supported by real-time facts

Real-world use cases

1. SOC Copilot / Security Analyst Assistants

DeltaStream maintains:

failed-logins (5m window)
IDS alert counts
Network exfil signals
Per-user/Per-IP risk scores

Agent can answer:

"User alice shows high risk due to a cluster of failed logins and a critical IDS alert in the last 3 minutes."

2. Customer Support Copilot

DeltaStream maintains:

Recent support tickets
Churn signals
Billing events
Product usage spikes

Agent can respond:

“This customer is a VIP, opened two high-priority tickets in the last hour, and shows churn indicators.”

3. RevOps / Sales Agent

DeltaStream maintains:

Product usage
Engagement
Expansion indicators
Contract milestones

Agent can say:

“Account X has a 30% increase in usage this week, a great target for expansion outreach.”

When to use Real-Time Context

Use this pattern when:

A human or agent asks questions
You need explainability
You need consolidated state built from many signals
Data must be fresh but pre-aggregated and prepared
Responses depend on the "latest situation"

Pattern #2: Real-Time Inference Pipelines

“Score, classify, enrich, or route every event as it flows through the stream.”

This pattern is event-driven, not question-driven. Instead of waiting for an agent to ask for context, every incoming event can trigger inference or transformation.

DeltaStream supports this via:

Streaming SQL
Stateful processing
Joins/windowing
Real-time feature computation
UDFs and the call_model function (LLMs + ML models)
Sinking results back to Kafka, databases, warehouses, or Delta Lake

How it works

Events arrive (transactions, logs, content, telemetry).
DeltaStream computes features (windowed counts, anomaly flags).
If needed, DeltaStream calls a model (LLM or ML) using call_model.
Output is routed to downstream systems.

There is no agent involved and the system is fully automated.

Real-world use cases

1. Real-Time Fraud Detection

Stream: transactions, device info, login activity.
DeltaStream computes velocity + abnormal patterns.
Once fraud detected, call_model('fraud-llm', features) classifies risk.
Output -> fraud_decisions Kafka topic.

2. Content Moderation

Stream: chat messages, forum posts, uploads.
DeltaStream: extract text + metadata.
LLM predicts:
- Toxicity
- Policy violation
- Category
Output: “allow / block / escalate”.

3. Real-Time Ticket Routing

Stream: incoming support tickets/emails.
LLM extracts:
- Intent
- Urgency
- Product area
Route to correct queue instantly.

4. Recommendation Scoring

Stream: pageviews, clicks, interactions.
call_model generates embeddings or category predictions.
Downstream recommendation system uses output immediately.

When to use Real-Time Inference Pipelines

Use this pattern when:

Each event requires automated scoring
There is no human-in-the-loop
You need high throughput (100k+ events/sec)
Latency matters
Results are consumed by systems, not agents
Workloads are repetitive and predictable

Be mindful when invoking LLMs in these scenarios, as frequent calls can quickly drive up costs. Ideally, LLM inference should occur only when truly necessary—not for every incoming event. DeltaStream pipelines can enforce this by filtering and aggregating events so that LLMs are triggered selectively rather than on a per-event basis.

Real-Time Context vs Real-Time Inference: Key Differences

Which Should You Use?

Choose Real-Time Context if:

You’re building agentic systems (security copilot, support copilot, SRE copilot).
Humans will ask exploratory questions.
The LLM needs live, explainable, consolidated state.
Multiple agents or apps will reuse the same real-time view.

Choose Real-Time Inference Pipelines if:

You need fully automated, per-event decisions.
You’re doing real-time scoring/classification/routing.
Latency and throughput are critical.
Downstream systems rely on enriched/processed events.

Use both if:

You score events in real time and
Agents need to understand / explain / analyze the results.

Example:
Fraud pipeline scores events → Fraud Copilot uses those results to conduct investigations. This pipeline + context pairing is becoming the standard architecture for GenAI in production.

Final Thoughts

As LLM-powered agents and real-time automation become central to modern AI systems, teams need infrastructure that can:

Ingest and correlate streaming data
Build real-time state
Run real-time inference
Expose live results to agents and downstream services

DeltaStream is uniquely positioned here: combining a streaming SQL engine, real-time MVs, continuous computation, and LLM/ML inference into one platform.

Whether you're building an autonomous fraud engine, a real-time SOC copilot, or a hyper-personalized AI experience, DeltaStream gives you both patterns, and lets you mix them effortlessly.

If you’d like a follow-up deep dive with architectures, sample SQL, or demo notebooks, we’re happy to share.

Let’s build the future of real-time GenAI together.

This blog was written by the author with assistance from AI to help with outlining, drafting, or editing.

Hojjat Jafarpour

Founder & CEO

Real-Time Context vs Real-Time Inference: Two Essential Patterns for Modern GenAI (and How DeltaStream Powers Both)

Table of contents

Pattern #1: Real-Time Context for Agents

“Give an LLM or Agent the freshest, most accurate picture of what’s happening right now.”

How it works

What it enables

Real-world use cases

When to use Real-Time Context

Pattern #2: Real-Time Inference Pipelines

“Score, classify, enrich, or route every event as it flows through the stream.”

How it works

Real-world use cases

When to use Real-Time Inference Pipelines

Real-Time Context vs Real-Time Inference: Key Differences

Which Should You Use?

Choose Real-Time Context if:

Choose Real-Time Inference Pipelines if:

Use both if:

Final Thoughts

Productionizing Cybersecurity AI Agents: Why Fresh Context Must Be Prebuilt

Productionizing AI Agents: Why Fresh, Prebuilt Context Beats Runtime Data Assembly on Correctness, Cost and Latency

Introducing DeltaStream Agent: Build Fresh Context and Real-Time Pipelines With Natural Language

Table of contents

Pattern #1: Real-Time Context for Agents

“Give an LLM or Agent the freshest, most accurate picture of what’s happening right now.”

How it works

What it enables

Real-world use cases

When to use Real-Time Context

Pattern #2: Real-Time Inference Pipelines

“Score, classify, enrich, or route every event as it flows through the stream.”

How it works

Real-world use cases

When to use Real-Time Inference Pipelines

Real-Time Context vs Real-Time Inference: Key Differences

Which Should You Use?

Choose Real-Time Context if:

Choose Real-Time Inference Pipelines if:

Use both if:

Final Thoughts

Productionizing Cybersecurity AI Agents: Why Fresh Context Must Be Prebuilt

Productionizing Cybersecurity AI Agents: Why Fresh Context Must Be Prebuilt

Productionizing AI Agents: Why Fresh, Prebuilt Context Beats Runtime Data Assembly on Correctness, Cost and Latency

Productionizing AI Agents: Why Fresh, Prebuilt Context Beats Runtime Data Assembly on Correctness, Cost and Latency

Introducing DeltaStream Agent: Build Fresh Context and Real-Time Pipelines With Natural Language

Introducing DeltaStream Agent: Build Fresh Context and Real-Time Pipelines With Natural Language

Request Submitted

Share this blog post