25 Nov 2025
Min Read
Real-Time Context vs Real-Time Inference: Two Essential Patterns for Modern GenAI (and How DeltaStream Powers Both)
Over the past year, GenAI systems have evolved from simple chatbots into agentic systems that take actions, interact with tools, and process live information. As these systems mature, one thing has become clear:
Real-time streaming data is the backbone of intelligent agents and automated pipelines.
DeltaStream sits at the center of this shift, providing the streaming-native engine that makes live context and live inference both possible at scale.
Across security, fintech, SaaS, IoT, and AI-driven automation, we’ve found that teams often build one of two patterns:
- Real-Time Context for Agents
- Real-Time Inference Pipelines
These patterns look similar on the surface, but they serve different roles and are optimized for different problems. Both can (and often should) coexist.
This post will explain the differences, the architectures, and when to use each with realistic, modern GenAI use cases.
Pattern #1: Real-Time Context for Agents
“Give an LLM or Agent the freshest, most accurate picture of what’s happening right now.”
Modern AI agents, whether built with OpenAI Agent Builder, LangChain, or internal frameworks, rely heavily on tools. These tools query external systems: databases, APIs, vector stores, and now, streaming state.
But agents are stateless by design. They don’t store long-term memory or real-time state internally. This is where DeltaStream becomes the Real-Time Context Engine.
How it works
- Ingest streaming + CDC sources
Kafka, Kinesis, Pulsar, Debezium/Postgres CDC, event logs, telemetry. - Use SQL to join, aggregate, and compute live state
- Tumbling/Hopping windows
- Real-time counters
- Rolling risk scores
- Feature aggregates
- Pattern detection
- Expose the result as Materialized Views (MVs)
These are continuously updated tables that reflect the current world. With its built-in MCP server, DeltaStream makes it seamless! See here for a demo video as an example. - Agents query these MVs via tool calls (MCP, REST, SQL)
The agent receives structured, trustworthy, fresh context.
What it enables
Agents can now:
- Explain what is happening (root cause, anomalies, spikes)
- Make informed decisions
- Use live features for recommendations or prioritization
- Provide narrative reasoning supported by real-time facts
Real-world use cases
1. SOC Copilot / Security Analyst Assistants
DeltaStream maintains:
- failed-logins (5m window)
- IDS alert counts
- Network exfil signals
- Per-user/Per-IP risk scores
Agent can answer:
"User alice shows high risk due to a cluster of failed logins and a critical IDS alert in the last 3 minutes."
2. Customer Support Copilot
DeltaStream maintains:
- Recent support tickets
- Churn signals
- Billing events
- Product usage spikes
Agent can respond:
“This customer is a VIP, opened two high-priority tickets in the last hour, and shows churn indicators.”
3. RevOps / Sales Agent
DeltaStream maintains:
- Product usage
- Engagement
- Expansion indicators
- Contract milestones
Agent can say:
“Account X has a 30% increase in usage this week, a great target for expansion outreach.”
When to use Real-Time Context
Use this pattern when:
- A human or agent asks questions
- You need explainability
- You need consolidated state built from many signals
- Data must be fresh but pre-aggregated and prepared
- Responses depend on the "latest situation"
Pattern #2: Real-Time Inference Pipelines
“Score, classify, enrich, or route every event as it flows through the stream.”
This pattern is event-driven, not question-driven. Instead of waiting for an agent to ask for context, every incoming event can trigger inference or transformation.
DeltaStream supports this via:
- Streaming SQL
- Stateful processing
- Joins/windowing
- Real-time feature computation
- UDFs and the
call_modelfunction (LLMs + ML models) - Sinking results back to Kafka, databases, warehouses, or Delta Lake
How it works
- Events arrive (transactions, logs, content, telemetry).
- DeltaStream computes features (windowed counts, anomaly flags).
- If needed, DeltaStream calls a model (LLM or ML) using
call_model. - Output is routed to downstream systems.
There is no agent involved and the system is fully automated.
Real-world use cases
1. Real-Time Fraud Detection
- Stream: transactions, device info, login activity.
- DeltaStream computes velocity + abnormal patterns.
- Once fraud detected,
call_model('fraud-llm', features)classifies risk. - Output ->
fraud_decisionsKafka topic.
2. Content Moderation
- Stream: chat messages, forum posts, uploads.
- DeltaStream: extract text + metadata.
- LLM predicts:
- Toxicity
- Policy violation
- Category
- Output: “allow / block / escalate”.
3. Real-Time Ticket Routing
- Stream: incoming support tickets/emails.
- LLM extracts:
- Intent
- Urgency
- Product area
- Route to correct queue instantly.
4. Recommendation Scoring
- Stream: pageviews, clicks, interactions.
call_modelgenerates embeddings or category predictions.- Downstream recommendation system uses output immediately.
When to use Real-Time Inference Pipelines
Use this pattern when:
- Each event requires automated scoring
- There is no human-in-the-loop
- You need high throughput (100k+ events/sec)
- Latency matters
- Results are consumed by systems, not agents
- Workloads are repetitive and predictable
Be mindful when invoking LLMs in these scenarios, as frequent calls can quickly drive up costs. Ideally, LLM inference should occur only when truly necessary—not for every incoming event. DeltaStream pipelines can enforce this by filtering and aggregating events so that LLMs are triggered selectively rather than on a per-event basis.
Real-Time Context vs Real-Time Inference: Key Differences
Which Should You Use?
Choose Real-Time Context if:
- You’re building agentic systems (security copilot, support copilot, SRE copilot).
- Humans will ask exploratory questions.
- The LLM needs live, explainable, consolidated state.
- Multiple agents or apps will reuse the same real-time view.
Choose Real-Time Inference Pipelines if:
- You need fully automated, per-event decisions.
- You’re doing real-time scoring/classification/routing.
- Latency and throughput are critical.
- Downstream systems rely on enriched/processed events.
Use both if:
- You score events in real time and
- Agents need to understand / explain / analyze the results.
Example:
Fraud pipeline scores events → Fraud Copilot uses those results to conduct investigations. This pipeline + context pairing is becoming the standard architecture for GenAI in production.
Final Thoughts
As LLM-powered agents and real-time automation become central to modern AI systems, teams need infrastructure that can:
- Ingest and correlate streaming data
- Build real-time state
- Run real-time inference
- Expose live results to agents and downstream services
DeltaStream is uniquely positioned here: combining a streaming SQL engine, real-time MVs, continuous computation, and LLM/ML inference into one platform.
Whether you're building an autonomous fraud engine, a real-time SOC copilot, or a hyper-personalized AI experience, DeltaStream gives you both patterns, and lets you mix them effortlessly.
If you’d like a follow-up deep dive with architectures, sample SQL, or demo notebooks, we’re happy to share.
Let’s build the future of real-time GenAI together.
This blog was written by the author with assistance from AI to help with outlining, drafting, or editing.