Over the past year, GenAI systems have evolved from simple chatbots into agentic systems that take actions, interact with tools, and process live information. As these systems mature, one thing has become clear:

Real-time streaming data is the backbone of intelligent agents and automated pipelines.

DeltaStream sits at the center of this shift, providing the streaming-native engine that makes live context and live inference both possible at scale.

Across security, fintech, SaaS, IoT, and AI-driven automation, we’ve found that teams often build one of two patterns:

  1. Real-Time Context for Agents
  2. Real-Time Inference Pipelines

These patterns look similar on the surface, but they serve different roles and are optimized for different problems. Both can (and often should) coexist.

This post will explain the differences, the architectures, and when to use each with realistic, modern GenAI use cases.

Pattern #1: Real-Time Context for Agents

“Give an LLM or Agent the freshest, most accurate picture of what’s happening right now.”

Modern AI agents, whether built with OpenAI Agent Builder, LangChain, or internal frameworks, rely heavily on tools. These tools query external systems: databases, APIs, vector stores, and now, streaming state.

But agents are stateless by design. They don’t store long-term memory or real-time state internally. This is where DeltaStream becomes the Real-Time Context Engine.

How it works

  1. Ingest streaming + CDC sources
    Kafka, Kinesis, Pulsar, Debezium/Postgres CDC, event logs, telemetry.
  2. Use SQL to join, aggregate, and compute live state
    • Tumbling/Hopping windows
    • Real-time counters
    • Rolling risk scores
    • Feature aggregates
    • Pattern detection
  3. Expose the result as Materialized Views (MVs)
    These are continuously updated tables that reflect the current world. With its built-in MCP server, DeltaStream makes it seamless! See here for a demo video as an example.
  4. Agents query these MVs via tool calls (MCP, REST, SQL)
    The agent receives structured, trustworthy, fresh context.

What it enables

Agents can now:

  • Explain what is happening (root cause, anomalies, spikes)
  • Make informed decisions
  • Use live features for recommendations or prioritization
  • Provide narrative reasoning supported by real-time facts

Real-world use cases

1. SOC Copilot / Security Analyst Assistants

DeltaStream maintains:

  • failed-logins (5m window)
  • IDS alert counts
  • Network exfil signals
  • Per-user/Per-IP risk scores

Agent can answer:

"User alice shows high risk due to a cluster of failed logins and a critical IDS alert in the last 3 minutes."

2. Customer Support Copilot

DeltaStream maintains:

  • Recent support tickets
  • Churn signals
  • Billing events
  • Product usage spikes

Agent can respond:

“This customer is a VIP, opened two high-priority tickets in the last hour, and shows churn indicators.”

3. RevOps / Sales Agent

DeltaStream maintains:

  • Product usage
  • Engagement
  • Expansion indicators
  • Contract milestones

Agent can say:

“Account X has a 30% increase in usage this week, a great target for expansion outreach.”

When to use Real-Time Context

Use this pattern when:

  • A human or agent asks questions
  • You need explainability
  • You need consolidated state built from many signals
  • Data must be fresh but pre-aggregated and prepared
  • Responses depend on the "latest situation"

Pattern #2: Real-Time Inference Pipelines

“Score, classify, enrich, or route every event as it flows through the stream.”

This pattern is event-driven, not question-driven. Instead of waiting for an agent to ask for context, every incoming event can trigger inference or transformation.

DeltaStream supports this via:

  • Streaming SQL
  • Stateful processing
  • Joins/windowing
  • Real-time feature computation
  • UDFs and the call_model function (LLMs + ML models)
  • Sinking results back to Kafka, databases, warehouses, or Delta Lake

How it works

  1. Events arrive (transactions, logs, content, telemetry).
  2. DeltaStream computes features (windowed counts, anomaly flags).
  3. If needed, DeltaStream calls a model (LLM or ML) using call_model.
  4. Output is routed to downstream systems.

There is no agent involved and the system is fully automated.

Real-world use cases

1. Real-Time Fraud Detection

  • Stream: transactions, device info, login activity.
  • DeltaStream computes velocity + abnormal patterns.
  • Once fraud detected, call_model('fraud-llm', features) classifies risk.
  • Output -> fraud_decisions Kafka topic.

2. Content Moderation

  • Stream: chat messages, forum posts, uploads.
  • DeltaStream: extract text + metadata.
  • LLM predicts:
    • Toxicity
    • Policy violation
    • Category
  • Output: “allow / block / escalate”.

3. Real-Time Ticket Routing

  • Stream: incoming support tickets/emails.
  • LLM extracts:
    • Intent
    • Urgency
    • Product area
  • Route to correct queue instantly.

4. Recommendation Scoring

  • Stream: pageviews, clicks, interactions.
  • call_model generates embeddings or category predictions.
  • Downstream recommendation system uses output immediately.

When to use Real-Time Inference Pipelines

Use this pattern when:

  • Each event requires automated scoring
  • There is no human-in-the-loop
  • You need high throughput (100k+ events/sec)
  • Latency matters
  • Results are consumed by systems, not agents
  • Workloads are repetitive and predictable

Be mindful when invoking LLMs in these scenarios, as frequent calls can quickly drive up costs. Ideally, LLM inference should occur only when truly necessary—not for every incoming event. DeltaStream pipelines can enforce this by filtering and aggregating events so that LLMs are triggered selectively rather than on a per-event basis.

Real-Time Context vs Real-Time Inference: Key Differences

Which Should You Use?

Choose Real-Time Context if:

  • You’re building agentic systems (security copilot, support copilot, SRE copilot).
  • Humans will ask exploratory questions.
  • The LLM needs live, explainable, consolidated state.
  • Multiple agents or apps will reuse the same real-time view.

Choose Real-Time Inference Pipelines if:

  • You need fully automated, per-event decisions.
  • You’re doing real-time scoring/classification/routing.
  • Latency and throughput are critical.
  • Downstream systems rely on enriched/processed events.

Use both if:

  • You score events in real time and
  • Agents need to understand / explain / analyze the results.

Example:
Fraud pipeline scores events → Fraud Copilot uses those results to conduct investigations. This pipeline + context pairing is becoming the standard architecture for GenAI in production.

Final Thoughts

As LLM-powered agents and real-time automation become central to modern AI systems, teams need infrastructure that can:

  • Ingest and correlate streaming data
  • Build real-time state
  • Run real-time inference
  • Expose live results to agents and downstream services

DeltaStream is uniquely positioned here: combining a streaming SQL engine, real-time MVs, continuous computation, and LLM/ML inference into one platform.

Whether you're building an autonomous fraud engine, a real-time SOC copilot, or a hyper-personalized AI experience, DeltaStream gives you both patterns, and lets you mix them effortlessly.

If you’d like a follow-up deep dive with architectures, sample SQL, or demo notebooks, we’re happy to share.

Let’s build the future of real-time GenAI together.

This blog was written by the author with assistance from AI to help with outlining, drafting, or editing.