Memory for AutoGen Agents

Microsoft AutoGen makes multi-agent conversation easy. But when the session ends, the conversation history vanishes. 0Latency gives your AutoGen agents persistent memory — they remember past collaborations, decisions, and outcomes across sessions.

The Problem: Powerful Conversations, Zero Persistence

AutoGen's strength is multi-agent collaboration. A GroupChat with a coder, a reviewer, and a planner can produce remarkable results. The agents debate, iterate, and converge on solutions. But here's the gap: all of that collaboration is ephemeral.

When the Python process exits, the conversation history — and everything the agents learned during it — is gone. The next session starts cold. The coder doesn't remember the architecture decisions from yesterday. The reviewer doesn't remember which code patterns were approved. The planner doesn't remember what was tried and failed.

AutoGen does provide a Teachability capability that stores facts in a local vector database. But it's limited:

Single-agent only. Teachability attaches to one agent. In a GroupChat with five agents, you'd need to configure and maintain five separate teachability instances — each with its own isolated memory.
No temporal awareness. A fact learned a year ago has the same weight as one learned today. No decay, no reinforcement, no notion of currency.
No contradiction handling. If decisions change over time, the old and new versions coexist without any conflict resolution. The agent might recall outdated information and use it confidently.
Local storage. Memories live on the filesystem. Deploy to a different machine, spin up a container — your memories stay behind.

For production deployments — especially multi-agent systems that run frequently — you need memory that's persistent, shared, intelligent, and portable. That's what 0Latency provides.

The Solution: Hook, Extract, Recall

The integration pattern for AutoGen has three components:

Recall at session init — before the GroupChat starts, pull in relevant memories and inject them into agent system messages.
Extract after conversation — when the chat completes (or at checkpoints), extract learnings from the conversation history.
Optionally, hook into message callbacks — for long-running conversations, extract incrementally as messages flow.

Session Start → recall() → Inject into system messages → GroupChat runs → extract() → Memories stored

Full Example: AutoGen GroupChat with Persistent Memory

pip install zero-latency-sdk pyautogen

import autogen
from zero_latency import Memory

# Initialize — that's all the setup you need
mem = Memory("your-api-key")

# Recall context from previous sessions — instantly available
past_context = mem.recall("software architecture decisions and coding standards")

# LLM config
llm_config = {"model": "gpt-4o", "api_key": "your-key"}

# Create agents with memory-enriched system messages
coder = autogen.AssistantAgent(
    name="Coder",
    system_message=f"""You are a senior software engineer. Write clean, 
tested code. Follow established patterns.

Context from previous sessions:
{past_context.text}""",
    llm_config=llm_config,
)

reviewer = autogen.AssistantAgent(
    name="Reviewer",
    system_message=f"""You review code for correctness, security, and 
maintainability. Reference past decisions when relevant.

Context from previous sessions:
{past_context.text}""",
    llm_config=llm_config,
)

planner = autogen.AssistantAgent(
    name="Planner",
    system_message=f"""You break down tasks into actionable steps. 
Consider past approaches and what worked.

Context from previous sessions:
{past_context.text}""",
    llm_config=llm_config,
)

user_proxy = autogen.UserProxyAgent(
    name="User",
    human_input_mode="TERMINATE",
    max_consecutive_auto_reply=10,
    code_execution_config={"work_dir": "workspace"},
)

# Set up GroupChat
groupchat = autogen.GroupChat(
    agents=[user_proxy, coder, reviewer, planner],
    messages=[],
    max_round=20,
)
manager = autogen.GroupChatManager(
    groupchat=groupchat, 
    llm_config=llm_config
)

# Run the conversation
user_proxy.initiate_chat(
    manager,
    message="Build a REST API for user authentication with JWT tokens."
)

# After conversation: extract learnings
conversation_text = "\n".join(
    f"{msg['name']}: {msg['content']}" 
    for msg in groupchat.messages 
    if msg.get('content')
)
mem.add(conversation_text)

print("Memories stored — extraction happens in the background.")

The next time this script runs — whether tomorrow, next week, or on a different machine — recall() will return the decisions, patterns, and learnings from this conversation. The coder remembers which auth library was chosen. The reviewer remembers which edge cases were flagged. The planner remembers the task breakdown that worked.

Advanced: Incremental Extraction with Callbacks

For long-running GroupChat sessions, you may want to extract memories as the conversation progresses rather than waiting until the end. AutoGen's agent registration allows you to hook into the message flow:

# Message callback for incremental extraction
message_buffer = []

def on_message(sender, message, recipient, **kwargs):
    """Accumulate messages and extract every 10 exchanges."""
    if isinstance(message, str) and message.strip():
        message_buffer.append(f"{sender.name}: {message}")
    
    if len(message_buffer) >= 10:
        mem.add("\n".join(message_buffer))
        message_buffer.clear()
    
    return False  # Don't skip the message

# Register callback on all agents
for agent in [coder, reviewer, planner]:
    agent.register_hook(
        hookable_method="process_last_received_message",
        hook=lambda msg, **kw: (on_message(agent, msg, None), msg)[1]
    )

When to use incremental vs. batch extraction: For conversations under 20 rounds, batch extraction at the end is simpler and sufficient. For long-running sessions (50+ rounds) or conversations that might be interrupted, incremental extraction ensures you don't lose in-progress learnings.

REST API Alternative

If you prefer not to add a Python dependency, 0Latency's REST API works directly. The entire API is two endpoints:

# Extract memories from conversation
curl -X POST https://api.0latency.ai/v1/extract \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "agent_id": "dev-team-chat",
    "content": "Coder: I recommend using FastAPI with Pydantic models...\nReviewer: Agreed, but add input validation on all endpoints..."
  }'

# Recall relevant memories
curl -X POST https://api.0latency.ai/v1/recall \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "agent_id": "dev-team-chat",
    "query": "API framework and architecture decisions",
    "max_tokens": 2000
  }'

This makes 0Latency accessible from any language or framework that can make HTTP requests — not just Python, not just AutoGen.

What Your Agents Gain

Decision memory. "We chose FastAPI over Flask because of async support" — stored, timestamped, and recalled when relevant. No re-litigating settled architecture decisions.
Pattern recognition. Over multiple sessions, the reviewer learns which code patterns are approved and which trigger rewrites. Reviews get faster and more consistent.
Error avoidance. If a particular approach was tried and failed in Session 3, the planner in Session 7 knows not to go down that path again. Negative recall surfaces what to avoid, not just what to do.
Onboarding new agents. Add a new agent to your GroupChat and it immediately benefits from the team's accumulated knowledge. No cold start — the organizational memory is there from the first message.
Continuity across deployments. Memory lives in 0Latency's API, not on your local filesystem. Redeploy, scale out, move to a different cloud — your agents' memories follow them.

Real-world impact: Teams using persistent memory report that their AutoGen agents reach useful output quality 3-4x faster on repeated tasks compared to cold-start sessions. The first run is the same — every subsequent run is faster and better.

Pricing

Free tier: 1,000 memories, 10,000 recall queries/month. For a development team running a few GroupChat sessions per day, this covers normal usage. Pro ($29/month) removes all limits — built for production multi-agent systems with high-frequency execution.

Give your AutoGen agents a memory

Agents that remember past collaborations, avoid repeated mistakes, and build on previous work. Two API calls.

Try 0Latency Free →

Memory for AutoGen Agents

The Problem: Powerful Conversations, Zero Persistence

The Solution: Hook, Extract, Recall

Full Example: AutoGen GroupChat with Persistent Memory

Advanced: Incremental Extraction with Callbacks

REST API Alternative

What Your Agents Gain

Pricing

Give your AutoGen agents a memory

More Integrations