Memory for AutoGen Agents

Microsoft AutoGen makes multi-agent conversation easy. But when the session ends, the conversation history vanishes. 0Latency gives your AutoGen agents persistent memory — they remember past collaborations, decisions, and outcomes across sessions.

The Problem: Powerful Conversations, Zero Persistence

AutoGen's strength is multi-agent collaboration. A GroupChat with a coder, a reviewer, and a planner can produce remarkable results. The agents debate, iterate, and converge on solutions. But here's the gap: all of that collaboration is ephemeral.

When the Python process exits, the conversation history — and everything the agents learned during it — is gone. The next session starts cold. The coder doesn't remember the architecture decisions from yesterday. The reviewer doesn't remember which code patterns were approved. The planner doesn't remember what was tried and failed.

AutoGen does provide a Teachability capability that stores facts in a local vector database. But it's limited:

For production deployments — especially multi-agent systems that run frequently — you need memory that's persistent, shared, intelligent, and portable. That's what 0Latency provides.

The Solution: Hook, Extract, Recall

The integration pattern for AutoGen has three components:

  1. Recall at session init — before the GroupChat starts, pull in relevant memories and inject them into agent system messages.
  2. Extract after conversation — when the chat completes (or at checkpoints), extract learnings from the conversation history.
  3. Optionally, hook into message callbacks — for long-running conversations, extract incrementally as messages flow.
Session Start recall() Inject into system messages GroupChat runs extract() Memories stored

Full Example: AutoGen GroupChat with Persistent Memory

pip install zero-latency-sdk pyautogen
import autogen
from zero_latency import Memory

# Initialize — that's all the setup you need
mem = Memory("your-api-key")

# Recall context from previous sessions — sub-100ms
past_context = mem.recall("software architecture decisions and coding standards")

# LLM config
llm_config = {"model": "gpt-4o", "api_key": "your-key"}

# Create agents with memory-enriched system messages
coder = autogen.AssistantAgent(
    name="Coder",
    system_message=f"""You are a senior software engineer. Write clean, 
tested code. Follow established patterns.

Context from previous sessions:
{past_context.text}""",
    llm_config=llm_config,
)

reviewer = autogen.AssistantAgent(
    name="Reviewer",
    system_message=f"""You review code for correctness, security, and 
maintainability. Reference past decisions when relevant.

Context from previous sessions:
{past_context.text}""",
    llm_config=llm_config,
)

planner = autogen.AssistantAgent(
    name="Planner",
    system_message=f"""You break down tasks into actionable steps. 
Consider past approaches and what worked.

Context from previous sessions:
{past_context.text}""",
    llm_config=llm_config,
)

user_proxy = autogen.UserProxyAgent(
    name="User",
    human_input_mode="TERMINATE",
    max_consecutive_auto_reply=10,
    code_execution_config={"work_dir": "workspace"},
)

# Set up GroupChat
groupchat = autogen.GroupChat(
    agents=[user_proxy, coder, reviewer, planner],
    messages=[],
    max_round=20,
)
manager = autogen.GroupChatManager(
    groupchat=groupchat, 
    llm_config=llm_config
)

# Run the conversation
user_proxy.initiate_chat(
    manager,
    message="Build a REST API for user authentication with JWT tokens."
)

# After conversation: extract learnings
conversation_text = "\n".join(
    f"{msg['name']}: {msg['content']}" 
    for msg in groupchat.messages 
    if msg.get('content')
)
mem.add(conversation_text)

print("Memories stored — extraction happens in the background.")

The next time this script runs — whether tomorrow, next week, or on a different machine — recall() will return the decisions, patterns, and learnings from this conversation. The coder remembers which auth library was chosen. The reviewer remembers which edge cases were flagged. The planner remembers the task breakdown that worked.

Advanced: Incremental Extraction with Callbacks

For long-running GroupChat sessions, you may want to extract memories as the conversation progresses rather than waiting until the end. AutoGen's agent registration allows you to hook into the message flow:

# Message callback for incremental extraction
message_buffer = []

def on_message(sender, message, recipient, **kwargs):
    """Accumulate messages and extract every 10 exchanges."""
    if isinstance(message, str) and message.strip():
        message_buffer.append(f"{sender.name}: {message}")
    
    if len(message_buffer) >= 10:
        mem.add("\n".join(message_buffer))
        message_buffer.clear()
    
    return False  # Don't skip the message

# Register callback on all agents
for agent in [coder, reviewer, planner]:
    agent.register_hook(
        hookable_method="process_last_received_message",
        hook=lambda msg, **kw: (on_message(agent, msg, None), msg)[1]
    )
When to use incremental vs. batch extraction: For conversations under 20 rounds, batch extraction at the end is simpler and sufficient. For long-running sessions (50+ rounds) or conversations that might be interrupted, incremental extraction ensures you don't lose in-progress learnings.

REST API Alternative

If you prefer not to add a Python dependency, 0Latency's REST API works directly. The entire API is two endpoints:

# Extract memories from conversation
curl -X POST https://api.0latency.ai/v1/extract \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "agent_id": "dev-team-chat",
    "content": "Coder: I recommend using FastAPI with Pydantic models...\nReviewer: Agreed, but add input validation on all endpoints..."
  }'

# Recall relevant memories
curl -X POST https://api.0latency.ai/v1/recall \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "agent_id": "dev-team-chat",
    "query": "API framework and architecture decisions",
    "max_tokens": 2000
  }'

This makes 0Latency accessible from any language or framework that can make HTTP requests — not just Python, not just AutoGen.

What Your Agents Gain

Real-world impact: Teams using persistent memory report that their AutoGen agents reach useful output quality 3-4x faster on repeated tasks compared to cold-start sessions. The first run is the same — every subsequent run is faster and better.

Pricing

Free tier: 1,000 memories, 10,000 recall queries/month. For a development team running a few GroupChat sessions per day, this covers normal usage. Pro ($29/month) removes all limits — built for production multi-agent systems with high-frequency execution.

Give your AutoGen agents a memory

Agents that remember past collaborations, avoid repeated mistakes, and build on previous work. Two API calls.

Try 0Latency Free →

Other Integrations