Building Production AI Agents: Architecture & Best Practices

Production AI agents need durable workflows, tool sandboxing, memory, and model fallback. Here's the architecture we've converged on across multiple products at Shahriar Labs.

The four layers

┌─────────────────────────────────────┐
│  Orchestration  (Temporal / queues) │
├─────────────────────────────────────┤
│  Agent loop     (LLM + tool calls)  │
├─────────────────────────────────────┤
│  Tool layer     (sandboxed, typed)  │
├─────────────────────────────────────┤
│  Memory / context  (knowledge graph)│
└─────────────────────────────────────┘

1. Orchestration — durable workflows

LLM calls fail. Networks time out. Tasks run longer than your serverless function limit. We use Temporal.io for any agent task that takes more than 10 seconds or involves more than 2 LLM calls. Temporal makes workflows durable: if the worker dies mid-task, it replays from the last checkpoint.

// QuantumSketch video generation — runs 2-8 minutes
func VideoGenerationWorkflow(ctx workflow.Context, req VideoRequest) error {
    ao := workflow.ActivityOptions{ScheduleToCloseTimeout: 10 * time.Minute}
    ctx = workflow.WithActivityOptions(ctx, ao)

    var storyboard Storyboard
    workflow.ExecuteActivity(ctx, GenerateStoryboardActivity, req).Get(ctx, &storyboard)

    var rendered []VideoChunk
    for _, beat := range storyboard.Beats {
        var chunk VideoChunk
        workflow.ExecuteActivity(ctx, RenderManimActivity, beat).Get(ctx, &chunk)
        rendered = append(rendered, chunk)
    }

    return workflow.ExecuteActivity(ctx, MergeAndPublishActivity, rendered).Get(ctx, nil)
}

2. Model fallback

Never hard-code a single model. We route through a fallback chain — primary fails (rate limit, context overflow) → secondary → free tier:

var modelChain = []string{
    "anthropic/claude-sonnet-4-6",
    "google/gemini-2.5-flash",
    "deepseek/deepseek-r1:free",  // free tier via openrouter
}

The openrouter-free-infer skill handles this automatically — see it on GitHub.

3. Tool sandboxing

Every tool the agent can call runs in a subprocess with:

Filesystem: read-only except an explicit scratch dir
Network: allowlisted domains only
Timeout: hard 30s per call

4. Memory — knowledge graph

Flat RAG (vector search over docs) loses entity relationships. We use Context-Heavy, our multi-tenant knowledge-graph API, to store structured facts about codebases, users, and systems. Traversal via recursive CTEs on pgvector-backed PostgreSQL.