> Work in progress. Mostly just for me to unpack the different architectural patterns and common abstractions needed for agentic systems. Recently I've seen the term "Agent" used to describe systems that I'd usually define as workflows. I think most of the confusion stems from the different types of **agentic systems** that exist - namely, Workflows and Agents. [Anthropic](https://www.anthropic.com/engineering/building-effective-agents) summarises it best: - Workflows: systems where LLMs and tools are orchestrated through pre-defined code paths. - Agents: systems where LLMs dynamically direct there own processes and tool usage, maintaining control over how they accomplish tasks. That is - agents make their own decisions. Workflows use LLMs and tools but walk down a defined path of execution. ### Don't start with an agent A good signal for whether you're doing something wrong is if the first line of code you cut is with the intent of building an agent. Ideally - you start with implementing the shortest viable path. This might mean directly calling a chat completion API with all the context. Then perhaps you need to split the work up into a few steps. The key part is that even when you split the work up, the order of the steps is still deterministic, and followed in order because you've written the code that way. This is a **workflow**. Finally, maybe you realise that the problem you're trying to solve needs far better task performance, and has a level of ambiguity across individual invocations that would benefit from LLM reasoning and non-deterministic execution paths. This is the point where you consider calling your code an agent, and allow the LLM to be the orchestrator. ### Why do we need frameworks? I still struggle to see the need for agentic frameworks. If you're writing code anyway, it makes a ton of sense to just call LLM APIs directly. Abstractions like memory, parsing tools and chaining tool calls all can be implemented in a few LOC. ### Patterns Cribbed (again) from Anthropic. #### Augmented LLM The simplest component. An LLM enhanced with tools, context retrieval and memory. ![[Pasted image 20250604170848.png]] Tools in this scenario can be accessed through MCP, our by writing function calls. Most API providers allow you to specify tools that the LLM has access to. This diagram is probably move complete if the context retrieval and memory boxes are fleshed out. #### Memory Memory is essentially the ability to provide the LLM with previous prompts and responses as context. This is in effect, state management for LLMs. Which can take three forms: - Stuffing the entire history into the next message - Trimming old messages (reducing the amount of information the model has to deal with) - Compression techniques (summarising the history with another LLM call before providing it forward to the next message) For agentic systems, it becomes important to be able to "checkpoint" memory at each step it changes (realistically, only if you want to time travel through execution). For non-deterministic execution paths (the type you'll see in agents, not workflows) - you can image that it becomes doubly important that each _node_ in an execution is able to **consume** memory appropriate to it's task, and **produce** memory alongside its output. ![[Pasted image 20250604170529.png]] I think the idea of a memory sequencer makes a lot of sense. ``` interface memoryType: LongContext | Summary | TruncatedContext class MemorySequencer { checkpointIdx: Number storage: DB | Memory appendMemory(entries, LLM) fetchMemory(checkpointIdx = next, LLM): Memory<MemoryType> } ``` #### Retrieval Retrieval (at the point of usage) refers to querying for some context (usually based on a similarity match against an input prompt). But retrieval is deeper than that. I suppose retrieval can be referred to as "any information that the LLM asks for or might need to fulfil the task requested by the user. i.e. with tool calling - we can make the LLM ask for things, but in the absence of tool calling, _we_ need to orchestrate retrieval. ``` memory = init_new_llm_memory() user_prompt = "how many flavours of applesauce do you have?" memory.append({ role: "User", message: user_prompt }) context = vectorstore.query(products, user_prompt, similarity_algo) memory.append({ role: "System" message: `The following information may be relevant to the user query\n${context}` }) response = llm.execute("Help the user find the products we have in the store", memory.toMessages(), tools=[storeLookupTool()]) .... response: We have 17 types of AppleSauce: ... ``` Something like that makes sense. None of this really requires a third-party framework for "agents". Also - `vectorstore.query` might also just be `serpapi.search()` - you can have this as an explicit context retrieval step OR you can just provide that tool to the LLM, with a good description of how to use it. Mandated retrieval is more workflow-esque, given the LLM the choice is more agentic. #### Prompt Chaining Decompose a task into a sequence of steps, where each LLM call processes the output of the previous step. Simple tasks that can be decomposed into clear steps with fixed goals. I've found this especially useful when you have a task that has different output and input modalities. i.e. analysing social media videos. `gemini` is able to analyse video content, but that's not something `gpt-4o` can do. Similarly, the `gpt-image-1` outperforms most other models on image generation. ![[Pasted image 20250604170121.png]] Importantly, as we break up our work more - <u>durable execution</u> becomes more valuable for two reasons: - Prompt 1 might have produced a _really_ great output - if we fail after Prompt 2 and fallback to human intervention, we want to be able to resume after Prompt 1. - The execution of Prompt 1 might be the _most expensive_ call in our chain - losing results that impacts our execution cost. #### Routing Classifying an input, and then directing it to a specialised follow up task. This enables us to separate concerns and build specialised prompts for different use cases. Increasing the semantic value of our prompts (instead of having a single prompt that tries to optimise for `n` use cases). Also can be useful for cost management! If there are a category of tasks that are deemed simple, we can use cheaper LLMs versus more expensive ones for complex tasks. Outside of LLMs, routing can help us push queries into the _correct_ agentic subsystems. We don't want a query about last weeks sales hitting the system that knows how to execute bank transfers for instance. ![[Pasted image 20250604164954.png]] We can just view this as an LLM-powered inbound sequencer. ##### Parallelization Doesn't need a diagram, but in essence can mean two things: - Partitioning: A task can be several independent subtasks that can be run in parallel. - Voting: Send the same task to different agentic systems (or the same system `n` times) to get different outcomes - have a judge compare them to get the best output. #### Orchestrator-Synthesizer Non deterministic version of routing + parallelization. ![[anthropicgensynth.png]] (Stolen from Anthropic) An orchestrator model breaks down a task, and decides how to assign them to workers. It might assign tasks in parallel, or require workers to "sequence-up" to produce an output. At the end, the same model (or a different one) synthesizes the output. #### Evaluator-Optimiser One LLM generates a response and another evaluates the output and provides feedback in a loop. ![[anthropicgeneval.png]] (Stolen from Anthropic) Pitfalls here are the echo-chamber risk, and conversational collapse edge cases when the loop gets too tight. ### Structured Output TODO ### Multimodal #### Input TODO #### Output <u>Storage</u> TODO -r2 as a retrieval source