[{"content":"Building an AI Agent Platform in Go with Clean Architecture Most AI agent stacks today are built in Python.\nThat makes sense. Python has mature frameworks such as LangChain, LangGraph, and LlamaIndex, and the surrounding ecosystem is optimized for experimentation. But in my case, I ended up building an AI agent platform in Go, with almost everything implemented from scratch.\nThis article explains why I made that choice, how the initial implementation broke down as the system grew, and how I redesigned it around Clean Architecture. I will also cover the thread-based chat model, database design, persistence strategy, and the practical trade-offs I ran into while moving from “it works” to something that can actually live inside a product.\nWhy build an AI agent platform in Go? The first reason was practical rather than ideological: the backend of the product was already written in Go.\nFor an MVP, introducing a separate Python service would have increased deployment, review, and operational complexity. It was cheaper to keep everything inside the existing Go backend and move fast there first.\nOnce I started building, I also realized that Go has several properties that are surprisingly well suited to AI agent infrastructure:\ngoroutines make concurrent execution simple SSE, WebSocket, and gRPC are lightweight to implement deployment is easy because a single binary is often enough interfaces make abstraction boundaries explicit testability is high when dependencies are cleanly separated Go does not have the same quantity of agent tooling as Python, so the implementation cost is somewhat higher. But the upside is that you get much tighter control over architecture, persistence, observability, and long-term maintainability.\nWhy not just use LangGraph or LlamaIndex? Part of the reason is timing: I had already started building the MVP directly in Go.\nBut there was also a structural reason. In a previous product, I built an AI agent system on top of LangGraph Platform and found that some aspects of product integration were more constrained than expected. For example:\ninternal state tends to become a black box persistence and thread management often need to follow framework conventions models and tools are coupled to framework-specific abstractions partial adoption is harder than it looks export, logging, and auditability can become awkward Frameworks are useful, especially when the goal is fast prototyping. But when AI becomes a long-lived feature inside an existing backend, I care a lot about being able to control how messages are stored, how tools are executed, how streaming is handled, and how everything can later be exported or audited.\nSo instead of treating the agent framework as the center of the system, I wanted the product architecture to remain the center, and the agent layer to stay thin and replaceable.\nWhy Clean Architecture fits AI agents well AI agent systems change constantly.\nThe model changes. The prompt format changes. Tools are added and removed. Memory strategies evolve. Streaming requirements shift from SSE to WebSocket or gRPC. Retrieval pipelines get replaced. Planner-executor patterns come and go.\nThat means the real challenge is not “how to call an LLM API.” The real challenge is how to isolate change.\nThis is exactly where Clean Architecture helps.\nSource: The Clean Architecture \u0026ndash; Robert C. Martin (Uncle Bob)\nIn my design, the core agent concepts are separated into abstractions such as:\nModel: OpenAI, Anthropic, local LLMs Memory: in-memory, PostgreSQL, Redis Tool: external API calls, DB lookup, computation, search Agent: ReAct-style execution, workflow-based agents, planner-executor Streaming: SSE, WebSocket, gRPC By keeping these concepts behind interfaces, I can change the implementation without breaking the application layer. That means:\nswitching from OpenAI to another provider does not force large refactors adding or removing tools stays localized changing memory from in-memory to persistent storage does not rewrite agent logic changing the transport from SSE to WebSocket does not require redesigning the agent itself For AI systems, that flexibility matters more than almost anything else.\nBefore: a monolithic /api/ai/chat The first version was intentionally simple.\nThere was a single endpoint, /api/ai/chat, and it did almost everything:\nbind the request build the prompt register tools call OpenAI stream the response back via SSE At the MVP stage, this was the right choice. It minimized code and made the whole behavior easy to trace. One file gave you the entire flow.\nConceptually, the structure looked like this:\nPOST /api/ai/chat ↓ [Presentation] ChatHandler ├─ request binding ├─ prompt building ├─ tool registration └─ agent execution ↓ [Application] ReactAgent ├─ OpenAI call ├─ tool execution ├─ state handling └─ streaming ↓ [Domain] prompt/context helpers ↓ [Infrastructure] OpenAI client PostgreSQL repository This version was good enough to validate the feature.\nBut it quickly ran into structural problems.\nProblems with the first version 1. The application layer was effectively tied to OpenAI The OpenAI SDK was directly embedded into application logic. That meant changing the model provider was not just a matter of replacing one implementation. It affected multiple layers:\napplication logic agent behavior response handling error handling Instead of “the system supports OpenAI,” the actual state was “OpenAI is baked into the system.”\n2. Tools were tightly coupled to concrete implementations The tool registry and the individual tools were too close to application logic. Some tools directly touched repositories. Some depended on model-specific data shapes. That made it harder to:\nunit test tools independently reuse tools across agents evolve toward other tool protocols such as MCP 3. Streaming was hardcoded around SSE Because the system was designed around SSE from the beginning, changing the output transport later would require touching a lot of code from the handler down into the agent execution flow.\nEven before there was a concrete need for WebSocket or gRPC, I could already see the architectural problem: streaming details had leaked too far inward.\n4. Testing was painful Because the application layer depended directly on the OpenAI SDK, tool implementations, and memory implementations, mocking one thing tended to drag the others with it.\nIn practice, that pushed the code toward integration-heavy testing, even when what I really wanted was much simpler: mock Model, Memory, and Tool dependencies and verify only the agent behavior.\nThat was the point where I decided the agent system needed a real architectural boundary.\nThe redesign: move the agent core into pkg/ai The key idea of the redesign was simple:\nPut the core AI abstractions into a dedicated package and keep providers, persistence, and transport details outside of it.\nThe resulting structure looked roughly like this:\npkg/ai/ agents/ // Agent abstraction and ReactAgent memory/ // Memory abstraction models/ // Model abstraction openai/ // OpenAI implementation prompts/ // prompt abstractions streaming/ // StreamEvent / StreamWriter abstractions tools/ // Tool / ToolRegistry abstractions types.go // Message / ToolCall / TokenUsage application/ ai/tools/ // app-specific tools usecases/ // AgentRunUsecase / ThreadUsecase / ThreadChatUsecase infra/ ai/memory/ // PostgreSQL memory ai/tools/ // default tool registry ai/streaming/ // SSE writer, etc. presentation/ handlers/ai/ // HTTP handlers This is essentially a thin, self-owned agent layer inside a Go backend.\nIt plays a role similar to a lightweight internal version of LangChain, but built to fit the product’s own architecture instead of forcing the product to fit a framework.\nModel abstraction The first step was to hide provider-specific behavior behind a Model interface.\ntype Model interface { Generate( ctx context.Context, messages []ai.Message, opts ...ai.ModelOption, ) (*Response, error) } type StreamingModel interface { Model GenerateStream( ctx context.Context, messages []ai.Message, opts ...ai.ModelOption, ) (\u0026lt;-chan streaming.StreamEvent, error) } This separates two concerns cleanly:\nthe application and agent only know they are calling a model provider-specific code lives elsewhere The OpenAI-specific implementation is isolated inside pkg/ai/openai, which is responsible for:\nconverting internal messages into OpenAI request messages converting tool definitions into the provider’s function-calling format converting streaming output into internal stream events That means the rest of the system does not need to know anything about the OpenAI SDK.\nMemory abstraction Conversation history is handled through a Memory interface.\ntype Memory interface { LoadHistory(ctx context.Context, opts ...ai.MemoryLoadOption) ([]ai.Message, error) Save(ctx context.Context, msg ai.Message) (*ai.StoredMessageInfo, error) Clear(ctx context.Context) error } At first, I used an in-memory implementation. Later I replaced it with PostgreSQL-backed persistence.\nThe important point is that the agent does not know whether messages are being stored in memory, in Postgres, or somewhere else. It just loads and saves messages through the interface.\nThat makes memory strategy a replaceable concern rather than a structural dependency.\nTool abstraction Tools represent the agent’s interface to the outside world.\ntype Tool interface { Name() string Description() string JSONSchema() map[string]any Call(ctx context.Context, args json.RawMessage) (any, error) } type ToolRegistry interface { Register(tool Tool) error Get(name string) (Tool, error) List() []Tool Execute(ctx context.Context, name string, args json.RawMessage) (any, error) } This lets the system split responsibilities cleanly:\npkg/ai/tools defines the abstraction infrastructure provides the registry implementation application code provides product-specific tools That separation turned out to be important. Product tools often need domain repositories or business rules, but the agent core should not know about those directly.\nThe agent itself: a ReAct-style execution loop The agent depends only on abstractions:\nModel Memory ToolRegistry StreamWriter That means the agent logic can focus only on the execution pattern.\nIn my case, I implemented a ReAct-style loop:\nload history from memory inject a dynamic system prompt add the user message call the model if the model requests tool calls, execute them and append the tool results to history call the model again if a final answer is returned, persist it and end the run The agent does not know about HTTP. It does not know about SSE specifically. It does not know how PostgreSQL works. It only knows how to orchestrate message flow, tool calls, and stopping conditions.\nThat boundary was the most valuable part of the redesign.\nMoving from single-shot chat to thread-based chat A major change in the second phase was to stop treating chat as a one-off request and start modeling it as a real thread.\nI introduced three core domain entities:\nAgent: the agent definition and configuration Thread: a conversation session Message: an individual item in the thread This is a much better fit for product reality than a stateless /chat endpoint.\nAgent The Agent entity stores configuration such as:\nname description mode enabled tools model temperature max tokens timeout metadata Thread The Thread entity stores:\nownership and tenancy associated agent title status metadata last message timestamp Message The Message entity stores:\nthread ID role (user, assistant, system, tool) structured content message order tool calls tool call IDs token usage timestamps This model makes it possible to build a product-level chat system rather than a demo endpoint.\nDatabase design The database schema was intentionally simple:\nagents threads messages One design choice that mattered a lot was storing a message_index on messages.\nThat gave several advantages:\nordering is explicit instead of depending on timestamps loading the latest N messages is easy future features such as insertion or editing are more manageable This sounds like a small detail, but in chat systems, stable ordering becomes more important as soon as you care about persistence, debugging, or replay.\nReplacing in-memory history with PostgreSQL memory Once the thread model existed, the next step was to back memory with PostgreSQL.\nThe Postgres implementation does two main things:\nLoading history It retrieves messages belonging to a thread, ordered by message_index, and converts them into internal ai.Message objects.\nOne detail I intentionally added was excluding system messages from persistence-based reloads. System prompts are dynamically generated each run, so I did not want to treat them like ordinary stored history.\nSaving messages When saving, the implementation:\nassigns the next message_index handles tool call IDs determines message visibility The visibility concept turned out to be useful. For example:\nuser-visible messages can be shown in the end-user UI internal tool outputs and intermediate assistant messages can remain hidden from the user but still be available for debugging or auditing That gives a much cleaner separation between what the agent actually does and what the product UI should expose.\nUse cases and handlers Once the abstractions existed, the application layer became much simpler.\nThe AgentRunUsecase is responsible for orchestration:\nresolve app-specific context construct the tool registry register the tools relevant to that use case build the model using a model factory build the agent execute it In other words, the use case chooses which parts to assemble, but it does not contain the actual agent logic.\nThe presentation layer is thinner still. A handler only needs to:\nvalidate the request perform authentication create a stream writer delegate to the appropriate use case That means HTTP concerns remain HTTP concerns, and AI concerns remain AI concerns.\nThis separation is exactly what I wanted from the beginning, but could only get after introducing explicit abstractions.\nPractical pitfalls during implementation The redesign improved the architecture, but it did not eliminate implementation traps.\nHere are two of the most important ones.\n1. User messages were accidentally saved twice At one point, both the use case and the agent were saving the same user message:\nthe use case saved it before execution the agent saved it again after loading history and appending the user input The fix was straightforward once I understood the responsibility boundary:\nAll agent-managed messages — user, assistant, tool, and system-related internal flow — should be saved by the agent side through memory. The use case should not duplicate that responsibility.\n2. Streaming responses were not being persisted When streaming token deltas directly to the client, you do not automatically get a final assistant message to store.\nSo the agent had to explicitly accumulate the streamed text:\ncollect every text delta into a buffer keep streaming those deltas to the client once the stream finishes, if the result is a final answer rather than a tool call, save the collected text as one assistant message Without this step, the UI would show the answer live, but the database would not actually contain the final response.\nThat kind of issue is easy to miss when the transport and the persistence logic are not clearly separated.\nWhat became easier after the redesign? The biggest improvement was not elegance. It was local change.\nAfter the redesign:\nchanging model providers became much more contained adding or removing tools no longer contaminated unrelated parts of the codebase changing memory implementation did not require rewriting agent logic streaming transport became a replaceable output concern agent behavior became testable without HTTP use-case level orchestration became clearer In practice, this changed the development loop.\nInstead of experimenting directly inside handlers or endpoint logic, I could iterate freely inside pkg/ai, and only later wire the results into application and presentation layers.\nThat made both experimentation and stabilization easier.\nWhat I still think are the trade-offs? I do not want to oversell this approach.\nBuilding a thin, self-owned agent layer in Go is not the fastest way to get an AI feature on the screen. If you only want a demo, or if the system will never become a core product feature, using an existing framework can absolutely be the better choice.\nThe trade-off is simple:\nframework-first gives faster short-term development architecture-first gives better long-term control If your AI functionality needs to live inside an existing Go backend, be persisted cleanly, be audited, be observable, and evolve over time, then the architecture-first route becomes much more attractive.\nFuture extensions The design also leaves room for further growth.\nRAG RAG can be added in at least two ways:\nas tools, such as SearchDocumentsTool as a retrieval step inside memory loading Because both ToolRegistry and Memory are abstracted, either approach can be introduced later without destabilizing the rest of the agent layer.\nMulti-provider models A ModelFactory interface makes it straightforward to support multiple providers:\nOpenAI Anthropic local LLMs This can later expand into per-agent model selection or even dynamic model routing.\nMCP and external tool protocols Because tools are already abstracted, an MCP-backed tool layer can be integrated by:\nimplementing MCP-aware tools or synchronizing remote tool definitions into the registry From the agent’s point of view, they are still just tools.\nConclusion The most important lesson from this project is that building AI agents for products is much more about system design than about LLM API calls.\nThe hard parts are:\nisolating provider dependencies structuring tool execution designing memory correctly deciding what gets persisted handling streaming without leaking transport details everywhere keeping the system testable while the AI stack keeps changing For my use case, Go and Clean Architecture turned out to be a very good combination for solving those problems.\nPython still dominates the AI tooling ecosystem, and for good reasons. But if your product backend is already in Go, and you want an AI agent system that behaves like a real product subsystem instead of a framework-shaped add-on, building a thin internal agent layer in Go is a very practical option.\nIt is slower at first.\nBut it gives you control where it matters.\n","permalink":"https://blog.yusukeikoma.com/posts/ai-agent-clean-architecture/","summary":"Why I built an AI agent platform in Go instead of using LangChain, and how Clean Architecture made model, memory, tool, and streaming concerns independently replaceable.","title":"AI Agent Clean Architecture"},{"content":"Hi, I\u0026rsquo;m Yusuke Ikoma \u0026ndash; a CS undergrad at the University of Tokyo (currently on leave) and a freelance software / AI engineer.\nI started this blog as a place to organize and share my thoughts on topics I\u0026rsquo;m studying and working on. Expect posts on:\nMachine Learning \u0026amp; AI \u0026ndash; Papers, experiments, and concepts I\u0026rsquo;m exploring Systems \u0026amp; Infrastructure \u0026ndash; Cloud, distributed systems, and DevOps Software Engineering \u0026ndash; Architecture, tools, and lessons from real projects Writing helps me think more clearly. If any of these topics interest you, I hope you\u0026rsquo;ll find something useful here.\nThanks for stopping by!\n","permalink":"https://blog.yusukeikoma.com/posts/hello-world/","summary":"Welcome to my blog \u0026ndash; a space for sharing notes on CS, ML, and engineering.","title":"Hello World"},{"content":"Yusuke Ikoma I\u0026rsquo;m a junior at the Department of Electrical and Electronic Engineering, Faculty of Engineering, the University of Tokyo, currently on a leave of absence (Apr 2026 \u0026ndash; Mar 2027, planned). During this time, I\u0026rsquo;m working as a freelance Software Engineer / AI Engineer, taking on contract projects while exploring various areas of engineering.\nInterests Machine Learning / AI \u0026ndash; Deep learning, model training and optimization, applied ML Systems \u0026amp; Infrastructure \u0026ndash; Distributed systems, cloud architecture, MLOps GPU / HPC \u0026ndash; High-performance computing, parallel architectures Software Engineering \u0026ndash; Backend, full-stack development, system design Skills Languages: Python, TypeScript, Go, Lisp\nML / AI: TensorFlow, PyTorch\nInfrastructure: Docker, AWS, GCP\nWeb / App: React, Next.js, NestJS, Supabase\nContact GitHub: yusukeikoma X: @IYusuke1205 LinkedIn: Yusuke Ikoma ","permalink":"https://blog.yusukeikoma.com/about/","summary":"About me","title":"About"}]