Week 4 at AI_devs: Building AI Agent Tools and Interfaces
Table of Contents
Welcome to my summary of Week 4 at AI_devs. Following our exploration of data organization in Week 3, this week focused on building tools and interfaces for AI agents. We learned how to create modular, reusable components that enable AI agents to interact with external services and perform complex tasks autonomously.
AI_devs summaries:
- Week 1 at AI_devs: From LLMs to Intelligent Agents
- Week 2 at AI_devs: Exploring Multimodal AI
- Week 3 at AI_devs: Mastering Data Organization and Retrieval
- Week 4 at AI_devs: Building AI Agent Tools and Interfaces (current post)
Introduction
This week, we explored how to design and implement tools and interfaces that empower AI agents to operate autonomously. The lessons focused on five key areas:
- Building modular, reusable tools
- Document and web content processing
- External API integrations
- Managing task queues and async operations
- Designing scalable AI agent infrastructure
Here’s what I learned throughout the week.
Day 1: Building Tools for AI Agents
In our previous lessons, we learned how to create basic integrations with LLM models. This week takes us to the next level - building a toolkit
that lets LLMs work independently to complete assigned tasks.
Our code no longer does most of the work - instead, we’re creating tools that let LLMs make their own decisions. Each tool needs:
- A
clear, unique name
that helps the model choose the right tool for the job - A
brief description
of what it can and can’t do Simple instructions
in the form of promptsClear input/output structures
so it can work with other tools
These tools work much like Linux apps - each has a specific purpose, but they can be combined to create more complex workflows. For example, we can build tools for:
- Managing tasks and projects
- Handling calendars and emails
- Translating documents
- Creating tests or audio content
- Searching the internet
- Sending notifications through Slack or SMS
These tools can work independently or together. An AI agent can handle complex commands like “Every morning, check these websites, summarize them, and email me the summary” by breaking them down into individual tool operations. The AI agent itself determines what steps are needed to complete the task - this logic isn’t handled by our code. We simply provide the set of tools.
In this lesson we focused on building a todo list manager as our first tool. It can:
- Get project lists
- Fetch task lists
- Add, change, and remove tasks
- Watch for task updates
Each tool becomes a mini-app that understands natural language. With automatic prompt testing, we can easily make changes and improvements. These tools form a network, handling tasks for us and other AI agents.
Practical Task: Image Analysis and Repair Assistant
Our daily challenge involved building an image processing system. We received a set of photos, many damaged or imperfect, along with an API offering repair tools: REPAIR, DARKEN, and BRIGHTEN. We built a system to:
- Download and analyze each photo
- Decide if it needed fixing
- Apply repairs through the API
- Generate descriptions for any people in the images
The solution combined vision models for analysis, decision-making for repairs, and natural language processing for descriptions, creating a workflow where each tool handled its specialized part of the process.
Day 2: Building an Advanced Document Processing System
Our earlier lessons covered various aspects of document processing. Today we combined all these concepts to create a unified system that works with multiple data sources.
We built a simple yet powerful interface that lets AI agents perform common document operations:
- Loading documents from various sources
- Creating summaries
- Answering questions about content
- Translating between languages
- Extracting specific information
The system handles tasks like:
- “Go to https://… and list all mentioned tools”
- “Download this DOCX file and create a summary”
- “Translate this document from Polish to English”
- “Answer questions a, b, c using files x, y, z”
The lesson reinforced our previous knowledge about document formatting, database storage, and data retrieval while showing how to apply these concepts in practice.
Practical Task: Data Classification System
Today’s challenge focused on building a classification system. We received three data sets:
correct
- examples of properly formatted dataincorrect
- examples of improperly formatted dataverify
- data requiring classification
Using Few-Shot Prompting
, I created a system to classify entries in the ‘verify’ set as either correct or incorrect. The approach used known examples to teach the model the difference between properly and improperly formatted data, enabling accurate classification of new cases.
Are you interested in staying up-to-date with the latest developments in #Azure, #CloudComputing, #PlatformEngineering, #DevOps, #AI?
Follow me on LinkedIn for regular updates and insights on these topics. I bring a unique perspective and in-depth knowledge to the table. Don't miss out on the valuable content I share – give me a follow today!
Day 3: Advanced Web Content Processing
Building on our document processing work from Day 2, we explored more sophisticated ways to handle web content. Instead of just downloading web pages as documents, we built systems that can actively navigate and interact with web content.
Our web processing logic works in two ways:
- Full search mode: generating search queries and deciding which pages to download
- Direct mode: fetching content from specific URLs
Practical Task: Web Navigation Agent
Today’s challenge involved building an AI agent that could search for information on a specially prepared website. The agent needed to:
- Download page content
- Check if the page contained the answer
- Decide which page to visit next if needed
For implementation, I used:
github.com/go-rod/rod
for web page interactiongithub.com/JohannesKaufmann/html-to-markdown/v2/converter
to convert HTML to Markdown (smaller and more LLM-friendly)
The agent used a prompt that returned two possible actions:
- ANSWER: when the required information was found
- NAVIGATE_PAGE: suggesting the next page to visit
The agent successfully navigated through the website, analyzing content and making decisions at each step until finding the required information.
Day 4: Integrating with External Services
After exploring web content processing, this lesson focused on external API integration. The lesson covered several example tools:
- Google Maps for route directions and location information
- Spotify for music search and playback control
- Resend for email communication
- Voice message system using macOS’s ‘say’ command
The discussion centered on handling irreversible actions
. When a tool can send emails or post messages, mistakes can’t be undone. The lesson explored programming safeguards to either catch errors or prevent them entirely.
Key points from the lesson:
- Always limit model permissions to the absolute minimum
- Include human verification for critical operations
- Consider using deterministic code instead of LLMs for tasks requiring 100% accuracy
- Design clear interfaces between models and external APIs
Practical Task: Grid Navigation System
Today’s challenge involved building a natural language navigation API. The system received travel descriptions like “move one square right, then all the way down” and needed to:
- Parse natural language directions
- Track position on a special grid
- Return information about the final location
I built the system using prompts to interpret navigation commands while maintaining position tracking.
Day 5: Building Scalable AI Agent Infrastructure
The final day focused on organizing data and managing complex AI agent operations. The lesson covered a database schema design that includes:
- User conversation history
- Message management with document links
- Task tracking with action lists
- Tool and document action relationships
The discussion emphasized two key aspects of AI agent systems:
Model Responsibility: The lesson explored how to properly divide work between LLMs and code. LLMs should only handle tasks that can’t be done programmatically, while everything else should be managed by code.
Request Management: Every API has its limits, which can interrupt task execution. For language models, these include:
- Query count limits
- Token limits per minute/day
- Input/output token limits
- Budget constraints
- API availability issues
Practical Task: Large Document Analysis System
Today’s challenge involved analyzing a large PDF document to answer specific questions. The main challenges were:
- The document was too large to process in one LLM request
- It contained both text and images, requiring special processing
I built a solution using:
marker
(github.com/VikParuchuri/marker) to convert PDF to MarkdownGPT-4o
to generate descriptions for extracted images- A
chunking system
to process the document in manageable pieces
Wrapping Up
This week was all about empowering AI agents to work autonomously by designing modular tools and interfaces. We explored building reusable components for tasks like document and web content processing, external service integration, and managing asynchronous operations.
By focusing on dividing responsibilities between code and AI models, we ensured efficient and reliable task execution. From integrating APIs like Google Maps and Spotify to handling complex operations on large documents, this week demonstrated the power of thoughtful design in AI systems.
Thanks for reading! Stay tuned for next week’s insights and challenges.
Do you like this post? Share it with your friends!
You can also subscribe to my RSS channel for future posts.