Week 4 at AI_devs: Building AI Agent Tools and Interfaces

Welcome to my summary of Week 4 at AI_devs. Following our exploration of data organization in Week 3, this week focused on building tools and interfaces for AI agents. We learned how to create modular, reusable components that enable AI agents to interact with external services and perform complex tasks autonomously.

AI_devs summaries:

Week 1 at AI_devs: From LLMs to Intelligent Agents
Week 2 at AI_devs: Exploring Multimodal AI
Week 3 at AI_devs: Mastering Data Organization and Retrieval
Week 4 at AI_devs: Building AI Agent Tools and Interfaces (current post)
Week 5 at AI_devs: Building Advanced AI Agents and course summary

Introduction

This week, we explored how to design and implement tools and interfaces that empower AI agents to operate autonomously. The lessons focused on five key areas:

Building modular, reusable tools
Document and web content processing
External API integrations
Managing task queues and async operations
Designing scalable AI agent infrastructure

Here’s what I learned throughout the week.

Day 1: Building Tools for AI Agents

In our previous lessons, we learned how to create basic integrations with LLM models. This week takes us to the next level - building a toolkit that lets LLMs work independently to complete assigned tasks.

Our code no longer does most of the work - instead, we’re creating tools that let LLMs make their own decisions. Each tool needs:

A clear, unique name that helps the model choose the right tool for the job
A brief description of what it can and can’t do
Simple instructions in the form of prompts
Clear input/output structures so it can work with other tools

These tools work much like Linux apps - each has a specific purpose, but they can be combined to create more complex workflows. For example, we can build tools for:

Managing tasks and projects
Handling calendars and emails
Translating documents
Creating tests or audio content
Searching the internet
Sending notifications through Slack or SMS

These tools can work independently or together. An AI agent can handle complex commands like “Every morning, check these websites, summarize them, and email me the summary” by breaking them down into individual tool operations. The AI agent itself determines what steps are needed to complete the task - this logic isn’t handled by our code. We simply provide the set of tools.

In this lesson we focused on building a todo list manager as our first tool. It can:

Get project lists
Fetch task lists
Add, change, and remove tasks
Watch for task updates

Each tool becomes a mini-app that understands natural language. With automatic prompt testing, we can easily make changes and improvements. These tools form a network, handling tasks for us and other AI agents.

Practical Task: Image Analysis and Repair Assistant

Our daily challenge involved building an image processing system. We received a set of photos, many damaged or imperfect, along with an API offering repair tools: REPAIR, DARKEN, and BRIGHTEN. We built a system to:

Download and analyze each photo
Decide if it needed fixing
Apply repairs through the API
Generate descriptions for any people in the images

The solution combined vision models for analysis, decision-making for repairs, and natural language processing for descriptions, creating a workflow where each tool handled its specialized part of the process.

Day 2: Building an Advanced Document Processing System

Our earlier lessons covered various aspects of document processing. Today we combined all these concepts to create a unified system that works with multiple data sources.

We built a simple yet powerful interface that lets AI agents perform common document operations:

Loading documents from various sources
Creating summaries
Answering questions about content
Translating between languages
Extracting specific information

The system handles tasks like:

“Go to https://… and list all mentioned tools”
“Download this DOCX file and create a summary”
“Translate this document from Polish to English”
“Answer questions a, b, c using files x, y, z”

The lesson reinforced our previous knowledge about document formatting, database storage, and data retrieval while showing how to apply these concepts in practice.

Practical Task: Data Classification System

Today’s challenge focused on building a classification system. We received three data sets:

correct - examples of properly formatted data
incorrect - examples of improperly formatted data
verify - data requiring classification

Using Few-Shot Prompting, I created a system to classify entries in the ‘verify’ set as either correct or incorrect. The approach used known examples to teach the model the difference between properly and improperly formatted data, enabling accurate classification of new cases.

Are you interested in staying up-to-date with the latest developments in #Azure, #CloudComputing, #PlatformEngineering, #DevOps, #AI?

Follow me on LinkedIn for regular updates and insights on these topics. I bring a unique perspective and in-depth knowledge to the table. Don't miss out on the valuable content I share – give me a follow today!

Day 3: Advanced Web Content Processing

Building on our document processing work from Day 2, we explored more sophisticated ways to handle web content. Instead of just downloading web pages as documents, we built systems that can actively navigate and interact with web content.

Our web processing logic works in two ways:

Full search mode: generating search queries and deciding which pages to download
Direct mode: fetching content from specific URLs

Today’s challenge involved building an AI agent that could search for information on a specially prepared website. The agent needed to:

Download page content
Check if the page contained the answer
Decide which page to visit next if needed

For implementation, I used:

github.com/go-rod/rod for web page interaction
github.com/JohannesKaufmann/html-to-markdown/v2/converter to convert HTML to Markdown (smaller and more LLM-friendly)

The agent used a prompt that returned two possible actions:

ANSWER: when the required information was found
NAVIGATE_PAGE: suggesting the next page to visit

The agent successfully navigated through the website, analyzing content and making decisions at each step until finding the required information.

Day 4: Integrating with External Services

After exploring web content processing, this lesson focused on external API integration. The lesson covered several example tools:

Google Maps for route directions and location information
Spotify for music search and playback control
Resend for email communication
Voice message system using macOS’s ‘say’ command

The discussion centered on handling irreversible actions. When a tool can send emails or post messages, mistakes can’t be undone. The lesson explored programming safeguards to either catch errors or prevent them entirely.

Key points from the lesson:

Always limit model permissions to the absolute minimum
Include human verification for critical operations
Consider using deterministic code instead of LLMs for tasks requiring 100% accuracy
Design clear interfaces between models and external APIs

Today’s challenge involved building a natural language navigation API. The system received travel descriptions like “move one square right, then all the way down” and needed to:

Parse natural language directions
Track position on a special grid
Return information about the final location

I built the system using prompts to interpret navigation commands while maintaining position tracking.

Day 5: Building Scalable AI Agent Infrastructure

The final day focused on organizing data and managing complex AI agent operations. The lesson covered a database schema design that includes:

User conversation history
Message management with document links
Task tracking with action lists
Tool and document action relationships

The discussion emphasized two key aspects of AI agent systems:

Model Responsibility: The lesson explored how to properly divide work between LLMs and code. LLMs should only handle tasks that can’t be done programmatically, while everything else should be managed by code.

Request Management: Every API has its limits, which can interrupt task execution. For language models, these include:

Query count limits
Token limits per minute/day
Input/output token limits
Budget constraints
API availability issues

Practical Task: Large Document Analysis System

Today’s challenge involved analyzing a large PDF document to answer specific questions. The main challenges were:

The document was too large to process in one LLM request
It contained both text and images, requiring special processing

I built a solution using:

marker (github.com/VikParuchuri/marker) to convert PDF to Markdown
GPT-4o to generate descriptions for extracted images
A chunking system to process the document in manageable pieces

Wrapping Up

This week was all about empowering AI agents to work autonomously by designing modular tools and interfaces. We explored building reusable components for tasks like document and web content processing, external service integration, and managing asynchronous operations.

By focusing on dividing responsibilities between code and AI models, we ensured efficient and reliable task execution. From integrating APIs like Google Maps and Spotify to handling complex operations on large documents, this week demonstrated the power of thoughtful design in AI systems.

Thanks for reading! Stay tuned for next week’s insights and challenges.

Do you like this post? Share it with your friends!
You can also subscribe to my RSS channel for future posts.