fbpx
AgentChain: Chain Together Models to Perform Complex Tasks AgentChain: Chain Together Models to Perform Complex Tasks
AgentChain uses Large Language Models (LLMs) for planning and orchestrating multiple Agents or Large Models (LMs) for accomplishing sophisticated tasks. AgentChain... AgentChain: Chain Together Models to Perform Complex Tasks

AgentChain uses Large Language Models (LLMs) for planning and orchestrating multiple Agents or Large Models (LMs) for accomplishing sophisticated tasks. AgentChain is fully multimodal: it accepts text, image, audio, and tabular data as input and output.

  • ūü߆¬†LLMs as the brain:¬†AgentChain leverages state-of-the-art Large Language Models to provide users with the ability to plan and make decisions based on natural language inputs. This feature makes AgentChain a versatile tool for a wide range of applications, such as task execution given natural language instructions, data understanding, and data generation.
  • ūüĆü¬†Fully Multimodal IO:¬†AgentChain is fully multimodal, accepting input and output from various modalities, such as text, image, audio, or video (coming soon). This feature makes AgentChain a versatile tool for a wide range of applications, such as computer vision, speech recognition, and transitioning from one modality to another.
  • ūü§Ě¬†Orchestrate Versatile Agents:¬†AgentChain can orchestrate multiple agents to perform complex tasks. Using composability and hierarchical structuring of tools AgentChain can choose inelligently which tools to use and when for a certain task. This feature makes AgentChain a powerful tool for projects that require complex combination of tools.
  • ūüĒ߬†Customizable for Ad-hoc Needs:¬†AgentChain can be customized to fit specific project requirements, making it a versatile tool for a wide range of applications. Specific requirements can be met by enhancing capabilities with new agents (and distributed architecture coming soon).

Agents in AgentChain

AgentChain is a sophisticated system with the goal of solving general problems. It can orchestrate multiple agents to accomplish sub-problems. These agents are organized into different groups, each with their unique set of capabilities and functionalities.

SearchAgents

The SearchAgents group is responsible for gathering information from various sources, including search engines, online databases, and APIs. The agents in this group are highly skilled at retrieving up-to-date world knowledge and information. Some examples of agents in this group include the Google Search API, Bing API, Wikipedia API, and Serp.

CommsAgents

The CommsAgents group is responsible for handling communication between different parties, such as sending emails, making phone calls, or messaging via various platforms. The agents in this group can integrate with a wide range of platforms. Some examples of agents in this group include TwilioCaller, TwilioEmailWriter, TwilioMessenger and Slack.

ToolsAgents

The ToolsAgents group is responsible for performing various computational tasks, such as performing calculations, running scripts, or executing commands. The agents in this group can work with a wide range of programming languages and tools. Some examples of agents in this group include Math, Python REPL, and Terminal.

MultiModalAgents

The MultiModalAgents group is responsible for handling input and output from various modalities, such as text, image, audio, or video (coming soon). The agents in this group can process and understand different modalities. Some examples of agents in this group include OpenAI Whisper, Blip2, Coqui, and StableDiffusion.

ImageAgents

The ImageAgents group is responsible for processing and manipulating images, such as enhancing image quality, object detection, or image recognition. The agents in this group can perform complex operations on them. Some examples of agents in this group include Upscaler, ControlNet and YOLO.

DBAgents

The DBAgents group is responsible for adding and fetching data from your database, such as getting metrics or aggregations from your database. The agents in this group will interact with databases and enrich other agents with your database information. Some examples of agents in this group include SQL, MongoDB, ElasticSearch, Qrant and Notion.

Get started

  1. Clone the AgentChain repo
  2. Install requirements: pip install -r requirements.txt
  3. Download model checkpoints: bash download.sh
  4. Depending on the agents you need in-place, make sure to export environment variables:
OPENAI_API_KEY={YOUR_OPENAI_API_KEY} # mandatory since the LLM is central in this application
SERPAPI_API_KEY={YOUR_SERPAPI_API_KEY}  # make sure to include a serp API key in case you need the agent to be able to search the web

# These environment variables are needed in case you want the agent to be able to make phone calls
AWS_ACCESS_KEY_ID={YOUR_AWS_ACCESS_KEY_ID}
AWS_SECRET_ACCESS_KEY={YOUR_AWS_SECRET_ACCESS_KEY}
TWILIO_ACCOUNT_SID={YOUR_TWILIO_ACCOUNT_SID}
TWILIO_AUTH_TOKEN={YOUR_TWILIO_AUTH_TOKEN}
AWS_S3_BUCKET_NAME={YOUR_AWS_S3_BUCKET_NAME} # make sure to create an S3 bucket with public access
  1. Install ffmpeg library (needed for whisper): sudo apt update && sudo apt install ffmpeg (Ubuntu command)
  2. Run the main script: python main.py

Visit the repo now!

Article originally posted here by Jina AI. Reposted with permission. Check out the original blog for examples with videos!

ODSC Community

The Open Data Science community is passionate and diverse, and we always welcome contributions from data science professionals! All of the articles under this profile are from our community, with individual authors mentioned in the text itself.

1