ConfigUrable Generalist Agent (CUGA)
An autonomous agent capable of performing web actions with intelligent planning and task decomposition
ConfigUrable Generalist Agent (CUGA)
An autonomous agent capable of performing web actions with intelligent planning and task decomposition
What is CUGA?
CUGA is a sophisticated autonomous agent that combines the power of large language models with intelligent task planning and execution capabilities. It can perform both browser automation and API interactions, making it a versatile tool for complex workflow automation.
Key Features
- Autonomous Operation: Self-planning and task decomposition
- Web Automation: Browser-based task execution
- API Integration: Seamless API interaction capabilities
- Intelligent Planning: LLM-powered decision making
- Workflow Persistence: Save and reuse successful workflows
- Experiment Tracking: Comprehensive monitoring and analytics
🔗 Important Links
- cuga-website - Official website
- Enablement-session - Training materials
Use Cases
CUGA excels in scenarios requiring:
- Complex Workflow Automation: Multi-step processes that require decision making
- API Orchestration: Coordinating multiple API calls with intelligent error handling
- Web Scraping & Automation: Browser-based data collection and form filling
- Business Process Automation: Repetitive tasks that benefit from AI-powered optimization
- Research & Development: Experimental workflows that require adaptive planning
Technology Stack
- Python 3.12: Core runtime environment
- UV: Modern Python package management
- FastAPI: High-performance web framework
- Selenium/Playwright: Browser automation capabilities
- OpenAI/LiteLLM: LLM integration for intelligent decision making
- Docker: Containerized deployment and evaluation
Quick Start
Get started with CUGA in minutes:
# Clone & Setup
git clone git@github.com:cuga-project/cuga-agent.git
cd cuga
uv venv --python=3.12 && source .venv/bin/activate
uv sync
# Configure
cp .env.example .env
# Add your OPENAI_API_KEY to .env
# Test Code Sandbox
uv run test_sandbox
# Run Demo
cuga start demoDocumentation Structure
This documentation is organized into four main sections:
1. Getting Started
- Introduction - Overview and key concepts
- Quick Start - Get up and running quickly
- Installation - Complete setup instructions
- Configuration - Model setup, environment, and settings
2. Usage
- Demo Mode - Learn CUGA with pre-configured demo
- Control Commands - Master CUGA CLI
- API Integration - Add your own APIs and tools
- Save & Reuse - Workflow persistence and optimization
- Execution Modes - Fast, accurate, and custom modes
3. Evaluation
- AppWorld Evaluation - Test with AppWorld benchmark
- WebArena Evaluation - Test with WebArena benchmark
- WxO Tools Evaluation - Integrate with Watson Orchestrate
- Docker Parallel Evaluation - Scale evaluation with containers
- Experiment Tracking - Monitor and analyze results
4. Development
- Testing - Comprehensive testing strategies
- Troubleshooting - Debug common issues
- Debugging - Advanced debugging techniques
- API Reference - Complete API documentation
Prerequisites
Before getting started, ensure you have:
| Tool | Purpose | Installation |
|---|---|---|
| UV | Python project manager | Install Guide |
| Rancher Desktop | Container management | Download |
| OpenAI API Key | LLM access | Add to .env file |
CUGA Team: Use the ETE LiteLLM API key
Demo Mode
CUGA comes with a pre-configured demo featuring the Digital Sales API:
# Start the demo
cuga start demo
# Try this example query:
"get my top account by revenue from digital sales"Demo Pages
We've created demo HTML pages for testing different CUGA modes:
- Hybrid Demo: Test browser + API integration
- Browser Demo: Test pure web automation
- API Demo: Test API-only operations
Download and open these pages in your browser to test CUGA's capabilities in each mode.
Experiment Tracking
Monitor your CUGA experiments with built-in analytics:
# View experiment dashboard
cuga exp
# Start dashboard for specific experiment
# Click "Start Dashboard" in the interfaceExecution Modes
CUGA offers multiple execution modes optimized for different use cases:
| Mode | Speed | Accuracy | Use Case |
|---|---|---|---|
| Fast | ⭐⭐⭐⭐⭐ | ⭐⭐⭐ | Development, testing |
| Accurate | ⭐⭐⭐ | ⭐⭐⭐⭐⭐ | Production, critical tasks |
| Custom | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ | Tailored workflows |
| Save & Reuse | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐ | Repeated workflows |
Evaluation Benchmarks
Test CUGA with industry-standard benchmarks:
- AppWorld: Real-world web application testing
- WebArena: Web automation and navigation testing
- WxO Tools: Watson Orchestrate integration testing
Configuration
Model Providers
CUGA supports multiple LLM providers:
- ETE LiteLLM (Recommended for IBM teams) - Free, high-performance
- OpenAI - Popular choice with excellent model quality
- IBM WatsonX - Enterprise-grade AI platform
- Azure OpenAI - Microsoft's managed service
Environment Setup
# Switch between providers
export AGENT_SETTING_CONFIG="settings.litellm.toml" # ETE (default)
export AGENT_SETTING_CONFIG="settings.openai.toml" # OpenAI
export AGENT_SETTING_CONFIG="settings.watsonx.toml" # WatsonXTroubleshooting
Common Issues
| Issue | Solution |
|---|---|
| Port conflicts | Check if ports 8000, 8005, 8080 are free |
| Docker errors | Ensure Rancher Desktop is running |
| API key errors | Verify .env file has correct OPENAI_API_KEY |
| Module not found | Run uv sync to install dependencies |
Debug Commands
# Check service status
cuga status
# View logs
cuga logs --tail
# Check configuration
cuga config show
# Run diagnostics
cuga diagnoseTesting
Run the comprehensive test suite:
# Run all tests
uv run pytest -v
# Run specific test categories
uv run pytest tests/unit/ # Unit tests
uv run pytest tests/integration/ # Integration tests
uv run pytest tests/system/ # System tests
# Run with coverage
uv run pytest --cov=cuga --cov-report=htmlPerformance Monitoring
Monitor CUGA performance and health:
# Real-time monitoring
cuga monitor
# Performance metrics
cuga stats
# Resource usage
cuga stats --resourcesContinuous Integration
Automate testing and evaluation:
# Daily evaluation
0 2 * * * cd /path/to/cuga && uv run appworld_eval --eval_key daily_test
# Weekly comprehensive evaluation
0 3 * * 0 cd /path/to/cuga && uv run appworld_eval --eval_key weekly_fullResources
Contributing
We welcome contributions from the IBM community:
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests for new functionality
- Submit a pull request
License
This project is proprietary to IBM Corporation. All rights reserved.
