broswer-automation/agent-livekit/ENHANCED_FIELD_WORKFLOW.md

# Enhanced Field Detection and Filling Workflow

## Overview

This implementation provides an advanced workflow for LiveKit agents to handle missing webpage fields using MCP (Model Context Protocol) for automatic field detection and filling. When a field cannot be found using standard methods, the system automatically employs multiple detection strategies and executes specified actions after successful field population.

## Key Features

### 1. Multi-Strategy Field Detection
The workflow employs five detection strategies in order of preference:

1. **Cached Fields** (Confidence: 0.9)
   - Uses pre-detected and cached field information
   - Fastest and most reliable method
   - Automatically refreshes cache if empty

2. **Enhanced Detection** (Confidence: 0.8)
   - Uses intelligent selector generation based on field names
   - Supports multiple field name variations and patterns
   - Handles common field types (email, password, username, etc.)

3. **Label Analysis** (Confidence: 0.7)
   - Analyzes HTML labels and their associations with form fields
   - Supports `for` attribute relationships
   - Context-aware field matching

4. **Content Analysis** (Confidence: 0.6)
   - Analyzes page content for field-related keywords
   - Matches form elements based on proximity to keywords
   - Handles dynamic content and non-standard field naming

5. **Fallback Patterns** (Confidence: 0.3)
   - Last resort using common CSS selectors
   - Targets any visible input fields
   - Provides basic functionality when all else fails

### 2. Automatic Action Execution
After successful field filling, the workflow can execute a series of actions:

- **submit**: Submit a form (with optional form selector)
- **click**: Click on any element using CSS selector
- **navigate**: Navigate to a new URL
- **wait**: Pause execution for specified time
- **keyboard**: Send keyboard input (Enter, Tab, etc.)

### 3. Comprehensive Error Handling
- Detailed error reporting for each detection strategy
- Graceful fallback between strategies
- Action-level error handling with optional/required flags
- Execution time tracking and performance metrics

## Implementation Details

### Core Method: `execute_field_workflow`

```python
async def execute_field_workflow(
    self,
    field_name: str,
    field_value: str,
    actions: list = None,
    max_retries: int = 3
) -> dict:
```

**Parameters:**
- `field_name`: Name or identifier of the field to find
- `field_value`: Value to fill in the field
- `actions`: List of actions to execute after successful field filling
- `max_retries`: Maximum number of detection attempts

**Returns:**
A dictionary containing:
- `success`: Overall workflow success status
- `field_filled`: Whether the field was successfully filled
- `actions_executed`: List of executed actions with results
- `detection_method`: Which strategy successfully found the field
- `errors`: List of any errors encountered
- `execution_time`: Total workflow execution time
- `field_selector`: CSS selector used to fill the field

### Action Format

Actions are specified as a list of dictionaries:

```python
actions = [
    {
        "type": "submit",           # Action type
        "target": "form",           # Target selector/value (optional for submit)
        "delay": 0.5,              # Delay before action (optional)
        "required": True           # Whether action failure should stop workflow (optional)
    },
    {
        "type": "click",
        "target": "button[type='submit']",
        "required": True
    },
    {
        "type": "keyboard",
        "target": "Enter"
    }
]
```

## Usage Examples

### 1. Simple Search Workflow

```python
# Fill search field and press Enter
result = await mcp_client.execute_field_workflow(
    field_name="search",
    field_value="LiveKit automation",
    actions=[{"type": "keyboard", "target": "Enter"}]
)
```

### 2. Login Form Workflow

```python
# Fill email field and submit form
result = await mcp_client.execute_field_workflow(
    field_name="email",
    field_value="user@example.com",
    actions=[
        {"type": "wait", "target": "1"},
        {"type": "submit", "target": "form#login"}
    ]
)
```

### 3. Complex Multi-Step Workflow

```python
# Fill message field, wait, then click submit button
result = await mcp_client.execute_field_workflow(
    field_name="message",
    field_value="Hello from LiveKit agent!",
    actions=[
        {"type": "wait", "target": "0.5"},
        {"type": "click", "target": "button[type='submit']"},
        {"type": "wait", "target": "2"},
        {"type": "navigate", "target": "https://example.com/success"}
    ]
)
```

## LiveKit Agent Integration

The workflow is integrated into the LiveKit agent as a function tool:

```python
@function_tool
async def execute_field_workflow(
    context: RunContext,
    field_name: str,
    field_value: str,
    actions: str = ""
):
```

**Usage in LiveKit Agent:**
- `field_name`: Natural language field identifier
- `field_value`: Value to fill
- `actions`: JSON string of actions to execute

**Example Agent Commands:**
```
"Fill the search field with 'python tutorial' and press Enter"
execute_field_workflow("search", "python tutorial", '[{"type": "keyboard", "target": "Enter"}]')

"Fill email with test@example.com and submit the form"
execute_field_workflow("email", "test@example.com", '[{"type": "submit"}]')
```

## Error Handling and Reliability

### Retry Mechanism
- Configurable retry attempts (default: 3)
- Progressive strategy fallback
- Intelligent delay between retries

### Error Reporting
- Strategy-level error tracking
- Action-level success/failure reporting
- Detailed error messages for debugging

### Performance Monitoring
- Execution time tracking
- Strategy performance metrics
- Confidence scoring for detection methods

## Testing

Use the provided test script to validate functionality:

```bash
python test_field_workflow.py
```

The test script includes scenarios for:
- Google search workflow
- Login form handling
- Contact form submission
- JSON action format validation

## Configuration

The workflow uses the existing MCP Chrome client configuration:

```python
chrome_config = {
    'mcp_server_type': 'chrome_extension',
    'mcp_server_url': 'http://localhost:3000',
    'mcp_server_command': '',
    'mcp_server_args': []
}
```

## Benefits

1. **Robust Field Detection**: Multiple fallback strategies ensure high success rates
2. **Automated Workflows**: Complete automation from field detection to action execution
3. **Error Resilience**: Comprehensive error handling and recovery mechanisms
4. **Performance Optimized**: Intelligent caching and strategy ordering
5. **Easy Integration**: Simple API that works with existing LiveKit agent infrastructure
6. **Detailed Reporting**: Comprehensive execution results for debugging and monitoring

This implementation significantly improves the reliability of web automation tasks by providing intelligent field detection and automated workflow execution capabilities.