Files
broswer-automation/agent-livekit/ENHANCED_FIELD_WORKFLOW.md
nasir@endelospay.com d97cad1736 first commit
2025-08-12 02:54:17 +05:00

231 lines
6.9 KiB
Markdown

# Enhanced Field Detection and Filling Workflow
## Overview
This implementation provides an advanced workflow for LiveKit agents to handle missing webpage fields using MCP (Model Context Protocol) for automatic field detection and filling. When a field cannot be found using standard methods, the system automatically employs multiple detection strategies and executes specified actions after successful field population.
## Key Features
### 1. Multi-Strategy Field Detection
The workflow employs five detection strategies in order of preference:
1. **Cached Fields** (Confidence: 0.9)
- Uses pre-detected and cached field information
- Fastest and most reliable method
- Automatically refreshes cache if empty
2. **Enhanced Detection** (Confidence: 0.8)
- Uses intelligent selector generation based on field names
- Supports multiple field name variations and patterns
- Handles common field types (email, password, username, etc.)
3. **Label Analysis** (Confidence: 0.7)
- Analyzes HTML labels and their associations with form fields
- Supports `for` attribute relationships
- Context-aware field matching
4. **Content Analysis** (Confidence: 0.6)
- Analyzes page content for field-related keywords
- Matches form elements based on proximity to keywords
- Handles dynamic content and non-standard field naming
5. **Fallback Patterns** (Confidence: 0.3)
- Last resort using common CSS selectors
- Targets any visible input fields
- Provides basic functionality when all else fails
### 2. Automatic Action Execution
After successful field filling, the workflow can execute a series of actions:
- **submit**: Submit a form (with optional form selector)
- **click**: Click on any element using CSS selector
- **navigate**: Navigate to a new URL
- **wait**: Pause execution for specified time
- **keyboard**: Send keyboard input (Enter, Tab, etc.)
### 3. Comprehensive Error Handling
- Detailed error reporting for each detection strategy
- Graceful fallback between strategies
- Action-level error handling with optional/required flags
- Execution time tracking and performance metrics
## Implementation Details
### Core Method: `execute_field_workflow`
```python
async def execute_field_workflow(
self,
field_name: str,
field_value: str,
actions: list = None,
max_retries: int = 3
) -> dict:
```
**Parameters:**
- `field_name`: Name or identifier of the field to find
- `field_value`: Value to fill in the field
- `actions`: List of actions to execute after successful field filling
- `max_retries`: Maximum number of detection attempts
**Returns:**
A dictionary containing:
- `success`: Overall workflow success status
- `field_filled`: Whether the field was successfully filled
- `actions_executed`: List of executed actions with results
- `detection_method`: Which strategy successfully found the field
- `errors`: List of any errors encountered
- `execution_time`: Total workflow execution time
- `field_selector`: CSS selector used to fill the field
### Action Format
Actions are specified as a list of dictionaries:
```python
actions = [
{
"type": "submit", # Action type
"target": "form", # Target selector/value (optional for submit)
"delay": 0.5, # Delay before action (optional)
"required": True # Whether action failure should stop workflow (optional)
},
{
"type": "click",
"target": "button[type='submit']",
"required": True
},
{
"type": "keyboard",
"target": "Enter"
}
]
```
## Usage Examples
### 1. Simple Search Workflow
```python
# Fill search field and press Enter
result = await mcp_client.execute_field_workflow(
field_name="search",
field_value="LiveKit automation",
actions=[{"type": "keyboard", "target": "Enter"}]
)
```
### 2. Login Form Workflow
```python
# Fill email field and submit form
result = await mcp_client.execute_field_workflow(
field_name="email",
field_value="user@example.com",
actions=[
{"type": "wait", "target": "1"},
{"type": "submit", "target": "form#login"}
]
)
```
### 3. Complex Multi-Step Workflow
```python
# Fill message field, wait, then click submit button
result = await mcp_client.execute_field_workflow(
field_name="message",
field_value="Hello from LiveKit agent!",
actions=[
{"type": "wait", "target": "0.5"},
{"type": "click", "target": "button[type='submit']"},
{"type": "wait", "target": "2"},
{"type": "navigate", "target": "https://example.com/success"}
]
)
```
## LiveKit Agent Integration
The workflow is integrated into the LiveKit agent as a function tool:
```python
@function_tool
async def execute_field_workflow(
context: RunContext,
field_name: str,
field_value: str,
actions: str = ""
):
```
**Usage in LiveKit Agent:**
- `field_name`: Natural language field identifier
- `field_value`: Value to fill
- `actions`: JSON string of actions to execute
**Example Agent Commands:**
```
"Fill the search field with 'python tutorial' and press Enter"
execute_field_workflow("search", "python tutorial", '[{"type": "keyboard", "target": "Enter"}]')
"Fill email with test@example.com and submit the form"
execute_field_workflow("email", "test@example.com", '[{"type": "submit"}]')
```
## Error Handling and Reliability
### Retry Mechanism
- Configurable retry attempts (default: 3)
- Progressive strategy fallback
- Intelligent delay between retries
### Error Reporting
- Strategy-level error tracking
- Action-level success/failure reporting
- Detailed error messages for debugging
### Performance Monitoring
- Execution time tracking
- Strategy performance metrics
- Confidence scoring for detection methods
## Testing
Use the provided test script to validate functionality:
```bash
python test_field_workflow.py
```
The test script includes scenarios for:
- Google search workflow
- Login form handling
- Contact form submission
- JSON action format validation
## Configuration
The workflow uses the existing MCP Chrome client configuration:
```python
chrome_config = {
'mcp_server_type': 'chrome_extension',
'mcp_server_url': 'http://localhost:3000',
'mcp_server_command': '',
'mcp_server_args': []
}
```
## Benefits
1. **Robust Field Detection**: Multiple fallback strategies ensure high success rates
2. **Automated Workflows**: Complete automation from field detection to action execution
3. **Error Resilience**: Comprehensive error handling and recovery mechanisms
4. **Performance Optimized**: Intelligent caching and strategy ordering
5. **Easy Integration**: Simple API that works with existing LiveKit agent infrastructure
6. **Detailed Reporting**: Comprehensive execution results for debugging and monitoring
This implementation significantly improves the reliability of web automation tasks by providing intelligent field detection and automated workflow execution capabilities.