231 lines
6.9 KiB
Markdown
231 lines
6.9 KiB
Markdown
# Enhanced Field Detection and Filling Workflow
|
|
|
|
## Overview
|
|
|
|
This implementation provides an advanced workflow for LiveKit agents to handle missing webpage fields using MCP (Model Context Protocol) for automatic field detection and filling. When a field cannot be found using standard methods, the system automatically employs multiple detection strategies and executes specified actions after successful field population.
|
|
|
|
## Key Features
|
|
|
|
### 1. Multi-Strategy Field Detection
|
|
The workflow employs five detection strategies in order of preference:
|
|
|
|
1. **Cached Fields** (Confidence: 0.9)
|
|
- Uses pre-detected and cached field information
|
|
- Fastest and most reliable method
|
|
- Automatically refreshes cache if empty
|
|
|
|
2. **Enhanced Detection** (Confidence: 0.8)
|
|
- Uses intelligent selector generation based on field names
|
|
- Supports multiple field name variations and patterns
|
|
- Handles common field types (email, password, username, etc.)
|
|
|
|
3. **Label Analysis** (Confidence: 0.7)
|
|
- Analyzes HTML labels and their associations with form fields
|
|
- Supports `for` attribute relationships
|
|
- Context-aware field matching
|
|
|
|
4. **Content Analysis** (Confidence: 0.6)
|
|
- Analyzes page content for field-related keywords
|
|
- Matches form elements based on proximity to keywords
|
|
- Handles dynamic content and non-standard field naming
|
|
|
|
5. **Fallback Patterns** (Confidence: 0.3)
|
|
- Last resort using common CSS selectors
|
|
- Targets any visible input fields
|
|
- Provides basic functionality when all else fails
|
|
|
|
### 2. Automatic Action Execution
|
|
After successful field filling, the workflow can execute a series of actions:
|
|
|
|
- **submit**: Submit a form (with optional form selector)
|
|
- **click**: Click on any element using CSS selector
|
|
- **navigate**: Navigate to a new URL
|
|
- **wait**: Pause execution for specified time
|
|
- **keyboard**: Send keyboard input (Enter, Tab, etc.)
|
|
|
|
### 3. Comprehensive Error Handling
|
|
- Detailed error reporting for each detection strategy
|
|
- Graceful fallback between strategies
|
|
- Action-level error handling with optional/required flags
|
|
- Execution time tracking and performance metrics
|
|
|
|
## Implementation Details
|
|
|
|
### Core Method: `execute_field_workflow`
|
|
|
|
```python
|
|
async def execute_field_workflow(
|
|
self,
|
|
field_name: str,
|
|
field_value: str,
|
|
actions: list = None,
|
|
max_retries: int = 3
|
|
) -> dict:
|
|
```
|
|
|
|
**Parameters:**
|
|
- `field_name`: Name or identifier of the field to find
|
|
- `field_value`: Value to fill in the field
|
|
- `actions`: List of actions to execute after successful field filling
|
|
- `max_retries`: Maximum number of detection attempts
|
|
|
|
**Returns:**
|
|
A dictionary containing:
|
|
- `success`: Overall workflow success status
|
|
- `field_filled`: Whether the field was successfully filled
|
|
- `actions_executed`: List of executed actions with results
|
|
- `detection_method`: Which strategy successfully found the field
|
|
- `errors`: List of any errors encountered
|
|
- `execution_time`: Total workflow execution time
|
|
- `field_selector`: CSS selector used to fill the field
|
|
|
|
### Action Format
|
|
|
|
Actions are specified as a list of dictionaries:
|
|
|
|
```python
|
|
actions = [
|
|
{
|
|
"type": "submit", # Action type
|
|
"target": "form", # Target selector/value (optional for submit)
|
|
"delay": 0.5, # Delay before action (optional)
|
|
"required": True # Whether action failure should stop workflow (optional)
|
|
},
|
|
{
|
|
"type": "click",
|
|
"target": "button[type='submit']",
|
|
"required": True
|
|
},
|
|
{
|
|
"type": "keyboard",
|
|
"target": "Enter"
|
|
}
|
|
]
|
|
```
|
|
|
|
## Usage Examples
|
|
|
|
### 1. Simple Search Workflow
|
|
|
|
```python
|
|
# Fill search field and press Enter
|
|
result = await mcp_client.execute_field_workflow(
|
|
field_name="search",
|
|
field_value="LiveKit automation",
|
|
actions=[{"type": "keyboard", "target": "Enter"}]
|
|
)
|
|
```
|
|
|
|
### 2. Login Form Workflow
|
|
|
|
```python
|
|
# Fill email field and submit form
|
|
result = await mcp_client.execute_field_workflow(
|
|
field_name="email",
|
|
field_value="user@example.com",
|
|
actions=[
|
|
{"type": "wait", "target": "1"},
|
|
{"type": "submit", "target": "form#login"}
|
|
]
|
|
)
|
|
```
|
|
|
|
### 3. Complex Multi-Step Workflow
|
|
|
|
```python
|
|
# Fill message field, wait, then click submit button
|
|
result = await mcp_client.execute_field_workflow(
|
|
field_name="message",
|
|
field_value="Hello from LiveKit agent!",
|
|
actions=[
|
|
{"type": "wait", "target": "0.5"},
|
|
{"type": "click", "target": "button[type='submit']"},
|
|
{"type": "wait", "target": "2"},
|
|
{"type": "navigate", "target": "https://example.com/success"}
|
|
]
|
|
)
|
|
```
|
|
|
|
## LiveKit Agent Integration
|
|
|
|
The workflow is integrated into the LiveKit agent as a function tool:
|
|
|
|
```python
|
|
@function_tool
|
|
async def execute_field_workflow(
|
|
context: RunContext,
|
|
field_name: str,
|
|
field_value: str,
|
|
actions: str = ""
|
|
):
|
|
```
|
|
|
|
**Usage in LiveKit Agent:**
|
|
- `field_name`: Natural language field identifier
|
|
- `field_value`: Value to fill
|
|
- `actions`: JSON string of actions to execute
|
|
|
|
**Example Agent Commands:**
|
|
```
|
|
"Fill the search field with 'python tutorial' and press Enter"
|
|
execute_field_workflow("search", "python tutorial", '[{"type": "keyboard", "target": "Enter"}]')
|
|
|
|
"Fill email with test@example.com and submit the form"
|
|
execute_field_workflow("email", "test@example.com", '[{"type": "submit"}]')
|
|
```
|
|
|
|
## Error Handling and Reliability
|
|
|
|
### Retry Mechanism
|
|
- Configurable retry attempts (default: 3)
|
|
- Progressive strategy fallback
|
|
- Intelligent delay between retries
|
|
|
|
### Error Reporting
|
|
- Strategy-level error tracking
|
|
- Action-level success/failure reporting
|
|
- Detailed error messages for debugging
|
|
|
|
### Performance Monitoring
|
|
- Execution time tracking
|
|
- Strategy performance metrics
|
|
- Confidence scoring for detection methods
|
|
|
|
## Testing
|
|
|
|
Use the provided test script to validate functionality:
|
|
|
|
```bash
|
|
python test_field_workflow.py
|
|
```
|
|
|
|
The test script includes scenarios for:
|
|
- Google search workflow
|
|
- Login form handling
|
|
- Contact form submission
|
|
- JSON action format validation
|
|
|
|
## Configuration
|
|
|
|
The workflow uses the existing MCP Chrome client configuration:
|
|
|
|
```python
|
|
chrome_config = {
|
|
'mcp_server_type': 'chrome_extension',
|
|
'mcp_server_url': 'http://localhost:3000',
|
|
'mcp_server_command': '',
|
|
'mcp_server_args': []
|
|
}
|
|
```
|
|
|
|
## Benefits
|
|
|
|
1. **Robust Field Detection**: Multiple fallback strategies ensure high success rates
|
|
2. **Automated Workflows**: Complete automation from field detection to action execution
|
|
3. **Error Resilience**: Comprehensive error handling and recovery mechanisms
|
|
4. **Performance Optimized**: Intelligent caching and strategy ordering
|
|
5. **Easy Integration**: Simple API that works with existing LiveKit agent infrastructure
|
|
6. **Detailed Reporting**: Comprehensive execution results for debugging and monitoring
|
|
|
|
This implementation significantly improves the reliability of web automation tasks by providing intelligent field detection and automated workflow execution capabilities.
|