6.9 KiB
Enhanced Field Detection and Filling Workflow
Overview
This implementation provides an advanced workflow for LiveKit agents to handle missing webpage fields using MCP (Model Context Protocol) for automatic field detection and filling. When a field cannot be found using standard methods, the system automatically employs multiple detection strategies and executes specified actions after successful field population.
Key Features
1. Multi-Strategy Field Detection
The workflow employs five detection strategies in order of preference:
-
Cached Fields (Confidence: 0.9)
- Uses pre-detected and cached field information
- Fastest and most reliable method
- Automatically refreshes cache if empty
-
Enhanced Detection (Confidence: 0.8)
- Uses intelligent selector generation based on field names
- Supports multiple field name variations and patterns
- Handles common field types (email, password, username, etc.)
-
Label Analysis (Confidence: 0.7)
- Analyzes HTML labels and their associations with form fields
- Supports
for
attribute relationships - Context-aware field matching
-
Content Analysis (Confidence: 0.6)
- Analyzes page content for field-related keywords
- Matches form elements based on proximity to keywords
- Handles dynamic content and non-standard field naming
-
Fallback Patterns (Confidence: 0.3)
- Last resort using common CSS selectors
- Targets any visible input fields
- Provides basic functionality when all else fails
2. Automatic Action Execution
After successful field filling, the workflow can execute a series of actions:
- submit: Submit a form (with optional form selector)
- click: Click on any element using CSS selector
- navigate: Navigate to a new URL
- wait: Pause execution for specified time
- keyboard: Send keyboard input (Enter, Tab, etc.)
3. Comprehensive Error Handling
- Detailed error reporting for each detection strategy
- Graceful fallback between strategies
- Action-level error handling with optional/required flags
- Execution time tracking and performance metrics
Implementation Details
Core Method: execute_field_workflow
async def execute_field_workflow(
self,
field_name: str,
field_value: str,
actions: list = None,
max_retries: int = 3
) -> dict:
Parameters:
field_name
: Name or identifier of the field to findfield_value
: Value to fill in the fieldactions
: List of actions to execute after successful field fillingmax_retries
: Maximum number of detection attempts
Returns: A dictionary containing:
success
: Overall workflow success statusfield_filled
: Whether the field was successfully filledactions_executed
: List of executed actions with resultsdetection_method
: Which strategy successfully found the fielderrors
: List of any errors encounteredexecution_time
: Total workflow execution timefield_selector
: CSS selector used to fill the field
Action Format
Actions are specified as a list of dictionaries:
actions = [
{
"type": "submit", # Action type
"target": "form", # Target selector/value (optional for submit)
"delay": 0.5, # Delay before action (optional)
"required": True # Whether action failure should stop workflow (optional)
},
{
"type": "click",
"target": "button[type='submit']",
"required": True
},
{
"type": "keyboard",
"target": "Enter"
}
]
Usage Examples
1. Simple Search Workflow
# Fill search field and press Enter
result = await mcp_client.execute_field_workflow(
field_name="search",
field_value="LiveKit automation",
actions=[{"type": "keyboard", "target": "Enter"}]
)
2. Login Form Workflow
# Fill email field and submit form
result = await mcp_client.execute_field_workflow(
field_name="email",
field_value="user@example.com",
actions=[
{"type": "wait", "target": "1"},
{"type": "submit", "target": "form#login"}
]
)
3. Complex Multi-Step Workflow
# Fill message field, wait, then click submit button
result = await mcp_client.execute_field_workflow(
field_name="message",
field_value="Hello from LiveKit agent!",
actions=[
{"type": "wait", "target": "0.5"},
{"type": "click", "target": "button[type='submit']"},
{"type": "wait", "target": "2"},
{"type": "navigate", "target": "https://example.com/success"}
]
)
LiveKit Agent Integration
The workflow is integrated into the LiveKit agent as a function tool:
@function_tool
async def execute_field_workflow(
context: RunContext,
field_name: str,
field_value: str,
actions: str = ""
):
Usage in LiveKit Agent:
field_name
: Natural language field identifierfield_value
: Value to fillactions
: JSON string of actions to execute
Example Agent Commands:
"Fill the search field with 'python tutorial' and press Enter"
execute_field_workflow("search", "python tutorial", '[{"type": "keyboard", "target": "Enter"}]')
"Fill email with test@example.com and submit the form"
execute_field_workflow("email", "test@example.com", '[{"type": "submit"}]')
Error Handling and Reliability
Retry Mechanism
- Configurable retry attempts (default: 3)
- Progressive strategy fallback
- Intelligent delay between retries
Error Reporting
- Strategy-level error tracking
- Action-level success/failure reporting
- Detailed error messages for debugging
Performance Monitoring
- Execution time tracking
- Strategy performance metrics
- Confidence scoring for detection methods
Testing
Use the provided test script to validate functionality:
python test_field_workflow.py
The test script includes scenarios for:
- Google search workflow
- Login form handling
- Contact form submission
- JSON action format validation
Configuration
The workflow uses the existing MCP Chrome client configuration:
chrome_config = {
'mcp_server_type': 'chrome_extension',
'mcp_server_url': 'http://localhost:3000',
'mcp_server_command': '',
'mcp_server_args': []
}
Benefits
- Robust Field Detection: Multiple fallback strategies ensure high success rates
- Automated Workflows: Complete automation from field detection to action execution
- Error Resilience: Comprehensive error handling and recovery mechanisms
- Performance Optimized: Intelligent caching and strategy ordering
- Easy Integration: Simple API that works with existing LiveKit agent infrastructure
- Detailed Reporting: Comprehensive execution results for debugging and monitoring
This implementation significantly improves the reliability of web automation tasks by providing intelligent field detection and automated workflow execution capabilities.