# Enhanced Field Detection and Filling Workflow ## Overview This implementation provides an advanced workflow for LiveKit agents to handle missing webpage fields using MCP (Model Context Protocol) for automatic field detection and filling. When a field cannot be found using standard methods, the system automatically employs multiple detection strategies and executes specified actions after successful field population. ## Key Features ### 1. Multi-Strategy Field Detection The workflow employs five detection strategies in order of preference: 1. **Cached Fields** (Confidence: 0.9) - Uses pre-detected and cached field information - Fastest and most reliable method - Automatically refreshes cache if empty 2. **Enhanced Detection** (Confidence: 0.8) - Uses intelligent selector generation based on field names - Supports multiple field name variations and patterns - Handles common field types (email, password, username, etc.) 3. **Label Analysis** (Confidence: 0.7) - Analyzes HTML labels and their associations with form fields - Supports `for` attribute relationships - Context-aware field matching 4. **Content Analysis** (Confidence: 0.6) - Analyzes page content for field-related keywords - Matches form elements based on proximity to keywords - Handles dynamic content and non-standard field naming 5. **Fallback Patterns** (Confidence: 0.3) - Last resort using common CSS selectors - Targets any visible input fields - Provides basic functionality when all else fails ### 2. Automatic Action Execution After successful field filling, the workflow can execute a series of actions: - **submit**: Submit a form (with optional form selector) - **click**: Click on any element using CSS selector - **navigate**: Navigate to a new URL - **wait**: Pause execution for specified time - **keyboard**: Send keyboard input (Enter, Tab, etc.) ### 3. Comprehensive Error Handling - Detailed error reporting for each detection strategy - Graceful fallback between strategies - Action-level error handling with optional/required flags - Execution time tracking and performance metrics ## Implementation Details ### Core Method: `execute_field_workflow` ```python async def execute_field_workflow( self, field_name: str, field_value: str, actions: list = None, max_retries: int = 3 ) -> dict: ``` **Parameters:** - `field_name`: Name or identifier of the field to find - `field_value`: Value to fill in the field - `actions`: List of actions to execute after successful field filling - `max_retries`: Maximum number of detection attempts **Returns:** A dictionary containing: - `success`: Overall workflow success status - `field_filled`: Whether the field was successfully filled - `actions_executed`: List of executed actions with results - `detection_method`: Which strategy successfully found the field - `errors`: List of any errors encountered - `execution_time`: Total workflow execution time - `field_selector`: CSS selector used to fill the field ### Action Format Actions are specified as a list of dictionaries: ```python actions = [ { "type": "submit", # Action type "target": "form", # Target selector/value (optional for submit) "delay": 0.5, # Delay before action (optional) "required": True # Whether action failure should stop workflow (optional) }, { "type": "click", "target": "button[type='submit']", "required": True }, { "type": "keyboard", "target": "Enter" } ] ``` ## Usage Examples ### 1. Simple Search Workflow ```python # Fill search field and press Enter result = await mcp_client.execute_field_workflow( field_name="search", field_value="LiveKit automation", actions=[{"type": "keyboard", "target": "Enter"}] ) ``` ### 2. Login Form Workflow ```python # Fill email field and submit form result = await mcp_client.execute_field_workflow( field_name="email", field_value="user@example.com", actions=[ {"type": "wait", "target": "1"}, {"type": "submit", "target": "form#login"} ] ) ``` ### 3. Complex Multi-Step Workflow ```python # Fill message field, wait, then click submit button result = await mcp_client.execute_field_workflow( field_name="message", field_value="Hello from LiveKit agent!", actions=[ {"type": "wait", "target": "0.5"}, {"type": "click", "target": "button[type='submit']"}, {"type": "wait", "target": "2"}, {"type": "navigate", "target": "https://example.com/success"} ] ) ``` ## LiveKit Agent Integration The workflow is integrated into the LiveKit agent as a function tool: ```python @function_tool async def execute_field_workflow( context: RunContext, field_name: str, field_value: str, actions: str = "" ): ``` **Usage in LiveKit Agent:** - `field_name`: Natural language field identifier - `field_value`: Value to fill - `actions`: JSON string of actions to execute **Example Agent Commands:** ``` "Fill the search field with 'python tutorial' and press Enter" execute_field_workflow("search", "python tutorial", '[{"type": "keyboard", "target": "Enter"}]') "Fill email with test@example.com and submit the form" execute_field_workflow("email", "test@example.com", '[{"type": "submit"}]') ``` ## Error Handling and Reliability ### Retry Mechanism - Configurable retry attempts (default: 3) - Progressive strategy fallback - Intelligent delay between retries ### Error Reporting - Strategy-level error tracking - Action-level success/failure reporting - Detailed error messages for debugging ### Performance Monitoring - Execution time tracking - Strategy performance metrics - Confidence scoring for detection methods ## Testing Use the provided test script to validate functionality: ```bash python test_field_workflow.py ``` The test script includes scenarios for: - Google search workflow - Login form handling - Contact form submission - JSON action format validation ## Configuration The workflow uses the existing MCP Chrome client configuration: ```python chrome_config = { 'mcp_server_type': 'chrome_extension', 'mcp_server_url': 'http://localhost:3000', 'mcp_server_command': '', 'mcp_server_args': [] } ``` ## Benefits 1. **Robust Field Detection**: Multiple fallback strategies ensure high success rates 2. **Automated Workflows**: Complete automation from field detection to action execution 3. **Error Resilience**: Comprehensive error handling and recovery mechanisms 4. **Performance Optimized**: Intelligent caching and strategy ordering 5. **Easy Integration**: Simple API that works with existing LiveKit agent infrastructure 6. **Detailed Reporting**: Comprehensive execution results for debugging and monitoring This implementation significantly improves the reliability of web automation tasks by providing intelligent field detection and automated workflow execution capabilities.