Files
broswer-automation/agent-livekit/ENHANCED_FIELD_WORKFLOW.md
nasir@endelospay.com d97cad1736 first commit
2025-08-12 02:54:17 +05:00

6.9 KiB

Enhanced Field Detection and Filling Workflow

Overview

This implementation provides an advanced workflow for LiveKit agents to handle missing webpage fields using MCP (Model Context Protocol) for automatic field detection and filling. When a field cannot be found using standard methods, the system automatically employs multiple detection strategies and executes specified actions after successful field population.

Key Features

1. Multi-Strategy Field Detection

The workflow employs five detection strategies in order of preference:

  1. Cached Fields (Confidence: 0.9)

    • Uses pre-detected and cached field information
    • Fastest and most reliable method
    • Automatically refreshes cache if empty
  2. Enhanced Detection (Confidence: 0.8)

    • Uses intelligent selector generation based on field names
    • Supports multiple field name variations and patterns
    • Handles common field types (email, password, username, etc.)
  3. Label Analysis (Confidence: 0.7)

    • Analyzes HTML labels and their associations with form fields
    • Supports for attribute relationships
    • Context-aware field matching
  4. Content Analysis (Confidence: 0.6)

    • Analyzes page content for field-related keywords
    • Matches form elements based on proximity to keywords
    • Handles dynamic content and non-standard field naming
  5. Fallback Patterns (Confidence: 0.3)

    • Last resort using common CSS selectors
    • Targets any visible input fields
    • Provides basic functionality when all else fails

2. Automatic Action Execution

After successful field filling, the workflow can execute a series of actions:

  • submit: Submit a form (with optional form selector)
  • click: Click on any element using CSS selector
  • navigate: Navigate to a new URL
  • wait: Pause execution for specified time
  • keyboard: Send keyboard input (Enter, Tab, etc.)

3. Comprehensive Error Handling

  • Detailed error reporting for each detection strategy
  • Graceful fallback between strategies
  • Action-level error handling with optional/required flags
  • Execution time tracking and performance metrics

Implementation Details

Core Method: execute_field_workflow

async def execute_field_workflow(
    self, 
    field_name: str, 
    field_value: str, 
    actions: list = None, 
    max_retries: int = 3
) -> dict:

Parameters:

  • field_name: Name or identifier of the field to find
  • field_value: Value to fill in the field
  • actions: List of actions to execute after successful field filling
  • max_retries: Maximum number of detection attempts

Returns: A dictionary containing:

  • success: Overall workflow success status
  • field_filled: Whether the field was successfully filled
  • actions_executed: List of executed actions with results
  • detection_method: Which strategy successfully found the field
  • errors: List of any errors encountered
  • execution_time: Total workflow execution time
  • field_selector: CSS selector used to fill the field

Action Format

Actions are specified as a list of dictionaries:

actions = [
    {
        "type": "submit",           # Action type
        "target": "form",           # Target selector/value (optional for submit)
        "delay": 0.5,              # Delay before action (optional)
        "required": True           # Whether action failure should stop workflow (optional)
    },
    {
        "type": "click",
        "target": "button[type='submit']",
        "required": True
    },
    {
        "type": "keyboard",
        "target": "Enter"
    }
]

Usage Examples

1. Simple Search Workflow

# Fill search field and press Enter
result = await mcp_client.execute_field_workflow(
    field_name="search",
    field_value="LiveKit automation",
    actions=[{"type": "keyboard", "target": "Enter"}]
)

2. Login Form Workflow

# Fill email field and submit form
result = await mcp_client.execute_field_workflow(
    field_name="email",
    field_value="user@example.com",
    actions=[
        {"type": "wait", "target": "1"},
        {"type": "submit", "target": "form#login"}
    ]
)

3. Complex Multi-Step Workflow

# Fill message field, wait, then click submit button
result = await mcp_client.execute_field_workflow(
    field_name="message",
    field_value="Hello from LiveKit agent!",
    actions=[
        {"type": "wait", "target": "0.5"},
        {"type": "click", "target": "button[type='submit']"},
        {"type": "wait", "target": "2"},
        {"type": "navigate", "target": "https://example.com/success"}
    ]
)

LiveKit Agent Integration

The workflow is integrated into the LiveKit agent as a function tool:

@function_tool
async def execute_field_workflow(
    context: RunContext, 
    field_name: str, 
    field_value: str, 
    actions: str = ""
):

Usage in LiveKit Agent:

  • field_name: Natural language field identifier
  • field_value: Value to fill
  • actions: JSON string of actions to execute

Example Agent Commands:

"Fill the search field with 'python tutorial' and press Enter"
execute_field_workflow("search", "python tutorial", '[{"type": "keyboard", "target": "Enter"}]')

"Fill email with test@example.com and submit the form"
execute_field_workflow("email", "test@example.com", '[{"type": "submit"}]')

Error Handling and Reliability

Retry Mechanism

  • Configurable retry attempts (default: 3)
  • Progressive strategy fallback
  • Intelligent delay between retries

Error Reporting

  • Strategy-level error tracking
  • Action-level success/failure reporting
  • Detailed error messages for debugging

Performance Monitoring

  • Execution time tracking
  • Strategy performance metrics
  • Confidence scoring for detection methods

Testing

Use the provided test script to validate functionality:

python test_field_workflow.py

The test script includes scenarios for:

  • Google search workflow
  • Login form handling
  • Contact form submission
  • JSON action format validation

Configuration

The workflow uses the existing MCP Chrome client configuration:

chrome_config = {
    'mcp_server_type': 'chrome_extension',
    'mcp_server_url': 'http://localhost:3000',
    'mcp_server_command': '',
    'mcp_server_args': []
}

Benefits

  1. Robust Field Detection: Multiple fallback strategies ensure high success rates
  2. Automated Workflows: Complete automation from field detection to action execution
  3. Error Resilience: Comprehensive error handling and recovery mechanisms
  4. Performance Optimized: Intelligent caching and strategy ordering
  5. Easy Integration: Simple API that works with existing LiveKit agent infrastructure
  6. Detailed Reporting: Comprehensive execution results for debugging and monitoring

This implementation significantly improves the reliability of web automation tasks by providing intelligent field detection and automated workflow execution capabilities.