Files

nasir@endelospay.com d97cad1736 first commit

2025-08-12 02:54:17 +05:00

6.9 KiB

Raw Blame History

Enhanced Field Detection and Filling Workflow

Overview

This implementation provides an advanced workflow for LiveKit agents to handle missing webpage fields using MCP (Model Context Protocol) for automatic field detection and filling. When a field cannot be found using standard methods, the system automatically employs multiple detection strategies and executes specified actions after successful field population.

Key Features

1. Multi-Strategy Field Detection

The workflow employs five detection strategies in order of preference:

Cached Fields (Confidence: 0.9)
- Uses pre-detected and cached field information
- Fastest and most reliable method
- Automatically refreshes cache if empty
Enhanced Detection (Confidence: 0.8)
- Uses intelligent selector generation based on field names
- Supports multiple field name variations and patterns
- Handles common field types (email, password, username, etc.)
Label Analysis (Confidence: 0.7)
- Analyzes HTML labels and their associations with form fields
- Supports for attribute relationships
- Context-aware field matching
Content Analysis (Confidence: 0.6)
- Analyzes page content for field-related keywords
- Matches form elements based on proximity to keywords
- Handles dynamic content and non-standard field naming
Fallback Patterns (Confidence: 0.3)
- Last resort using common CSS selectors
- Targets any visible input fields
- Provides basic functionality when all else fails

2. Automatic Action Execution

After successful field filling, the workflow can execute a series of actions:

submit: Submit a form (with optional form selector)
click: Click on any element using CSS selector
navigate: Navigate to a new URL
wait: Pause execution for specified time
keyboard: Send keyboard input (Enter, Tab, etc.)

3. Comprehensive Error Handling

Detailed error reporting for each detection strategy
Graceful fallback between strategies
Action-level error handling with optional/required flags
Execution time tracking and performance metrics

Implementation Details

Core Method: `execute_field_workflow`

async def execute_field_workflow(
    self, 
    field_name: str, 
    field_value: str, 
    actions: list = None, 
    max_retries: int = 3
) -> dict:

Parameters:

field_name: Name or identifier of the field to find
field_value: Value to fill in the field
actions: List of actions to execute after successful field filling
max_retries: Maximum number of detection attempts

Returns: A dictionary containing:

success: Overall workflow success status
field_filled: Whether the field was successfully filled
actions_executed: List of executed actions with results
detection_method: Which strategy successfully found the field
errors: List of any errors encountered
execution_time: Total workflow execution time
field_selector: CSS selector used to fill the field

Action Format

Actions are specified as a list of dictionaries:

actions = [
    {
        "type": "submit",           # Action type
        "target": "form",           # Target selector/value (optional for submit)
        "delay": 0.5,              # Delay before action (optional)
        "required": True           # Whether action failure should stop workflow (optional)
    },
    {
        "type": "click",
        "target": "button[type='submit']",
        "required": True
    },
    {
        "type": "keyboard",
        "target": "Enter"
    }
]

Usage Examples

1. Simple Search Workflow

# Fill search field and press Enter
result = await mcp_client.execute_field_workflow(
    field_name="search",
    field_value="LiveKit automation",
    actions=[{"type": "keyboard", "target": "Enter"}]
)

# Fill email field and submit form
result = await mcp_client.execute_field_workflow(
    field_name="email",
    field_value="user@example.com",
    actions=[
        {"type": "wait", "target": "1"},
        {"type": "submit", "target": "form#login"}
    ]
)

3. Complex Multi-Step Workflow

# Fill message field, wait, then click submit button
result = await mcp_client.execute_field_workflow(
    field_name="message",
    field_value="Hello from LiveKit agent!",
    actions=[
        {"type": "wait", "target": "0.5"},
        {"type": "click", "target": "button[type='submit']"},
        {"type": "wait", "target": "2"},
        {"type": "navigate", "target": "https://example.com/success"}
    ]
)

LiveKit Agent Integration

The workflow is integrated into the LiveKit agent as a function tool:

@function_tool
async def execute_field_workflow(
    context: RunContext, 
    field_name: str, 
    field_value: str, 
    actions: str = ""
):

Usage in LiveKit Agent:

field_name: Natural language field identifier
field_value: Value to fill
actions: JSON string of actions to execute

Example Agent Commands:

"Fill the search field with 'python tutorial' and press Enter"
execute_field_workflow("search", "python tutorial", '[{"type": "keyboard", "target": "Enter"}]')

"Fill email with test@example.com and submit the form"
execute_field_workflow("email", "test@example.com", '[{"type": "submit"}]')

Error Handling and Reliability

Retry Mechanism

Configurable retry attempts (default: 3)
Progressive strategy fallback
Intelligent delay between retries

Error Reporting

Strategy-level error tracking
Action-level success/failure reporting
Detailed error messages for debugging

Performance Monitoring

Execution time tracking
Strategy performance metrics
Confidence scoring for detection methods

Testing

Use the provided test script to validate functionality:

python test_field_workflow.py

The test script includes scenarios for:

Google search workflow
Login form handling
Contact form submission
JSON action format validation

Configuration

The workflow uses the existing MCP Chrome client configuration:

chrome_config = {
    'mcp_server_type': 'chrome_extension',
    'mcp_server_url': 'http://localhost:3000',
    'mcp_server_command': '',
    'mcp_server_args': []
}

Benefits

Robust Field Detection: Multiple fallback strategies ensure high success rates
Automated Workflows: Complete automation from field detection to action execution
Error Resilience: Comprehensive error handling and recovery mechanisms
Performance Optimized: Intelligent caching and strategy ordering
Easy Integration: Simple API that works with existing LiveKit agent infrastructure
Detailed Reporting: Comprehensive execution results for debugging and monitoring

This implementation significantly improves the reliability of web automation tasks by providing intelligent field detection and automated workflow execution capabilities.

6.9 KiB Raw Blame History