Files
broswer-automation/agent-livekit/REALTIME_FORM_DISCOVERY.md
nasir@endelospay.com d97cad1736 first commit
2025-08-12 02:54:17 +05:00

8.9 KiB

Real-Time Form Discovery System

Overview

The LiveKit agent now features a REAL-TIME ONLY form discovery system that NEVER uses cached selectors. Every form field discovery is performed live using MCP tools, ensuring the most current and accurate form element detection.

Key Principles

🚫 NO CACHE POLICY

  • Zero cached selectors - every request gets fresh selectors
  • Real-time discovery only - uses MCP tools on every call
  • No hardcoded selectors - all elements discovered dynamically
  • Fresh page analysis - adapts to dynamic content changes

🔄 Real-Time MCP Tools

  • chrome_get_interactive_elements - Gets current form elements
  • chrome_get_content_web_form - Analyzes form structure
  • chrome_get_web_content - Content analysis for field discovery
  • Live selector testing - Validates selectors before use

How Real-Time Discovery Works

1. Voice Command Processing

When a user says: "fill email with john@example.com"

# NO cache lookup - goes straight to real-time discovery
field_name = "email"
value = "john@example.com"

# Step 1: Real-time MCP discovery
discovery_result = await client._discover_form_fields_dynamically(field_name, value)

# Step 2: Enhanced detection with retry (if needed)
enhanced_result = await client._enhanced_field_detection_with_retry(field_name, value)

# Step 3: Direct MCP element search (final fallback)
direct_result = await client._direct_mcp_element_search(field_name, value)

2. Real-Time Discovery Process

Strategy 1: Interactive Elements Discovery

# Get ALL current interactive elements
interactive_result = await client._call_mcp_tool("chrome_get_interactive_elements", {
    "types": ["input", "textarea", "select"]
})

# Match field name to current elements
for element in elements:
    if client._is_field_match(element, field_name):
        selector = client._extract_best_selector(element)
        # Try to fill immediately with fresh selector

Strategy 2: Form Content Analysis

# Get current form structure
form_result = await client._call_mcp_tool("chrome_get_content_web_form", {})

# Parse form content for field patterns
selector = client._parse_form_content_for_field(form_content, field_name)

# Test and use selector immediately
# Exhaustive search through ALL elements
all_elements = await client._call_mcp_tool("chrome_get_interactive_elements", {})

# Very flexible matching for any possible match
for element in all_elements:
    if client._is_very_flexible_match(element, field_name):
        # Generate and test selector immediately

3. Real-Time Selector Generation

The system generates selectors in real-time based on current element attributes:

def _extract_best_selector(element):
    attrs = element.get("attributes", {})
    
    # Priority order for reliability
    if attrs.get("id"):
        return f"#{attrs['id']}"
    if attrs.get("name"):
        return f"input[name='{attrs['name']}']"
    if attrs.get("type") and attrs.get("name"):
        return f"input[type='{attrs['type']}'][name='{attrs['name']}']"
    # ... more patterns

API Reference

Real-Time Functions

fill_field_by_name(field_name: str, value: str) -> str

NOW REAL-TIME ONLY - No cache, fresh discovery every call.

fill_field_realtime_only(field_name: str, value: str) -> str

Guaranteed real-time - Explicit real-time discovery function.

get_realtime_form_fields() -> str

Live form discovery - Gets current form fields using only MCP tools.

_discover_form_fields_dynamically(field_name: str, value: str) -> dict

Pure real-time discovery - Uses chrome_get_interactive_elements and chrome_get_content_web_form.

_direct_mcp_element_search(field_name: str, value: str) -> dict

Exhaustive real-time search - Final fallback using comprehensive MCP element search.

Real-Time Matching Algorithms

_is_field_match(element: dict, field_name: str) -> bool

Standard real-time field matching using current element attributes.

_is_very_flexible_match(element: dict, field_name: str) -> bool

Very flexible real-time matching for challenging cases.

_generate_common_selectors(field_name: str) -> list

Generates common CSS selectors based on field name patterns.

Usage Examples

Voice Commands (All Real-Time)

User: "fill email with john@example.com"
Agent: [Uses chrome_get_interactive_elements] ✓ Filled 'email' field using real-time discovery

User: "enter password secret123"
Agent: [Uses chrome_get_content_web_form] ✓ Filled 'password' field using form content analysis

User: "type hello in search box"
Agent: [Uses direct MCP search] ✓ Filled 'search' field using exhaustive element search

Programmatic Usage

# All these functions use ONLY real-time discovery
result = await client.fill_field_by_name("email", "user@example.com")
result = await client.fill_field_realtime_only("search", "python")
result = await client._discover_form_fields_dynamically("username", "john_doe")

Real-Time Discovery Strategies

1. Interactive Elements Strategy

  • Uses chrome_get_interactive_elements to get current form elements
  • Matches field names to element attributes in real-time
  • Tests selectors immediately before use

2. Form Content Strategy

  • Uses chrome_get_content_web_form for form-specific analysis
  • Parses current form structure for field patterns
  • Generates selectors based on live content

3. Direct Search Strategy

  • Exhaustive search through ALL current page elements
  • Very flexible matching criteria
  • Tests multiple selector patterns

4. Common Selector Strategy

  • Generates intelligent selectors based on field name
  • Tests each selector against current page
  • Uses type-specific patterns for common fields

Benefits of Real-Time Discovery

🎯 Accuracy

  • Always current - reflects actual page state
  • No stale selectors - eliminates cached selector failures
  • Dynamic adaptation - handles page changes automatically

🔄 Reliability

  • Fresh discovery - every request gets new selectors
  • Multiple strategies - comprehensive fallback methods
  • Live validation - selectors tested before use

🌐 Compatibility

  • Works on any site - no pre-configuration needed
  • Handles dynamic content - adapts to JavaScript-generated forms
  • Cross-platform - works with any web technology

🛠️ Maintainability

  • Zero maintenance - no selector databases to update
  • Self-adapting - automatically handles site changes
  • Future-proof - works with new web technologies

Testing Real-Time Discovery

Run the real-time test suite:

python test_realtime_form_discovery.py

This tests:

  • Real-time discovery on Google search
  • Form field discovery on GitHub
  • Direct MCP element search
  • Very flexible matching algorithms
  • Cross-website compatibility

Performance Considerations

Real-Time vs Speed

  • Slightly slower than cached selectors (by design)
  • More reliable than cached approaches
  • Eliminates cache invalidation issues
  • Prevents stale selector errors

Optimization Strategies

  • Parallel discovery - multiple strategies run concurrently
  • Early termination - stops on first successful match
  • Intelligent prioritization - most likely selectors first

Error Handling

Graceful Degradation

  1. Interactive elementsForm contentDirect searchCommon selectors
  2. Detailed logging of each attempt
  3. Clear error messages about what was tried
  4. No silent failures - always reports what happened

Retry Mechanism

  • Multiple attempts with increasing flexibility
  • Different strategies on each retry
  • Configurable retry count (default: 3)
  • Delay between retries to handle loading

Future Enhancements

Advanced Real-Time Features

  • Visual element detection using screenshots
  • Machine learning field recognition
  • Context-aware field relationships
  • Performance optimization for faster discovery

Real-Time Analytics

  • Discovery success rates by strategy
  • Performance metrics for each method
  • Field matching accuracy tracking
  • Site compatibility reporting

Migration from Cached System

Automatic Migration

  • No code changes required for existing voice commands
  • Backward compatibility maintained
  • Enhanced reliability with real-time discovery
  • Same API with improved implementation

Benefits of Migration

  • Eliminates cache issues - no more stale selectors
  • Improves accuracy - always uses current page state
  • Reduces maintenance - no cache management needed
  • Increases reliability - works on dynamic sites

The real-time discovery system ensures that the LiveKit agent always works with the most current page state, providing maximum reliability and compatibility across all websites.