Files
broswer-automation/INTELLIGENT_SELECTOR_DISCOVERY.md

5.4 KiB

Intelligent Selector Discovery

Overview

The LiveKit agent now includes intelligent selector discovery functionality that automatically adapts to changing web page structures, particularly for Google search results. When standard CSS selectors fail (like the common "No valid content found for selector: .r" error), the system intelligently discovers alternative selectors.

Problem Solved

Google and other search engines frequently change their HTML structure, causing hardcoded CSS selectors to break. The old system would fail with errors like:

  • "No valid content found for selector: .r"
  • "No search results found on this page"

How It Works

1. Multi-Layer Fallback System

The intelligent discovery system uses a multi-layer approach:

  1. Standard Selectors: Try known working selectors first
  2. Intelligent Discovery: Generate smart selectors based on common patterns
  3. DOM Analysis: Analyze page structure using heuristics
  4. Final Fallback: Extract any meaningful content

2. Intelligent Selector Generation

The system generates selectors based on modern web patterns:

// Modern Google patterns (2024+)
'[data-ved] h3',
'[data-ved]:has(h3)',
'[jscontroller]:has(h3)',

// Generic search result patterns
'div[class*="result"]:has(h3)',
'article:has(h3)',
'[role="main"] div:has(h3)',

// Link-based patterns
'a[href*="http"]:has(h3)',
'div:has(h3):has(a[href*="http"])'

3. Content Validation

Each discovered selector is validated to ensure it contains actual search results:

  • Must have headings (h1-h6) and links
  • Must contain substantial text content (>50 characters)
  • Must have search result indicators (URLs, titles, snippets)

4. DOM Structure Analysis

If intelligent selectors fail, the system analyzes the DOM structure:

  • Looks for containers with multiple links
  • Identifies repeated structures
  • Finds main content areas
  • Uses semantic HTML patterns

Implementation Details

LiveKit Agent (Python)

The main implementation is in agent-livekit/mcp_chrome_client.py:

  • _discover_search_result_selectors(): Main discovery function
  • _generate_intelligent_search_selectors(): Generate smart selectors
  • _validate_search_results_content(): Validate content quality
  • _analyze_dom_for_search_results(): DOM structure analysis
  • _final_intelligent_discovery(): Last resort broad patterns

Chrome Extension (JavaScript)

Enhanced functionality in app/chrome-extension/inject-scripts/enhanced-search-helper.js:

  • discoverSearchResultElements(): Client-side intelligent discovery
  • validateSearchResultElement(): Element validation
  • analyzeDOMForSearchResults(): DOM analysis
  • extractResultFromElement(): Flexible data extraction

Usage

The intelligent discovery is automatically triggered when standard selectors fail. No additional configuration is required.

Voice Commands

"Search for intelligent selector discovery"

The system will:

  1. Navigate to Google
  2. Perform the search
  3. Try standard selectors
  4. Fall back to intelligent discovery if needed
  5. Return formatted results

Logging

The system provides detailed logging to track which method was successful:

🔍 Starting intelligent selector discovery for search results...
✅ Found valid search results with intelligent selector: [data-ved]:has(h3)

Benefits

  1. Resilience: Adapts to changing website structures
  2. Broad Compatibility: Works across different search engines
  3. Automatic: No manual intervention required
  4. Detailed Logging: Easy to debug and monitor
  5. Performance: Efficient fallback hierarchy

Testing

Run the test suite to verify functionality:

node test-intelligent-search-selectors.js

This will test:

  • Google search result extraction
  • DuckDuckGo compatibility
  • Selector validation functions
  • Content extraction accuracy

Supported Patterns

Search Engines

  • Google (all modern layouts)
  • DuckDuckGo
  • Bing
  • Yahoo
  • Generic search result pages

Element Patterns

  • Modern data attributes (data-ved, jscontroller)
  • Semantic HTML (role="main", article)
  • Class-based patterns (class*="result")
  • Link and heading combinations
  • Container structures

Future Enhancements

  1. Machine Learning: Train models on successful selector patterns
  2. Site-Specific Rules: Custom rules for specific websites
  3. Performance Optimization: Cache successful selectors
  4. User Feedback: Learn from user corrections
  5. Visual Recognition: Use computer vision for element detection

Troubleshooting

Common Issues

  1. No results found: Check if the page has loaded completely
  2. Incorrect extraction: Verify the page structure hasn't changed dramatically
  3. Performance issues: Reduce the number of fallback selectors

Debug Mode

Enable detailed logging by setting the log level to DEBUG in the LiveKit agent configuration.

Manual Override

If needed, you can specify custom selectors in the MCP client configuration.

Contributing

When adding new selector patterns:

  1. Test across multiple search engines
  2. Validate content quality
  3. Add appropriate logging
  4. Update test cases
  5. Document new patterns
  • agent-livekit/mcp_chrome_client.py - Main Python implementation
  • app/chrome-extension/inject-scripts/enhanced-search-helper.js - JavaScript client
  • test-intelligent-search-selectors.js - Test suite
  • agent-livekit/livekit_agent.py - Integration with voice commands