API Reference: Core Interfaces
CrawlConfig​
Controls crawl behavior and output policies.
Key fields:
base_url: strmax_depth: intmax_pages: intsame_origin_only: booldenied_domains: list[str]output_dir: Pathpom_language: strlocator_storage: str(inlineorexternal)browser_adapter: strplaywright_headless: boolcdp_url: str | Nonechrome_profile: boolinteractive_pause: bool
BrowserAdapter​
Runtime browser abstraction for navigation and extraction.
Methods:
goto(url)url()title()extract_interactive_dom_summary(max_nodes=500)capture_screenshot(scale=0.4)is_visible(selector, timeout_ms=1500)close()
PlaywrightBrowserAdapter specifics​
Supports advanced initialization:
cdp_url: Connects to a running browser instance.chrome_profile: Launches with the default local Chrome profile.- Headless/Headed toggle.
Progress hook events​
AutoPomOrchestrator supports a progress callback used by the CLI to stream runtime visibility.
Event types:
dequeue- URL picked from frontier queue.modeled- page model generated and persisted.skip- URL ignored with a reason (policy, duplicate, depth).
PageModel​
Intermediate structured representation used for generation.
Includes:
- page identity and route metadata
- sectioned element mappings
- inferred actions
- discovered links
- navigation hints
CrawlResult​
Primary run result returned by orchestrator.
Includes:
pages: list[PageModel]model_paths: list[Path]pom_paths: list[Path](language-specific generated artifacts)report_path: Path
Compatibility helper:
java_pathsis preserved as an alias topom_pathsfor existing integrations.
Execution summary artifacts​
CLI runs also emit:
reports/execution_summary.md(manager-facing)reports/execution_summary.json(machine-facing)
These include effective configuration, elapsed time, counts for pages/elements/actions, and artifact paths.