{"files":{"SKILL.md":"---\nname: webscraping-ai\ndescription: \"WebScraping.AI API skill. Use when working with WebScraping.AI for ai, html, text. Covers 7 endpoints.\"\nversion: 1.0.0\ngenerator: lapsh\n---\n\n# WebScraping.AI\nAPI version: 3.2.1\n\n## Auth\nApiKey api_key in query\n\n## Base URL\nhttps://api.webscraping.ai\n\n## Setup\n1. Set your API key in the appropriate header\n2. GET /ai/question -- verify access\n\n## Endpoints\n\n7 endpoints across 6 groups. See references/api-spec.lap for full details.\n\n### ai\n| Method | Path | Description |\n|--------|------|-------------|\n| GET | /ai/question | Get an answer to a question about a given web page |\n| GET | /ai/fields | Extract structured data fields from a web page |\n\n### html\n| Method | Path | Description |\n|--------|------|-------------|\n| GET | /html | Page HTML by URL |\n\n### text\n| Method | Path | Description |\n|--------|------|-------------|\n| GET | /text | Page text by URL |\n\n### selected\n| Method | Path | Description |\n|--------|------|-------------|\n| GET | /selected | HTML of a selected page area by URL and CSS selector |\n\n### selected-multiple\n| Method | Path | Description |\n|--------|------|-------------|\n| GET | /selected-multiple | HTML of multiple page areas by URL and CSS selectors |\n\n### account\n| Method | Path | Description |\n|--------|------|-------------|\n| GET | /account | Information about your account calls quota |\n\n## Common Questions\n\nMatch user requests to endpoints in references/api-spec.lap. Key patterns:\n- \"List all question?\" -> GET /ai/question\n- \"List all fields?\" -> GET /ai/fields\n- \"List all html?\" -> GET /html\n- \"List all text?\" -> GET /text\n- \"List all selected?\" -> GET /selected\n- \"List all selected-multiple?\" -> GET /selected-multiple\n- \"List all account?\" -> GET /account\n- \"How to authenticate?\" -> See Auth section\n\n## Response Tips\n- Check response schemas in references/api-spec.lap for field details\n\n## CLI\n\n```bash\n# Update this spec to the latest version\nnpx @lap-platform/lapsh get webscraping-ai -o references/api-spec.lap\n\n# Search for related APIs\nnpx @lap-platform/lapsh search webscraping-ai\n```\n\n## References\n- Full spec: See references/api-spec.lap for complete endpoint details, parameter tables, and response schemas\n\n> Generated from the official API spec by [LAP](https://lap.sh)\n","references/api-spec.lap":"@lap v0.3\n# Machine-readable API spec. Each @endpoint block is one API call.\n@api WebScraping.AI\n@base https://api.webscraping.ai\n@version 3.2.1\n@auth ApiKey api_key in query\n@endpoints 7\n@toc ai(2), html(1), text(1), selected(1), selected-multiple(1), account(1)\n\n@group ai\n@endpoint GET /ai/question\n@desc Get an answer to a question about a given web page\n@required {url: str # URL of the target page.}\n@optional {question: str # Question or instructions to ask the LLM model about the target page., headers: map # HTTP headers to pass to the target page. Can be specified either via a nested query parameter (...&headers[One]=value1&headers=[Another]=value2) or as a JSON encoded object (...&headers={\"One\": \"value1\", \"Another\": \"value2\"})., timeout: int=10000 # Maximum web page retrieval time in ms. Increase it in case of timeout errors (10000 by default, maximum is 30000)., js: bool=true # Execute on-page JavaScript using a headless browser (true by default)., js_timeout: int=2000 # Maximum JavaScript rendering time in ms. Increase it in case if you see a loading indicator instead of data on the target page., wait_for: str # CSS selector to wait for before returning the page content. Useful for pages with dynamic content loading. Overrides js_timeout., proxy: str(datacenter/residential/stealth)=datacenter # Type of proxy. Use `residential` if your site restricts traffic from datacenters, or `stealth` for the most heavily protected sites with advanced anti-bot detection (`datacenter` by default). Residential and stealth proxy requests are more expensive than datacenter, see the pricing page for details., country: str(us/gb/de/it/fr/ca/es/ru/jp/kr/in/hk/tr)=us # Country of the proxy to use (US by default)., custom_proxy: str # Your own proxy URL to use instead of our built-in proxy pool in \"http://user:password@host:port\" format (Smartproxy for example)., device: str(desktop/mobile/tablet)=desktop # Type of device emulation., error_on_404: bool=false # Return error on 404 HTTP status on the target page (false by default)., error_on_redirect: bool=false # Return error on redirect on the target page (false by default)., js_script: str # Custom JavaScript code to execute on the target page., format: str(json/text)=json # Format of the response (text by default). \"json\" will return a JSON object with the response, \"text\" will return a plain text/HTML response.}\n@returns(200) Success\n@errors {400: Parameters validation error, 402: Billing issue, probably you've ran out of credits, 403: Wrong API key, 429: Too many concurrent requests, 500: Non-2xx and non-404 HTTP status code on the target page or unexpected error, try again or contact support@webscraping.ai, 504: Timeout error, try increasing timeout parameter value}\n\n@endpoint GET /ai/fields\n@desc Extract structured data fields from a web page\n@required {url: str # URL of the target page., fields: map # Object describing fields to extract from the page and their descriptions}\n@optional {headers: map # HTTP headers to pass to the target page. Can be specified either via a nested query parameter (...&headers[One]=value1&headers=[Another]=value2) or as a JSON encoded object (...&headers={\"One\": \"value1\", \"Another\": \"value2\"})., timeout: int=10000 # Maximum web page retrieval time in ms. Increase it in case of timeout errors (10000 by default, maximum is 30000)., js: bool=true # Execute on-page JavaScript using a headless browser (true by default)., js_timeout: int=2000 # Maximum JavaScript rendering time in ms. Increase it in case if you see a loading indicator instead of data on the target page., wait_for: str # CSS selector to wait for before returning the page content. Useful for pages with dynamic content loading. Overrides js_timeout., proxy: str(datacenter/residential/stealth)=datacenter # Type of proxy. Use `residential` if your site restricts traffic from datacenters, or `stealth` for the most heavily protected sites with advanced anti-bot detection (`datacenter` by default). Residential and stealth proxy requests are more expensive than datacenter, see the pricing page for details., country: str(us/gb/de/it/fr/ca/es/ru/jp/kr/in/hk/tr)=us # Country of the proxy to use (US by default)., custom_proxy: str # Your own proxy URL to use instead of our built-in proxy pool in \"http://user:password@host:port\" format (Smartproxy for example)., device: str(desktop/mobile/tablet)=desktop # Type of device emulation., error_on_404: bool=false # Return error on 404 HTTP status on the target page (false by default)., error_on_redirect: bool=false # Return error on redirect on the target page (false by default)., js_script: str # Custom JavaScript code to execute on the target page.}\n@returns(200) Success\n@errors {400: Parameters validation error, 402: Billing issue, probably you've ran out of credits, 403: Wrong API key, 429: Too many concurrent requests, 500: Non-2xx and non-404 HTTP status code on the target page or unexpected error, try again or contact support@webscraping.ai, 504: Timeout error, try increasing timeout parameter value}\n\n@endgroup\n\n@group html\n@endpoint GET /html\n@desc Page HTML by URL\n@required {url: str # URL of the target page.}\n@optional {headers: map # HTTP headers to pass to the target page. Can be specified either via a nested query parameter (...&headers[One]=value1&headers=[Another]=value2) or as a JSON encoded object (...&headers={\"One\": \"value1\", \"Another\": \"value2\"})., timeout: int=10000 # Maximum web page retrieval time in ms. Increase it in case of timeout errors (10000 by default, maximum is 30000)., js: bool=true # Execute on-page JavaScript using a headless browser (true by default)., js_timeout: int=2000 # Maximum JavaScript rendering time in ms. Increase it in case if you see a loading indicator instead of data on the target page., wait_for: str # CSS selector to wait for before returning the page content. Useful for pages with dynamic content loading. Overrides js_timeout., proxy: str(datacenter/residential/stealth)=datacenter # Type of proxy. Use `residential` if your site restricts traffic from datacenters, or `stealth` for the most heavily protected sites with advanced anti-bot detection (`datacenter` by default). Residential and stealth proxy requests are more expensive than datacenter, see the pricing page for details., country: str(us/gb/de/it/fr/ca/es/ru/jp/kr/in/hk/tr)=us # Country of the proxy to use (US by default)., custom_proxy: str # Your own proxy URL to use instead of our built-in proxy pool in \"http://user:password@host:port\" format (Smartproxy for example)., device: str(desktop/mobile/tablet)=desktop # Type of device emulation., error_on_404: bool=false # Return error on 404 HTTP status on the target page (false by default)., error_on_redirect: bool=false # Return error on redirect on the target page (false by default)., js_script: str # Custom JavaScript code to execute on the target page., return_script_result: bool=false # Return result of the custom JavaScript code (js_script parameter) execution on the target page (false by default, page HTML will be returned)., format: str(json/text)=json # Format of the response (text by default). \"json\" will return a JSON object with the response, \"text\" will return a plain text/HTML response.}\n@returns(200) Success\n@errors {400: Parameters validation error, 402: Billing issue, probably you've ran out of credits, 403: Wrong API key, 429: Too many concurrent requests, 500: Non-2xx and non-404 HTTP status code on the target page or unexpected error, try again or contact support@webscraping.ai, 504: Timeout error, try increasing timeout parameter value}\n\n@endgroup\n\n@group text\n@endpoint GET /text\n@desc Page text by URL\n@required {url: str # URL of the target page.}\n@optional {text_format: str(plain/xml/json)=plain # Format of the text response (plain by default). \"plain\" will return only the page body text. \"json\" and \"xml\" will return a json/xml with \"title\", \"description\" and \"content\" keys., return_links: bool=false # [Works only with text_format=json] Return links from the page body text (false by default). Useful for building web crawlers., headers: map # HTTP headers to pass to the target page. Can be specified either via a nested query parameter (...&headers[One]=value1&headers=[Another]=value2) or as a JSON encoded object (...&headers={\"One\": \"value1\", \"Another\": \"value2\"})., timeout: int=10000 # Maximum web page retrieval time in ms. Increase it in case of timeout errors (10000 by default, maximum is 30000)., js: bool=true # Execute on-page JavaScript using a headless browser (true by default)., js_timeout: int=2000 # Maximum JavaScript rendering time in ms. Increase it in case if you see a loading indicator instead of data on the target page., wait_for: str # CSS selector to wait for before returning the page content. Useful for pages with dynamic content loading. Overrides js_timeout., proxy: str(datacenter/residential/stealth)=datacenter # Type of proxy. Use `residential` if your site restricts traffic from datacenters, or `stealth` for the most heavily protected sites with advanced anti-bot detection (`datacenter` by default). Residential and stealth proxy requests are more expensive than datacenter, see the pricing page for details., country: str(us/gb/de/it/fr/ca/es/ru/jp/kr/in/hk/tr)=us # Country of the proxy to use (US by default)., custom_proxy: str # Your own proxy URL to use instead of our built-in proxy pool in \"http://user:password@host:port\" format (Smartproxy for example)., device: str(desktop/mobile/tablet)=desktop # Type of device emulation., error_on_404: bool=false # Return error on 404 HTTP status on the target page (false by default)., error_on_redirect: bool=false # Return error on redirect on the target page (false by default)., js_script: str # Custom JavaScript code to execute on the target page.}\n@returns(200) Success\n@errors {400: Parameters validation error, 402: Billing issue, probably you've ran out of credits, 403: Wrong API key, 429: Too many concurrent requests, 500: Non-2xx and non-404 HTTP status code on the target page or unexpected error, try again or contact support@webscraping.ai, 504: Timeout error, try increasing timeout parameter value}\n\n@endgroup\n\n@group selected\n@endpoint GET /selected\n@desc HTML of a selected page area by URL and CSS selector\n@required {url: str # URL of the target page.}\n@optional {selector: str # CSS selector (null by default, returns whole page HTML), headers: map # HTTP headers to pass to the target page. Can be specified either via a nested query parameter (...&headers[One]=value1&headers=[Another]=value2) or as a JSON encoded object (...&headers={\"One\": \"value1\", \"Another\": \"value2\"})., timeout: int=10000 # Maximum web page retrieval time in ms. Increase it in case of timeout errors (10000 by default, maximum is 30000)., js: bool=true # Execute on-page JavaScript using a headless browser (true by default)., js_timeout: int=2000 # Maximum JavaScript rendering time in ms. Increase it in case if you see a loading indicator instead of data on the target page., wait_for: str # CSS selector to wait for before returning the page content. Useful for pages with dynamic content loading. Overrides js_timeout., proxy: str(datacenter/residential/stealth)=datacenter # Type of proxy. Use `residential` if your site restricts traffic from datacenters, or `stealth` for the most heavily protected sites with advanced anti-bot detection (`datacenter` by default). Residential and stealth proxy requests are more expensive than datacenter, see the pricing page for details., country: str(us/gb/de/it/fr/ca/es/ru/jp/kr/in/hk/tr)=us # Country of the proxy to use (US by default)., custom_proxy: str # Your own proxy URL to use instead of our built-in proxy pool in \"http://user:password@host:port\" format (Smartproxy for example)., device: str(desktop/mobile/tablet)=desktop # Type of device emulation., error_on_404: bool=false # Return error on 404 HTTP status on the target page (false by default)., error_on_redirect: bool=false # Return error on redirect on the target page (false by default)., js_script: str # Custom JavaScript code to execute on the target page., format: str(json/text)=json # Format of the response (text by default). \"json\" will return a JSON object with the response, \"text\" will return a plain text/HTML response.}\n@returns(200) Success\n@errors {400: Parameters validation error, 402: Billing issue, probably you've ran out of credits, 403: Wrong API key, 429: Too many concurrent requests, 500: Non-2xx and non-404 HTTP status code on the target page or unexpected error, try again or contact support@webscraping.ai, 504: Timeout error, try increasing timeout parameter value}\n\n@endgroup\n\n@group selected-multiple\n@endpoint GET /selected-multiple\n@desc HTML of multiple page areas by URL and CSS selectors\n@required {url: str # URL of the target page.}\n@optional {selectors: [str] # Multiple CSS selectors (null by default, returns whole page HTML), headers: map # HTTP headers to pass to the target page. Can be specified either via a nested query parameter (...&headers[One]=value1&headers=[Another]=value2) or as a JSON encoded object (...&headers={\"One\": \"value1\", \"Another\": \"value2\"})., timeout: int=10000 # Maximum web page retrieval time in ms. Increase it in case of timeout errors (10000 by default, maximum is 30000)., js: bool=true # Execute on-page JavaScript using a headless browser (true by default)., js_timeout: int=2000 # Maximum JavaScript rendering time in ms. Increase it in case if you see a loading indicator instead of data on the target page., wait_for: str # CSS selector to wait for before returning the page content. Useful for pages with dynamic content loading. Overrides js_timeout., proxy: str(datacenter/residential/stealth)=datacenter # Type of proxy. Use `residential` if your site restricts traffic from datacenters, or `stealth` for the most heavily protected sites with advanced anti-bot detection (`datacenter` by default). Residential and stealth proxy requests are more expensive than datacenter, see the pricing page for details., country: str(us/gb/de/it/fr/ca/es/ru/jp/kr/in/hk/tr)=us # Country of the proxy to use (US by default)., custom_proxy: str # Your own proxy URL to use instead of our built-in proxy pool in \"http://user:password@host:port\" format (Smartproxy for example)., device: str(desktop/mobile/tablet)=desktop # Type of device emulation., error_on_404: bool=false # Return error on 404 HTTP status on the target page (false by default)., error_on_redirect: bool=false # Return error on redirect on the target page (false by default)., js_script: str # Custom JavaScript code to execute on the target page.}\n@returns(200) Success\n@errors {400: Parameters validation error, 402: Billing issue, probably you've ran out of credits, 403: Wrong API key, 429: Too many concurrent requests, 500: Non-2xx and non-404 HTTP status code on the target page or unexpected error, try again or contact support@webscraping.ai, 504: Timeout error, try increasing timeout parameter value}\n\n@endgroup\n\n@group account\n@endpoint GET /account\n@desc Information about your account calls quota\n@returns(200) {email: str, remaining_api_calls: int, resets_at: int, remaining_concurrency: int} # Success\n@errors {403: Wrong API key}\n\n@endgroup\n\n@end\n"}}