Claude Code Web Scraping: How to Fetch and Extract Web Content
Claude Code cannot visit websites. It runs in your terminal with no browser, no HTTP client, and no way to reach the internet on its own. If you want Claude Code to read a web page, you need to give it a tool that can fetch one.
This is the first thing people discover when they try to use Claude Code for web scraping. The model is great at parsing and extracting data from text, but it needs something else to get that text in the first place.
The Three Approaches
There are three common ways to give Claude Code web access. Each has different tradeoffs.
1. API-Based Scraping Tools (MCP)
An MCP server with a scraping tool is the most straightforward option. The tool takes a URL, fetches the page, strips the HTML down to clean text, and returns it to Claude Code. The agent never sees raw HTML or deals with rendering.
Pros:
- Fast. API calls complete in seconds.
- Clean output. The tool handles HTML parsing so the agent gets readable text, not a wall of tags.
- Reliable. No browser to crash, no rendering engine to time out.
- Works behind firewalls if the API server has access.
Cons:
- Can’t handle JavaScript-rendered pages. If the content loads after a client-side API call, a simple fetch won’t see it.
- Some sites block automated requests based on user-agent or IP.
This approach works for most scraping tasks: documentation sites, blog posts, product pages, news articles, and any page that serves its content in the initial HTML response.
2. Browser Automation (MCP)
Browser automation MCP servers like Playwright MCP or Puppeteer MCP give Claude Code a headless browser. The agent can navigate pages, wait for JavaScript to render, click buttons, fill forms, and take screenshots.
Pros:
- Handles JavaScript-rendered content (SPAs, dynamic dashboards, infinite scroll).
- Can interact with pages: clicking, scrolling, filling forms.
- Takes screenshots for visual verification.
Cons:
- Slower. Launching a browser, loading a page, and waiting for JavaScript adds seconds to every request.
- Resource-heavy. Headless browsers consume memory and CPU.
- More failure modes. Browser timeouts, element selectors breaking, pages loading differently than expected.
- Setup is more involved. You need Chrome/Chromium installed and a running MCP server process.
Use browser automation when the content you need requires JavaScript to render. For static or server-rendered pages, it’s overkill.
3. The Built-in Fetch Tool
Claude Code has a built-in WebFetch tool that can retrieve web content. It works for basic fetching but has limitations on output size and doesn’t handle complex scenarios like authentication or JavaScript rendering.
Pros:
- No setup required. It’s already available.
- Fine for grabbing a single page or checking a URL.
Cons:
- Output truncation on large pages.
- No control over how HTML is parsed.
- Limited error handling for edge cases.
For anything beyond quick page checks, a dedicated scraping tool gives you better results.
Choosing the Right Approach
The decision tree is short:
- Does the page require JavaScript to render its content? Use browser automation.
- Is this a one-off fetch of a small page? The built-in fetch tool works.
- For everything else, use an API-based scraping tool through MCP.
Most web scraping tasks fall into category three. Documentation, news articles, blog posts, product listings, and public data pages all serve their content in the initial HTML. An API-based tool handles these faster and with fewer failure modes than a full browser.
Web Scraping with AgentPatch
AgentPatch provides two tools relevant to web scraping in Claude Code:
- scrape-web: Takes a URL and returns clean, readable text. Handles HTML parsing, strips navigation and boilerplate, and returns the main content. Good for extracting article text, documentation, product information, or any page where you want the words, not the markup.
- screenshot: Captures a visual snapshot of any URL. Useful for verifying that you’re scraping the right page, seeing how content is laid out, or capturing pages where the visual structure matters more than the text.
Together, these cover the common case: fetch the text content, and optionally take a screenshot to verify what the page looks like.
An example session in Claude Code:
“Scrape the pricing page at example.com/pricing and extract all plan names, prices, and feature lists into a markdown table.”
Claude Code calls scrape-web to get the page content, then parses the text and builds the table. If you want to verify the result matches the actual page layout, follow up with:
“Take a screenshot of that page so I can compare.”
Setup
Connect AgentPatch to your AI agent to get access to the tools:
Claude Code
claude mcp add -s user --transport http agentpatch https://agentpatch.ai/mcp \
--header "Authorization: Bearer YOUR_API_KEY"
OpenClaw
Add AgentPatch to ~/.openclaw/openclaw.json:
{
"mcp": {
"servers": {
"agentpatch": {
"transport": "streamable-http",
"url": "https://agentpatch.ai/mcp"
}
}
}
}
Get your API key at agentpatch.ai.
Wrapping Up
Claude Code needs external tools to access the web. For most scraping tasks, an API-based tool is faster and more reliable than spinning up a headless browser. Browser automation is worth the overhead when you need JavaScript rendering. AgentPatch gives Claude Code a scrape tool and a screenshot tool that handle the common cases well. Check it out at agentpatch.ai.