The Model Context Protocol (MCP) has revolutionized how AI assistants interact with external tools and data sources. If you're using Claude Desktop or other MCP-compatible tools, you can now leverage ScrapeGraphAI's powerful web scraping capabilities directly through a Docker container.
In this comprehensive tutorial, I'll walk you through setting up the ScrapeGraphAI MCP Server using Docker, configuring it with Claude Desktop, and using all its powerful tools for web scraping, data extraction, and content conversion.
What is the ScrapeGraphAI MCP Server?
The ScrapeGraphAI MCP Server is a Dockerized Model Context Protocol server that provides AI assistants like Claude with direct access to ScrapeGraphAI's web scraping tools. Instead of writing code or making API calls manually, you can simply ask Claude to scrape websites, extract data, or convert pages to markdown—all through natural language conversations.
The Docker container (mcp/scrapegraph) runs as an MCP server that exposes five powerful tools:
- smartscraper: Extract structured data from webpages using AI
- searchscraper: Perform AI-powered web searches with structured results
- markdownify: Convert webpages to clean, formatted markdown
- smartcrawler_initiate: Start intelligent multi-page web crawling operations
- smartcrawler_fetch_results: Retrieve results from crawling operations
Prerequisites
Before we begin, make sure you have:
- Docker installed: Download Docker Desktop from docker.com
- Claude Desktop (optional but recommended): Get it from Anthropic's website
- ScrapeGraphAI API Key: Sign up at dashboard.scrapegraphai.com and get your API key
Step 1: Pull the Docker Image
First, let's pull the ScrapeGraphAI MCP Server image from Docker Hub:
docker pull mcp/scrapegraphThis will download the latest version of the image (approximately 106 MB). Once complete, you can verify it's available:
docker images | grep scrapegraphYou should see output like:
mcp/scrapegraph latest sha256:9005c47bd... 106.2 MB 4 months ago
Step 2: Configure Claude Desktop
To use the MCP server with Claude Desktop, you need to configure it in Claude's settings. The configuration file location varies by operating system:
macOS:
~/Library/Application Support/Claude/claude_desktop_config.json
Windows:
%APPDATA%\Claude\claude_desktop_config.json
Linux:
~/.config/Claude/claude_desktop_config.json
Create or edit this file with the following configuration:
{
"mcpServers": {
"scrapegraph": {
"command": "docker",
"args": [
"run",
"-i",
"--rm",
"-e",
"SGAI_API_KEY",
"mcp/scrapegraph"
],
"env": {
"SGAI_API_KEY": "YOUR_SGAI_API_KEY_HERE"
}
}
}
}Important Security Note: Replace YOUR_SGAI_API_KEY_HERE with your actual ScrapeGraphAI API key. For better security, you can also set it as an environment variable:
{
"mcpServers": {
"scrapegraph": {
"command": "docker",
"args": [
"run",
"-i",
"--rm",
"-e",
"SGAI_API_KEY",
"mcp/scrapegraph"
],
"env": {
"SGAI_API_KEY": "${SGAI_API_KEY}"
}
}
}
}Then set the environment variable in your system before launching Claude Desktop.
After saving the configuration file, restart Claude Desktop for the changes to take effect.
Step 3: Verify the Connection
Once Claude Desktop restarts, you can verify that the MCP server is connected. In a new conversation, try asking Claude:
What MCP tools do you have available?
Claude should list the five ScrapeGraphAI tools. If you don't see them, check:
- The Docker daemon is running
- The configuration file is in the correct location
- The JSON syntax is valid (you can use JSONLint to verify)
- You've restarted Claude Desktop after making changes
Using the Tools
Now that everything is set up, let's explore how to use each tool through natural language conversations with Claude.
SmartScraper: Extract Structured Data
SmartScraper is perfect when you need to extract specific information from a webpage. Just describe what you want, and Claude will use the smartscraper tool to get it for you.
Example conversation:
You: Can you extract the product name, price, and description from
https://example.com/product?
Claude: I'll extract that information for you using SmartScraper.
[Claude uses the smartscraper tool]
Here's the extracted data:
- Product Name: Example Widget
- Price: $29.99
- Description: A high-quality widget perfect for everyday use...
Advanced usage: You can also ask Claude to extract data from multiple pages or handle dynamic content:
You: Extract all job listings from https://company.com/careers, including
title, location, and salary for each position.
SearchScraper: AI-Powered Web Search
SearchScraper lets Claude perform web searches and extract structured information from multiple sources automatically.
Example conversation:
You: Search for the latest information about renewable energy trends in 2024
and extract the key findings.
Claude: I'll search for that information and extract the key findings.
[Claude uses the searchscraper tool]
Here are the key findings from my search:
1. Solar energy costs have decreased by 40%...
2. Wind power capacity has increased significantly...
By default, SearchScraper searches 3 websites, but you can ask Claude to adjust this:
You: Search 5 websites for information about Python web scraping best practices.
Markdownify: Convert Webpages to Markdown
Markdownify converts any webpage into clean, readable markdown format—perfect for documentation, content migration, or reading purposes.
Example conversation:
You: Convert https://docs.example.com/getting-started to markdown format.
Claude: I'll convert that page to markdown for you.
[Claude uses the markdownify tool]
# Getting Started
Welcome to our documentation...
[Markdown content continues]
This is especially useful for:
- Converting documentation to markdown for your own docs
- Creating backups of important web content
- Migrating content between systems
- Improving readability of web articles
SmartCrawler: Multi-Page Web Crawling
SmartCrawler is powerful for extracting data from multiple pages on a website. The process involves two steps: initiating the crawl, then fetching results.
Example conversation:
You: Crawl https://blog.example.com and extract all article titles,
authors, and publication dates. Limit it to the first 10 articles.
Claude: I'll initiate a SmartCrawler operation to extract article information.
[Claude uses smartcrawler_initiate]
[Claude waits for completion]
[Claude uses smartcrawler_fetch_results]
Here are the articles I found:
1. Title: "Introduction to Web Scraping"
Author: Jane Doe
Published: 2024-11-15
2. Title: "Advanced Data Extraction"
Author: John Smith
Published: 2024-11-10
[... continues with remaining articles]
SmartCrawler supports two modes:
- AI Extraction Mode (10 credits per page): Extracts structured data based on your prompt
- Markdown Mode (2 credits per page): Converts pages to markdown format
You can control the crawling behavior:
depth: Maximum link traversal depthmax_pages: Maximum number of pages to crawlsame_domain_only: Whether to stay within the same domain
Running the Container Manually
While using it with Claude Desktop is convenient, you can also run the MCP server manually for testing or integration with other tools.
Basic Usage
docker run -i --rm -e SGAI_API_KEY=your_api_key_here mcp/scrapegraphUsing Environment Variables
For better security, use environment variables:
export SGAI_API_KEY="your_api_key_here"
docker run -i --rm -e SGAI_API_KEY mcp/scrapegraphPersistent Container (Development)
For development purposes, you might want to keep the container running:
docker run -d --name scrapegraph-mcp \
-e SGAI_API_KEY=your_api_key_here \
mcp/scrapegraphThen attach to it:
docker attach scrapegraph-mcpAdvanced Configuration
Using Docker Compose
For easier management, you can create a docker-compose.yml file:
version: '3.8'
services:
scrapegraph-mcp:
image: mcp/scrapegraph:latest
container_name: scrapegraph-mcp
environment:
- SGAI_API_KEY=${SGAI_API_KEY}
stdin_open: true
tty: true
restart: unless-stoppedThen run:
docker-compose up -dEnvironment Variable Files
For better security, use a .env file:
# .env file
SGAI_API_KEY=your_api_key_hereAnd reference it in docker-compose:
env_file:
- .envOr when running docker directly:
docker run -i --rm --env-file .env mcp/scrapegraphTroubleshooting Common Issues
Issue: Claude Desktop Can't Connect to the MCP Server
Solutions:
- Verify Docker is running:
docker psshould work without errors - Test the image manually:
docker run -i --rm -e SGAI_API_KEY=test mcp/scrapegraph - Check the configuration file JSON syntax is valid
- Ensure the configuration file is in the correct location for your OS
- Restart Claude Desktop completely
Issue: "API Key Not Found" Error
Solutions:
- Verify your API key is correctly set in the configuration
- Check that the environment variable is accessible
- Test your API key at the ScrapeGraphAI dashboard
- Ensure there are no extra spaces or quotes around the API key
Issue: Docker Container Fails to Start
Solutions:
- Check Docker logs:
docker logs <container_id> - Verify you have the latest image:
docker pull mcp/scrapegraph - Ensure you have sufficient disk space:
docker system df - Try removing old containers:
docker system prune
Issue: Tools Not Appearing in Claude
Solutions:
- Restart Claude Desktop completely (quit and reopen)
- Check that the MCP server appears in Claude's connection status
- Verify the Docker command works manually
- Check Claude Desktop's logs for errors
Best Practices
Security
- Never commit API keys: Use environment variables or secure secret management
- Use Docker secrets: For production deployments, use Docker secrets
- Limit container access: Run containers with minimal required permissions
- Regular updates: Keep the Docker image updated:
docker pull mcp/scrapegraph
Performance
- Resource limits: Set appropriate CPU and memory limits for Docker containers
- Network configuration: Use Docker networks for secure communication
- Credits management: Monitor your ScrapeGraphAI usage to stay within budget
- Caching: Consider caching results for repeated requests
Usage Tips
- Be specific in prompts: Clearer prompts yield better extraction results
- Batch similar requests: Group related scraping tasks together
- Monitor credit usage: Track your API usage in the dashboard
- Test with small datasets first: Verify your approach before large crawls
Real-World Use Cases
Content Aggregation
You: Crawl https://tech-news.com and extract all article headlines and
summaries from the front page, then format them as a daily digest.
Market Research
You: Search for information about competitor pricing in the SaaS space
and extract pricing tiers, features, and target markets.
Documentation Migration
You: Convert all pages from https://old-docs.example.com/api to markdown
format so I can migrate them to our new documentation system.
Lead Generation
You: Extract company information including name, contact email, and
industry from https://directory.example.com/companies.
SEO Monitoring
You: Extract meta titles, descriptions, and H1 tags from
https://competitor.com/blog to analyze their SEO strategy.
Integration with Other Tools
While Claude Desktop is the primary use case, the MCP server can integrate with other MCP-compatible tools:
- Cline: VS Code extension with MCP support
- Continue: IDE extension for code completion
- Custom MCP clients: Build your own integrations
Check the Model Context Protocol documentation for more information about MCP client development.
Understanding Costs
ScrapeGraphAI uses a credit-based pricing system:
- SmartScraper: 10 credits per request
- SearchScraper: 10 credits per website searched (default 3 = 30 credits)
- Markdownify: 2 credits per page
- SmartCrawler (AI mode): 10 credits per page crawled
- SmartCrawler (Markdown mode): 2 credits per page crawled
Monitor your usage in the dashboard and adjust your scraping strategies accordingly.
Frequently Asked Questions
Can I use this without Claude Desktop?
Yes! The Docker container implements the MCP protocol, so any MCP-compatible client can use it. You can also use it programmatically by implementing an MCP client.
How do I update to the latest version?
Simply pull the latest image:
docker pull mcp/scrapegraphThen restart Claude Desktop or your MCP client.
Is my data secure?
The Docker container runs locally on your machine. Data flows from:
- Your machine → Docker container → ScrapeGraphAI API
- Results flow back through the same path
Your API key is stored in your local configuration and never leaves your machine (except to authenticate with ScrapeGraphAI).
Can I use multiple API keys?
Yes, you can run multiple instances with different API keys by giving them different names in the configuration:
{
"mcpServers": {
"scrapegraph": {
"command": "docker",
"args": ["run", "-i", "--rm", "-e", "SGAI_API_KEY", "mcp/scrapegraph"],
"env": {
"SGAI_API_KEY": "key1"
}
},
"scrapegraph-alt": {
"command": "docker",
"args": ["run", "-i", "--rm", "-e", "SGAI_API_KEY", "mcp/scrapegraph"],
"env": {
"SGAI_API_KEY": "key2"
}
}
}
}What happens if I exceed my credit limit?
The API will return an error. Monitor your usage in the dashboard and upgrade your plan if needed.
Can I customize the Docker image?
Yes! The source code is available at github.com/ScrapeGraphAI/scrapegraph-mcp. You can build a custom image with your modifications.
Is there a way to cache results?
The MCP server itself doesn't cache, but you can implement caching at the Claude Desktop level or build a custom MCP client with caching capabilities.
Conclusion
The ScrapeGraphAI Docker MCP Server brings powerful web scraping capabilities directly to AI assistants like Claude. By following this tutorial, you've learned how to:
- Set up the Docker container
- Configure Claude Desktop to use the MCP server
- Use all five available tools through natural language
- Troubleshoot common issues
- Apply best practices for security and performance
The combination of Docker's containerization and MCP's protocol makes this a secure, portable, and powerful solution for AI-powered web scraping. Whether you're aggregating content, conducting market research, or migrating documentation, this setup gives Claude the tools it needs to help you work with web data efficiently.
Start experimenting with different scraping tasks and discover how much more productive you can be when Claude can access the web directly through ScrapeGraphAI!
Related Resources
Want to learn more about ScrapeGraphAI and web scraping? Explore these guides:
- ScrapeGraphAI Tutorial - Master AI-powered web scraping
- ScrapeGraphAI JavaScript SDK - Use ScrapeGraphAI in JavaScript/TypeScript
- LlamaIndex Integration - Combine with LlamaIndex for data pipelines
- Building AI Agents - Create powerful automation agents
- SmartCrawler Introduction - Learn about intelligent web crawling
- Web Scraping Best Practices - Production-ready scraping strategies
