ScrapeGraphAIScrapeGraphAI

ScrapeGraphAI Docker MCP Server: Complete Setup Guide

ScrapeGraphAI Docker MCP Server: Complete Setup Guide

Author 1

Marco Vinciguerra

The Model Context Protocol (MCP) has revolutionized how AI assistants interact with external tools and data sources. If you're using Claude Desktop or other MCP-compatible tools, you can now leverage ScrapeGraphAI's powerful web scraping capabilities directly through a Docker container.

In this comprehensive tutorial, I'll walk you through setting up the ScrapeGraphAI MCP Server using Docker, configuring it with Claude Desktop, and using all its powerful tools for web scraping, data extraction, and content conversion.

What is the ScrapeGraphAI MCP Server?

The ScrapeGraphAI MCP Server is a Dockerized Model Context Protocol server that provides AI assistants like Claude with direct access to ScrapeGraphAI's web scraping tools. Instead of writing code or making API calls manually, you can simply ask Claude to scrape websites, extract data, or convert pages to markdown—all through natural language conversations.

The Docker container (mcp/scrapegraph) runs as an MCP server that exposes five powerful tools:

  • smartscraper: Extract structured data from webpages using AI
  • searchscraper: Perform AI-powered web searches with structured results
  • markdownify: Convert webpages to clean, formatted markdown
  • smartcrawler_initiate: Start intelligent multi-page web crawling operations
  • smartcrawler_fetch_results: Retrieve results from crawling operations

Prerequisites

Before we begin, make sure you have:

  1. Docker installed: Download Docker Desktop from docker.com
  2. Claude Desktop (optional but recommended): Get it from Anthropic's website
  3. ScrapeGraphAI API Key: Sign up at dashboard.scrapegraphai.com and get your API key

Step 1: Pull the Docker Image

First, let's pull the ScrapeGraphAI MCP Server image from Docker Hub:

docker pull mcp/scrapegraph

This will download the latest version of the image (approximately 106 MB). Once complete, you can verify it's available:

docker images | grep scrapegraph

You should see output like:

mcp/scrapegraph    latest    sha256:9005c47bd...    106.2 MB    4 months ago

Step 2: Configure Claude Desktop

To use the MCP server with Claude Desktop, you need to configure it in Claude's settings. The configuration file location varies by operating system:

macOS:

~/Library/Application Support/Claude/claude_desktop_config.json

Windows:

%APPDATA%\Claude\claude_desktop_config.json

Linux:

~/.config/Claude/claude_desktop_config.json

Create or edit this file with the following configuration:

{
  "mcpServers": {
    "scrapegraph": {
      "command": "docker",
      "args": [
        "run",
        "-i",
        "--rm",
        "-e",
        "SGAI_API_KEY",
        "mcp/scrapegraph"
      ],
      "env": {
        "SGAI_API_KEY": "YOUR_SGAI_API_KEY_HERE"
      }
    }
  }
}

Important Security Note: Replace YOUR_SGAI_API_KEY_HERE with your actual ScrapeGraphAI API key. For better security, you can also set it as an environment variable:

{
  "mcpServers": {
    "scrapegraph": {
      "command": "docker",
      "args": [
        "run",
        "-i",
        "--rm",
        "-e",
        "SGAI_API_KEY",
        "mcp/scrapegraph"
      ],
      "env": {
        "SGAI_API_KEY": "${SGAI_API_KEY}"
      }
    }
  }
}

Then set the environment variable in your system before launching Claude Desktop.

After saving the configuration file, restart Claude Desktop for the changes to take effect.

Step 3: Verify the Connection

Once Claude Desktop restarts, you can verify that the MCP server is connected. In a new conversation, try asking Claude:

What MCP tools do you have available?

Claude should list the five ScrapeGraphAI tools. If you don't see them, check:

  1. The Docker daemon is running
  2. The configuration file is in the correct location
  3. The JSON syntax is valid (you can use JSONLint to verify)
  4. You've restarted Claude Desktop after making changes

Using the Tools

Now that everything is set up, let's explore how to use each tool through natural language conversations with Claude.

SmartScraper: Extract Structured Data

SmartScraper is perfect when you need to extract specific information from a webpage. Just describe what you want, and Claude will use the smartscraper tool to get it for you.

Example conversation:

You: Can you extract the product name, price, and description from 
https://example.com/product?

Claude: I'll extract that information for you using SmartScraper.

[Claude uses the smartscraper tool]

Here's the extracted data:
- Product Name: Example Widget
- Price: $29.99
- Description: A high-quality widget perfect for everyday use...

Advanced usage: You can also ask Claude to extract data from multiple pages or handle dynamic content:

You: Extract all job listings from https://company.com/careers, including 
title, location, and salary for each position.

SearchScraper: AI-Powered Web Search

SearchScraper lets Claude perform web searches and extract structured information from multiple sources automatically.

Example conversation:

You: Search for the latest information about renewable energy trends in 2024 
and extract the key findings.

Claude: I'll search for that information and extract the key findings.

[Claude uses the searchscraper tool]

Here are the key findings from my search:
1. Solar energy costs have decreased by 40%...
2. Wind power capacity has increased significantly...

By default, SearchScraper searches 3 websites, but you can ask Claude to adjust this:

You: Search 5 websites for information about Python web scraping best practices.

Markdownify: Convert Webpages to Markdown

Markdownify converts any webpage into clean, readable markdown format—perfect for documentation, content migration, or reading purposes.

Example conversation:

You: Convert https://docs.example.com/getting-started to markdown format.

Claude: I'll convert that page to markdown for you.

[Claude uses the markdownify tool]

# Getting Started

Welcome to our documentation...

[Markdown content continues]

This is especially useful for:

  • Converting documentation to markdown for your own docs
  • Creating backups of important web content
  • Migrating content between systems
  • Improving readability of web articles

SmartCrawler: Multi-Page Web Crawling

SmartCrawler is powerful for extracting data from multiple pages on a website. The process involves two steps: initiating the crawl, then fetching results.

Example conversation:

You: Crawl https://blog.example.com and extract all article titles, 
authors, and publication dates. Limit it to the first 10 articles.

Claude: I'll initiate a SmartCrawler operation to extract article information.

[Claude uses smartcrawler_initiate]
[Claude waits for completion]
[Claude uses smartcrawler_fetch_results]

Here are the articles I found:
1. Title: "Introduction to Web Scraping"
   Author: Jane Doe
   Published: 2024-11-15

2. Title: "Advanced Data Extraction"
   Author: John Smith
   Published: 2024-11-10

[... continues with remaining articles]

SmartCrawler supports two modes:

  1. AI Extraction Mode (10 credits per page): Extracts structured data based on your prompt
  2. Markdown Mode (2 credits per page): Converts pages to markdown format

You can control the crawling behavior:

  • depth: Maximum link traversal depth
  • max_pages: Maximum number of pages to crawl
  • same_domain_only: Whether to stay within the same domain

Running the Container Manually

While using it with Claude Desktop is convenient, you can also run the MCP server manually for testing or integration with other tools.

Basic Usage

docker run -i --rm -e SGAI_API_KEY=your_api_key_here mcp/scrapegraph

Using Environment Variables

For better security, use environment variables:

export SGAI_API_KEY="your_api_key_here"
docker run -i --rm -e SGAI_API_KEY mcp/scrapegraph

Persistent Container (Development)

For development purposes, you might want to keep the container running:

docker run -d --name scrapegraph-mcp \
  -e SGAI_API_KEY=your_api_key_here \
  mcp/scrapegraph

Then attach to it:

docker attach scrapegraph-mcp

Advanced Configuration

Using Docker Compose

For easier management, you can create a docker-compose.yml file:

version: '3.8'
 
services:
  scrapegraph-mcp:
    image: mcp/scrapegraph:latest
    container_name: scrapegraph-mcp
    environment:
      - SGAI_API_KEY=${SGAI_API_KEY}
    stdin_open: true
    tty: true
    restart: unless-stopped

Then run:

docker-compose up -d

Environment Variable Files

For better security, use a .env file:

# .env file
SGAI_API_KEY=your_api_key_here

And reference it in docker-compose:

env_file:
  - .env

Or when running docker directly:

docker run -i --rm --env-file .env mcp/scrapegraph

Troubleshooting Common Issues

Issue: Claude Desktop Can't Connect to the MCP Server

Solutions:

  1. Verify Docker is running: docker ps should work without errors
  2. Test the image manually: docker run -i --rm -e SGAI_API_KEY=test mcp/scrapegraph
  3. Check the configuration file JSON syntax is valid
  4. Ensure the configuration file is in the correct location for your OS
  5. Restart Claude Desktop completely

Issue: "API Key Not Found" Error

Solutions:

  1. Verify your API key is correctly set in the configuration
  2. Check that the environment variable is accessible
  3. Test your API key at the ScrapeGraphAI dashboard
  4. Ensure there are no extra spaces or quotes around the API key

Issue: Docker Container Fails to Start

Solutions:

  1. Check Docker logs: docker logs <container_id>
  2. Verify you have the latest image: docker pull mcp/scrapegraph
  3. Ensure you have sufficient disk space: docker system df
  4. Try removing old containers: docker system prune

Issue: Tools Not Appearing in Claude

Solutions:

  1. Restart Claude Desktop completely (quit and reopen)
  2. Check that the MCP server appears in Claude's connection status
  3. Verify the Docker command works manually
  4. Check Claude Desktop's logs for errors

Best Practices

Security

  1. Never commit API keys: Use environment variables or secure secret management
  2. Use Docker secrets: For production deployments, use Docker secrets
  3. Limit container access: Run containers with minimal required permissions
  4. Regular updates: Keep the Docker image updated: docker pull mcp/scrapegraph

Performance

  1. Resource limits: Set appropriate CPU and memory limits for Docker containers
  2. Network configuration: Use Docker networks for secure communication
  3. Credits management: Monitor your ScrapeGraphAI usage to stay within budget
  4. Caching: Consider caching results for repeated requests

Usage Tips

  1. Be specific in prompts: Clearer prompts yield better extraction results
  2. Batch similar requests: Group related scraping tasks together
  3. Monitor credit usage: Track your API usage in the dashboard
  4. Test with small datasets first: Verify your approach before large crawls

Real-World Use Cases

Content Aggregation

You: Crawl https://tech-news.com and extract all article headlines and 
summaries from the front page, then format them as a daily digest.

Market Research

You: Search for information about competitor pricing in the SaaS space 
and extract pricing tiers, features, and target markets.

Documentation Migration

You: Convert all pages from https://old-docs.example.com/api to markdown 
format so I can migrate them to our new documentation system.

Lead Generation

You: Extract company information including name, contact email, and 
industry from https://directory.example.com/companies.

SEO Monitoring

You: Extract meta titles, descriptions, and H1 tags from 
https://competitor.com/blog to analyze their SEO strategy.

Integration with Other Tools

While Claude Desktop is the primary use case, the MCP server can integrate with other MCP-compatible tools:

  • Cline: VS Code extension with MCP support
  • Continue: IDE extension for code completion
  • Custom MCP clients: Build your own integrations

Check the Model Context Protocol documentation for more information about MCP client development.

Understanding Costs

ScrapeGraphAI uses a credit-based pricing system:

  • SmartScraper: 10 credits per request
  • SearchScraper: 10 credits per website searched (default 3 = 30 credits)
  • Markdownify: 2 credits per page
  • SmartCrawler (AI mode): 10 credits per page crawled
  • SmartCrawler (Markdown mode): 2 credits per page crawled

Monitor your usage in the dashboard and adjust your scraping strategies accordingly.

Frequently Asked Questions

Can I use this without Claude Desktop?

Yes! The Docker container implements the MCP protocol, so any MCP-compatible client can use it. You can also use it programmatically by implementing an MCP client.

How do I update to the latest version?

Simply pull the latest image:

docker pull mcp/scrapegraph

Then restart Claude Desktop or your MCP client.

Is my data secure?

The Docker container runs locally on your machine. Data flows from:

  1. Your machine → Docker container → ScrapeGraphAI API
  2. Results flow back through the same path

Your API key is stored in your local configuration and never leaves your machine (except to authenticate with ScrapeGraphAI).

Can I use multiple API keys?

Yes, you can run multiple instances with different API keys by giving them different names in the configuration:

{
  "mcpServers": {
    "scrapegraph": {
      "command": "docker",
      "args": ["run", "-i", "--rm", "-e", "SGAI_API_KEY", "mcp/scrapegraph"],
      "env": {
        "SGAI_API_KEY": "key1"
      }
    },
    "scrapegraph-alt": {
      "command": "docker",
      "args": ["run", "-i", "--rm", "-e", "SGAI_API_KEY", "mcp/scrapegraph"],
      "env": {
        "SGAI_API_KEY": "key2"
      }
    }
  }
}

What happens if I exceed my credit limit?

The API will return an error. Monitor your usage in the dashboard and upgrade your plan if needed.

Can I customize the Docker image?

Yes! The source code is available at github.com/ScrapeGraphAI/scrapegraph-mcp. You can build a custom image with your modifications.

Is there a way to cache results?

The MCP server itself doesn't cache, but you can implement caching at the Claude Desktop level or build a custom MCP client with caching capabilities.

Conclusion

The ScrapeGraphAI Docker MCP Server brings powerful web scraping capabilities directly to AI assistants like Claude. By following this tutorial, you've learned how to:

  • Set up the Docker container
  • Configure Claude Desktop to use the MCP server
  • Use all five available tools through natural language
  • Troubleshoot common issues
  • Apply best practices for security and performance

The combination of Docker's containerization and MCP's protocol makes this a secure, portable, and powerful solution for AI-powered web scraping. Whether you're aggregating content, conducting market research, or migrating documentation, this setup gives Claude the tools it needs to help you work with web data efficiently.

Start experimenting with different scraping tasks and discover how much more productive you can be when Claude can access the web directly through ScrapeGraphAI!

Related Resources

Want to learn more about ScrapeGraphAI and web scraping? Explore these guides:

Give your AI Agent superpowers with lightning-fast web data!