Integrating ScrapeGraphAI with ADK: Complete Guide

Agent Development Kit (ADK) is a powerful framework that enables you to create intelligent AI agents using Google's models like Gemini and supports the use of other generative AI models. ScrapeGraphAI provides a complete MCP (Model Context Protocol) server that seamlessly integrates web scraping, crawling, and structured data extraction with ADK agents.

In this tutorial, we'll discover how to combine these two technologies to create advanced AI agents capable of navigating the web, extracting information, and transforming it into ready-to-use structured data.

What is ADK?

Agent Development Kit (ADK) is a modern framework for developing AI agents using Google's Gemini models and supports the use of other generative AI models. ADK agents can:

Communicate naturally with users
Utilize external tools through the MCP protocol
Handle complex workflows with multi-step reasoning capabilities
Extract and process data from various sources

Why ScrapeGraphAI with ADK?

ScrapeGraphAI is an AI-powered platform for web scraping that offers:

Structured extraction: Transforms HTML into structured JSON using AI
Intelligent crawling: Automatically navigates complex websites
JavaScript support: Handles sites with heavy client-side rendering
MCP protocol: Standard integration with frameworks like ADK

By combining these technologies, you can create agents capable of:

Automatically searching for information on the web
Extracting structured data from web pages
Monitoring sites for changes
Gathering market intelligence
Automating complex research

Installation

To get started, you need to install the ScrapeGraphAI MCP server. The server is available as a Python package (requires Python 3.13 or higher):

pip install scrapegraph-mcp

Also make sure you have ADK installed:

pip install google-adk

Initial Setup

Before using ScrapeGraphAI with ADK, you'll need a ScrapeGraphAI API key. You can obtain it from the ScrapeGraphAI dashboard.

Save your API key in an environment variable or a Python variable:

SGAI_API_KEY = "YOUR_SCRAPEGRAPHAI_API_KEY"

Basic Integration with ADK

Here's how to set up a basic ADK agent that uses ScrapeGraphAI:

import asyncio
from google.adk.agents import Agent
from google.adk.runners import InMemoryRunner
from google.adk.tools.mcp_tool.mcp_toolset import MCPToolset
from google.adk.tools.mcp_tool.mcp_session_manager import StdioConnectionParams
from mcp import StdioServerParameters
 
SGAI_API_KEY = "YOUR_SCRAPEGRAPHAI_API_KEY"
 
# Create an ADK agent with ScrapeGraphAI integration
root_agent = Agent(
    model="gemini-2.5-pro",
    name="scrapegraph_assistant_agent",
    instruction="""Help the user with web scraping and data extraction using
ScrapeGraphAI. You can convert webpages to markdown, extract 
structured data using AI, perform web searches, crawl
multiple pages, and automate complex scraping workflows.""",
    tools=[
        MCPToolset(
            connection_params=StdioConnectionParams(
                server_params=StdioServerParameters(
                    # The following CLI command is available
                    # from `pip install scrapegraph-mcp`
                    command="scrapegraph-mcp",
                    env={
                        "SGAI_API_KEY": SGAI_API_KEY,
                    },
                ),
                timeout=300,
            ),
            # Optional: Filter which tools from the MCP server are exposed
            # tool_filter=["markdownify", "smartscraper", "searchscraper"]
        ),
    ],
)
 
runner = InMemoryRunner(agent=root_agent)

What This Code Does

Imports Required Modules: Includes asyncio, Agent, InMemoryRunner, and MCP-related imports
Creates an Agent: Uses Google's gemini-2.5-pro model
Configures Instructions: Defines the agent's web scraping capabilities
Adds MCPToolset: Integrates the ScrapeGraphAI MCP server
Configures Connection: Uses stdio to communicate with the MCP server
Sets Timeout: 300 seconds for complex operations
Creates Runner: Initializes an InMemoryRunner to execute agent tasks

Using the Agent

Once the agent is configured, you can use it for various web scraping tasks:

# Example 1: Convert a webpage to markdown
response = asyncio.run(runner.run_debug("Convert this page to markdown: https://scrapegraphai.com"))
print(response)
 
# Example 2: Extract structured data
response = asyncio.run(runner.run_debug("Extract all products with name, price, and description from: https://scrapegraphai.com/blog"))
print(response)
 
# Example 3: Perform a web search
response = asyncio.run(runner.run_debug("Search for the latest AI news and return title, author, and publication date"))
print(response)

Available Tools

ScrapeGraphAI MCP Server offers a complete suite of tools for web scraping:

1. `markdownify`

Transforms any webpage into clean, structured markdown format.

Ideal for:

Archiving web content
Content migration
Reading and analyzing articles

Example:

response = asyncio.run(runner.run_debug("Convert https://docs.python.org/3/tutorial/ to markdown"))
print(response)

2. `smartscraper`

Uses AI to extract structured data from any webpage with support for infinite scrolling.

Ideal for:

E-commerce scraping (products, prices)
Business information extraction
Data collection from dynamic feeds

Example:

response = asyncio.run(runner.run_debug("""Extract all products from https://scrapegraphai.com with:
    - Product name
    - Price
    - Availability
    - Main image"""))
print(response)

3. `searchscraper`

Performs AI-powered web searches with structured, actionable results.

Ideal for:

Searching for information across multiple sites
Competitive intelligence
Market analysis

Example:

response = asyncio.run(runner.run_debug("Search for gaming laptop price information and return results from the top 5 sites"))
print(response)

4. `scrape`

Basic endpoint for fetching content with optional heavy JavaScript rendering.

Ideal for:

Fetching raw HTML
Page structure analysis
Pre-processing before other operations

5. `sitemap`

Extracts sitemap URLs and structure for any website.

Ideal for:

Content discovery
Crawling planning
SEO analysis

6. `smartcrawler_initiate` / `smartcrawler_fetch_results`

Initiates intelligent multi-page crawling (asynchronous operation).

Ideal for:

Complete site crawling
Large-scale data collection
Content archiving

7. `agentic_scrapper`

Runs advanced agentic scraping workflows with customizable steps and structured output schemas.

Ideal for:

Complex multi-step workflows
Form interactions
Guided navigation

Practical Example: E-commerce Price Monitoring

Here's a complete example of how to use the integration to monitor product prices:

import asyncio
from google.adk.agents import Agent
from google.adk.runners import InMemoryRunner
from google.adk.tools.mcp_tool.mcp_toolset import MCPToolset
from google.adk.tools.mcp_tool.mcp_session_manager import StdioConnectionParams
from mcp import StdioServerParameters
import os
 
SGAI_API_KEY = os.getenv("SGAI_API_KEY")
 
# Create agent specialized for price monitoring
price_monitor_agent = Agent(
    model="gemini-2.5-pro",
    name="price_monitor_agent",
    instruction="""You are an agent specialized in e-commerce price monitoring.
When you receive a monitoring request:
1. Identify the product page
2. Extract product name, current price, and availability
3. Compare with historical prices if available
4. Return a structured summary with key information""",
    tools=[
        MCPToolset(
            connection_params=StdioConnectionParams(
                server_params=StdioServerParameters(
                    command="scrapegraph-mcp",
                    env={"SGAI_API_KEY": SGAI_API_KEY},
                ),
                timeout=300,
            ),
            # Filter only necessary tools for better performance
            tool_filter=["smartscraper", "markdownify"]
        ),
    ],
)
 
runner = InMemoryRunner(agent=price_monitor_agent)
 
# Use the agent
result = asyncio.run(runner.run_debug("Monitor the price of this product: https://scrapegraphai.com/pricing"))
print(result)

Filtering Tools for Performance

To optimize performance, you can filter which tools from the MCP server to expose to your agent:

MCPToolset(
    connection_params=StdioConnectionParams(...),
    tool_filter=["markdownify", "smartscraper", "searchscraper"]
)

This limits the agent to using only the specified tools, reducing:

Latency: Fewer tools to load
Costs: Avoids calls to unnecessary tools
Complexity: Simpler interface for the agent

Error Handling

It's important to properly handle errors when working with web scraping:

import asyncio
from google.adk.agents import Agent
from google.adk.runners import InMemoryRunner
import logging
 
logging.basicConfig(level=logging.INFO)
 
runner = InMemoryRunner(agent=root_agent)
 
try:
    response = asyncio.run(runner.run_debug("Extract data from https://scrapegraphai.com"))
    print(response)
except Exception as e:
    logging.error(f"Error during scraping: {e}")
    # Handle the error or retry with different parameters

Common errors and solutions:

Timeout: Increase timeout for complex operations
Rate limiting: Implement exponential backoff
Dynamic content: Use smartscraper with render_heavy_js=True

Best Practices

1. Use Clear Instructions

Provide specific instructions to the agent for better results:

# ❌ Not optimal
"Extract data from the page"
 
# ✅ Optimal
"Extract all products with name, price, description, and rating from https://scrapegraphai.com/blog"

2. Optimize Timeouts

Set appropriate timeouts based on operation complexity:

Simple: 60-120 seconds
Medium: 180-300 seconds
Complex: 300-600 seconds

3. Filter Tools When Possible

Limiting available tools improves performance and reduces costs:

tool_filter=["smartscraper"]  # Only when necessary

4. Handle Rate Limiting

Implement backoff when making many requests:

import asyncio
import time
from google.adk.runners import InMemoryRunner
 
async def scrape_with_backoff(runner, url, max_retries=3):
    for attempt in range(max_retries):
        try:
            return await runner.run_debug(f"Extract data from {url}")
        except Exception as e:
            if attempt < max_retries - 1:
                await asyncio.sleep(2 ** attempt)  # Exponential backoff
            else:
                raise
 
# Usage
response = asyncio.run(scrape_with_backoff(runner, "https://scrapegraphai.com"))
print(response)

Advanced Use Cases

1. Competitive Intelligence

Create an agent that monitors competitors:

competitive_intel_agent = Agent(
    model="gemini-2.5-pro",
    name="competitive_intel",
    instruction="""Analyze competitor websites and extract:
    - Products and prices
    - Features and positioning
    - Marketing content
    - SEO strategies""",
    tools=[MCPToolset(...)],
)

2. Content Aggregation

Gather content from different sources:

content_aggregator = Agent(
    model="gemini-2.5-pro",
    name="content_aggregator",
    instruction="""Aggregate articles and content from different sources,
    extract title, author, date, and main content.""",
    tools=[MCPToolset(...)],
)

3. Market Research

Perform automated market research:

market_researcher = Agent(
    model="gemini-2.5-pro",
    name="market_researcher",
    instruction="""Perform market research on specific topics,
    aggregating data from multiple sources and providing structured insights.""",
    tools=[MCPToolset(...)],
)

Limitations and Considerations

Technical Limitations

Python 3.13+: The MCP server requires Python 3.13 or higher
API Key Required: A valid ScrapeGraphAI API key is needed
Rate Limits: Respect the platform's rate limits

Cost Considerations

Each call to ScrapeGraphAI tools consumes credits
Monitor usage through the dashboard
Use tool filters to reduce costs

Legal Compliance

Always respect site robots.txt files
Don't violate terms of service
Use ethical and responsible scraping

Troubleshooting

Issue: MCP server won't start

Solution:

# Verify installation
pip show scrapegraph-mcp
 
# Verify API key is configured
echo $SGAI_API_KEY

Issue: Frequent timeouts

Solution:

# Increase timeout
timeout=600  # 10 minutes

Issue: Tools not available

Solution:

# Verify tools are correctly filtered
tool_filter=["smartscraper", "markdownify"]  # Explicit list

Additional Resources

To learn more about the integration, consult:

Conclusion

Integrating ScrapeGraphAI with ADK opens new possibilities for creating powerful AI agents capable of:

Navigating the web autonomously
Extracting structured data from any source
Aggregating information from multiple sources
Automating complex research
Providing insights based on real-time data

With this combination, you can build automated intelligence systems, market monitoring, and advanced research agents that go far beyond traditional scraping capabilities.

Start experimenting with this powerful combination today and discover what you can build! 🚀

Frequently Asked Questions

How do I get a ScrapeGraphAI API key?

Can I use different Gemini models?

Yes, you can use any Gemini model available in ADK:

gemini-2.5-pro (recommended for complex tasks)
gemini-1.5-pro (more economical)
gemini-1.5-flash (faster)

What is the cost of usage?

ScrapeGraphAI uses a credit system. Check the pricing page for details.

Can I use multiple agents simultaneously?

Yes, each agent can have its own MCP connection. Make sure to properly manage resources.

How do I handle network errors?

Implement retry logic with exponential backoff and appropriate error handling.

Want to learn more about AI agents and automation? Explore these guides:

AI Agents Tutorial - Learn the fundamentals of AI agents
Web Scraping with AI - Discover AI-powered scraping capabilities
MCP Server Tutorial - Deep dive into the MCP protocol
Structured Data Extraction - Learn to structure extracted data
Web Scraping Best Practices - Improve your scraping techniques

Integrating ScrapeGraphAI with ADK: Complete Guide

Integrating ScrapeGraphAI with ADK: Complete Guide

What is ADK?

Why ScrapeGraphAI with ADK?

Installation

Initial Setup

Basic Integration with ADK

What This Code Does

Using the Agent

Available Tools

1. markdownify

2. smartscraper

3. searchscraper

4. scrape

5. sitemap

6. smartcrawler_initiate / smartcrawler_fetch_results

7. agentic_scrapper

Practical Example: E-commerce Price Monitoring

Filtering Tools for Performance

Error Handling

Best Practices

1. Use Clear Instructions

2. Optimize Timeouts

3. Filter Tools When Possible

4. Handle Rate Limiting

Advanced Use Cases

1. Competitive Intelligence

2. Content Aggregation

3. Market Research

Limitations and Considerations

Technical Limitations

Cost Considerations

Legal Compliance

Troubleshooting

Issue: MCP server won't start

Issue: Frequent timeouts

Issue: Tools not available

Additional Resources

Conclusion

Frequently Asked Questions

How do I get a ScrapeGraphAI API key?

Can I use different Gemini models?

What is the cost of usage?

Can I use multiple agents simultaneously?

How do I handle network errors?

Related Articles

1. `markdownify`

2. `smartscraper`

3. `searchscraper`

4. `scrape`

5. `sitemap`

6. `smartcrawler_initiate` / `smartcrawler_fetch_results`

7. `agentic_scrapper`