Blog/ScrapeGraphAI + Agno: The Ultimate Web Scraping Workflow

ScrapeGraphAI + Agno: The Ultimate Web Scraping Workflow

Learn how to use ScrapeGraphAI with Agno to create a powerful web scraping workflow.

Tutorials15 min read min readMarco VinciguerraBy Marco Vinciguerra
ScrapeGraphAI + Agno: The Ultimate Web Scraping Workflow

Introduction

The landscape of AI agent frameworks has been rapidly evolving, with new platforms emerging that promise to revolutionize how developers build and deploy autonomous AI systems. As the team behind ScrapeGraphAI—a cutting-edge AI-powered web scraping library—the developers are constantly evaluating emerging technologies that could enhance their ability to create intelligent, adaptive scraping solutions.

In the world of web scraping and data extraction, the challenges are becoming increasingly complex. Modern websites employ sophisticated anti-bot measures, dynamic content loading, and intricate user interactions that traditional scraping methods struggle to handle. This has led the ScrapeGraphAI team to explore agentic AI systems—autonomous agents capable of reasoning, learning, and adapting to overcome these challenges in real-time.

Enter Agno, a platform that has caught the attention of developers across the AI community for its bold claims and innovative approach to agent development. Unlike traditional frameworks that often require extensive boilerplate code and complex workflow management, Agno positions itself as a lightweight, high-performance solution that can transform any large language model into a sophisticated autonomous agent.

What makes Agno particularly intriguing for the ScrapeGraphAI team is its promise of unprecedented performance—claiming to be approximately 10,000 times faster than popular alternatives like LangGraph while using 50 times less memory. For applications that need to process thousands of web pages, handle dynamic content, and make real-time decisions about data extraction strategies, these performance characteristics could be game-changing.

The ScrapeGraphAI team has been conducting an in-depth evaluation of Agno, examining its architecture, capabilities, and potential applications in the context of intelligent web scraping. This exploration has revealed several compelling features that could significantly impact how AI-powered data extraction systems are built and deployed in production environments.

What is Agno?

Agno is an open-source platform designed to build, ship, and monitor agentic systems. Think of it as a comprehensive framework that transforms any large language model into a powerful, autonomous agent capable of reasoning, memory retention, and tool usage.

What sets Agno apart from other frameworks like LangGraph is its focus on performance and simplicity. Agno claims to be approximately 10,000x faster than LangGraph and uses 50x less memory, making it particularly attractive for teams building at scale.

What is ScrapeGraphAI

ScrapeGraphAI is an API that uses AI to extract data from the web. It uses graph-based scraping, which is better than both browser-based methods and Tavily. This service fits smoothly into your data pipeline with our easy-to-use APIs, known for their speed and accuracy. Powered by AI, it connects effortlessly with any no-code platform like n8n, Bubble, or Make. We offer fast, production-ready APIs, Python and JS SDKs, auto-recovery, agent-friendly integration (such as LangGraph, LangChain, etc.), and a free tier with strong support.

ScrapeGraphTools Integration Examples

Before diving into the complex Deep Researcher Agent implementation, let's explore the various ways you can integrate ScrapeGraphAI with Agno through the ScrapeGraphTools. The tool offers multiple scraping methods, each optimized for different use cases:

Basic ScrapeGraphTools Usage

python

from agno.agent import Agent

from agno.tools.scrapegraph import ScrapeGraphTools

# Example 1: Default behavior - only smartscraper enabled

scrapegraph = ScrapeGraphTools(smartscraper=True)

agent = Agent(tools=[scrapegraph], show_tool_calls=True, markdown=True, stream=True)

Use smartscraper for structured data extraction

python

agent.print_response("""

Use smartscraper to extract the following from https://www.wired.com/category/science/:

- News articles

- Headlines

- Images

- Links

- Author

""")

# Example 2: Only markdownify enabled (by setting smartscraper=False)
scrapegraph_md = ScrapeGraphTools(smartscraper=False)

agent_md = Agent(tools=[scrapegraph_md], show_tool_calls=True, markdown=True)

# Use markdownify for content conversion

agent_md.print_response(

    "Fetch and convert https://www.wired.com/category/science/ to markdown format"

)
python

# Example 3: Enable searchscraper for targeted information extraction

scrapegraph_search = ScrapeGraphTools(searchscraper=True)

agent_search = Agent(tools=[scrapegraph_search], show_tool_calls=True, markdown=True)

# Use searchscraper for specific information retrieval

agent_search.print_response(

    "Use searchscraper to find the CEO of company X and their contact details from https://example.com"

)

# Example 4: Enable crawl for comprehensive site analysis

scrapegraph_crawl = ScrapeGraphTools(crawl=True)

agent_crawl = Agent(tools=[scrapegraph_crawl], show_tool_calls=True, markdown=True)

# Use crawl with schema-based extraction

agent_crawl.print_response(

    "Use crawl to extract what the company does and get text content from privacy and terms from https://scrapegraphai.com/ with a suitable schema."

)

Understanding ScrapeGraphTools Methods

Each method in ScrapeGraphTools serves a specific purpose:

  • smartscraper: Default method for structured data extraction with AI-powered analysis
  • markdownify: Converts web pages to clean markdown format for better readability
  • searchscraper: Targets specific information extraction based on search queries
  • crawl: Comprehensive site crawling with schema-based data extraction

These examples demonstrate the flexibility of ScrapeGraphTools in handling different scraping scenarios, from simple content conversion to complex structured data extraction.

Deep Researcher Agent Implementation

Architecture Overview

The Deep Researcher Agent is a multi-stage AI-powered research workflow that automates comprehensive web research, analysis, and report generation using Agno, Scrapegraph, and Nebius AI. This implementation demonstrates how to create a sophisticated research system with three specialized agents working in sequence.

Core Implementation (agents.py)

python
import os

from agno.agent import Agent

from agno.models.nebius import Nebius

from dotenv import load_dotenv

from typing import Iterator

from agno.utils.log import logger

from agno.utils.pprint import pprint_run_response

from agno.tools.scrapegraph import ScrapeGraphTools

from agno.workflow import RunEvent, RunResponse, Workflow

from pydantic import BaseModel, Field

load_dotenv()

class DeepResearcherAgent(Workflow):

    """

    A multi-stage research workflow that:

    1. Gathers information from the web using advanced scraping tools.

    2. Analyzes and synthesizes the findings.

    3. Produces a clear, well-structured report.

    """

    # Searcher: Finds and extracts relevant information from the web

    searcher: Agent = Agent(

        tools=[ScrapeGraphTools(api_key=os.getenv("SGAI_API_KEY"))],

        model=Nebius(

            id="deepseek-ai/DeepSeek-V3-0324", api_key=os.getenv("NEBIUS_API_KEY")

        ),

        show_tool_calls=True,

        markdown=True,

        description=(

            "You are ResearchBot-X, an expert at finding and extracting high-quality, "

            "up-to-date information from the web. Your job is to gather comprehensive, "

            "reliable, and diverse sources on the given topic."

        ),

        instructions=(

            "1. Search for the most recent and authoritative and up-to-date sources (news, blogs, official docs, research papers, forums, etc.) on the topic.\n"

            "2. Extract key facts, statistics, and expert opinions.\n"

            "3. Cover multiple perspectives and highlight any disagreements or controversies.\n"

            "4. Include relevant statistics, data, and expert opinions where possible.\n"

            "5. Organize your findings in a clear, structured format (e.g., markdown table or sections by source type).\n"

            "6. If the topic is ambiguous, clarify with the user before proceeding.\n"

            "7. Be as comprehensive and verbose as possible—err on the side of including more detail.\n"

            "8. Mention the References & Sources of the Content. (It's Must)"

        ),

    )

    # Analyst: Synthesizes and interprets the research findings

    analyst: Agent = Agent(

        model=Nebius(

            id="deepseek-ai/DeepSeek-V3-0324", api_key=os.getenv("NEBIUS_API_KEY")

        ),

        markdown=True,

        description=(

            "You are AnalystBot-X, a critical thinker who synthesizes research findings "

            "into actionable insights. Your job is to analyze, compare, and interpret the "

            "information provided by the researcher."

        ),

        instructions=(

            "1. Identify key themes, trends, and contradictions in the research.\n"

            "2. Highlight the most important findings and their implications.\n"

            "3. Suggest areas for further investigation if gaps are found.\n"

            "4. Present your analysis in a structured, easy-to-read format.\n"

            "5. Extract and list ONLY the reference links or sources that were ACTUALLY found and provided by the researcher in their findings. Do NOT create, invent, or hallucinate any links.\n"

            "6. If no links were provided by the researcher, do not include a References section.\n"

            "7. Don't add hallucinations or make up information. Use ONLY the links that were explicitly passed to you by the researcher.\n"

            "8. Verify that each link you include was actually present in the researcher's findings before listing it.\n"

            "9. If there's no Link found from the previous agent then just say, No reference Found."

        ),

    )

    # Writer: Produces a final, polished report

    writer: Agent = Agent(

        model=Nebius(

            id="deepseek-ai/DeepSeek-V3-0324", api_key=os.getenv("NEBIUS_API_KEY")

        ),

        markdown=True,

        description=(

            "You are WriterBot-X, a professional technical writer. Your job is to craft "

            "a clear, engaging, and well-structured report based on the analyst's summary."

        ),

        instructions=(

            "1. Write an engaging introduction that sets the context.\n"

            "2. Organize the main findings into logical sections with headings.\n"

            "3. Use bullet points, tables, or lists for clarity where appropriate.\n"

            "4. Conclude with a summary and actionable recommendations.\n"

            "5. Include a References & Sources section ONLY if the analyst provided actual links from their analysis.\n"

            "6. Use ONLY the reference links that were explicitly provided by the analyst in their analysis. Do NOT create, invent, or hallucinate any links.\n"

            "7. If the analyst provided links, format them as clickable markdown links in the References section.\n"

            "8. If no links were provided by the analyst, do not include a References section at all.\n"

            "9. Never add fake or made-up links - only use links that were actually found and passed through the research chain."

        ),

Ready to Scale Your Data Collection?

Join thousands of businesses using ScrapeGrapAI to automate their web scraping needs. Start your journey today with our powerful API.

text
)

def run(self, topic: str) -> Iterator[RunResponse]:

    """

    Orchestrates the research, analysis, and report writing process for a given topic.

    """

    logger.info(f"Running deep researcher agent for topic: {topic}")

    # Step 1: Research

    research_content = self.searcher.run(topic)

    # logger.info(f"Searcher content: {research_content.content}")

    logger.info("Analysis started")

    # Step 2: Analysis

    analysis = self.analyst.run(research_content.content)

    # logger.info(f"Analyst analysis: {analysis.content}")

    logger.info("Report Writing Started")

    # Step 3: Report Writing

    report = self.writer.run(analysis.content, stream=True)

    yield from report

def run_research(query: str) -> str:

text
agent = DeepResearcherAgent()

final_report_iterator = agent.run(

    topic=query,

)

logger.info("Report Generated")

# Collect all streaming content into a single string

full_report = ""

for chunk in final_report_iterator:

    if chunk.content:

        full_report += chunk.content

return full_report

if name == "main":

text
topic = "Extract information about Nebius AI Studio, including its features, capabilities, and applications from available sources."

response = run_research(topic)

print(response)
text

## **Project Structure**

deep_researcher_agent/

├── agents.py # Core agent workflow implementation

├── server.py # MCP server for integration

├── pyproject.toml # Project configuration

├── .python-version # Python version specification

├── uv.lock # Lock file for dependencies

└── README.md # Documentation

text

## **Installation and Setup**

### **Prerequisites**

* Python 3.10+  
* uv for dependency management  
* API keys for Nebius AI and Scrapegraph

### **Installation Steps**

1. **Install uv** (if you don't have it):

```bash

curl -LsSf https://astral.sh/uv/install.sh | sh
  1. Clone the repository:
bash

git clone https://github.com/Arindam200/awesome-ai-apps.git

cd awesome-ai-apps/advance_ai_agents/deep_researcher_agent

  1. Install dependencies:

uv sync

  1. Environment Setup: Create a .env file with your API keys:
text
NEBIUS_API_KEY=your_nebius_api_key_here

SGAI_API_KEY=your_scrapegraph_api_key_here

Usage Options

1. Command Line Interface

Run research directly from the command line:

uv run python agents.py

2. MCP Server Integration

Add the following configuration to your .cursor/mcp.json or Claude/claude_desktop_config.json:

json

{

  "mcpServers": {

    "deep_researcher_agent": {

      "command": "python",

      "args": [

        "--directory",

        "/Your/Path/to/directory/awesome-ai-apps/advance_ai_agents/deep_researcher_agent",

        "run",

        "server.py"

      ],

      "env": {

        "NEBIUS_API_KEY": "your_nebius_api_key_here",

        "SGAI_API_KEY": "your_scrapegraph_api_key_here"

      }

    }

  }

}

Example Usage

python
from agents import run_research

Example research topic

topic = "Extract information about Nebius AI Studio, including its features, capabilities, and applications from available sources."

Run the research

python
response = run_research(topic)

print(response)

Installation Requirements

The project uses uv for dependency management

All dependencies are specified in pyproject.toml

Main dependencies include:

  • agno (agent framework)

  • streamlit (web interface)

  • python-dotenv (environment management)

- pydantic (data validation)

Install using uv:

bash

uv sync

Project Dependencies (pyproject.toml)

The project includes the following key dependencies:

  • agno: High-performance agent framework
  • streamlit: Web interface framework
  • python-dotenv: Environment variable management
  • pydantic: Data validation and serialization

Environment Configuration

Create a .env file with your API keys:

bash
NEBIUS_API_KEY=your_nebius_api_key_here

SGAI_API_KEY=your_scrapegraph_api_key_here

Key Features

  1. Multi-Stage Research Workflow: The agent follows a structured approach with three specialized agents:
    • Searcher (ResearchBot-X): Finds and extracts high-quality, up-to-date information from the web
    • Analyst (AnalystBot-X): Synthesizes and interprets research findings into actionable insights
    • Writer (WriterBot-X): Produces clear, structured, and actionable reports
  2. Advanced Web Scraping: Uses ScrapeGraphAI's tools for intelligent data extraction from complex websites.
  3. High Performance: Leverages Agno's performance optimizations and Nebius AI's DeepSeek-V3 model for faster processing.
  4. Streaming Response: Real-time report generation with streaming capabilities for better user experience.
  5. Reference Management: Intelligent handling of source references with validation to prevent hallucinated links.
  6. MCP Server Support: Integration with Claude Desktop and Cursor for seamless workflow integration.
  7. Robust Error Handling: Built-in error handling and logging for production environments.

Development Commands

Code formatting

bash
uv run black .
bash
uv run isort .

Type checking

bash
uv run mypy .

Testing

bash

uv run pytest
```bash

## **Final Thoughts**

Agno appears ready for real workloads, especially for teams building agentic systems at scale. While frameworks like LangGraph excel in flow-oriented control logic, Agno's focus on performance, simplicity, and scalability makes it particularly compelling for production environments.

The integration with ScrapeGraphAI creates a powerful combination for intelligent data extraction and research tasks. The Deep Researcher Agent demonstrates how these technologies can work together to create sophisticated, autonomous research systems that can handle complex, multi-stage investigations.

For teams building AI agents that need to operate at scale, handle multimodal data, or require sophisticated reasoning capabilities, this [Agno](https://www.agno.com/?utm_source=scrapegraphai&utm_medium=partner-content&utm_campaign=partner-technical&utm_content=blog) + ScrapeGraphAI combination appears worth exploring. The open-source nature, combined with enterprise-ready features and impressive performance claims, positions it as a serious contender in the agent framework space.

## **Frequently Asked Questions (FAQ)**

### **What is ScrapeGraphAI, and how does it differ from Tavily?**

ScrapeGraphAI is an AI-powered, graph-based web scraping platform that maps websites as interconnected graphs to efficiently extract structured data from any web page or document. It offers fast, production-ready APIs with easy integration for comprehensive data extraction workflows. Tavily, on the other hand, is a search engine API specifically designed for AI agents that focuses on retrieving search results and content snippets from across the web, but it doesn't provide the deep structural scraping capabilities needed for extracting specific data elements from individual web pages.

### **Why should I choose ScrapeGraphAI over Tavily for data extraction needs?**

ScrapeGraphAI provides several key advantages for comprehensive data extraction: lightning-fast scraping with graph-based navigation, production-ready stability with auto-recovery mechanisms, the ability to extract structured data from any website layout, simple APIs and SDKs for Python and JavaScript, a generous free tier for testing, dedicated support, and seamless integration with existing data pipelines. While Tavily excels at search and content retrieval, it's primarily designed for finding information rather than performing detailed data extraction from specific web pages or handling complex scraping workflows.

### **Is ScrapeGraphAI suitable for users who need more than just search functionality?**

Yes, ScrapeGraphAI is designed for users who need comprehensive data extraction beyond simple search results. With minimal configuration, it can handle complex scraping tasks like extracting product catalogs, financial data, real estate listings, or any structured information from websites. Unlike Tavily, which focuses on search and content snippets optimized for AI processing, ScrapeGraphAI provides full-scale web scraping capabilities that can navigate dynamic content, handle authentication, and extract data in any desired format or structure.

### **How reliable is ScrapeGraphAI in production environments?**

ScrapeGraphAI is production-ready, operating 24/7 with built-in fault tolerance and auto-recovery mechanisms. It is designed to handle edge cases and maintain stability, unlike Browser-Use, which is prone to crashes and not optimized for production.

### **Can ScrapeGraphAI be integrated with AI agents?**

Absolutely. ScrapeGraphAI can be easily defined as a tool in frameworks like LangGraph, enabling AI agents to leverage its world-class scraping capabilities. The Deep Researcher Agent implementation above demonstrates how to integrate it with minimal effort using the Agno framework for enhanced performance and scalability.

### **What are the performance benefits of using Agno with ScrapeGraphAI?**

The combination provides significant performance improvements:

* **Speed**: Agno's claimed 10,000x faster processing compared to alternatives like LangGraph  
* **Memory Efficiency**: 50x less memory usage for handling large-scale research tasks  
* **Scalability**: Designed to handle thousands of concurrent scraping operations  
* **Reliability**: Built-in error handling and auto-recovery mechanisms  
* **Real-time Processing**: Capability to make real-time decisions about data extraction strategies

This makes the combination particularly suitable for production environments requiring high-throughput data processing and analysis.

## **Related Resources**

Want to learn more about AI agents, frameworks, and advanced scraping techniques? Explore these comprehensive guides:

### **AI Agent Development**
- [Building AI Agents with LangChain, LlamaIndex, and CrewAI](/blog/agents-tutorial) - Master the fundamentals of AI agent development with popular frameworks
- [Building Agents Without Frameworks](/blog/how-to-create-agent-without-frameworks) - Learn to create AI agents from scratch using OpenAI SDK
- [Multi-Agent Systems with LangGraph](/blog/multi-agent) - Discover how to build complex, collaborative AI agent systems
- [AI Agent Web Scraping](/blog/ai-agent-webscraping) - Deep dive into AI-powered web scraping and automation

### **ScrapeGraphAI Integrations**
- [ScrapeGraphAI CrewAI Integration](/blog/scrapegraphai-crewai-integration) - See how to integrate ScrapeGraphAI with CrewAI for task automation
- [LlamaIndex Integration](/blog/scrapegraphai-llamaindex-integration)) - Learn how to process scraped data with LlamaIndex for advanced analysis
- [Building Intelligent Agents with ScrapeGraphAI](/blog/integrating-scrapegraph-into-intelligent-agents) - Advanced techniques for integrating scraping into AI agents
- [Advanced Prompt Engineering with ScrapeGraphAI and LangChain](/blog/sgai-langchain) - Master sophisticated prompting techniques for intelligent data extraction

### **Web Scraping Fundamentals**
- [Web Scraping 101](/blog/101-scraping) - Master the basics of web scraping and data extraction
- [Pre-AI to Post-AI Scraping](/blog/pre-ai-to-post-ai-scraping) - See how AI has transformed web scraping from traditional methods
- [Automated Data Scraping with AI](/blog/automation-web-scraping) - Learn about LLM-enhanced scraping automation
- [Browser Automation vs Graph Scraping](/blog/scrapegraph-vs-browseruse) - Compare different scraping approaches and methodologies

### **Tool Comparisons and Alternatives**
- [ScrapeGraphAI vs Tavily](/blog/sgai-vs-tavily) - Detailed comparison of AI-powered scraping vs search APIs
- [Tavily Alternatives](/blog/tavily-alternatives) - Explore alternatives to Tavily for different use cases
- [BrowserUse Alternatives](/blog/browseruse-alternatives) - Compare browser automation tools with modern scraping solutions
- [AI Web Scraping Tools](/blog/ai-tools) - Comprehensive guide to the best AI-powered scraping tools in 2025

### **Advanced Applications**
- [Real Estate Web Scraping](/blog/real-estate-scraping) - Learn how to extract real estate data for market analysis
- [Social Media Trends Analysis](/blog/social-media-trends) - Discover how to scrape and analyze social media data
- [Academic Research with Graph-Based Scraping](/blog/empowering-academic-research) - See how to enhance academic research with AI-powered data extraction
- [Growth Strategies with AI Scraping](/blog/growth-scraping) - Learn how AI scraping can drive business growth

### **No-Code and Workflow Automation**
- [No-Code Platform Integrations](/blog/no-code-platforms) - Learn how to integrate ScrapeGraphAI with platforms like n8n, Zapier, and Bubble
- [Langflow Integration](/blog/langflow) - Discover visual workflow automation with ScrapeGraphAI
- [Structured Output](/blog/structured-output) - Master data formatting and schema-based extraction
- [Data Innovation](/blog/data-innovation) - Explore innovative approaches to data collection and processing

### **Legal and Best Practices**
- [Web Scraping Legality](/blog/legality-of-web-scraping) - Understand the legal aspects of AI-powered web scraping
- [Mastering ScrapeGraphAI](/blog/mastering-scrapegraphai-endpoint) - Deep dive into ScrapeGraphAI's advanced features and capabilities

These resources will help you understand different approaches to AI agent development, web scraping, and how to leverage these technologies effectively for your projects.