ScrapeGraphAI + LangChain: Web Tools for Your Agents

Written byMarco Vinciguerra

TL;DR

ScrapeGraphAI slots into LangChain as plain tools, so an agent can scrape, extract, and search the live web while it reasons.

Install: pip install langchain langchain-openai scrapegraph-py.
Keys: set both SGAI_API_KEY and OPENAI_API_KEY.
Wrap: decorate SDK calls with @tool, or use the full toolkit covering scrape, extract, search, crawl, monitors, and account endpoints.
Run: hand the tools to create_agent and let the model call them mid-conversation.

The gap LangChain leaves open

LangChain is great at orchestration. It handles the prompt templates, the tool-calling loop, the agent scaffolding. What it doesn't do is read the live web for you. A model wired up with LangChain still can't see this morning's prices or the page a user just pasted in.

That's the piece ScrapeGraphAI fills. Wrap its SDK methods as LangChain tools and the agent gains a reliable way to turn any URL or query into structured data. You write the tool definitions once; the model figures out when to reach for them.

Setup

Install LangChain, a model provider, and the ScrapeGraphAI SDK together:

pip install langchain langchain-openai scrapegraph-py

Two keys are in play here, one for scraping and one for the model:

export SGAI_API_KEY="your-scrapegraph-key"
export OPENAI_API_KEY="your-openai-key"

Your ScrapeGraphAI key comes from the dashboard.

Defining a tool

The cleanest pattern is a small module, say sgai_tools.py, that holds your client and your tool functions. LangChain's @tool decorator turns a function into something the agent can call:

from langchain_core.tools import tool
from scrapegraph_py import ScrapeGraphAI
 
sgai = ScrapeGraphAI()  # reads SGAI_API_KEY from env
 
@tool
def scrape(url: str) -> str:
    """Fetch a web page and return its content as markdown."""
    res = sgai.scrape(url)
    if res.status == "error":
        return f"error: {res.error}"
    return res.data

Ready to scrape?

Start for free

The docstring matters more than it looks. LangChain feeds it to the model as the tool description, so it's how the agent decides whether this is the right tool for the step.

You can build out the same way for the rest of the surface: an extract(url, prompt) tool for structured pulls, a search(query, num_results) tool for web search, and so on. The full toolkit covers seventeen endpoints in all, including crawl job control (crawl_start, crawl_get, crawl_stop, crawl_resume, crawl_delete), scheduled monitors (monitor_create, monitor_list, monitor_pause, and friends), and account calls like credits and history_list.

Calling a tool directly

Before you hand anything to an agent, you can invoke a tool on its own. Useful for testing the wiring:

from sgai_tools import scrape
 
print(scrape.invoke({"url": "https://example.com"}))

Letting an agent drive

Once the tools exist, building the agent is short. Give the model the toolset and a system prompt that frames the job:

from langchain.agents import create_agent
from langchain_openai import ChatOpenAI
from sgai_tools import ALL_TOOLS
 
llm = ChatOpenAI(model="gpt-4o", temperature=0)
agent = create_agent(
    model=llm,
    tools=ALL_TOOLS,
    system_prompt="You are a web research agent...",
)

Ask it a question that needs live data and it will pick the right tool, call it, read the result, and answer. You never wrote fetch-and-parse logic, only the goal and the toolbox.

Structured output

When you need typed results out the other end, pair extract with a Pydantic model and validate the response:

from pydantic import BaseModel
from sgai_tools import extract
 
class Company(BaseModel):
    name: str
    tagline: str
 
result = extract.invoke({"url": "https://scrapegraphai.com", "prompt": "Get the name and tagline"})
company = Company(**result["json_data"])

A small helper that unwraps SDK responses into plain dicts keeps every tool returning the same shape, which makes this validation step uniform across the toolkit.

Wrapping up

LangChain gives you the agent; ScrapeGraphAI gives that agent eyes on the web. Wrap the SDK methods as tools, write honest docstrings so the model chooses well, and hand the set to create_agent. Start with scrape, extract, and search, then add crawl and monitor tools as your workflows grow into multi-step research.

ScrapeGraphAI + LangGraph: Stateful Scraping Agents - Take the same tools into a graph with custom routing and state.
ScrapeGraphAI Python SDK: Scrape, Extract, Crawl - The methods underneath every tool above.
ScrapeGraphAI + CrewAI: Build Data Collection Agents - The same idea, applied to multi-agent crews.
ScrapeGraphAI + Vercel AI SDK: Web Tools for Agents - The TypeScript equivalent of this pattern.

ScrapeGraphAI + LangChain: Web Tools for Your Agents

Written byMarco Vinciguerra

TL;DR

ScrapeGraphAI slots into LangChain as plain tools, so an agent can scrape, extract, and search the live web while it reasons.

Install: pip install langchain langchain-openai scrapegraph-py.
Keys: set both SGAI_API_KEY and OPENAI_API_KEY.
Wrap: decorate SDK calls with @tool, or use the full toolkit covering scrape, extract, search, crawl, monitors, and account endpoints.
Run: hand the tools to create_agent and let the model call them mid-conversation.

The gap LangChain leaves open

Setup

Install LangChain, a model provider, and the ScrapeGraphAI SDK together:

pip install langchain langchain-openai scrapegraph-py

Two keys are in play here, one for scraping and one for the model:

export SGAI_API_KEY="your-scrapegraph-key"
export OPENAI_API_KEY="your-openai-key"

Your ScrapeGraphAI key comes from the dashboard.

Defining a tool

The cleanest pattern is a small module, say sgai_tools.py, that holds your client and your tool functions. LangChain's @tool decorator turns a function into something the agent can call:

from langchain_core.tools import tool
from scrapegraph_py import ScrapeGraphAI
 
sgai = ScrapeGraphAI()  # reads SGAI_API_KEY from env
 
@tool
def scrape(url: str) -> str:
    """Fetch a web page and return its content as markdown."""
    res = sgai.scrape(url)
    if res.status == "error":
        return f"error: {res.error}"
    return res.data

Ready to scrape?

Start for free

The docstring matters more than it looks. LangChain feeds it to the model as the tool description, so it's how the agent decides whether this is the right tool for the step.

Calling a tool directly

Before you hand anything to an agent, you can invoke a tool on its own. Useful for testing the wiring:

from sgai_tools import scrape
 
print(scrape.invoke({"url": "https://example.com"}))

Letting an agent drive

Once the tools exist, building the agent is short. Give the model the toolset and a system prompt that frames the job:

from langchain.agents import create_agent
from langchain_openai import ChatOpenAI
from sgai_tools import ALL_TOOLS
 
llm = ChatOpenAI(model="gpt-4o", temperature=0)
agent = create_agent(
    model=llm,
    tools=ALL_TOOLS,
    system_prompt="You are a web research agent...",
)

Ask it a question that needs live data and it will pick the right tool, call it, read the result, and answer. You never wrote fetch-and-parse logic, only the goal and the toolbox.

Structured output

When you need typed results out the other end, pair extract with a Pydantic model and validate the response:

from pydantic import BaseModel
from sgai_tools import extract
 
class Company(BaseModel):
    name: str
    tagline: str
 
result = extract.invoke({"url": "https://scrapegraphai.com", "prompt": "Get the name and tagline"})
company = Company(**result["json_data"])

A small helper that unwraps SDK responses into plain dicts keeps every tool returning the same shape, which makes this validation step uniform across the toolkit.

Wrapping up

ScrapeGraphAI + LangGraph: Stateful Scraping Agents - Take the same tools into a graph with custom routing and state.
ScrapeGraphAI Python SDK: Scrape, Extract, Crawl - The methods underneath every tool above.
ScrapeGraphAI + CrewAI: Build Data Collection Agents - The same idea, applied to multi-agent crews.
ScrapeGraphAI + Vercel AI SDK: Web Tools for Agents - The TypeScript equivalent of this pattern.

ScrapeGraphAI + LangChain: Web Tools for Your Agents

TL;DR

The gap LangChain leaves open

Setup

Defining a tool

Ready to scrape?

Calling a tool directly

Letting an agent drive

Structured output

Wrapping up

Give your AI Agent superpowers with lightning-fast web data!

ScrapeGraphAI + LangChain: Web Tools for Your Agents

TL;DR

The gap LangChain leaves open

Setup

Defining a tool

Ready to scrape?

Calling a tool directly

Letting an agent drive

Structured output

Wrapping up

Give your AI Agent superpowers with lightning-fast web data!

ScrapeGraphAI + LangChain: Web Tools for Your Agents

TL;DR

The gap LangChain leaves open

Setup

Defining a tool

Ready to scrape?

Calling a tool directly

Letting an agent drive

Structured output

Wrapping up

Related Articles

Give your AI Agent superpowers with lightning-fast web data!

ScrapeGraphAI + LangChain: Web Tools for Your Agents

TL;DR

The gap LangChain leaves open

Setup

Defining a tool

Ready to scrape?

Calling a tool directly

Letting an agent drive

Structured output

Wrapping up

Related Articles

Give your AI Agent superpowers with lightning-fast web data!