TL;DR
ScrapeGraphAI is a web data API for developers who need reliable page content, structured extraction, search-based research, crawls, and scheduled monitoring without maintaining CSS selectors. In V2, the current service names are scrape, extract, search, crawl, and monitor.
scrapefetches a known URL and returns Markdown, HTML, links, images, summaries, screenshots, branding, or JSON.extractreturns structured JSON from a URL, HTML, or Markdown using a prompt and optional schema.searchstarts from a web query, fetches result pages, and can extract structured answers.crawltraverses multiple pages from a starting URL and returns requested formats per page.monitorruns scheduled page checks, records diffs, and can push changes to a webhook.
The short version: use ScrapeGraphAI when you want the web turned into usable data for an app, agent, workflow, or data pipeline.
The Problem ScrapeGraphAI Solves
Traditional scraping often starts with a brittle chain of work:
- Inspect the HTML.
- Write CSS selectors or XPath.
- Add browser rendering when the page is dynamic.
- Add retries, proxy rules, and timeout handling.
- Parse raw text into a useful object.
- Repair the scraper when the site layout changes.
That workflow still works for some stable pages. It breaks down when teams need many sources, frequent updates, structured outputs, or fast iteration. ScrapeGraphAI replaces the selector-heavy parts with API services that understand page content and return the output shape you asked for.
For a comparison with parser-first scraping, read Traditional vs AI Scraping. For a buying view, read Web Scraping API: How to Choose One in 2026.
How The V2 Services Fit Together
| Service | Input | Output | Use it when |
|---|---|---|---|
scrape |
URL | Page formats | You need Markdown, HTML, links, images, summary, screenshot, branding, or JSON from a known URL |
extract |
URL, HTML, or Markdown plus prompt | Structured JSON | You need typed business data such as products, prices, jobs, articles, or company profiles |
search |
Search query | Search results or structured JSON | You do not know the source URLs yet |
crawl |
Starting URL | Multi-page job results | You need to traverse a site or section |
monitor |
URL plus cron schedule | Recurring ticks and diffs | You need ongoing change detection |
The services are designed to be combined. A product intelligence pipeline might use search to find retailer pages, extract to normalize product data, and monitor to watch a final set of important pages. A documentation ingestion pipeline might use crawl with markdown formats and then store the pages in a vector database.
That separation keeps the API choice practical. A team importing help docs should not pay for structured extraction when it only needs Markdown. A team building a product feed should not store raw Markdown and parse it later if the downstream system expects fields. A team watching competitor pricing should not run a cron job blindly if a monitor can record changes and expose activity.
What ScrapeGraphAI Returns
ScrapeGraphAI returns API responses, not a hosted dashboard export that you have to download manually. That makes it useful inside backend jobs, notebooks, agents, ETL workflows, and internal tools.
The output depends on the service:
scrapereturns aresultsobject keyed by format, such asmarkdown,links,screenshot, orjson.extractreturns JSON data that matches the prompt and optional schema.searchreturns result pages, or JSON data when you add a prompt.crawlreturns an async job with page metadata and resolved page content.monitorreturns monitor configuration, tick history, diffs, and optional webhook payloads.
That response model is the main difference from scraping libraries that leave you with raw HTML. ScrapeGraphAI is built for the step after fetching: turning the page into a shape another system can use.
scrape: Turn A URL Into Page Formats
Use scrape when you know the URL and want one or more page formats. A single call can return Markdown, HTML, links, images, summary, screenshot, branding, and JSON.
from scrapegraph_py import ScrapeGraphAI, MarkdownFormatConfig, LinksFormatConfig
sgai = ScrapeGraphAI()
page = sgai.scrape(
"https://example.com",
formats=[
MarkdownFormatConfig(mode="reader"),
LinksFormatConfig(),
],
)
if page.status == "success":
markdown = page.data.results.get("markdown", {}).get("data", [])
links = page.data.results.get("links", {}).get("data", [])
print(markdown[0] if markdown else "")
print(len(links))
else:
print(page.error)This is the right service for RAG ingestion, page archiving, content normalization, screenshots, link discovery, and multi-format page capture.
extract: Turn A Page Into Structured JSON
Use extract when the output needs a stable shape. You give ScrapeGraphAI a prompt and, when needed, a JSON schema. The service returns structured JSON instead of raw page text.
from pydantic import BaseModel, Field
from scrapegraph_py import ScrapeGraphAI
class Product(BaseModel):
name: str = Field(description="Product name")
price: str | None = Field(default=None, description="Listed price")
availability: str | None = None
class Products(BaseModel):
products: list[Product] = Field(default_factory=list)
sgai = ScrapeGraphAI()
extraction = sgai.extract(
"Extract all products with name, price, and availability.",
url="https://example.com/products",
schema=Products.model_json_schema(),
)
if extraction.status == "success":
parsed = Products.model_validate(extraction.data.json_data)
print(parsed.products)
else:
print(extraction.error)extract is useful for ecommerce, real estate, jobs, healthcare directories, company profiles, lead enrichment, and any workflow where the next system expects fields instead of prose. See Web Scraping with Pydantic: Structured Data Guide for schema patterns.
search: Discover Sources Before Extracting
Use search when the workflow starts with a query. Without a prompt, it returns fetched search results. With a prompt and schema, it can summarize or extract structured data across the result pages.
from scrapegraph_py import ScrapeGraphAI
sgai = ScrapeGraphAI()
research = sgai.search(
"AI web scraping API pricing pages",
num_results=5,
prompt="Return company name, pricing URL, and cheapest paid plan.",
)
if research.status == "success":
print(research.data.json_data)
else:
print(research.error)search fits market research, competitor discovery, vendor lists, source finding, and trend checks. If you already have URLs, skip search and use scrape or extract.
crawl: Traverse A Site Or Section
Use crawl when a job starts from one URL and needs multiple linked pages. Crawls are async. Start the crawl, poll the job, then read the pages.
import time
from scrapegraph_py import ScrapeGraphAI, MarkdownFormatConfig
sgai = ScrapeGraphAI()
start = sgai.crawl.start(
"https://docs.example.com",
formats=[MarkdownFormatConfig(mode="reader")],
max_pages=25,
max_depth=2,
)
if start.status == "success":
crawl_id = start.data.id
while True:
time.sleep(2)
status = sgai.crawl.get(crawl_id)
if status.status != "success":
break
if status.data.status in ("completed", "failed"):
print(status.data.status, status.data.finished)
break
else:
print(start.error)crawl is the right fit for documentation importers, site audits, knowledge base refreshes, and multi-page extraction jobs. Use include and exclude patterns when the crawl should stay inside a specific section.
monitor: Watch A Page Over Time
Use monitor when freshness matters. A monitor fetches a page on a cron schedule, records ticks, tracks diffs, and can call a webhook.
from scrapegraph_py import ScrapeGraphAI, MarkdownFormatConfig
sgai = ScrapeGraphAI()
watch = sgai.monitor.create(
"https://example.com/pricing",
"0 */6 * * *",
name="Pricing watch",
formats=[MarkdownFormatConfig(mode="reader")],
)
if watch.status == "success":
print(watch.data.cron_id)
else:
print(watch.error)monitor is useful for pricing changes, stock availability, policy updates, job postings, brand mentions, and pages that should trigger a workflow only when something changes. For a practical monitoring use case, see Monitor Brand Mentions with ScrapeGraphAI.
Fetching, Rendering, And Stealth
The v2 services use fetchConfig for page fetch controls. Use it when a page needs JavaScript rendering, custom headers, cookies, scrolling, a wait time, a timeout, country routing, or stealth.
from scrapegraph_py import FetchConfig, MarkdownFormatConfig, ScrapeGraphAI
sgai = ScrapeGraphAI()
rendered_page = sgai.scrape(
"https://example.com",
formats=[MarkdownFormatConfig()],
fetch_config=FetchConfig(
mode="js",
wait=2000,
scrolls=2,
country="us",
),
)Start with the default fetch. Add rendering, waits, scrolling, cookies, or stealth only when a target needs them. That keeps latency and credits easier to predict.
Pricing In Plain English
Pricing is credit based. scrape sums the selected formats. extract costs 5 credits per call. search costs per result. crawl costs a small startup fee plus per-page scrape costs. monitor costs per tick and adds a change bonus when a diff is detected.
For exact current numbers, use ScrapeGraphAI pricing. For workload math, use the price calculator. The calculator is the fastest way to compare a simple Markdown pipeline against schema extraction, search-based research, crawls, or monitors.
Example Production Workflow
A common workflow is competitor product tracking. The team starts with search to find relevant product or category pages for a query. The result URLs are reviewed, filtered, and stored as source candidates. Then extract turns each approved page into a product object with fields such as name, price, currency, availability, and source URL. If the site has many category pages, crawl can collect the page set first, then each page can be processed with the right format or extraction schema.
Once the important pages are known, monitor watches them on a schedule. A no-change tick only records that the page was checked. A changed tick can trigger a webhook, send the new structured data into a queue, or notify a human reviewer. For auditability, the application stores the request ID, source URL, prompt, schema version, status, and timestamp.
The same pattern works outside ecommerce. Recruiting teams can watch job boards, analysts can track vendor pages, security teams can monitor advisories, and AI products can refresh external context. The services stay the same; only the prompt, schema, schedule, and source list change.
When ScrapeGraphAI Is A Good Fit
ScrapeGraphAI is a good fit when the job needs one or more of these:
- Natural language extraction instead of hand-written selectors.
- Structured JSON output for a product, company, listing, article, job, or lead.
- Clean Markdown or HTML from pages that need rendering or cleanup.
- Multi-page crawling with progress tracking.
- Scheduled monitoring with diffs and webhooks.
- A developer API that can plug into agents, ETL jobs, analytics tools, and internal apps.
It is less useful when the target is a private page without valid access, when the only requirement is a one-off manual copy, or when a team already owns a stable in-house scraper for a tiny fixed source.