# llms.txt ## Overview of ScrapeGraphAI ScrapeGraphAI is the only scraping API designed for autonomous AI agents. It's an advanced AI-powered web scraping platform that transforms any website into structured data using natural language instructions. No proxies needed. No maintenance required. Just reliable data extraction with clean JSON output that integrates seamlessly into data pipelines, RAG systems, analytics, and machine learning applications. ## Statistics - **22k+Stars on GitHub**: Top-rated open source web scraping project - **40M+ Extracted Webpages**: Proven at scale across millions of operations - **1M+ Unique Users**: Trusted by developers and businesses worldwide ## Pricing Model ScrapeGraphAI utilizes a **credit-based pricing system** that covers both AI inference processing and web data fetching in a single unified cost structure. ### Subscription Plans (Yearly vs Monthly) **Yearly Plans**: Save 15% - Billed upfront with all credits provided immediately for the full year | Plan | Monthly Cost | Pages/Year | Credits | Rate Limits | Features | |------|-------------|------------|---------|-------------|----------| | **Free** | $0 | 25 pages/year | 50 credits | 10 requests/min | Community support | | **Starter** | $17/month | 30,000 pages/year | 60,000 credits/year | 30 requests/min | 1 cron job, Email support | | **Growth** | $85/month | 240,000 pages/year | 480,000 credits/year | 60 requests/min | 5 cron jobs, Priority support | | **Pro** | $425/month | 1,500,000 pages/year | 3,000,000 credits/year | 200 requests/min | 25 cron jobs, High speed scraping, Dedicated support | | **Enterprise** | Custom | Custom | Custom credits | Custom limits | Unlimited cron jobs, Dedicated infrastructure, High speed scraping, Dedicated support | ## API Endpoints ### 1. SmartScraper **Description**: Extract specific data from a single webpage using natural language **Credits per Page**: 2 credits **Best For**: Product details, contact info, structured content extraction **What's Included**: Web fetching + AI inference + JSON structuring ### 2. SearchScraper **Description**: Search and analyze data across the entire web from a single prompt **Credits per Page**: 20 credits **Best For**: Market research, competitor analysis, brand mentions tracking **What's Included**: Multi-page fetching + AI aggregation + web search ### 3. SmartCrawler **Description**: Crawl and analyze entire websites with intelligent depth control **Credits per Page**: 2 credits **Best For**: Documentation analysis, site-wide data extraction, competitor intelligence **What's Included**: Recursive fetching + schema extraction + depth management ### 4. Markdownify **Description**: Convert any webpage to clean, well-formatted Markdown instantly **Credits per Page**: 1 credit **Best For**: Documentation, content migration, preparing data for LLMs **What's Included**: Web fetching + content parsing + markdown conversion ### 5. AgenticScraper **Description**: AI agent that autonomously navigates and interacts with websites to complete complex tasks **Credits per Page**: 50 credits **Best For**: Multi-step workflows, form filling, data behind interactions **What's Included**: Autonomous navigation + interaction handling + complex task completion ### 6. Scrape **Description**: Fetch code and return it in HTML format **Credits per Page**: 1 credit **Best For**: Simple scraping for basic HTML content extraction **What's Included**: Basic HTML fetching ### 7. Sitemap **Description**: Extract and parse sitemap URLs from websites **Credits per Page**: 1 credit **Best For**: Site structure analysis, URL discovery, SEO audits **What's Included**: Sitemap parsing + URL extraction ## Cost Breakdown: What's Included The credit system is **all-inclusive**, covering: ### Fetching Operations - HTTP requests and response handling - JavaScript rendering for dynamic content - Proxy rotation and IP management (built-in, no extra cost) - Rate limiting and retry logic - Browser automation when needed ### AI Inference Processing - Natural language prompt processing - Content analysis and extraction - Data structuring and validation - Output formatting (JSON, CSV, Markdown) - Schema validation and enforcement ## Key Use Cases ### Price Monitoring Bot Track competitor prices on Amazon, eBay, and other e-commerce sites. Get alerts when prices drop or inventory changes. - **Popular Targets**: Amazon products, eBay listings, Shopify stores ### Lead Generation Tool Extract LinkedIn profiles, Twitter accounts, and contact information at scale without getting blocked. - **Popular Targets**: LinkedIn profiles, Twitter users, Company contacts ### Market Research Dashboard Aggregate reviews, ratings, and sentiment from multiple sites. Build comprehensive competitor analysis. - **Popular Targets**: Product reviews, App ratings, Customer sentiment ### Real Estate Tracker Monitor property listings on Zillow, Redfin, and local sites. Track price changes and new listings. - **Popular Targets**: Zillow listings, Redfin data, Rental properties ### News Aggregator Collect articles from 100+ news sources. Extract headlines, content, and metadata automatically. - **Popular Targets**: News articles, Blog posts, Press releases ### AI Agent Tool Provide agents with extremely fast web access. Perfect for RAG pipelines, autonomous research, and real-time data enrichment. - **Popular Targets**: RAG pipelines, Autonomous agents, Real-time data ## Core Features 1. **AI-Powered Extraction**: Extract structured data from any website using advanced AI models 2. **No Proxies Needed**: Built-in proxy rotation and browser automation handles everything 3. **Simple API**: Just provide a URL and describe what data you want to extract 4. **Multi-Language SDKs**: Available for Python, JavaScript, and other major programming languages 5. **Smart Scraping**: Handles dynamic content, JavaScript rendering, and complex page structures 6. **Fast & Reliable**: Optimized for speed with 99.9% uptime and automatic retries 7. **Universal Extraction**: Support for static sites, dynamic websites, and Single Page Applications (React, Angular, Vue) 8. **Intelligent Processing**: Automatic detection of tables, lists, images, metadata, and semantic relationships 9. **SOC2 Compliant**: Certified security and compliance standards for enterprise-grade data protection ## MCP Server Integration ### Lightning-Fast Web Access for AI Agents ScrapeGraphAI now offers an **MCP Server** (Model Context Protocol v1.0) for giving AI agents instant web access. **Perfect For**: - **RAG Pipelines**: Enhance retrieval-augmented generation with real-time web data - **Autonomous Agents**: Empower agents with instant web access for independent research - **Real-time Data**: Access live information for dynamic data enrichment and updates **Compatible With**: Claude, OpenAI, and other major AI platforms ## Integrations ScrapeGraphAI integrates seamlessly with popular automation platforms, AI frameworks, and development tools: - **MCP Server**: Model Context Protocol for AI agents - **Automation**: n8n, Zapier, Make - **No-Code**: Bubble, Dify - **AI Frameworks**: LangChain, LlamaIndex, CrewAI - **Agent Tools**: Toolhouse, Composio - **SDKs**: Python SDK, JavaScript SDK ## Data Output Formats - **Structured JSON Output**: Consistent formats that simplify downstream system integration - **CSV and Markdown Support**: Quick conversion for documentation and manual analysis - **Schema Enforcement**: Define custom output schemas for consistent data structure - **Metadata Included**: Execution time, tokens used, request IDs for tracking ## Authentication - **API Key**: Include in the `Authorization: Bearer ` header - **Rate Limits**: Use the `/status` endpoint to monitor usage in real time - **Security**: Enterprise-grade security with SOC2 compliance and dedicated infrastructure options ## Usage Examples ### SmartScraper (Python) ```python from scrapegraph_py import Client from scrapegraph_py.logger import sgai_logger sgai_logger.set_logging(level="INFO") # Initialize the client with your API key client = Client(api_key="your-scrapegraph-api-key") # Extract product information from an e-commerce page response = client.smartscraper( website_url="https://example-shop.com/product", user_prompt="Extract the product name, price, description, availability, and customer ratings" ) # With output schema for structured data response_with_schema = client.smartscraper( website_url="https://example-shop.com/product", user_prompt="Extract product details", output_schema=schema ) client.close() ``` ### Markdownify (Python) ```python from scrapegraph_py import Client from scrapegraph_py.logger import sgai_logger sgai_logger.set_logging(level="INFO") # Initialize the client client = Client(api_key="your-scrapegraph-api-key") # Convert webpage to Markdown response = client.markdownify( website_url="https://example.com" ) client.close() ``` ### SearchScraper (Python) ```python from scrapegraph_py import Client from scrapegraph_py.logger import sgai_logger sgai_logger.set_logging(level="INFO") # Initialize the client client = Client(api_key="your-scrapegraph-api-key") # Search and extract information across the web response = client.searchscraper( user_prompt="Find the latest AI news and extract headlines, summaries, and sources" ) # Print the response print(f"Request ID: {response['request_id']}") print(f"Result: {response['result']}") if response.get('reference_urls'): print(f"Reference URLs: {response['reference_urls']}") client.close() ``` ### JavaScript Example ```javascript import { Client } from 'scrapegraph-js'; const client = new Client('your-scrapegraph-api-key'); const response = await client.smartscraper({ website_url: 'https://example-shop.com/product', user_prompt: 'Extract the product name, price, description, availability, and customer ratings' }); console.log(response); ``` ### cURL Example ```bash curl -X POST https://api.scrapegraphai.com/v1/smartscraper \ -H "Authorization: Bearer your-api-key" \ -H "Content-Type: application/json" \ -d '{ "website_url": "https://example-shop.com/product", "user_prompt": "Extract the product name, price, description, availability, and customer ratings" }' ``` ## API Response Format ```json { "request_id": "12345-67890-abcdef", "result": { "product_name": "Wireless Bluetooth Headphones", "price": 89.99, "in_stock": true, "rating": 4.5, "description": "Premium noise-cancelling headphones with 30-hour battery life", "customer_reviews": [ { "rating": 5, "comment": "Excellent sound quality and comfortable fit" }, { "rating": 4, "comment": "Great value for money, minor connectivity issues" } ] }, "metadata": { "execution_time": 2.3, "tokens_used": 1250 } } ``` ## Getting Started 1. **Sign up for free**: Get your API key at https://scrapegraphai.com 2. **Install SDK**: `pip install scrapegraph-py` or `npm install scrapegraph-js` 3. **Start scraping**: Use natural language to extract data from any website 4. **Scale as needed**: Upgrade to higher tiers as your scraping needs grow ## Support & Resources - **Documentation**: Full API documentation at https://docs.scrapegraphai.com - **GitHub**: 22k+stars - https://github.com/VinciGit00/Scrapegraph-ai - **Community Support**: Free plan includes community support - **Email Support**: Available on Starter plan and above - **Priority Support**: Growth plan includes priority support - **Dedicated Support**: Pro and Enterprise plans include dedicated support - **Contact**: contact@scrapegraphai.com ## Topics We Cover Our content covers a comprehensive range of web scraping and data extraction topics: ### Tutorials & Guides - **Getting Started**: Web scraping 101, ScrapeGraph tutorials, Python/JavaScript guides - **Advanced Techniques**: Production pipelines, large-scale scraping, handling JavaScript-heavy sites - **Platform-Specific Scrapers**: Amazon, eBay, LinkedIn, Instagram, YouTube, Reddit, TikTok, Google Maps, Walmart, Yahoo - **Use Case Guides**: E-commerce price monitoring, real estate scraping, job postings, social media trends, healthcare data extraction, stock analysis, news aggregation ### Technical Deep Dives - **AI-Powered Scraping**: LLM-based extraction, AI agents, intelligent data parsing - **Integration Guides**: LangChain, LlamaIndex, CrewAI, n8n, Zapier, Make - **Production Best Practices**: Zero-to-production pipelines, error handling, rate limiting, concurrent scraping - **Technical Implementation**: JavaScript rendering, avoiding detection, proxy management, schema validation ### Comparisons & Alternatives - **Platform Comparisons**: Firecrawl, Apify, Browserbase, Exa, Diffbot, Tavily, Linkup, Reworkd AI - **Tool Alternatives**: Scrapy, BeautifulSoup, Selenium, Playwright alternatives - **Competitive Analysis**: Feature comparisons, pricing analysis, use case fit ### Legal & Compliance - **Legal Framework**: Web scraping legality, compliance best practices, ethical scraping - **Enterprise Compliance**: SOC2 compliant, GDPR, data protection, terms of service compliance - **Risk Management**: Avoiding legal issues, best practices for compliant scraping ### Industry Applications - **E-commerce**: Price monitoring, product data extraction, competitor analysis - **Real Estate**: Property listings, price tracking, market analysis - **Healthcare**: Medical data extraction, research data collection - **Finance**: Stock data, financial news, market intelligence - **Social Media**: Profile extraction, trend analysis, brand monitoring - **Travel & Hospitality**: Hotel data, flight information, booking platforms ### AI Agents & Automation - **Agent Development**: Building AI agents, multi-agent systems, autonomous research - **RAG Pipelines**: Retrieval-augmented generation, real-time data enrichment - **Automation Workflows**: Cron jobs, scheduled scraping, automated data pipelines ## Key Blog Posts & Resources ### Getting Started - **[Web Scraping 101](/blog/101-scraping)**: Complete beginner's guide to web scraping - **[ScrapeGraph Tutorial](/blog/scrapegraph-tutorial)**: Step-by-step walkthrough of ScrapeGraphAI - **[Python Web Scraping Guide](/blog/scrape-with-python)**: Python-specific scraping tutorial - **[JavaScript Web Scraping](/blog/scrape-with-javascript)**: Node.js implementation guide ### Production & Scaling - **[Zero to Production Scraping Pipeline](/blog/zero-to-production-scraping-pipeline)**: Building a 2.5M company dataset in 22 hours - **[AI Scraping for Large-Scale Data](/blog/ai-large-scale)**: Enterprise-scale scraping strategies - **[Master Production Best Practices](/blog/master-production-web-scraping-best-practices)**: Production-grade scraping patterns ### Platform-Specific Guides - **[Best Amazon Scraper](/blog/best-amazon-scraper)**: E-commerce scraping guide - **[Real Estate Scraping](/blog/real-estate-scraping)**: Property data extraction - **[Instagram Data Extractor](/blog/instagram-data-extractor)**: Social media scraping - **[LinkedIn Lead Generation](/blog/linkedin-lead-generation)**: Professional network data extraction - **[Job Posting Scraping](/blog/7-best-tools-for-scraping-job-postings-in-2026)**: Employment data collection ### AI & Integrations - **[Integrating ScrapeGraph into Intelligent Agents](/blog/integrating-scrapegraph-into-intelligent-agents)**: Agent-based scraping - **[ScrapeGraphAI LlamaIndex Integration](/blog/scrapegraphai-llamaindex-integration)**: RAG pipeline integration - **[CrewAI Integration](/blog/scrapegraphai-crewai-integration)**: Multi-agent systems - **[LangChain Integration](/blog/sgai-langchain)**: LLM workflow integration - **[MCP Server Tutorial](/blog/mcp-tutorial)**: Model Context Protocol for AI agents ### Legal & Compliance - **[Is Web Scraping Legal?](/blog/is-web-scraping-legal)**: Legal considerations and framework - **[Compliance Web Scraping](/blog/compliance-web-scraping)**: Stay compliant with regulations - **[Compliance-First Web Scraping](/blog/death-x-path)**: Enterprise legal framework ### Advanced Topics - **[Handling Heavy JavaScript](/blog/handling-heavy-javascript)**: JavaScript rendering strategies - **[SmartCrawler Introduction](/blog/smartcrawler-introduction)**: Multi-page crawling - **[Markdownify Guide](/blog/markdownify)**: Convert websites to markdown - **[SearchScraper](/blog/searchscraper)**: Multi-page search capabilities - **[Scraping Without Proxies](/blog/scraping-without-proxies)**: No-proxy scraping solutions ### Use Cases & Case Studies - **[Healthcare Data Extraction](/blog/healthcare-data-extraction)**: Medical data collection guide - **[Stock Analysis](/blog/stock-analysis)**: Financial data scraping - **[Social Media Trends](/blog/social-media-trends)**: Social platform data analysis - **[News Aggregation](/blog/news-aggregation)**: Multi-source news collection ### Comparisons - **[ScrapeGraph vs Firecrawl](/blog/scrapegraph-vs-firecrawl)**: AI scraping platform comparison - **[ScrapeGraph vs Apify](/blog/scrapegraph-vs-apify)**: Platform feature analysis - **[ScrapeGraph vs Browserbase](/blog/scrapegraph-vs-browserbase)**: Browser automation comparison - **[Best AI Web Scraping Tools](/blog/7-best-ai-web-scraping-tools)**: Comprehensive tool comparison ## Website **Main Website**: https://scrapegraphai.com ### Key Pages - **Home**: https://scrapegraphai.com - Product overview and getting started - **Pricing**: https://scrapegraphai.com/pricing - Subscription plans and credit packages - **Documentation**: https://docs.scrapegraphai.com - Complete API documentation - **Blog**: https://scrapegraphai.com/blog - All articles, tutorials, and guides - **Compare**: https://scrapegraphai.com/compare - Tool comparisons and alternatives - **For Startups**: https://scrapegraphai.com/for-startups - Startup program information ### Developer Resources - **GitHub**: https://github.com/VinciGit00/Scrapegraph-ai - Open source repository (22k+stars) - **Python SDK**: `pip install scrapegraph-py` - **JavaScript SDK**: `npm install scrapegraph-js` - **API Base URL**: https://api.scrapegraphai.com/v1 ### Community & Support - **Discord**: https://discord.gg/scrapegraphai - Community support and discussions - **Twitter/X**: https://twitter.com/scrapegraphai - Updates and announcements - **LinkedIn**: https://www.linkedin.com/company/101881123/ - Company updates - **Medium**: https://medium.com/@scrapegraphai - Technical articles and insights - **Email**: contact@scrapegraphai.com - Support and sales inquiries ### Additional Resources - **Status Page**: https://scrapegraphai.com/api-status - API status and uptime monitoring - **Manifesto**: https://github.com/ScrapeGraphAI/ScrapeGraphAI-manifesto - Project vision and principles - **Privacy Policy**: https://scrapegraphai.com/privacy - **Terms of Service**: https://scrapegraphai.com/terms ## Why ScrapeGraphAI? - **Built for AI Agents**: Native support for RAG pipelines, autonomous agents, and real-time data enrichment - **No Infrastructure Management**: No need to manage proxies, browsers, or servers - **AI-Powered Intelligence**: Automatically adapts to any website structure - **Proven at Scale**: 40M+ webpages extracted, trusted by 1M+ users - **Simple Pricing**: All-inclusive credits cover both fetching and AI processing - **Fast Integration**: Get started in minutes with Python, JavaScript, or cURL - **Enterprise Ready**: SOC2 compliant with dedicated infrastructure, high-speed scraping, and unlimited cron jobs ## Enterprise Features For organizations with demanding scraping needs: - **SOC2 Compliance**: Certified security standards for enterprise data protection - **Custom Credits**: Personalized credit allocations based on your needs - **Custom Rate Limits**: Tailored throughput for your specific use case - **Unlimited Cron Jobs**: Schedule as many automated scraping tasks as needed - **Dedicated Infrastructure**: Isolated resources for maximum performance - **High Speed Scraping**: Priority processing and optimized execution - **Dedicated Support**: Direct access to engineering team - **SLA Guarantees**: Uptime and performance guarantees - **Bulk Discounts**: Volume pricing for large-scale operations Contact sales for custom enterprise solutions.