Blog/Enhancing Your AI Agents with ScrapeGraphAI & LangChain: Advanced Prompt Examples

Enhancing Your AI Agents with ScrapeGraphAI & LangChain: Advanced Prompt Examples

Learn how to enhance your AI agents with ScrapeGraphAI & LangChain. This guide walks through setting up LangChain agents that utilize ScrapeGraphAI to perform real-time web scraping tasks, including dynamic data extraction, intelligent structuring via LLMs, and integrated reasoning.

Tutorials5 min read min readMohammad Ehsan AnsariBy Mohammad Ehsan Ansari
Enhancing Your AI Agents with ScrapeGraphAI & LangChain: Advanced Prompt Examples

Enhancing Your AI Agents with ScrapeGraphAI & LangChain: Advanced Prompt Examples

In today’s rapidly changing world, relying on static data is not enough. AI agents must be able to fetch live, relevant, and structured data from the web in real time. ScrapeGraphAI and LangChain together form a powerful combination to bring this capability to life. With LangChain's orchestration and ScrapeGraphAI's prompt-based intelligent scraping, developers can now build agents that think, act, and fetch.

This guide walks through setting up LangChain agents that utilize ScrapeGraphAI to perform real-time web scraping tasks, including dynamic data extraction, intelligent structuring via LLMs, and integrated reasoning. We’ll also explore prompt engineering, schema definition, real examples, configuration tips, and measurable accuracy improvements.


Why Combine ScrapeGraphAI and LangChain?

LangChain provides a framework for building AI-powered chains and agents with memory, tools, and external data integrations. ScrapeGraphAI brings language model-powered web scraping capabilities that remove the need for brittle XPath, CSS selectors, or custom scripts.

Combining them results in:

  • Agents that retrieve fresh data on-the-fly
  • Tools for summarizing or transforming real-time scraped data
  • More accurate responses that reflect the current state of the internet

Common Use Cases

  • AI assistants retrieving latest stock prices
  • Market analysis bots comparing product prices
  • Academic research agents gathering fresh government stats
  • Customer service bots checking real-time inventory or news
  • Live fact-checking tools for journalists

Step 1: Install the SDKs

Use pip to install both libraries:

bash
pip install scrapegraphai
pip install langchain

Step 2: Create a ScrapeGraphAI Wrapper as a Tool

We will now build a LangChain tool that wraps ScrapeGraphAI’s SmartScraperGraph to scrape a webpage using prompts and return structured output.

python
from langchain.tools import Tool
from scrapegraphai.graphs import SmartScraperGraph
from scrapegraphai.utils import convert_to_json_schema

def scrape_with_prompt(data):
    url = data["url"]
    prompt = data["prompt"]

    schema = {
        "headline": "string",
        "summary": "string"
    }

    graph = SmartScraperGraph(
        prompt=prompt,
        source=url,
        schema=convert_to_json_schema(schema),
        config={
            "llm": {
                "provider": "openai",
                "model": "gpt-4",
                "api_key": "your-api-key"
            }
        }
    )
    return graph.run()["result"]

scraper_tool = Tool(
    name="LiveScraper",
    func=scrape_with_prompt,
    description="Scrapes web content using prompt + schema with ScrapeGraphAI"
)

Step 3: Initialize Agent with Scraper Tool

python
from langchain.chat_models import ChatOpenAI
from langchain.agents import initialize_agent, AgentType

llm = ChatOpenAI(model="gpt-4", temperature=0)

agent = initialize_agent(
    tools=[scraper_tool],
    llm=llm,
    agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION,
    verbose=True
)

Step 4: Run Real-Time Scraping with Agent

python
result = agent.run({
    "url": "https://example.com/latest-news",
    "prompt": "Extract the latest headline and a short summary"
})
print(result)

Advanced Example: Scraping Product Pricing

python
schema = {
    "product_title": "string",
    "price": "string"
}

Ready to Scale Your Data Collection?

Join thousands of businesses using ScrapeGrapAI to automate their web scraping needs. Start your journey today with our powerful API.

prompt = "Extract product title and price from this product detail page"

graph = SmartScraperGraph( prompt=prompt, source="https://example.com/product-page", schema=convert_to_json_schema(schema), config={ "llm": { "provider": "openai", "api_key": "your-api-key", "model": "gpt-4" } } )

result = graph.run() print(result["result"])

text

---

## YAML-Based Configuration for Reusability

```yaml
llm:
  provider: openai
  api_key: YOUR_KEY
  model: gpt-4

schema:
  product_title: string
  price: string

prompt: Extract product title and price
source: https://example.com/product

Accuracy Before & After Using ScrapeGraphAI

QuestionWithout Real-TimeWith ScrapeGraphAI
What’s the latest iPhone price?Incorrect guessLive scraped data
What’s trending on government portal?No resultsFetched and summarized
What’s the current weather alert?Generic outputLive structured info

Best Practices

  • Log all scraped URLs, timestamps, and schemas used
  • Use caching to avoid redundant calls to the same site
  • Create prompt-schema pairs for reusable scraping agents
  • Respect robots.txt and site rate limits
  • Handle failures gracefully with fallback messages
  • Validate outputs before using them in final answers

Frequently Asked Questions

Can I use it with other LLM providers?

Yes, ScrapeGraphAI supports OpenAI, Groq, Mistral, and others via its flexible configuration.

What happens when the HTML structure changes?

ScrapeGraphAI uses LLMs to adapt to layout shifts and interpret content semantically, unlike brittle CSS selectors.

Can I extract complex tables?

Yes. Define a schema to match table rows and columns, and use prompts that explain what the table represents.

Can I scrape behind authentication?

ScrapeGraphAI is primarily designed for open-access content. Advanced setups can enable browser sessions if needed.


Conclusion

Integrating ScrapeGraphAI with LangChain empowers your AI agents to access, scrape, and structure real-time web data with precision. Whether building research bots, news aggregators, or product monitoring tools, this integration fills the last-mile gap between language understanding and live data retrieval. Let your AI agents not just think—but think with context and data that’s current, accurate, and aligned with the real world.


Want to learn more about social media scraping and data extraction? Explore these guides:

These resources will help you understand different approaches to social media data extraction and make the most of your scraping efforts.