Web Scraping with Pydantic and ScraperGraphAI

·2 min read min read·Tutorials
Share:
Web Scraping with Pydantic and ScraperGraphAI

Web Scraping with Pydantic and ScraperGraphAI

I'm excited to share a simple way to scrape web data using Pydantic schemas. This approach makes your code cleaner and your data more reliable.

Why It Matters

Using Pydantic helps you:

  • Keep Data Consistent: Data is automatically checked.
  • Catch Errors Early: Problems are found quickly.
  • Easily Update Your Code: Clear schemas make changes simple.

How It Works

We combine Pydantic with ScraperGraphAI to define exactly what data we need. Here's an example:

python
from pydantic import BaseModel, Field
from scrapegraph_py import Client

# Define the schema
class WebpageSchema(BaseModel):
    title: str = Field(description="The title of the webpage")
    description: str = Field(description="The description of the webpage")
    summary: str = Field(description="A brief summary of the webpage")

# Initialize the client
sgai_client = Client(api_key="your-api-key-here")

# Make a scraping request with the schema
response = sgai_client.smartscraper(
    website_url="https://example.com",
    user_prompt="Extract webpage information",
    output_schema=WebpageSchema,
)

print(f"Request ID: {response['request_id']}")
print(f"Result: {response['result']}")

sgai_client.close()

Example Response

Here's what the extracted data might look like:

json
{
  "title": "Example Domain",
  "description": "This domain is for use in illustrative examples in documents.",
  "summary": "A placeholder website used for documentation and testing purposes."
}

Benefits

  • Automatic Data Checking: Your data is validated automatically.
  • Developer Friendly: Simplifies data parsing and error handling.
  • Easy Integration: Works seamlessly with your projects.

Getting Started

  1. Define Your Schema: Create a Pydantic model for your data.
  2. Set Up the Client: Initialize ScraperGraphAI with your API key.
  3. Scrape Data: Use the smartscraper endpoint to get validated data.

Breaking Down the Code

  1. Schema Definition
    We create a Pydantic model that defines the structure of the data we want to extract.

  2. Client Setup
    Initialize the ScraperGraphAI client with your API key.

  3. Making the Request
    Use the smartscraper method with your schema to extract structured data.

  4. Processing Results
    The response includes validated data matching your schema.

Conclusion

Using Pydantic with ScraperGraphAI simplifies web scraping and improves data quality. Give it a try to enhance your data extraction process.

Happy scraping!

Did you find this article helpful?

Share it with your network!

Share:

Transform Your Data Collection

Experience the power of AI-driven web scraping with ScrapeGrapAI API. Start collecting structured data in minutes, not days.