Web Scraping with Pydantic and ScraperGraphAI

Web Scraping with Pydantic and ScraperGraphAI
I'm excited to share a simple way to scrape web data using Pydantic schemas. This approach makes your code cleaner and your data more reliable.
Why It Matters
Using Pydantic helps you:
- Keep Data Consistent: Data is automatically checked.
- Catch Errors Early: Problems are found quickly.
- Easily Update Your Code: Clear schemas make changes simple.
How It Works
We combine Pydantic with ScraperGraphAI to define exactly what data we need. Here's an example:
pythonfrom pydantic import BaseModel, Field from scrapegraph_py import Client # Define the schema class WebpageSchema(BaseModel): title: str = Field(description="The title of the webpage") description: str = Field(description="The description of the webpage") summary: str = Field(description="A brief summary of the webpage") # Initialize the client sgai_client = Client(api_key="your-api-key-here") # Make a scraping request with the schema response = sgai_client.smartscraper( website_url="https://example.com", user_prompt="Extract webpage information", output_schema=WebpageSchema, ) print(f"Request ID: {response['request_id']}") print(f"Result: {response['result']}") sgai_client.close()
Example Response
Here's what the extracted data might look like:
json{ "title": "Example Domain", "description": "This domain is for use in illustrative examples in documents.", "summary": "A placeholder website used for documentation and testing purposes." }
Benefits
- Automatic Data Checking: Your data is validated automatically.
- Developer Friendly: Simplifies data parsing and error handling.
- Easy Integration: Works seamlessly with your projects.
Getting Started
- Define Your Schema: Create a Pydantic model for your data.
- Set Up the Client: Initialize ScraperGraphAI with your API key.
- Scrape Data: Use the smartscraper endpoint to get validated data.
Breaking Down the Code
-
Schema Definition
We create a Pydantic model that defines the structure of the data we want to extract. -
Client Setup
Initialize the ScraperGraphAI client with your API key. -
Making the Request
Use the smartscraper method with your schema to extract structured data. -
Processing Results
The response includes validated data matching your schema.
Conclusion
Using Pydantic with ScraperGraphAI simplifies web scraping and improves data quality. Give it a try to enhance your data extraction process.
Happy scraping!
Did you find this article helpful?
Share it with your network!