ScrapeGraphAIScrapeGraphAI

Building a Full-Stack AI Web App with Cursor, OpenAI o1, Vercel v0, ScrapeGraphAI,

Building a Full-Stack AI Web App with Cursor, OpenAI o1, Vercel v0, ScrapeGraphAI,

Marco Vinciguerra

Marco Vinciguerra

In this tutorial, we'll demonstrate how to rapidly develop a full-stack AI web application by integrating several powerful tools: Cursor, OpenAI's o1 model, Vercel's v0, ScrapeGraphAI, and Patched. This combination allows for efficient prototyping and deployment of AI-driven applications.

Tools Overview

  • Cursor: An AI-enhanced code editor that assists in writing and understanding code.
  • OpenAI o1 Model: A powerful language model capable of understanding and generating human-like text.
  • Vercel v0: A platform for deploying web applications with ease.
  • ScrapeGraphAI: An AI-powered web scraping tool that simplifies data extraction from websites.
  • Patched: A tool for managing and deploying AI agents in production environments.

Step 1: Set Up Your Development Environment

Begin by installing the necessary tools and setting up your development environment.

# Install ScrapeGraphAI
pip install scrapegraphai
 
# Install Playwright for browser automation
pip install playwright
playwright install

Step 2: Create a ScrapeGraphAI Pipeline

Use ScrapeGraphAI to extract data from a target website. Here's an example of how to set up a simple scraping pipeline:

from scrapegraph_py import Client
from scrapegraph_py.logger import sgai_logger
 
sgai_logger.set_logging(level="INFO")
 
# Initialize the client
sgai_client = Client(api_key="your-api-key-here")
 
# SmartScraper request
response = sgai_client.smartscraper(
    website_url="https://example.com",
    user_prompt="Extract webpage information"
)
 
# Print the response
print(f"Request ID: {response['request_id']}")
print(f"Result: {response['result']}")
if response.get('reference_urls'):
    print(f"Reference URLs: {response['reference_urls']}")

Step 3: Design Your Frontend with Vercel v0

Vercel v0 allows you to quickly generate UI components and layouts. Use natural language to describe your desired interface:

# Use Vercel v0 to generate components
npx v0@latest add "Create a dashboard with data visualization cards"

This will generate React components that you can customize for your application.

Step 4: Integrate OpenAI o1 Model

Create an AI service that processes the scraped data:

import openai
from typing import Dict, Any
 
class AIAnalyzer:
    def __init__(self, api_key: str):
        self.client = openai.OpenAI(api_key=api_key)
    
    def analyze_scraped_data(self, data: Dict[str, Any]) -> str:
        """Analyze scraped data using OpenAI o1 model"""
        
        prompt = f"""
        Analyze the following scraped data and provide insights:
        
        Data: {data}
        
        Please provide:
        1. Key findings
        2. Patterns or trends
        3. Actionable recommendations
        """
        
        response = self.client.chat.completions.create(
            model="o1-preview",
            messages=[
                {"role": "user", "content": prompt}
            ]
        )
        
        return response.choices[0].message.content
 
# Usage
analyzer = AIAnalyzer(api_key="your-openai-key")
insights = analyzer.analyze_scraped_data(scraped_data)

Step 5: Build the Backend API

Create a FastAPI backend that orchestrates all components:

from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
from typing import Optional
import asyncio
 
app = FastAPI(title="AI Web Scraping App")
 
class ScrapeRequest(BaseModel):
    url: str
    prompt: str
    analyze: Optional[bool] = True
 
class ScrapeResponse(BaseModel):
    data: dict
    analysis: Optional[str] = None
    request_id: str
 
@app.post("/scrape", response_model=ScrapeResponse)
async def scrape_and_analyze(request: ScrapeRequest):
    """Scrape data and optionally analyze it"""
    
    try:
        # Step 1: Scrape data
        scrape_response = sgai_client.smartscraper(
            website_url=request.url,
            user_prompt=request.prompt
        )
        
        analysis = None
        if request.analyze:
            # Step 2: Analyze data with AI
            analyzer = AIAnalyzer(api_key="your-openai-key")
            analysis = analyzer.analyze_scraped_data(scrape_response['result'])
        
        return ScrapeResponse(
            data=scrape_response['result'],
            analysis=analysis,
            request_id=scrape_response['request_id']
        )
        
    except Exception as e:
        raise HTTPException(status_code=500, detail=str(e))
 
@app.get("/health")
async def health_check():
    return {"status": "healthy"}

Step 6: Frontend Implementation with React and Next.js

Create a React frontend that interacts with your API:

// components/ScrapingDashboard.tsx
import React, { useState } from 'react';
import { Card, CardContent, CardHeader, CardTitle } from '@/components/ui/card';
import { Button } from '@/components/ui/button';
import { Input } from '@/components/ui/input';
import { Textarea } from '@/components/ui/textarea';
 
interface ScrapeData {
  data: any;
  analysis?: string;
  request_id: string;
}
 
export default function ScrapingDashboard() {
  const [url, setUrl] = useState('');
  const [prompt, setPrompt] = useState('');
  const [loading, setLoading] = useState(false);
  const [result, setResult] = useState<ScrapeData | null>(null);
 
  const handleScrape = async () => {
    setLoading(true);
    try {
      const response = await fetch('/api/scrape', {
        method: 'POST',
        headers: {
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          url,
          prompt,
          analyze: true
        }),
      });
 
      const data = await response.json();
      setResult(data);
    } catch (error) {
      console.error('Scraping failed:', error);
    } finally {
      setLoading(false);
    }
  };
 
  return (
    <div className="max-w-4xl mx-auto p-6 space-y-6">
      <Card>
        <CardHeader>
          <CardTitle>AI Web Scraper</CardTitle>
        </CardHeader>
        <CardContent className="space-y-4">
          <Input
            placeholder="Enter URL to scrape"
            value={url}
            onChange={(e) => setUrl(e.target.value)}
          />
          <Textarea
            placeholder="Describe what data you want to extract"
            value={prompt}
            onChange={(e) => setPrompt(e.target.value)}
            rows={3}
          />
          <Button 
            onClick={handleScrape} 
            disabled={loading || !url || !prompt}
          >
            {loading ? 'Scraping...' : 'Scrape & Analyze'}
          </Button>
        </CardContent>
      </Card>
 
      {result && (
        <div className="grid grid-cols-1 md:grid-cols-2 gap-6">
          <Card>
            <CardHeader>
              <CardTitle>Scraped Data</CardTitle>
            </CardHeader>
            <CardContent>
              <pre className="text-sm overflow-auto max-h-96">
                {JSON.stringify(result.data, null, 2)}
              </pre>
            </CardContent>
          </Card>
 
          {result.analysis && (
            <Card>
              <CardHeader>
                <CardTitle>AI Analysis</CardTitle>
              </CardHeader>
              <CardContent>
                <div className="whitespace-pre-wrap text-sm">
                  {result.analysis}
                </div>
              </CardContent>
            </Card>
          )}
        </div>
      )}
    </div>
  );
}

Step 7: Enhance with Cursor AI

Use Cursor's AI capabilities to improve your code:

  1. Code Generation: Use Cursor to generate boilerplate code and components
  2. Error Fixing: Let Cursor help debug and fix issues
  3. Code Optimization: Get suggestions for improving performance
  4. Documentation: Generate comments and documentation
// Use Cursor AI to generate utility functions
// Prompt: "Create a utility function to format scraped data for display"
 
export const formatScrapedData = (data: any): string => {
  if (typeof data === 'string') {
    return data;
  }
  
  if (Array.isArray(data)) {
    return data.map(item => formatScrapedData(item)).join('\n');
  }
  
  if (typeof data === 'object' && data !== null) {
    return Object.entries(data)
      .map(([key, value]) => `${key}: ${formatScrapedData(value)}`)
      .join('\n');
  }
  
  return String(data);
};

Step 8: Deploy with Patched

Use Patched to manage and deploy your AI agents in production:

# patched_config.py
from patched import PatchedAgent
 
# Create an agent configuration
agent_config = {
    "name": "web-scraper-agent",
    "description": "AI-powered web scraping agent",
    "endpoints": [
        {
            "path": "/scrape",
            "method": "POST",
            "handler": "scrape_and_analyze"
        }
    ],
    "dependencies": [
        "scrapegraphai",
        "openai",
        "fastapi"
    ],
    "environment": {
        "SCRAPEGRAPH_API_KEY": "${SCRAPEGRAPH_API_KEY}",
        "OPENAI_API_KEY": "${OPENAI_API_KEY}"
    }
}
 
# Deploy the agent
agent = PatchedAgent(config=agent_config)
agent.deploy()

Step 9: Add Advanced Features

Real-time Updates with WebSockets

from fastapi import WebSocket
import json
 
@app.websocket("/ws")
async def websocket_endpoint(websocket: WebSocket):
    await websocket.accept()
    
    while True:
        try:
            # Receive scraping request via WebSocket
            data = await websocket.receive_text()
            request_data = json.loads(data)
            
            # Process scraping request
            result = await process_scrape_request(request_data)
            
            # Send results back
            await websocket.send_text(json.dumps(result))
            
        except Exception as e:
            await websocket.send_text(json.dumps({
                "error": str(e)
            }))

Data Visualization

// components/DataVisualization.tsx
import { BarChart, Bar, XAxis, YAxis, CartesianGrid, Tooltip, Legend } from 'recharts';
 
interface DataVisualizationProps {
  data: any[];
}
 
export function DataVisualization({ data }: DataVisualizationProps) {
  const processedData = data.map((item, index) => ({
    name: item.name || `Item ${index + 1}`,
    value: parseFloat(item.value) || 0
  }));
 
  return (
    <BarChart width={600} height={300} data={processedData}>
      <CartesianGrid strokeDasharray="3 3" />
      <XAxis dataKey="name" />
      <YAxis />
      <Tooltip />
      <Legend />
      <Bar dataKey="value" fill="#8884d8" />
    </BarChart>
  );
}

Step 10: Production Deployment

Deploy your application using Vercel:

# Install Vercel CLI
npm install -g vercel
 
# Deploy frontend
vercel --prod
 
# Set environment variables
vercel env add SCRAPEGRAPH_API_KEY
vercel env add OPENAI_API_KEY

For the backend, use a service like Railway, Heroku, or AWS:

# Dockerfile
FROM python:3.11-slim
 
WORKDIR /app
 
COPY requirements.txt .
RUN pip install -r requirements.txt
 
COPY . .
 
EXPOSE 8000
 
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]

Best Practices and Tips

Error Handling

import logging
from functools import wraps
 
def handle_errors(func):
    @wraps(func)
    async def wrapper(*args, **kwargs):
        try:
            return await func(*args, **kwargs)
        except Exception as e:
            logging.error(f"Error in {func.__name__}: {str(e)}")
            raise HTTPException(
                status_code=500,
                detail=f"Internal server error: {str(e)}"
            )
    return wrapper
 
@app.post("/scrape")
@handle_errors
async def scrape_endpoint(request: ScrapeRequest):
    # Your scraping logic here
    pass

Rate Limiting

from slowapi import Limiter, _rate_limit_exceeded_handler
from slowapi.util import get_remote_address
 
limiter = Limiter(key_func=get_remote_address)
app.state.limiter = limiter
app.add_exception_handler(RateLimitExceeded, _rate_limit_exceeded_handler)
 
@app.post("/scrape")
@limiter.limit("10/minute")
async def scrape_endpoint(request: Request, scrape_request: ScrapeRequest):
    # Rate-limited scraping logic
    pass

Caching

from functools import lru_cache
import hashlib
 
@lru_cache(maxsize=100)
def get_cached_scrape_result(url: str, prompt: str):
    """Cache scraping results to avoid duplicate requests"""
    cache_key = hashlib.md5(f"{url}:{prompt}".encode()).hexdigest()
    
    # Check cache first
    cached_result = cache.get(cache_key)
    if cached_result:
        return cached_result
    
    # If not cached, perform scraping
    result = perform_scraping(url, prompt)
    
    # Cache result for 1 hour
    cache.set(cache_key, result, timeout=3600)
    
    return result

Monitoring and Analytics

from prometheus_client import Counter, Histogram, start_http_server
 
# Metrics
scrape_requests = Counter('scrape_requests_total', 'Total scrape requests')
scrape_duration = Histogram('scrape_duration_seconds', 'Scrape request duration')
 
@app.middleware("http")
async def add_metrics(request: Request, call_next):
    start_time = time.time()
    
    response = await call_next(request)
    
    # Record metrics
    scrape_requests.inc()
    scrape_duration.observe(time.time() - start_time)
    
    return response
 
# Start metrics server
start_http_server(8001)

Conclusion

This tutorial demonstrates how to build a comprehensive full-stack AI web application using modern tools and services. The combination of ScrapeGraphAI for data extraction, OpenAI for analysis, and various deployment tools creates a powerful platform for AI-driven web applications.

Key takeaways:

  • Rapid prototyping with AI-assisted development tools
  • Modular architecture allowing for easy scaling and maintenance
  • Production-ready deployment with proper error handling and monitoring
  • AI integration throughout the stack for intelligent data processing

Related Resources

Want to learn more about building AI applications and web scraping? Check out these guides:

These resources will help you build sophisticated AI applications and become proficient in modern web scraping techniques.

Give your AI Agent superpowers with lightning-fast web data!