In this tutorial, we'll demonstrate how to rapidly develop a full-stack AI web application by integrating several powerful tools: Cursor, OpenAI's o1 model, Vercel's v0, ScrapeGraphAI, and Patched. This combination allows for efficient prototyping and deployment of AI-driven applications.
Tools Overview
- Cursor: An AI-enhanced code editor that assists in writing and understanding code.
- OpenAI o1 Model: A powerful language model capable of understanding and generating human-like text.
- Vercel v0: A platform for deploying web applications with ease.
- ScrapeGraphAI: An AI-powered web scraping tool that simplifies data extraction from websites.
- Patched: A tool for managing and deploying AI agents in production environments.
Step 1: Set Up Your Development Environment
Begin by installing the necessary tools and setting up your development environment.
# Install ScrapeGraphAI
pip install scrapegraphai
# Install Playwright for browser automation
pip install playwright
playwright install
Step 2: Create a ScrapeGraphAI Pipeline
Use ScrapeGraphAI to extract data from a target website. Here's an example of how to set up a simple scraping pipeline:
from scrapegraph_py import Client
from scrapegraph_py.logger import sgai_logger
sgai_logger.set_logging(level="INFO")
# Initialize the client
sgai_client = Client(api_key="your-api-key-here")
# SmartScraper request
response = sgai_client.smartscraper(
website_url="https://example.com",
user_prompt="Extract webpage information"
)
# Print the response
print(f"Request ID: {response['request_id']}")
print(f"Result: {response['result']}")
if response.get('reference_urls'):
print(f"Reference URLs: {response['reference_urls']}")
Step 3: Design Your Frontend with Vercel v0
Vercel v0 allows you to quickly generate UI components and layouts. Use natural language to describe your desired interface:
# Use Vercel v0 to generate components
npx v0@latest add "Create a dashboard with data visualization cards"
This will generate React components that you can customize for your application.
Step 4: Integrate OpenAI o1 Model
Create an AI service that processes the scraped data:
import openai
from typing import Dict, Any
class AIAnalyzer:
def __init__(self, api_key: str):
self.client = openai.OpenAI(api_key=api_key)
def analyze_scraped_data(self, data: Dict[str, Any]) -> str:
"""Analyze scraped data using OpenAI o1 model"""
prompt = f"""
Analyze the following scraped data and provide insights:
Data: {data}
Please provide:
1. Key findings
2. Patterns or trends
3. Actionable recommendations
"""
response = self.client.chat.completions.create(
model="o1-preview",
messages=[
{"role": "user", "content": prompt}
]
)
return response.choices[0].message.content
# Usage
analyzer = AIAnalyzer(api_key="your-openai-key")
insights = analyzer.analyze_scraped_data(scraped_data)
Step 5: Build the Backend API
Create a FastAPI backend that orchestrates all components:
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
from typing import Optional
import asyncio
app = FastAPI(title="AI Web Scraping App")
class ScrapeRequest(BaseModel):
url: str
prompt: str
analyze: Optional[bool] = True
class ScrapeResponse(BaseModel):
data: dict
analysis: Optional[str] = None
request_id: str
@app.post("/scrape", response_model=ScrapeResponse)
async def scrape_and_analyze(request: ScrapeRequest):
"""Scrape data and optionally analyze it"""
try:
# Step 1: Scrape data
scrape_response = sgai_client.smartscraper(
website_url=request.url,
user_prompt=request.prompt
)
analysis = None
if request.analyze:
# Step 2: Analyze data with AI
analyzer = AIAnalyzer(api_key="your-openai-key")
analysis = analyzer.analyze_scraped_data(scrape_response['result'])
return ScrapeResponse(
data=scrape_response['result'],
analysis=analysis,
request_id=scrape_response['request_id']
)
except Exception as e:
raise HTTPException(status_code=500, detail=str(e))
@app.get("/health")
async def health_check():
return {"status": "healthy"}
Step 6: Frontend Implementation with React and Next.js
Create a React frontend that interacts with your API:
// components/ScrapingDashboard.tsx
import React, { useState } from 'react';
import { Card, CardContent, CardHeader, CardTitle } from '@/components/ui/card';
import { Button } from '@/components/ui/button';
import { Input } from '@/components/ui/input';
import { Textarea } from '@/components/ui/textarea';
interface ScrapeData {
data: any;
analysis?: string;
request_id: string;
}
export default function ScrapingDashboard() {
const [url, setUrl] = useState('');
const [prompt, setPrompt] = useState('');
const [loading, setLoading] = useState(false);
const [result, setResult] = useState<ScrapeData | null>(null);
const handleScrape = async () => {
setLoading(true);
try {
const response = await fetch('/api/scrape', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
},
body: JSON.stringify({
url,
prompt,
analyze: true
}),
});
const data = await response.json();
setResult(data);
} catch (error) {
console.error('Scraping failed:', error);
} finally {
setLoading(false);
}
};
return (
<div className="max-w-4xl mx-auto p-6 space-y-6">
<Card>
<CardHeader>
<CardTitle>AI Web Scraper</CardTitle>
</CardHeader>
<CardContent className="space-y-4">
<Input
placeholder="Enter URL to scrape"
value={url}
onChange={(e) => setUrl(e.target.value)}
/>
<Textarea
placeholder="Describe what data you want to extract"
value={prompt}
onChange={(e) => setPrompt(e.target.value)}
rows={3}
/>
<Button
onClick={handleScrape}
disabled={loading || !url || !prompt}
>
{loading ? 'Scraping...' : 'Scrape & Analyze'}
</Button>
</CardContent>
</Card>
{result && (
<div className="grid grid-cols-1 md:grid-cols-2 gap-6">
<Card>
<CardHeader>
<CardTitle>Scraped Data</CardTitle>
</CardHeader>
<CardContent>
<pre className="text-sm overflow-auto max-h-96">
{JSON.stringify(result.data, null, 2)}
</pre>
</CardContent>
</Card>
{result.analysis && (
<Card>
<CardHeader>
<CardTitle>AI Analysis</CardTitle>
</CardHeader>
<CardContent>
<div className="whitespace-pre-wrap text-sm">
{result.analysis}
</div>
</CardContent>
</Card>
)}
</div>
)}
</div>
);
}
Step 7: Enhance with Cursor AI
Use Cursor's AI capabilities to improve your code:
- Code Generation: Use Cursor to generate boilerplate code and components
- Error Fixing: Let Cursor help debug and fix issues
- Code Optimization: Get suggestions for improving performance
- Documentation: Generate comments and documentation
// Use Cursor AI to generate utility functions
// Prompt: "Create a utility function to format scraped data for display"
export const formatScrapedData = (data: any): string => {
if (typeof data === 'string') {
return data;
}
if (Array.isArray(data)) {
return data.map(item => formatScrapedData(item)).join('\n');
}
if (typeof data === 'object' && data !== null) {
return Object.entries(data)
.map(([key, value]) => `${key}: ${formatScrapedData(value)}`)
.join('\n');
}
return String(data);
};
Step 8: Deploy with Patched
Use Patched to manage and deploy your AI agents in production:
# patched_config.py
from patched import PatchedAgent
# Create an agent configuration
agent_config = {
"name": "web-scraper-agent",
"description": "AI-powered web scraping agent",
"endpoints": [
{
"path": "/scrape",
"method": "POST",
"handler": "scrape_and_analyze"
}
],
"dependencies": [
"scrapegraphai",
"openai",
"fastapi"
],
"environment": {
"SCRAPEGRAPH_API_KEY": "${SCRAPEGRAPH_API_KEY}",
"OPENAI_API_KEY": "${OPENAI_API_KEY}"
}
}
# Deploy the agent
agent = PatchedAgent(config=agent_config)
agent.deploy()
Step 9: Add Advanced Features
Real-time Updates with WebSockets
from fastapi import WebSocket
import json
@app.websocket("/ws")
async def websocket_endpoint(websocket: WebSocket):
await websocket.accept()
while True:
try:
# Receive scraping request via WebSocket
data = await websocket.receive_text()
request_data = json.loads(data)
# Process scraping request
result = await process_scrape_request(request_data)
# Send results back
await websocket.send_text(json.dumps(result))
except Exception as e:
await websocket.send_text(json.dumps({
"error": str(e)
}))
Data Visualization
// components/DataVisualization.tsx
import { BarChart, Bar, XAxis, YAxis, CartesianGrid, Tooltip, Legend } from 'recharts';
interface DataVisualizationProps {
data: any[];
}
export function DataVisualization({ data }: DataVisualizationProps) {
const processedData = data.map((item, index) => ({
name: item.name || `Item ${index + 1}`,
value: parseFloat(item.value) || 0
}));
return (
<BarChart width={600} height={300} data={processedData}>
<CartesianGrid strokeDasharray="3 3" />
<XAxis dataKey="name" />
<YAxis />
<Tooltip />
<Legend />
<Bar dataKey="value" fill="#8884d8" />
</BarChart>
);
}
Step 10: Production Deployment
Deploy your application using Vercel:
# Install Vercel CLI
npm install -g vercel
# Deploy frontend
vercel --prod
# Set environment variables
vercel env add SCRAPEGRAPH_API_KEY
vercel env add OPENAI_API_KEY
For the backend, use a service like Railway, Heroku, or AWS:
# Dockerfile
FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
EXPOSE 8000
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]
Best Practices and Tips
Error Handling
import logging
from functools import wraps
def handle_errors(func):
@wraps(func)
async def wrapper(*args, **kwargs):
try:
return await func(*args, **kwargs)
except Exception as e:
logging.error(f"Error in {func.__name__}: {str(e)}")
raise HTTPException(
status_code=500,
detail=f"Internal server error: {str(e)}"
)
return wrapper
@app.post("/scrape")
@handle_errors
async def scrape_endpoint(request: ScrapeRequest):
# Your scraping logic here
pass
Rate Limiting
from slowapi import Limiter, _rate_limit_exceeded_handler
from slowapi.util import get_remote_address
limiter = Limiter(key_func=get_remote_address)
app.state.limiter = limiter
app.add_exception_handler(RateLimitExceeded, _rate_limit_exceeded_handler)
@app.post("/scrape")
@limiter.limit("10/minute")
async def scrape_endpoint(request: Request, scrape_request: ScrapeRequest):
# Rate-limited scraping logic
pass
Caching
from functools import lru_cache
import hashlib
@lru_cache(maxsize=100)
def get_cached_scrape_result(url: str, prompt: str):
"""Cache scraping results to avoid duplicate requests"""
cache_key = hashlib.md5(f"{url}:{prompt}".encode()).hexdigest()
# Check cache first
cached_result = cache.get(cache_key)
if cached_result:
return cached_result
# If not cached, perform scraping
result = perform_scraping(url, prompt)
# Cache result for 1 hour
cache.set(cache_key, result, timeout=3600)
return result
Monitoring and Analytics
from prometheus_client import Counter, Histogram, start_http_server
# Metrics
scrape_requests = Counter('scrape_requests_total', 'Total scrape requests')
scrape_duration = Histogram('scrape_duration_seconds', 'Scrape request duration')
@app.middleware("http")
async def add_metrics(request: Request, call_next):
start_time = time.time()
response = await call_next(request)
# Record metrics
scrape_requests.inc()
scrape_duration.observe(time.time() - start_time)
return response
# Start metrics server
start_http_server(8001)
Conclusion
This tutorial demonstrates how to build a comprehensive full-stack AI web application using modern tools and services. The combination of ScrapeGraphAI for data extraction, OpenAI for analysis, and various deployment tools creates a powerful platform for AI-driven web applications.
Key takeaways:
- Rapid prototyping with AI-assisted development tools
- Modular architecture allowing for easy scaling and maintenance
- Production-ready deployment with proper error handling and monitoring
- AI integration throughout the stack for intelligent data processing
Related Resources
Want to learn more about building AI applications and web scraping? Check out these guides:
- Web Scraping 101 - Master the basics of web scraping
- AI Agent Web Scraping - Advanced AI-powered scraping techniques
- Mastering ScrapeGraphAI - Complete guide to ScrapeGraphAI
- Building Intelligent Agents - AI agent development
- Scraping with Python - Python web scraping techniques
- Web Scraping Legality - Legal considerations
These resources will help you build sophisticated AI applications and become proficient in modern web scraping techniques.