使用 ScrapeGraphAI 抓取 Kayak 航班数据:完整指南

Scraping Kayak Flight Data with ScrapeGraphAI
Kayak is a popular travel search engine that aggregates flight data from multiple airlines, making it a valuable resource for travel analysts, bloggers, and developers. In this guide, we'll demonstrate how to extract flight information from Kayak using ScrapeGraphAI. With this approach, you can build powerful tools for price comparison, trend analysis, and market research.
Why Scrape Kayak?
Scraping flight data from Kayak can help you:
- Monitor Flight Prices - Stay updated with real-time fare changes
- Competitive Analysis - Compare airline pricing and schedule trends
- Content Creation - Generate data-driven travel content to boost your SEO
- Data-Driven Decisions - Enhance your travel business strategy with accurate data
Getting Started
Before you begin, make sure you have:
- Python 3.8 or later installed on your system
- The ScrapeGraphAI SDK installed via pip install scrapegraph-py
- An API key from the ScrapeGraphAI Dashboard
Example: Scraping Kayak Flight Data
Let's look at how to extract flight information from Kayak's search results using different programming languages:
Python Example
pythonfrom scrapegraph_py import Client from scrapegraph_py.logger import sgai_logger sgai_logger.set_logging(level="INFO") # Initialize the client sgai_client = Client(api_key="sgai-********************") # SmartScraper request response = sgai_client.smartscraper( website_url="https://www.kayak.it/flights/MIL-LON/2025-03-15/2025-03-19?ucs=obhoc7", user_prompt="extract me all the flights" ) # Print the response print(f"Request ID: {response['request_id']}") print(f"Result: {response['result']}") sgai_client.close()
JavaScript Example
javascriptimport { Client } from 'scrapegraph-js'; import { z } from 'zod'; // Define the schema const flightSchema = z.object({ departure_time: z.string(), arrival_time: z.string(), departure_airport: z.string(), arrival_airport: z.string(), airline: z.string(), duration: z.string(), price: z.string() }); type FlightSchema = z.infer<typeof flightSchema>; // Initialize the client const sgai_client = new Client("sgai-********************"); try { const response = await sgai_client.smartscraper({ websiteUrl: "https://www.kayak.it/flights/MIL-LON/2025-03-15/2025-03-19?ucs=obhoc7", userPrompt: "extract me all the flights", outputSchema: flightSchema }); console.log('Request ID:', response.requestId); console.log('Result:', response.result); } catch (error) { console.error(error); } finally { sgai_client.close(); }
cURL Example
bashcurl -X 'POST' \ 'https://api.scrapegraphai.com/v1/smartscraper' \ -H 'accept: application/json' \ -H 'SGAI-APIKEY: sgai-********************' \ -H 'Content-Type: application/json' \ -d '{ "website_url": "https://www.kayak.it/flights/MIL-LON/2025-03-15/2025-03-19?ucs=obhoc7", "user_prompt": "extract me all the flights", "output_schema": { "type": "object", "properties": { "departure_time": { "type": "string" }, "arrival_time": { "type": "string" }, "departure_airport": { "type": "string" }, "arrival_airport": { "type": "string" }, "airline": { "type": "string" }, "duration": { "type": "string" }, "price": { "type": "string" } }, "required": ["departure_time", "arrival_time", "departure_airport", "arrival_airport", "airline", "duration", "price"] } }'
The response will look something like this:
json{ "flights": [ { "departure_time": "22:15", "arrival_time": "23:20", "departure_airport": "BGY", "arrival_airport": "STN", "airline": "Ryanair", "duration": "2 h 05 min", "price": "50.67 €" }, { "departure_time": "06:20", "arrival_time": "09:10", "departure_airport": "STN", "arrival_airport": "BGY", "airline": "Ryanair", "duration": "1 h 50 min", "price": "57 €" }, { "departure_time": "21:20", "arrival_time": "22:25", "departure_airport": "BGY", "arrival_airport": "STN", "airline": "Ryanair", "duration": "2 h 05 min", "price": "55.25 €" }, { "departure_time": "20:25", "arrival_time": "23:25", "departure_airport": "LGW", "arrival_airport": "MXP", "airline": "Wizz Air", "duration": "2 h 00 min", "price": "52 €" }, { "departure_time": "07:00", "arrival_time": "10:00", "departure_airport": "LGW", "arrival_airport": "MXP", "airline": "easyJet", "duration": "2 h 00 min", "price": "47 €" } ] }
Best Practices for Flight Data Scraping
When scraping data from travel websites like Kayak, consider these tips:
- Respect Rate Limits: Insert delays between requests to avoid overloading the server.
- Error Handling: Implement robust error handling to manage potential scraping issues.
- Data Validation: Regularly verify that the extracted data is accurate and complete.
- Stay Compliant: Always review the website's terms of service and robots.txt before scraping.
Conclusion
Scraping flight data from Kayak using ScrapeGraphAI is an efficient way to gather valuable travel insights. Whether you're tracking price fluctuations or building a travel comparison tool, this method can empower you with up-to-date and actionable data.
Remember to secure your API key, follow best practices, and update your scraping scripts as needed to keep up with website changes.
Happy scraping and safe travels!
Did you find this article helpful?
Share it with your network!