使用 ScrapeGraphAI 抓取 Kayak 航班数据:完整指南

·4 分钟阅读 min read·教程
Share:
使用 ScrapeGraphAI 抓取 Kayak 航班数据:完整指南

Scraping Kayak Flight Data with ScrapeGraphAI

Kayak is a popular travel search engine that aggregates flight data from multiple airlines, making it a valuable resource for travel analysts, bloggers, and developers. In this guide, we'll demonstrate how to extract flight information from Kayak using ScrapeGraphAI. With this approach, you can build powerful tools for price comparison, trend analysis, and market research.

Why Scrape Kayak?

Scraping flight data from Kayak can help you:

  • Monitor Flight Prices - Stay updated with real-time fare changes
  • Competitive Analysis - Compare airline pricing and schedule trends
  • Content Creation - Generate data-driven travel content to boost your SEO
  • Data-Driven Decisions - Enhance your travel business strategy with accurate data

Getting Started

Before you begin, make sure you have:

  • Python 3.8 or later installed on your system
  • The ScrapeGraphAI SDK installed via pip install scrapegraph-py
  • An API key from the ScrapeGraphAI Dashboard

Example: Scraping Kayak Flight Data

Let's look at how to extract flight information from Kayak's search results using different programming languages:

Python Example

python
from scrapegraph_py import Client
from scrapegraph_py.logger import sgai_logger

sgai_logger.set_logging(level="INFO")

# Initialize the client
sgai_client = Client(api_key="sgai-********************")

# SmartScraper request
response = sgai_client.smartscraper(
    website_url="https://www.kayak.it/flights/MIL-LON/2025-03-15/2025-03-19?ucs=obhoc7",
    user_prompt="extract me all the flights"
)

# Print the response
print(f"Request ID: {response['request_id']}")
print(f"Result: {response['result']}")

sgai_client.close()

JavaScript Example

javascript
import { Client } from 'scrapegraph-js';
import { z } from 'zod';

// Define the schema
const flightSchema = z.object({
  departure_time: z.string(),
  arrival_time: z.string(),
  departure_airport: z.string(),
  arrival_airport: z.string(),
  airline: z.string(),
  duration: z.string(),
  price: z.string()
});

type FlightSchema = z.infer<typeof flightSchema>;

// Initialize the client
const sgai_client = new Client("sgai-********************");

try {
  const response = await sgai_client.smartscraper({
    websiteUrl: "https://www.kayak.it/flights/MIL-LON/2025-03-15/2025-03-19?ucs=obhoc7",
    userPrompt: "extract me all the flights",
    outputSchema: flightSchema
  });

  console.log('Request ID:', response.requestId);
  console.log('Result:', response.result);
} catch (error) {
  console.error(error);
} finally {
  sgai_client.close();
}

cURL Example

bash
curl -X 'POST' \
  'https://api.scrapegraphai.com/v1/smartscraper' \
  -H 'accept: application/json' \
  -H 'SGAI-APIKEY: sgai-********************' \
  -H 'Content-Type: application/json' \
  -d '{
  "website_url": "https://www.kayak.it/flights/MIL-LON/2025-03-15/2025-03-19?ucs=obhoc7",
  "user_prompt": "extract me all the flights",
  "output_schema": {
    "type": "object",
    "properties": {
      "departure_time": { "type": "string" },
      "arrival_time": { "type": "string" },
      "departure_airport": { "type": "string" },
      "arrival_airport": { "type": "string" },
      "airline": { "type": "string" },
      "duration": { "type": "string" },
      "price": { "type": "string" }
    },
    "required": ["departure_time", "arrival_time", "departure_airport", "arrival_airport", "airline", "duration", "price"]
  }
}'

The response will look something like this:

json
{
  "flights": [
    {
      "departure_time": "22:15",
      "arrival_time": "23:20",
      "departure_airport": "BGY",
      "arrival_airport": "STN",
      "airline": "Ryanair",
      "duration": "2 h 05 min",
      "price": "50.67 €"
    },
    {
      "departure_time": "06:20",
      "arrival_time": "09:10",
      "departure_airport": "STN",
      "arrival_airport": "BGY",
      "airline": "Ryanair",
      "duration": "1 h 50 min",
      "price": "57 €"
    },
    {
      "departure_time": "21:20",
      "arrival_time": "22:25",
      "departure_airport": "BGY",
      "arrival_airport": "STN",
      "airline": "Ryanair",
      "duration": "2 h 05 min",
      "price": "55.25 €"
    },
    {
      "departure_time": "20:25",
      "arrival_time": "23:25",
      "departure_airport": "LGW",
      "arrival_airport": "MXP",
      "airline": "Wizz Air",
      "duration": "2 h 00 min",
      "price": "52 €"
    },
    {
      "departure_time": "07:00",
      "arrival_time": "10:00",
      "departure_airport": "LGW",
      "arrival_airport": "MXP",
      "airline": "easyJet",
      "duration": "2 h 00 min",
      "price": "47 €"
    }
  ]
}

Best Practices for Flight Data Scraping

When scraping data from travel websites like Kayak, consider these tips:

  • Respect Rate Limits: Insert delays between requests to avoid overloading the server.
  • Error Handling: Implement robust error handling to manage potential scraping issues.
  • Data Validation: Regularly verify that the extracted data is accurate and complete.
  • Stay Compliant: Always review the website's terms of service and robots.txt before scraping.

Conclusion

Scraping flight data from Kayak using ScrapeGraphAI is an efficient way to gather valuable travel insights. Whether you're tracking price fluctuations or building a travel comparison tool, this method can empower you with up-to-date and actionable data.

Remember to secure your API key, follow best practices, and update your scraping scripts as needed to keep up with website changes.

Happy scraping and safe travels!

Did you find this article helpful?

Share it with your network!

Share:

Transform Your Data Collection

Experience the power of AI-driven web scraping with ScrapeGrapAI API. Start collecting structured data in minutes, not days.