TL;DR
A complete guide to extracting data from X/Twitter in 2026, covering three methods with step-by-step examples.
- Three extraction methods compared — official X API, AI-powered scraping, and open-source tools
- X API is costly for serious use — free tier is write-only, Basic starts at $100/month
- ScrapeGraphAI uses natural language prompts — describe what you want and get structured JSON back
- Extract tweets, profiles, and engagement metrics — text, media, timestamps, likes, retweets, and more
- Schema-based extraction with Pydantic — enforce consistent, validated output structures
Introduction
Twitter (now X) remains one of the most valuable sources of real-time public data. Whether you're tracking brand sentiment, monitoring competitors, or conducting academic research, extracting data from Twitter can provide actionable insights that are difficult to find elsewhere.
This guide covers the best tools, techniques, and legal considerations for Twitter data extraction in 2026, with step-by-step examples using ScrapeGraphAI, snscrape, and the official X API.
Why Extract Data from Twitter?
There are several compelling use cases for Twitter data extraction:
- Sentiment analysis — track how people feel about your brand, products, or industry
- Market research — monitor competitors and identify emerging trends
- Academic research — study social dynamics, information spread, and public discourse
- Lead generation — identify potential customers based on engagement patterns
- Trend detection — spot trending hashtags and topics early
- Crisis management — monitor mentions during PR events in real time
- Influencer identification — find key voices in specific niches
What Data Can You Extract?
| Category | Examples |
|---|---|
| Tweets | Text, media, timestamps, engagement metrics (likes, retweets, replies) |
| User profiles | Bio, follower counts, verification status, join date |
| Followers/following | Account lists and relationships |
| Hashtags & trends | Topic volume, trending data |
| Conversations | Reply threads, quote tweets |
| Lists | Memberships, subscribers |
| Spaces | Audio room metadata |
Three Methods for Twitter Data Extraction
Method 1: Official X API
The X API provides official, rate-limited access to Twitter data.
Pricing tiers:
| Tier | Price | Access |
|---|---|---|
| Free | $0 | Write-only access |
| Basic | $100/month | 10,000 reads/month |
| Pro | $5,000/month | 1M reads/month |
| Enterprise | Custom | Full firehose access |
The official API is reliable but expensive for serious data collection, and the free tier is write-only — no data extraction at all.
Method 2: AI-Powered Web Scraping
AI-powered tools like ScrapeGraphAI use natural language prompts to extract structured data from Twitter. This approach bypasses API limitations, handles rendering and anti-bot measures automatically, and delivers clean JSON output.
Method 3: Open-Source Scrapers
Free tools like snscrape can scrape Twitter without API access. The trade-off is that they risk breaking when the platform updates its frontend, and they require more maintenance.
Extracting Twitter Data with ScrapeGraphAI
ScrapeGraphAI is the most reliable approach for Twitter data extraction. You describe the data you want in plain English, and the AI handles the rest.
Basic Tweet Extraction
from scrapegraphai import Client
sgai = ScrapeGraphAI() # uses SGAI_API_KEY env var
response = sgai.extract(
url="https://x.com/elonmusk",
prompt="Extract the last 10 tweets with text, date, likes, retweets, and replies count"
)
print(response["result"])Schema-Based Extraction with Pydantic
For consistent, validated output, define a schema:
from pydantic import BaseModel, Field
from scrapegraphai import Client
class Tweet(BaseModel):
text: str = Field(description="Tweet text content")
date: str = Field(description="Publication date")
likes: int = Field(description="Number of likes")
retweets: int = Field(description="Number of retweets")
replies: int = Field(description="Number of replies")
class TweetList(BaseModel):
tweets: list[Tweet] = Field(description="List of extracted tweets")
sgai = ScrapeGraphAI() # uses SGAI_API_KEY env var
response = sgai.extract(
url="https://x.com/elonmusk",
prompt="Extract the latest tweets with engagement metrics",
schema=TweetList
)Keyword-Based Search
response = sgai.extract(
url="https://x.com/search?q=artificial+intelligence&src=typed_query",
prompt="Extract tweets about AI including author, text, date, and engagement metrics"
)Trending Topics
response = sgai.extract(
url="https://x.com/explore/tabs/trending",
prompt="Extract the top 20 trending topics with their categories and tweet volumes"
)Extracting Twitter Data with snscrape
snscrape is a free, open-source option for basic Twitter scraping.
User Timeline Scraping
import snscrape.modules.twitter as sntwitter
tweets = []
for i, tweet in enumerate(sntwitter.TwitterUserScraper("elonmusk").get_items()):
if i >= 100:
break
tweets.append({
"date": tweet.date,
"text": tweet.rawContent,
"likes": tweet.likeCount,
"retweets": tweet.retweetCount,
})Keyword Search
for i, tweet in enumerate(
sntwitter.TwitterSearchScraper("artificial intelligence since:2026-01-01").get_items()
):
if i >= 50:
break
print(f"{tweet.date} - {tweet.rawContent[:100]}")Note that snscrape may break when Twitter updates its frontend, and it requires regular maintenance.
Using the Official X API with tweepy
import tweepy
client = tweepy.Client(bearer_token="your-bearer-token")
response = client.search_recent_tweets(
query="artificial intelligence -is:retweet",
max_results=100,
tweet_fields=["created_at", "public_metrics", "author_id"]
)
for tweet in response.data:
print(f"{tweet.created_at}: {tweet.text}")
print(f"Likes: {tweet.public_metrics['like_count']}")Tool Comparison
| Aspect | ScrapeGraphAI | X API (Pro) | snscrape | Apify |
|---|---|---|---|---|
| Cost | $19+/month | $5,000/month | Free | $35+/month |
| Rate limits | Generous | Strict | None | Credit-based |
| Setup complexity | Low | Medium | Low | Low |
| Resilience | High (AI-adaptive) | High (official) | Low | Medium |
| Structured output | Yes (schema) | Partial | Manual | Partial |
Use Case Examples
Brand Monitoring
Track brand mentions and analyze sentiment polarity across Twitter to understand public perception and respond to issues quickly.
response = sgai.extract(
url="https://x.com/search?q=YourBrand&src=typed_query",
prompt="Extract tweets mentioning this brand with sentiment (positive/negative/neutral), author, date, and engagement metrics"
)Competitor Analysis
Compare engagement metrics across rival accounts to benchmark performance and identify content strategies that resonate.
Hashtag Research
Analyze hashtag performance for content strategy optimization. Track volume, engagement rates, and associated topics over time.
Legal and Ethical Considerations
Before extracting Twitter data, be aware of these important considerations:
- X's Terms of Service prohibit scraping without authorization
- Court precedent (hiQ Labs v. LinkedIn, 2022) generally permits public data scraping in the US
- GDPR and CCPA compliance is required when collecting data on EU or California users
- Best practices: collect only necessary data, anonymize personal information, and respect rate limits
Always consult legal counsel for your specific use case.
Tips for Effective Twitter Data Extraction
- Start with ScrapeGraphAI's free tier before committing to a paid plan
- Define schemas for consistent, validated output
- Use search operators like
from:,since:,-is:retweetfor precise queries - Batch requests to manage rate limits efficiently
- Cache results locally to avoid redundant extraction
- Monitor for platform changes that could affect scraping
- Combine methods — use the API for real-time data and scraping for historical data
Conclusion
Twitter remains a rich source of real-time public data. ScrapeGraphAI offers the best balance of power, simplicity, and cost efficiency for most use cases. Natural language prompts eliminate the complexity of traditional scraping, and structured JSON output integrates seamlessly into data pipelines.
FAQ
Is scraping Twitter legal?
Publicly available data extraction is generally legal in the US based on court precedent. However, X's terms of service prohibit unauthorized scraping. Always consult legal counsel for your specific situation.
What is the cheapest way to extract Twitter data?
ScrapeGraphAI's free tier and snscrape offer the most affordable options. The official X API free tier is write-only and cannot be used for data extraction.
How do I handle rate limits?
The official API enforces strict rate limits. ScrapeGraphAI and web scrapers have built-in rate limit management that handles this automatically.
Can I access private account data?
No — only publicly available data can be extracted regardless of the tool you use.
How far back can I get historical data?
The Basic API tier provides 7 days of history. Pro and Enterprise tiers offer full archive access. Web scraping tools can access whatever is visible on the page.