ScrapeGraphAIScrapeGraphAI

Instagram Scraping with ScrapeGraphAI Smart Scraper

Instagram Scraping with ScrapeGraphAI Smart Scraper

Author 1

Written by Marco Vinciguerra

Instagram scraping has a reputation for being painful. Login walls, aggressive bot detection, heavy JavaScript rendering, and a page structure that changes without notice — traditional scrapers spend more time fighting the platform than extracting data.

ScrapeGraphAI's Smart Scraper handles all of that for you. You write a natural language prompt describing what you want, and you get back clean JSON. No proxy rotation, no CAPTCHA solving, no CSS selectors to maintain.

Why Instagram Is Hard to Scrape

Instagram was built to be consumed in-browser, not by scrapers. Here's what you're up against:

  • JavaScript-heavy rendering — almost no meaningful content is in the raw HTML
  • Login walls — many pages push you to sign in before showing data
  • Rate limiting and bot detection — unusual request patterns trigger blocks quickly
  • Frequent layout changes — Instagram updates their structure often, breaking selector-based scrapers

With traditional tools you'd need a headless browser, rotating proxies, and a custom parser. With ScrapeGraphAI you skip all of it.

A Real Example

Here's how to extract profile data from a public Instagram account:

from scrapegraph_py import Client
from scrapegraph_py.logger import sgai_logger
 
sgai_logger.set_logging(level="INFO")
 
sgai_client = Client(api_key="sgai-********************")
 
url_list = [
    "https://www.instagram.com/natgeo/",
    "https://www.instagram.com/nasa/",
]
 
for url in url_list:
    response = sgai_client.smartscraper(
        website_url=url,
        user_prompt="Extract username, bio, follower count, following count, and post count"
    )
 
    print(f"Request ID: {response['request_id']}")
    print(f"Result: {response['result']}")
 
sgai_client.close()

That's the entire scraper. No browser setup, no proxy config, no XPath gymnastics.

What You Get Back

Here's real output from scraping a couple of public profiles:

{
  "username": "natgeo",
  "bio": "Experience the world through the eyes of National Geographic photographers.",
  "followers": "284,000,000",
  "following": "170",
  "posts": 27543
}
{
  "username": "nasa",
  "bio": "Explore the universe and discover our home planet with the official NASA Instagram account.",
  "followers": "97,500,000",
  "following": "60",
  "posts": 4321
}

Clean, structured, ready to use.

What You Can Extract

From Profiles

  • Username, display name, bio
  • Follower, following, and post counts
  • Verification status
  • Business category (if set)
  • Profile picture URL
  • External link in bio

From Posts

  • Caption and hashtags
  • Like and comment counts
  • Post timestamp
  • Media URLs (images and videos)
  • Location tag (when available)

Derived Metrics

  • Engagement rate (likes + comments / followers)
  • Posting frequency
  • Most used hashtags

Customizing Your Prompts

The natural language interface is flexible. Tailor your prompt to exactly what you need:

For influencer research:

user_prompt = "Extract username, followers, bio, and average engagement on recent posts"

For post analysis:

user_prompt = "Extract the caption, hashtags, like count, comment count, and post date"

For competitor tracking:

user_prompt = "Extract posting frequency, top hashtags, and average likes per post"

Building a Competitor Monitor

Here's a practical script that tracks multiple Instagram accounts and saves the results:

import json
import time
from datetime import datetime
from scrapegraph_py import Client
 
class InstagramMonitor:
    def __init__(self, api_key: str):
        self.client = Client(api_key=api_key)
 
    def analyze_profile(self, username: str) -> dict | None:
        url = f"https://www.instagram.com/{username}/"
        try:
            response = self.client.smartscraper(
                website_url=url,
                user_prompt=(
                    "Extract username, follower count, following count, post count, "
                    "bio, verification status, and average engagement on recent posts"
                )
            )
            return {
                "username": username,
                "scraped_at": datetime.now().isoformat(),
                "data": response["result"],
            }
        except Exception as e:
            print(f"Error scraping @{username}: {e}")
            return None
 
    def monitor_accounts(self, usernames: list[str]) -> list[dict]:
        results = []
        for username in usernames:
            print(f"Scraping @{username}...")
            result = self.analyze_profile(username)
            if result:
                results.append(result)
            time.sleep(2)  # be respectful with timing
        return results
 
    def save(self, results: list[dict], filename: str) -> None:
        with open(filename, "w") as f:
            json.dump(results, f, indent=2)
        print(f"Saved to {filename}")
 
monitor = InstagramMonitor(api_key="sgai-********************")
 
accounts = ["competitor_one", "competitor_two", "competitor_three"]
results = monitor.monitor_accounts(accounts)
 
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
monitor.save(results, f"instagram_analysis_{timestamp}.json")

Handling Errors Gracefully

Not every profile will return every field. Always wrap your calls:

def safe_scrape(client: Client, url: str, prompt: str) -> dict | None:
    try:
        response = client.smartscraper(
            website_url=url,
            user_prompt=prompt
        )
        return response["result"]
    except Exception as e:
        print(f"Failed to scrape {url}: {e}")
        return None
 
result = safe_scrape(client, "https://www.instagram.com/someaccount/", "Extract follower count and bio")
if result:
    print("Success:", result)

Practical Use Cases

Influencer vetting — Before a campaign, quickly pull follower counts and engagement rates for a shortlist of accounts. Spot inflated follower counts by checking the follower-to-engagement ratio. Competitor analysis — Track posting cadence, top hashtags, and engagement trends across competitor accounts. See what content is working for them.

Brand monitoring — Watch for brand mentions and user-generated content on public profiles. Set up a scheduled job to run daily and flag changes. Content research — Identify which hashtag categories and content formats drive the most engagement in your niche before building your content calendar.

Market research — Aggregate public data on consumer preferences, trending aesthetics, and audience demographics across public brand profiles.

Things to Keep in Mind

Be specific with your prompts. "Extract follower count and recent post engagement" gives better results than "get all data." The more precise your request, the more reliable the extraction. Add delays between requests. Even with managed infrastructure, batching requests too quickly is poor practice. A time.sleep(1) or time.sleep(2) between calls is sufficient.

Validate your output. AI extraction is very accurate but not infallible. For business-critical data, cross-check a sample of results manually. Public profiles only. Private accounts are not accessible. The scraper operates on the same publicly visible data any browser would see.

Common Questions

Is scraping Instagram legal? Scraping publicly available data is generally permitted under most jurisdictions, including the hiQ v. LinkedIn ruling, which established that scraping public web data doesn't violate the CFAA. That said, Instagram's Terms of Service prohibit automated access, so use common sense: scrape public data, don't misuse what you collect, and consult legal advice if you're building at scale.

What if Instagram changes their layout? That's the advantage of AI-powered extraction. Traditional scrapers break when the HTML structure changes. ScrapeGraphAI understands the semantic meaning of the page, so layout updates don't break your scraper.

Can I scrape Stories or Reels? Public Reels and Stories can be scraped, though the available data varies. Posts and profile-level data are the most reliable targets.

How many accounts can I scrape? This depends on your ScrapeGraphAI plan. Start with small batches, add delays, and scale up as you validate your pipeline.

What about private profiles? Only publicly accessible data is available. Private accounts require authentication and explicit permission — that's outside the scope of this tool.

The Bottom Line

Instagram data is genuinely useful for market research, influencer vetting, and competitive intelligence. Getting to it reliably used to require a lot of infrastructure. With ScrapeGraphAI you describe what you want in plain English and get back structured JSON — no proxy setup, no CAPTCHA handling, no fragile selectors.

Start with a single profile, make sure you're getting what you need, then scale up.

Give your AI Agent superpowers with lightning-fast web data!