ScrapeGraphAIScrapeGraphAI

How we turned Claude into a beast machine for web scraping

How we turned Claude into a beast machine for web scraping

Author 1

Lorenzo Padoan

Why raw LLM "fetch" tools fail and how ScrapeGraphAI turns Claude into an autonomous scraper.

Claude Scraping Beast

The Truth: LLM fetchers are still pretty bad at real world data acquisition

Claude, OpenAI, Gemini all of them still suffer from the same problem:

Their built in "fetch URL → extract data" tools break the moment real automation tasks begin.

Dynamic websites, pagination, JavaScript rendering, structured extraction... LLMs can describe scraping but fail at actually doing it. They hallucinate, return empty pages, misunderstand structure, or simply refuse to fetch.

We tested this live.

The experiment: Scraping IBM's partner directory

Target:

https://www.ibm.com/partnerplus/directory/companies?p=1

The request was simple: extract all the company URLs, open each profile, gather the overview, address, telephone, website and proficiencies, and finally assemble everything into an Excel file.

What happened without a scraping engine

This resulted in a classic LLM meltdown:

  • Empty fetches
  • Domain restrictions
  • Wrong URLs
  • Invented data
  • 47 companies found instead of 30???
  • An Excel file full of hallucinations

LLM fetcher = great talker, terrible scraper.

Original attempt: https://claude.ai/share/bcf1349b-0c87-416c-bb9f-2d1ced848b76

Then We Equipped Claude With a Real Scraping Engine

Same request. Same page.

But this time Claude had access to ScrapeGraphAI.

Result: https://claude.ai/share/b16acfb0-ba07-4116-9e46-3b781724a5b4

The difference was immediate. Claude correctly detected JavaScript-heavy content, extracted all 30 companies from page one, followed each link, pulled accurate structured data, built a clean Excel file — and did all of this without a single hallucination.

Why ScrapeGraphAI Works (While LLM Fetchers Fail)

Because LLMs don't have the infra for scraping at scale. ScrapeGraphAI is the main mission.

LLM fetch tools struggle with:

  • JavaScript rendered pages
  • Pagination
  • Antibot logic
  • Multistep workflows
  • Large scale crawling
  • Consistent structured data extraction

ScrapeGraphAI solves this by performing:

  • Real browser level fetching
  • DOM parsing
  • Schema validation
  • Recursive crawling
  • Antiduplicate logic
  • Robust retry mechanisms

The LLM is the brain, ScrapeGraphAI is the arm.

LLM + ScrapeGraphAI = Agentic Scraping

With ScrapeGraphAI behind it, Claude suddenly acts like a real agent. It navigates pages naturally, following links from one profile to the next without getting lost or confused. When it encounters a page, it extracts clean structured fields exactly as requested, and if something fails, it retries intelligently instead of crashing or hallucinating.

The results speak for themselves: Claude generates Excel files with perfectly organized data, summarizes entire datasets on demand, and continues working seamlessly across multi-page flows. All of this happens without the overhead of dealing with a slow and heavy browser, because ScrapeGraphAI handles the infrastructure while Claude focuses on understanding and organizing the data.

This is true agentic data acquisition.

All 30 companies extracted perfectly with ScrapeGraphAI as scraping engine

# Company Overview Address Telephone Website Key Proficiencies
1 Crayon Poland Sp. Z o.o. Global technology player with 47 offices worldwide, HQs in Oslo. One of IBM's larger Business Partners with strong competencies across IBM's software stack. Zlota 59, Warsaw, Poland +48 48222782777 crayon.com/pl-pl watsonx.ai, watsonx.data, Maximo, Guardium, Instana
2 Arrow ECS Mayorista de Soluciones Globales (Global Solutions Distributor) Avenida de Europa 21, Alcobendas, Madrid, Spain +34 0685914729 ibm.com MQ, Event Automation, QRadar, Guardium, watsonx suite
3 CAPGEMINI Technology Services Global leader in partnering with companies to transform business. 55-year heritage with deep industry expertise in cloud, data, AI, connectivity and platforms. 145-151 Quai du President Roosevelt, Paris, France +33 1 49673000 capgemini.com watsonx.ai, watsonx.data, Cloudability, Turbonomics, Maximo
4 YCOS Yves Colliard Software GmbH Since 1989 offering training, consultancy and products on MVS, OS/390 and z/OS platform. Bienenstr. 2, Euskirchen, Germany +49 2251 6250090 ycos.de z/OS platform, MVS, OS/390, ISV
5 Prolifics, Inc. Digital engineering and consulting firm helping navigate digital transformation. Expertise in Data & AI, Integration, Business Automation, DevXOps, Cybersecurity. Rödingsmarkt 20, Hamburg, Germany +49 40 89066770 prolifics.de Data & AI, Business Automation, DevXOps, Cybersecurity
6 iSky Development Founded 2013 in Cairo by Egyptian entrepreneurs. Solutions company serving Europe and Middle East for businesses, governments and non-profits. Unit 14, Tower 1, Silver Mall, Cairo, Egypt +20 20238326694 iskydev.com Event Automation, MQ, Turbonomics, Instana, API Connect
7 Deloitte Australia Industry-leading audit, tax, consulting, financial advisory and risk advisory services. Part of Deloitte Touche Tohmatsu Limited network. Quay Quarter Tower Level 46, Sydney, Australia 0293227000 deloitte.com.au Cloudability, Turbonomics, Verify, watsonx.ai, Terraform
8 TECH-HUB Professional IT Services Provider providing excellent business solutions to increase client revenue and provide competitive edge. 3 Road 262, New Maadi, Cairo, Egypt +20 101 6000789 tech-hub.tech Turbonomics, Instana, Guardium, watsonx Assistant
9 Cohesive Leading Maximo provider with 700+ successful implementations over 25 years. A Bentley brand for asset lifecycle management. Glenwood Office Park, Pretoria, South Africa Not available cohesivesolutions.com Maximo, TRIRIGA, ELM Suite, QRadar Suite
10 Jones Lang Lasalle Holding AB JLL Technologies delivers market-leading technology to power the future of real estate with purpose-built solutions. Birger Jarlsgatan 25, Stockholm, Sweden +46 84535000 jllsweden.se Maximo, TRIRIGA, Envizi Sustainability
11 Deloitte Poland (Consulting) Professional advisory services in audit, tax, economic, risk management, financial and legal advisory. al. Jana Pawła II 22, Warsaw, Poland (+)48 (22) 511 08 11 deloitte.com/pl Cloudability, Guardium, Verify, watsonx.ai, Terraform
12 ITALWARE SrL System integrator supporting Digital Transformation through ICT infrastructure solutions in partnership with major vendors. Via della Maglianella 65E, Roma, Italy +39 39 0666411156 italware.it Turbonomics, Power hardware, watsonx.ai, Guardium
13 GBM Dominicana, S.A. Leading IT services company, solutions integrator and IT expert. Exclusive IBM distributor in Central America, Dominican Republic and Haiti. John F Kennedy Ave, No. 14, Santo Domingo +809 566 5161 gbm.net Power systems, Turbonomics, Instana, Maximo, watsonx suite
14 CrushBank Technology, Inc. Award-winning Data and AI platform using IBM watsonx for faster IT support information access and problem resolution. 5 Aerial Way, Syosset, New York, USA 5163776585 crushbank.com Data and AI platform, IT support, ISV
15 Arrow ECS Baltic OÜ Global value add distributor with strong local teams around IBM technologies supporting customers at every journey stage. Sõpruse pst 145, Tallinn, Estonia Not available arrow.com/globalecs/ee Full IBM portfolio, watsonx suite, Maximo, Guardium
16 Cubewise China Full-service IBM Planning Analytics (TM1) supplier with hundreds of happy customers across six continents. 虹口区天潼路328号 WeWork, Shanghai, China +86 4000188803 cubewise.com Planning Analytics, watsonx.ai, Cognos Analytics
17 Cubewise Canada Largest, most enduring team of IBM Planning Analytics (TM1) specialists in the world dedicated to quality craftsmanship. 100 King Street W Suite 5700, Toronto, Canada 857 208 7267 cubewise.com Planning Analytics, watsonx.ai, Cognos Analytics
18 Phoenix Technologies AG Pioneers sovereign Cloud & AI solutions for large enterprises, governments and public entities in Switzerland. Alpenstrasse 9, Zug, Switzerland Not available phoenix-technologies.ch AI solutions, Cloud solutions, Sovereign Cloud & AI
19 CRAYON LITHUANIA, UAB Global technology player with 45 offices worldwide. One of IBM's larger Business Partners specializing in IBM Software optimization. 16F G Jasinskio G, Vilnius, Lithuania Not available crayon.com Cloudability, Instana, Turbonomics, watsonx.ai
20 Intercomputer Bulgaria Leading system integrator specializing in IT solutions including infrastructure, data analytics, middleware, security, AI/automation. 593 street, Sofia, Bulgaria Not available intercomputer.bg Event Automation, Power hardware, Maximo, watsonx suite
21 Crayon Australia Global technology player with 45 offices worldwide with strong competencies across IBM's software stack. 44 Lakeview Drive Scoresby, Melbourne, Australia +61 22891085 crayon.com/en-au Event Automation, Cloudability, Turbonomics, watsonx.ai
22 SHI International Corp. Passionate about delivering exceptional value helping customers select, deploy, and manage technology at global scale. 290 Davidson Avenue, Somerset, New Jersey, USA +1 888 7648888 shi.com Event Automation, Instana, Maximo, watsonx suite
23 Pedab Norway Dedicated IBM Value Add Distributor and Techbroker with European presence and high focus on IBM offerings for 30+ years. Stortingsgata 4, Oslo, Norway +47 476 57 700 pedab.no Cloudability, Power hardware, FlashSystems, watsonx.ai
24 Dun & Bradstreet D&B Ask Procurement is a GenAI assistant for procurement teams. World's leading source of commercial information with 550M+ business records. 5335 Gate Pkwy Ste 100, Jacksonville, Florida, USA Not available dnb.com ISV, GenAI assistant for procurement
25 PERSISTENT SYSTEMS LIMITED Global services company delivering AI-led Digital Engineering and Enterprise Modernization. 26,000+ employees in 18 countries. Bhageerath, 402, Senapati Bapat Road, Pune, India +91 20 30234000 persistent.com SevOne, Instana, watsonx Assistant, API Connect
26 Crayon Deutschland Global IBM Platinum Business Partner authorized in 30+ countries. Specialized in Software Licensing, SAM, Training and Consulting. Crayon Deutschland GmbH, Oberhaching, Germany +49 89 67804650 crayon.de Event Automation, Cloudability, Turbonomics, watsonx.ai
27 Dedagroup SPA Present with 35 locations in Italy, operating in UK, USA, Mexico and Tunisia. Partners in France, Germany and China. Via di Spini 50, Trento, Italy +39 461 997111 dedagroup.it Turbonomics, Maximo, Power hardware, watsonx.ai
28 MACS BV Software services provider for maintenance, service and facility management. 25 years of solutions across Europe, UK and worldwide. Museumstraat 8, Antwerpen, Belgium +32 3 2371755 macs.eu Maximo, Envizi, TRIRIGA
29 Kenac Computer Systems Zimbabwean company specializing in Enterprise ICT Solutions for Hardware and Software including Sales and support. 109 Enterprise Road, Highlands, Harare, Zimbabwe +263 4 0773836664 kenac.co.zw Event Automation, MQ, SevOne, Guardium, Power systems
30 InTTrust SA Provides high-quality IT services and solutions enhancing collaboration and productivity with current and emerging technologies. 2 Ipeirou Str, Ag. Paraskevi, Athens, Greece +30 210 6513040 inttrust.gr Event Automation, Maximo, Power systems, watsonx suite

The final XLSX file contained fully structured data, summary statistics and every corresponding profile URL.

Exactly what scraping is supposed to produce.

The big lesson: LLMs don't need better fetchers, they need real scraping engines

A fetch function isn't a scraper.

ScrapeGraphAI provides the missing infrastructure.

Give Claude a real scraping engine.

How to Set Up Claude With Scraping Capabilities

If you want Claude to behave exactly like the Scraping Beast described above, here is the exact setup process.

Step 1: Install the MCP Server

Go to the ScrapeGraphAI MCP server page:

https://smithery.ai/server/@ScrapeGraphAI/scrapegraph-mcp

Once you are on the page, scroll to the Auto section. Under the list of supported clients, you will find Claude Desktop.

Step 2: Run the Installation Command

Copy the npx command shown there and execute it in your terminal. This installs the ScrapeGraphAI MCP server locally and makes it visible to Claude Desktop.

Step 3: Restart Claude Desktop

Restart Claude Desktop so it can automatically detect the new MCP server.

Step 4: Get Your API Key

Head over to ScrapeGraphAI and retrieve your API key.

Step 5: Configure Claude

Open Claude Code and ask it to set up your ScrapeGraphAI API key. You can follow the same flow shown in this example chat:

https://claude.ai/share/990c3025-40ec-49c8-b0bb-7c556ac033b1

(The API key shown in that example is no longer active, sorry guys)

Step 6: Start Scraping!

Once the key is configured, Claude instantly becomes a true Scraping Beast, fully equipped with real browserless scraping power and agentic extraction capabilities.

Enjoy your Scraping Beast!

Give your AI Agent superpowers with lightning-fast web data!