What is Crawl Depth?

Definition

Crawl depth refers to how many links away from the starting URL a web crawler will follow before stopping. A depth of 0 means only the starting page is crawled. A depth of 1 includes the starting page plus all pages directly linked from it. A depth of 2 adds pages linked from those pages, and so on.

Why Crawl Depth Matters

Resource Management

Every additional level of depth can exponentially increase the number of pages to crawl. A site with an average of 50 links per page generates up to 50 pages at depth 1, 2,500 at depth 2, and 125,000 at depth 3. Without depth limits, crawlers can spiral into millions of pages, consuming time, bandwidth, and storage.

Relevance

Content relevance often decreases with depth. The starting page is presumably the most relevant target. Pages one click away are closely related. By depth 3 or 4, you may be crawling tangentially related or entirely irrelevant content.

Completeness vs Efficiency

There is an inherent trade-off between thoroughness and efficiency. A shallow crawl is fast but may miss important content buried deeper in the site structure. A deep crawl is comprehensive but expensive and slow.

Choosing the Right Depth

The optimal crawl depth depends on the site structure and your goals:

Depth 0 — single-page scraping, when you only need one specific page
Depth 1 — collecting a page and its immediate links, common for category pages linking to product pages
Depth 2-3 — exploring a section of a site, suitable for most content collection tasks
Unlimited depth — full site archival or comprehensive crawling (use with careful filtering)

Combining Depth with Filters

Depth alone is a blunt instrument. Effective crawlers combine depth limits with URL pattern filters, content type restrictions, and domain boundaries to target specific content regardless of its position in the link graph.

Crawl Depth in ScrapeGraphAI

ScrapeGraphAI allows you to configure crawl depth when initiating crawl operations, giving you precise control over how extensively the crawler explores a site. Combined with URL filtering, this ensures efficient, targeted data collection.