Definition
CSS selectors are pattern-matching expressions originally designed for applying styles to HTML elements. In web scraping, they serve as a primary method for targeting specific elements on a page to extract their content. Selectors identify elements by tag name, class, ID, attributes, position in the document tree, and various combinations thereof.
Common Selector Types
Basic Selectors
- Element —
div,p,atarget all elements of that type - Class —
.product-titletargets elements with that class name - ID —
#main-contenttargets the element with that unique ID - Attribute —
[data-price]targets elements with a specific attribute
Combinators
- Descendant —
div .titleselects.titleelements inside anydiv - Child —
ul > liselects only directlichildren oful - Adjacent sibling —
h2 + pselects the firstpimmediately after anh2
Pseudo-Selectors
:first-child— targets the first child element:nth-child(n)— targets the nth child element:not(.class)— excludes elements matching the inner selector
CSS Selectors in Web Scraping
Selectors are the traditional workhorse of web scraping. To extract product prices, you might use .product-card .price. To get all article links, perhaps article a[href]. Libraries like BeautifulSoup, Cheerio, and Puppeteer all support CSS selector queries natively.
Limitations
The fundamental weakness of CSS selectors in scraping is their brittleness. They are tightly coupled to the HTML structure. When a website redesigns — changing class names, restructuring elements, or switching CSS frameworks — selectors break and require manual updates.
Maintaining selectors across hundreds of target sites becomes a significant operational burden.
Beyond CSS Selectors with ScrapeGraphAI
ScrapeGraphAI's AI-powered extraction reduces dependence on CSS selectors by understanding page content semantically. Instead of specifying exact element paths, you describe the data you want, and the AI identifies it regardless of the underlying HTML structure. This approach is inherently more resilient to site changes than selector-based scraping.