PandasAI x ScrapeGraphAI️: Building a scraping agent that makes analytics

Having clean data and neatly visualising that is the backbone of data science and business analytics which can boost your understanding of the market you are in.
Finding the right tools for doing this is easier and smooth than it has ever been.
Introducing ScrapeGraphAI X PandasAI
What is ScrapeGraphAI 🕷️
ScrapeGraphAI is an API for extracting data from the web with the use of AI.
So it will help you with the data part which is focussed on scraping and aggregating information from various sources to gain insights from. This service will fit in your data pipeline perfectly because of the easy to use apis that we provide which are fast and accurate.And it's all AI powered.

What is PandasAI 🐼
PandaAI is an open-source framework that brings together intelligent data processing and natural language analysis. Whether you're working with complex datasets or just starting your data journey, PandaAI provides the tools to define, process, and analyze your data efficiently. Through its powerful data preparation layer and intuitive natural language interface, you can transform raw data into actionable insights without writing complex cod created by Gabriele Venturi in 2022.
PandasAI fits into our second piece which helps you perform operations on the data and visualize the data that we scraped from ScrapeGraphAI. It will help you perform operations on the data frames via natural language avoiding the tedious work that goes into transforming data. It will also plot the data for you using prompting.

How to create a dataset for extracting data from web and create analytics
You can either clone the repository and run the notebook directly or follow the tutorial
In the guide below we are going to create a simple agent which has access to two tools called analyze data and scrape website which will handle the plotting of the graph and scraping of the website using the PandasAI and ScrapeGraphAI. We are going to use OpenAI as our model provider you can choose the model provider of your own
1.First you need to install the following dependencies
bashipython==9.2.0 langchain_core==0.3.59 langchain_openai==0.3.16 langgraph==0.4.3 pandasai numpy==1.24.4 python-dotenv==1.1.0 scrapegraph_py==1.12.0
2.Then after that import the following packages:
pythonfrom dotenv import load_dotenv import pandas as pd import pandasai as pai import logging from scrapegraph_py import Client from langchain_openai import ChatOpenAI from langgraph.graph import MessagesState, StateGraph, START from langgraph.prebuilt import tools_condition, ToolNode from langchain_core.messages import HumanMessage, SystemMessage
3.Then you need to load the your API Keys:
python# Load environment variables from .env file load_dotenv() # Access environment variables scrape_graph_api_key = os.getenv('SCRAPEGRAPH_API_KEY') pandasai_api_key = os.getenv('PANDASAI_API_KEY') openai_api_key = os.getenv('OPENAI_API_KEY') # Set up logging logging.basicConfig(level=logging.INFO) (You can get the api keys from following links) [https://dashboard.scrapegraphai.com/](https://dashboard.scrapegraphai.com/) [https://app.pandabi.ai/sign-in](https://app.pandabi.ai/sign-in) [https://platform.openai.com/api-keys](https://platform.openai.com/api-keys)
- Now we will define our Tool which will use the PandasAI
This tool will plot the graphs for us. we have added some validations on the data to have consistent values
pythondef analyze_data(df_dict, question): """ Analyze the dataset and answer a question using pandasai. Parameters: - df_dict (dict): Dictionary containing data as lists under arbitrary keys and numbers as floats - question (str): Prompt for plotting the graph based on the data Returns: - The model's response Raises: - ValueError: If input cannot be converted to a valid DataFrame """ try: # Convert dictionary to DataFrame df = pd.DataFrame(df_dict) # Clean columns for column in df.columns: # Check if the column might contain mixed data if df[column].dtype == object: # Remove 'default' or 'N/A' entries df = df[~df[column].isin(['default', 'N/A'])] # Function to convert strings to float if they represent numbers def convert_to_float(value): if isinstance(value, str): # Remove currency symbols and replace commas with dots cleaned_value = value.replace('€', '').replace(',', '.') # Check if the cleaned value can be converted to float try: return float(cleaned_value) except ValueError: # Return original value if it can't be converted return value return value # Apply conversion to the column df[column] = df[column].apply(convert_to_float)
Ready to Scale Your Data Collection?
Join thousands of businesses using ScrapeGrapAI to automate their web scraping needs. Start your journey today with our powerful API.
text# If the column contains numeric values, ensure they're float if df[column].apply(lambda x: isinstance(x, (int, float))).any(): try: df[column] = pd.to_numeric(df[column], errors='coerce') # Drop rows with NaN in this column df = df.dropna(subset=[column]) if df[column].notna().any() else df except (AttributeError, ValueError): continue logging.info("Setting pandasai API key") pai.api_key.set(pandasai_api_key) logging.info("Creating DataFrame for analysis") # Use pandasai directly with the DataFrame response = pai.DataFrame(df).chat(question) # Save any generated plot response.save('plot.png') return response except Exception as e: logging.error(f"Error in analyze_data: {str(e)}") raise ValueError(f"Failed to process data: {str(e)}")
text5.Now we will define our Tool which will use ScrapeGraphAI This tool accepts website url and user prompt as arguments and scrapes the content of the website ```python def scrape_website(website_url, user_prompt) -> dict: """ Perform a scraping request on a website using ScrapeGraphAI. Parameters: - website_url (str): The URL of the website to scrape - user_prompt (str): The data extraction prompt Returns: - A dictionary containing the scraped data """ try: sgai_client = Client(api_key=scrape_graph_api_key) logging.info("Creating ScrapeGraphAI client") response = sgai_client.smartscraper( website_url=website_url, user_prompt=user_prompt ) sgai_client.close() result = response.values() # Extract the data dictionary from the response (usually at index 4) data_dict = list(result) # Ensure the return is a dictionary if not isinstance(data_dict, dict): data_dict = {"data": data_dict} return data_dict except Exception as e: logging.error(f"Error in scrape_website: {str(e)}") raise
6.Now we will create the Agent and add these tools to the agent
python# Create the agent tools = [analyze_data, scrape_website] llm = ChatOpenAI(model="gpt-4o", api_key=openai_api_key) llm_with_tools = llm.bind_tools(tools) # System message for the assistant sys_msg = SystemMessage(content="""You are a helpful assistant tasked with performing scraping scripts with scrapegraphai and analyzing the data with pandasai. You have access to the following tools: - scrape_website: to scrape a website - analyze_data: to analyze a pandas dataframe """) # Assistant function def assistant(state: MessagesState): return {"messages": [llm_with_tools.invoke([sys_msg] + state["messages"])]} # Build the graph builder = StateGraph(MessagesState) builder.add_node("assistant", assistant) builder.add_node("tools", ToolNode(tools)) builder.add_edge(START, "assistant") builder.add_conditional_edges( "assistant", tools_condition, ) builder.add_edge("tools", "assistant") react_graph = builder.compile() 7. You can see the plot of the graph by using the below code: from IPython.display import Image, display from langchain_core.runnables.graph import MermaidDrawMethod display( Image( react_graph.get_graph().draw_mermaid_png( draw_method=MermaidDrawMethod.API, ) ) )
This is the structure of our graph

9.Now lets run the agent
pythonmessages = react_graph.invoke( input={ "messages": [HumanMessage(content="""Draw me a histogram for rating against price for the products in the following link: https://www.amazon.com/s?k=keyboards&crid=2F2S3TU22QHOF&sprefix=keyboar%2Caps%2C442&ref=nb_sb_noss_2""")] } )
It will output a file called plot.png in your directory

This is how you can use ScrapeGraphAI and PandasAI together and enhance your data analysis work by 10x with our AI native pipelines that handle the complex stuff for you.
Frequently Asked Question (FAQ)
Are PandasAI and ScrapeGraphAI open source?
Absolutely both the services are open source, do leave a star on our repos
Why should I use ScrapeGraphAI and PandasAI together?
Both fit into the data pipelines seamlessly and work perfectly and both the services are AI powered.
In which scenarios is the ScrapeGraphAI + PandasAI combo most powerful?
If you want to do some research on competitor analysis or study the prices of products on e-commerce websites then this is a perfect combination because such data is quantitative and can be scraped using ScrapeGraphAI and PandasAI can help transform that data and plot beautiful graphs for visualizing.
Ready to Scale Your Data Collection?
Join thousands of businesses using ScrapeGrapAI to automate their web scraping needs. Start your journey today with our powerful API.
Did you find this article helpful?
Share it with your network!