Sneaker Data Analysis and Visualization
Role: Data Analyst
Team: Abir Ahmed, Loannis Parankas
Timeline: March 2022 - April 2022, 1 month
For a Data-Oriented Programming project, I collaborated with another data analyst to develop a Python script that scrapes a user-defined number of products across Goat.com and StockX.com (two popular sneaker resale websites).
Skills and Tools: Python, APIs, SQL, HTML, Visualizations, Documentation, Data Scraping
Background
The goal is to gather data, store that data in a database in several tables, calculate something from the data in the database, create visualizations from what you calculated, and print out a report with the calculated data and the visualizations. We decided to work with sneakers since we love sneaker culture and reselling. 👟
Work with Two APIs or one API and one website (with BSoup)
Create at least 2 visualizations
Goals
Original Goal
The original goal of this project was to create a database of historic sales data from StockX and then monitor for any price changes. As soon as a price change was detected that was greater than a % selected by users, the goal was to send an SMS to users so they could buy the product. The goal was to send the SMS using the Twilio API and to send product links encoded with the VigLink API to monetize any purchases made by users.
Adapted Goal
Instead, we designed a script that scrapes a user-defined number of products across Goat.com and StockX.com (two popular sneaker resale websites). The script then scrapes product data for said popular SKUs, and calculates the average sale price by size, the total sales by size, and total bids by size across the collected data. These 3 calculations are then visualized, using MatplotLib. Visualizations are additionally generated for all StockX and Goat products scraped.
Problems Faced
StockX.com and Goat.com did not offer public APIs
Their website data was not easily scrapeable using BeautifulSoup. To get around this issue, our team used the mitmproxy program to intercept the traffic each site used with their apps. Mitmproxy only shows traffic through curl requests, so our team then converted the requests to Python, to get access to the site’s API.
Access to the APIs was data protected.
(In the case of both apps, offered by PerimeterX). For StockX we were able to bypass the limitation by editing the headers we used when making requests. For Goat, we were unable to find a bypass and instead had to generate multiple headers in the development of our program.
The difference between the StockX and Goat API?
StockX reveals a lot more information about products than Goat, greatly impacting the amount of analysis we were able to complete on the data we collected.
Calculation Files
The file “avgSales.txt” contains a dictionary with the average price of every scraped product by size across both sneaker platforms.
The file “bidsTotals.txt” contains a dictionary with the total number of bids by size on StockX.
The file “salesTotals.txt” contains a dictionary with the total number of sales by size on StockX.
The files with the format “StockX_product_...headers.txt” contain product names, and product links that have been ran through the VigLink and Bit.ly API’s. They correspond to the “StockX_product_...data.txt” files. The same goes for the “Goat_product_...headers.txt” and “Goat_product_...data.txt” files.
Note that since there are ~400 of these files, we are only including 4 calculations (1 of each file type).
Visualizations
Average Sale Price By Size Across Goat and StockX
Total Bids By Size on StockX
Total Unit Sold By Size on StockX
Note that since there are ~200 visualization files, only 2 calculations are included (1 of each site).
Air Jordan 1 Retro High OG Patent Bred 555088-063
Nike Dunk Low Grey Fog
Instructions for Running Code
Navigate to “SI206FinalProject.py”
Run the code
Follow the instructions prompted by the program. To avoid any PerimeterX flagging when scraping Goat.com data, we recommend loading no more than 15 products. Use function “5” if you only have the .py file downloaded. Otherwise, feel free to use any of the functions the script describes.
-
get_stockX_popular
Inputs: Quantity
Outputs: A list of products
API’s Used: StockX API
Description: Function takes in a quantity, and then returns the top quantity # worth of popular products on StockX. StockX defines a product as being popular if it has the most sales in the past 72 hours.
build_stockX_product
Inputs: StockX Product UUID
Outputs: Product Dictionary
API’s Used: StockX API
Description: Function takes in a UUID and builds a dictionary including product lowest ask, total asks, highest bid, total bids, total sold, average sale price, and market cap using the StockX API.
get_stockX_product_link
Inputs: StockX Product UUID
Outputs: StockX Product Link
API’s Used: StockX API
Description: Function takes in a StockX product UUID and returns its respective product link.
get_Goat_popular
Inputs: Quantity
Outputs: A list of products
API’s Used: Goat API
Description: Function takes in a quantity, and then returns the top quantity # worth of popular products on Goat.
build_goat_product
Inputs: productTemplateId
Outputs: Product Dictionary
API’s Used: Goat API
Description: Function takes in a product slug and returns the current price per size.
get_Goat_product_link
Inputs: Product Slug
Outputs: Product Link
API’s Used: None
Description: Function takes in a product slug and returns its respective product link.
makeMoney
Inputs: Link
Outputs: Bit.ly link
API’s Used: VigLink API, Bit.ly
Description: Function takes in a product link, and then runs it through the VigLink API to generate a product affiliate link. The function then shortens the link using the Bit.ly API.
setUpDatabase
Inputs: Database Name
Outputs: Database Cursor and Connection
API’s Used: None
Description: Function takes in a name, and returns a database cursor and connection object.
build_StockX_popular_DB
Inputs: Popular StockX Styles, Database Cursor, Database Connection
Outputs: Database containing UUID, Description, and Product Links
API’s Used: None
Description: Function uses data returned from get_stockX_popular and makeMoney to build StockXPopular table in database.
build_StockX_product_DB
Inputs: Product UUID, Product Data, Database Cursor, Database Connection
Outputs: Individual Product Database
API’s Used: VigLink, Bit.ly
Description: Function generates the Bit.ly product link, and then populates a table using inputted product data
build_Goat_popular_DB
Inputs: Popular Goat Styles, Database Cursor, Database Connection
Outputs: Database containing ID, Slug ,and Product Links
API’s Used: None
Description: Function uses data returned from get_Goat_popular and makeMoney to build StockXPopular table in database.
build_Goat_product_DB
Inputs: Product Slug, Product Data, Database Cursor, Database Connection
Outputs: Individual Product Database
API’s Used: VigLink, Bit.ly
Description: Function generates the Bit.ly product link, and then populates a table using inputted product data.
read_StockX_popular_table
Inputs: Database Cursor, Database Connection
Outputs: Individual Product Databases for each product in StockX Popular Products.
API’s Used: StockX API
Description: Function reads the StockXPopular Database and creates a database for every popular product (1 at a time to comply with class 25 limit).
read_Goat_popular_table
Inputs: Database Cursor, Database Connection
Outputs: Individual Product Databases for each product in Goat Popular Products.
API’s Used: Goat API
Description: Function reads the GoatPopular Database and creates a database for every popular product (1 at a time to comply with class 25 limit).
get_avg_price_all
Inputs: Database Cursor, Database Connection
Outputs: avgSales.txt
API’s Used: None
Description: Function parses through every product database and creates a dictionary with a calculated average sale price per size. The calculation is done across products from both Goat and StockX.
total_bids_by_size_StockX
Inputs: Database Cursor, Database Connection
Outputs: bidsTotals.txt
API’s Used: None
Description: Function parses through all StockX products, and calculates the total bids per size.
total_sold_by_size_StockX
Inputs: Database Cursor, Database Connection
Outputs: salesTotals.txt
API’s Used: None
Description: Function parses through all StockX products, and calculates the total units sold per size.
create_StockX_product_TXT
Inputs: Database Cursor, Database Connection
Outputs: Many TXT Files
API’s Used: None
Description: Function uses a JOIN function to create two TXT files per StockX product. One file is used as the header for visualization, while the other is used as the main data for visualization. The JOIN function allows the chart title to be the correct shoe name for each visualization.
create_Goat_product_TXT
Inputs: Database Cursor, Database Connection
Outputs: Many TXT Files
API’s Used: None
Description: Function uses a JOIN function to create two TXT files per Goat product. One file is used as the header for visualization, while the other is used as the main data for visualization. The JOIN function allows the chart title to be the correct shoe name for each visualization.
visualize_Goat_TXTs
Inputs: None
Outputs: Visualizations
API’s Used: None
Description: Function uses the TXT files created by create_Goat_product_TXT to create a data visualization using MatplotLib.
visualize_StockX_TXTs
Inputs: None
Outputs: Visualizations
API’s Used: None
Description: Function uses the TXT files created by create_StockX_product_TXT to create a data visualization using MatplotLib.
visualize_avg_price_all
Inputs: None
Outputs: Visualizations
API’s Used: None
Description: Function uses avgSales.txt to create a visualization using MatplotLib.
visualize_total_bids_by_size_StockX
Inputs: None
Outputs: Visualizations
API’s Used: None
Description: Function uses bidsTotals.txt to create a visualization using MatplotLib.
visualize_sales_by_size_StockX
Inputs: None
Outputs: Visualizations
API’s Used: None
Description: Function uses salesTotals.txt to create a visualization using MatplotLib.
initialize_popular_data
Inputs: None
Outputs: None
API’s Used: None
Description: Function initializes all popular data, and asks for user input.
refresh_SKU_data
Inputs: None
Outputs: None
API’s Used: None
Description: Function refreshes all popular product data, pulling from the site popular databases.
create_all_visuals
Inputs: None
Outputs: None
API’s Used: None
Description: Function calls all visualization functions and generates data for them to function properly.
Main
Inputs: None
Outputs: None
API’s Used: None
Description: Function uses initialize_popular_data, refresh_SKU_data, and create_all_visuals, to synthesize the use of all created functions.