UX/UI Designer

Sneaker Data Analysis

 

 Sneaker Data Analysis and Visualization

Role: Data Analyst

Team: Abir Ahmed, Loannis Parankas

Timeline: March 2022 - April 2022, 1 month

For a Data-Oriented Programming project, I collaborated with another data analyst to develop a Python script that scrapes a user-defined number of products across Goat.com and StockX.com (two popular sneaker resale websites).

Skills and Tools: Python, APIs, SQL, HTML, Visualizations, Documentation, Data Scraping

Full GitHub repository with report here


Background

The goal is to gather data, store that data in a database in several tables, calculate something from the data in the database, create visualizations from what you calculated, and print out a report with the calculated data and the visualizations. We decided to work with sneakers since we love sneaker culture and reselling. 👟

  • Work with Two APIs or one API and one website (with BSoup)

  • Create at least 2 visualizations


Goals

Original Goal

The original goal of this project was to create a database of historic sales data from StockX and then monitor for any price changes. As soon as a price change was detected that was greater than a % selected by users, the goal was to send an SMS to users so they could buy the product. The goal was to send the SMS using the Twilio API and to send product links encoded with the VigLink API to monetize any purchases made by users.

Adapted Goal

Instead, we designed a script that scrapes a user-defined number of products across Goat.com and StockX.com (two popular sneaker resale websites). The script then scrapes product data for said popular SKUs, and calculates the average sale price by size, the total sales by size, and total bids by size across the collected data. These 3 calculations are then visualized, using MatplotLib. Visualizations are additionally generated for all StockX and Goat products scraped.


Problems Faced

StockX.com and Goat.com did not offer public APIs

Their website data was not easily scrapeable using BeautifulSoup. To get around this issue, our team used the mitmproxy program to intercept the traffic each site used with their apps. Mitmproxy only shows traffic through curl requests, so our team then converted the requests to Python, to get access to the site’s API.

Access to the APIs was data protected.

(In the case of both apps, offered by PerimeterX). For StockX we were able to bypass the limitation by editing the headers we used when making requests. For Goat, we were unable to find a bypass and instead had to generate multiple headers in the development of our program.

The difference between the StockX and Goat API?

StockX reveals a lot more information about products than Goat, greatly impacting the amount of analysis we were able to complete on the data we collected.


Calculation Files

The file “avgSales.txt” contains a dictionary with the average price of every scraped product by size across both sneaker platforms.

The file “bidsTotals.txt” contains a dictionary with the total number of bids by size on StockX.

The file “salesTotals.txt” contains a dictionary with the total number of sales by size on StockX.

The files with the format “StockX_product_...headers.txt” contain product names, and product links that have been ran through the VigLink and Bit.ly API’s. They correspond to the “StockX_product_...data.txt” files. The same goes for the “Goat_product_...headers.txt” and “Goat_product_...data.txt” files.

Note that since there are ~400 of these files, we are only including 4 calculations (1 of each file type).


Visualizations

Average Sale Price By Size Across Goat and StockX

 

Total Bids By Size on StockX

 

Total Unit Sold By Size on StockX

Note that since there are ~200 visualization files, only 2 calculations are included (1 of each site).

 

Air Jordan 1 Retro High OG Patent Bred 555088-063

Nike Dunk Low Grey Fog


Instructions for Running Code

  • Navigate to “SI206FinalProject.py”

  • Run the code

  • Follow the instructions prompted by the program. To avoid any PerimeterX flagging when scraping Goat.com data, we recommend loading no more than 15 products. Use function “5” if you only have the .py file downloaded. Otherwise, feel free to use any of the functions the script describes.

  • get_stockX_popular

    Inputs: Quantity

    Outputs: A list of products

    API’s Used: StockX API

    Description: Function takes in a quantity, and then returns the top quantity # worth of popular products on StockX. StockX defines a product as being popular if it has the most sales in the past 72 hours. 

    build_stockX_product

    Inputs: StockX Product UUID

    Outputs: Product Dictionary

    API’s Used: StockX API

    Description: Function takes in a UUID and builds a dictionary including product lowest ask, total asks, highest bid, total bids, total sold, average sale price, and market cap using the StockX API. 

    get_stockX_product_link

    Inputs: StockX Product UUID

    Outputs: StockX Product Link

    API’s Used: StockX API

    Description: Function takes in a StockX product UUID and returns its respective product link. 

    get_Goat_popular

    Inputs: Quantity

    Outputs: A list of products

    API’s Used: Goat API

    Description: Function takes in a quantity, and then returns the top quantity # worth of popular products on Goat.

    build_goat_product

    Inputs: productTemplateId

    Outputs: Product Dictionary

    API’s Used: Goat API

    Description: Function takes in a product slug and returns the current price per size. 

    get_Goat_product_link

    Inputs: Product Slug

    Outputs: Product Link

    API’s Used: None

    Description: Function takes in a product slug and returns its respective product link.

     

    makeMoney

    Inputs: Link

    Outputs: Bit.ly link

    API’s Used: VigLink API, Bit.ly

    Description: Function takes in a product link, and then runs it through the VigLink API to generate a product affiliate link. The function then shortens the link using the Bit.ly API. 

    setUpDatabase

    Inputs: Database Name

    Outputs: Database Cursor and Connection

    API’s Used: None

    Description: Function takes in a name, and returns a database cursor and connection object. 

    build_StockX_popular_DB

    Inputs: Popular StockX Styles, Database Cursor, Database Connection

    Outputs: Database containing UUID, Description, and Product Links

    API’s Used: None

    Description: Function uses data returned from get_stockX_popular and makeMoney to build StockXPopular table in database. 

    build_StockX_product_DB

    Inputs:  Product UUID, Product Data, Database Cursor, Database Connection

    Outputs: Individual Product Database

    API’s Used: VigLink, Bit.ly

    Description: Function generates the Bit.ly product link, and then populates a table using inputted product data

    build_Goat_popular_DB

    Inputs: Popular Goat Styles, Database Cursor, Database Connection

    Outputs: Database containing ID, Slug ,and Product Links

    API’s Used: None

    Description: Function uses data returned from get_Goat_popular and makeMoney to build StockXPopular table in database. 

    build_Goat_product_DB

    Inputs: Product Slug, Product Data, Database Cursor, Database Connection

    Outputs: Individual Product Database

    API’s Used: VigLink, Bit.ly

    Description: Function generates the Bit.ly product link, and then populates a table using inputted product data. 

    read_StockX_popular_table

    Inputs: Database Cursor, Database Connection

    Outputs: Individual Product Databases for each product in StockX Popular Products. 

    API’s Used: StockX API

    Description: Function reads the StockXPopular Database and creates a database for every popular product (1 at a time to comply with class 25 limit).

    read_Goat_popular_table

    Inputs: Database Cursor, Database Connection

    Outputs: Individual Product Databases for each product in Goat Popular Products. 

    API’s Used: Goat API

    Description: Function reads the GoatPopular Database and creates a database for every popular product (1 at a time to comply with class 25 limit).

    get_avg_price_all

    Inputs: Database Cursor, Database Connection

    Outputs: avgSales.txt

    API’s Used: None

    Description: Function parses through every product database and creates a dictionary with a calculated average sale price per size. The calculation is done across products from both Goat and StockX. 

    total_bids_by_size_StockX

    Inputs: Database Cursor, Database Connection

    Outputs: bidsTotals.txt

    API’s Used: None

    Description: Function parses through all StockX products, and calculates the total bids per size. 

    total_sold_by_size_StockX

    Inputs: Database Cursor, Database Connection

    Outputs: salesTotals.txt

    API’s Used: None

    Description: Function parses through all StockX products, and calculates the total units sold per size. 

    create_StockX_product_TXT

    Inputs: Database Cursor, Database Connection

    Outputs: Many TXT Files

    API’s Used: None

    Description: Function uses a JOIN function to create two TXT files per StockX product. One file is used as the header for visualization, while the other is used as the main data for visualization. The JOIN function allows the chart title to be the correct shoe name for each visualization. 

    create_Goat_product_TXT

    Inputs: Database Cursor, Database Connection

    Outputs: Many TXT Files

    API’s Used: None

    Description: Function uses a JOIN function to create two TXT files per Goat product. One file is used as the header for visualization, while the other is used as the main data for visualization. The JOIN function allows the chart title to be the correct shoe name for each visualization. 

    visualize_Goat_TXTs

    Inputs: None

    Outputs: Visualizations

    API’s Used: None

    Description: Function uses the TXT files created by create_Goat_product_TXT to create a data visualization using MatplotLib. 

    visualize_StockX_TXTs

    Inputs: None

    Outputs: Visualizations

    API’s Used: None

    Description: Function uses the TXT files created by create_StockX_product_TXT to create a data visualization using MatplotLib. 

    visualize_avg_price_all

    Inputs: None

    Outputs: Visualizations

    API’s Used: None

    Description: Function uses avgSales.txt to create a visualization using MatplotLib. 

    visualize_total_bids_by_size_StockX

    Inputs: None

    Outputs: Visualizations

    API’s Used: None

    Description: Function uses bidsTotals.txt to create a visualization using MatplotLib.

    visualize_sales_by_size_StockX

    Inputs: None

    Outputs: Visualizations

    API’s Used: None

    Description: Function uses salesTotals.txt to create a visualization using MatplotLib.

    initialize_popular_data

    Inputs: None

    Outputs: None

    API’s Used: None

    Description: Function initializes all popular data, and asks for user input. 

    refresh_SKU_data

    Inputs: None

    Outputs: None

    API’s Used: None

    Description: Function refreshes all popular product data, pulling from the site popular databases. 

    create_all_visuals

    Inputs: None

    Outputs: None

    API’s Used: None

    Description: Function calls all visualization functions and generates data for them to function properly. 

    Main

    Inputs: None

    Outputs: None

    API’s Used: None

    Description: Function uses initialize_popular_data, refresh_SKU_data, and create_all_visuals, to synthesize the use of all created functions.

 

Documentation of Resouces


Full GitHub Project here

Full report here


For more work inquiries, or to grab a coffee do email me at abiahmed@umich.edu

Thank you for reading!📚