SQL Data Cleaning Project: Layoffs Data

Welcome to my SQL data cleaning project! In this project, I tackled a dataset with over 2,300 records detailing layoffs across various industries. Using MySQL, I imported, cleaned, and standardized the data to ensure its usability and accuracy. Below, you can read more about the challenges I faced, the steps I took, and the tools and techniques I used to achieve my goals.

View Code

Objective

Clean and standardize a dataset of 2,361 records detailing layoffs across various industries to ensure data quality and usability.

Challenges

Data Import Issues: Initially, MySQL only imported 540 of the 2,361 records from a .csv file. After troubleshooting, I resolved this by converting the file to .json format, which allowed all records to be imported successfully.

Steps Taken

Data Import:
Imported a .json file to successfully include all 2,361 records into the MySQL database.
Data Cleaning:
- Removing Duplicates: Identified and deleted duplicate records using a ROW_NUMBER() function with a PARTITION BY clause.
- Standardizing Data:
  - Trimmed unnecessary spaces from text fields.
  - Standardized industry names (e.g., consolidated variations of "Crypto").
  - Reformatted date values to a consistent DATE format using STR_TO_DATE.
- Handling Missing/Null Values:
  - Identified and replaced missing or blank values where possible by cross-referencing other entries.
  - Deleted records where critical fields (e.g., total_laid_off, percentage_laid_off) were missing.
- Column Optimization: Dropped unnecessary columns (row_num after deduplication was complete).
Validation: Verified data consistency and integrity through targeted queries.

Tools Used

Database: MySQL
File Formats: .csv and .json

Key SQL Techniques

ROW_NUMBER() with PARTITION BY for deduplication.
String functions (TRIM, LIKE) for standardization.
STR_TO_DATE for date conversion.
Joins and subqueries for filling missing values.

Results

Cleaned and standardized a dataset of 2,361 records. Prepared data for further analysis and visualization.