Skip to content

Lpaakh/ETL_project

 
 

Repository files navigation

ETL_project

This project is part of the Data Science and Visualization Bootcamp at UCSD Extension.

ERD

Project Intro/Objective

The aim of this project is to perform ETL and merge two datasets - PPP loans and COVID-19 cases for California - to allow researchers to evaluate for a possible link between COVID-19 infection cases and receipt of PPP loans. This project was co-engineered by four people: Stephen Hong, Laura Paakh May, Nghia Nguyen, and Sagar Patel.

Methods and Technologies Used

Step 1 - Extract the data Searched Kaggle and downloaded four databases: A. PPP loan B. US county demographics C. US county and covid cases D. US county and corresponding zip codes

Step 2 - ERD Create Entity Relationship Diagram.

Step 3 - Transform Cleaned the data using python and jupyter notebook.

Step 4 - SQL Created SQL tables using postgreSQL.

Step 5 - Load Connect to PostgreSQL and upload data to tables.

Needs of this project

  • data exploration/descriptive statistics
  • data processing/cleaning
  • statistical modeling
  • ERD diagram modeling
  • database loading/table design
  • writeup/reporting

Getting Started

  1. Clone this repo (for help see this tutorial).

  2. Raw Data is being kept here within this repo.

  3. Data processing/transformation scripts are being kept here.

  4. Data loaded into the SQL tables scripts are here.

About

A project that used the ETL method and allows researchers to evaluate for a possible link between COVID-19 infection cases and receipt of PPP loans

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Jupyter Notebook 100.0%