A Data Analytics Enthusiast

Logo


Simplifying the world,
One dataset at a time.

I'm fuelled by using data to simplify complexities into actionable insights!

MY PROJECTS PORTFOLIO
MY ARTICLES
ABOUT ME

View My LinkedIn Profile

View My GitHub Profile

SQL

Module Introduction

Structured Query Language was the fourth core in the curriculum after the modules of Basics of Data Science, Power Query, and Excel. We started off with an introduction to databases, schemas, Entity-Relationship Diagrams, then moved on to SQL datatypes, basic commands, conditional operators, functions and aggregate functions, set operators, joins, and finished off with stored procedures.

This Capstone Project was designed to ignite our cognitive process, reinforce and apply our knowledge to “real-life” cases so as to solidify our learnings. We were graded on our ability to showcase and exhibit the five stages of the data lifecycle through the use of SQL and Excel (data collection, creating tables and loading data into SQL databases, data preparation, data analysis using SQL, creating Excel interactive dashboard and visualizations, presentation to stakeholders).

Capstone Project 2: The Movie Industry, An Overview

Click here to download my SQL code
Click here to download my Excel Interactive Dashboard
Click here to download my PowerPoint Presentation Show

Abstract

As like most things in life, there are a plethora of variables as to what makes a movie a “success”. This project is based off the angle that my target audience is looking towards going into the movie business and would like to get a overall sense of the movie industry.

Dataset Introduction

The Movies Dataset consists of 7 csv files (credits, keywords, links, moview_metadata, ratings, etc.). For this project’s context, I have only used the following 4:

The files are too massive to process in excel, and so SQL comes in to save the day!
Files are massive to even load in Github!
Just the ratings file alone is 692,921 KB.

Problem Statement

What is the direction I should take in coming up with my debut movie?

Some of the questions answered include:

Process Workflow

Planning

As Benjamin Franklin said,

“If you fail to plan, you are planning to fail.”

Likewise when dealing with databases, it is good practice (an essential, for me), to start off with planning.

I did this by doing up an ER-diagram:



Creating Database, Loading Data, Data Preparation

In the SQL Server, I created a new database named MOVIES, uploaded the files, used aggregate functions to check for duplicate rows and deleted the duplications in the files.

Data Analysis

Moving on to the Data Analysis part, as usual, I took a top-down approach in analysing the data as I like to have an overall feel of what the dataset is about before I drill down into the nitty gritties (which might overwhelm if I started off there).

Then, I started drilling down to find out:

Data Manipulation

And finally, I created a data dump for the Excel Dashboard by creating views and joining all the relevant views and tables together.

I imported the data dump into Excel, did the necessary pivot tables, visualizations, arranged them on the Dashboard, and did the slicers to enable interactions.









Dashboard Takeaways

I presented the following takeaways to the Boss.


I showed the Boss how to work the Excel Interactive Dashboard by using the slicers… by taking the Star Wars Collection as an example. It was:

The Star Wars Collection falls under the genres of Action, Adventure, Science, Fiction, Fantasy


The Dashboard would give him an efficient view of:

With my final suggestion to start off with a high return on investment movie (Horror/Thriller genre)…

The Boss is very pleased.
I aced the project.
I am pleased.
I scored a 19 out of 20.


Project Scoring Rubrics

We were rated based on the following 4 components:



Contact me

LinkedIn https://www.linkedin.com/in/megan-ng-7708a74/
Email megan_ng@hotmail.com

Hosted on GitHub Pages — Theme by orderedlist