Simplifying the world,
One dataset at a time.
I'm fuelled by using data to simplify complexities into actionable insights!
MY PROJECTS PORTFOLIO
MY ARTICLES
ABOUT ME
View My LinkedIn Profile
Structured Query Language was the fourth core in the curriculum after the modules of Basics of Data Science, Power Query, and Excel. We started off with an introduction to databases, schemas, Entity-Relationship Diagrams, then moved on to SQL datatypes, basic commands, conditional operators, functions and aggregate functions, set operators, joins, and finished off with stored procedures.
This Capstone Project was designed to ignite our cognitive process, reinforce and apply our knowledge to “real-life” cases so as to solidify our learnings. We were graded on our ability to showcase and exhibit the five stages of the data lifecycle through the use of SQL and Excel (data collection, creating tables and loading data into SQL databases, data preparation, data analysis using SQL, creating Excel interactive dashboard and visualizations, presentation to stakeholders).
Click here to download my SQL code
Click here to download my Excel Interactive Dashboard
Click here to download my PowerPoint Presentation Show
As like most things in life, there are a plethora of variables as to what makes a movie a “success”. This project is based off the angle that my target audience is looking towards going into the movie business and would like to get a overall sense of the movie industry.
The Movies Dataset consists of 7 csv files (credits, keywords, links, moview_metadata, ratings, etc.). For this project’s context, I have only used the following 4:
The files are too massive to process in excel, and so SQL comes in to save the day!
Files are massive to even load in Github!
Just the ratings file alone is 692,921 KB.
What is the direction I should take in coming up with my debut movie?
Some of the questions answered include:
As Benjamin Franklin said,
“If you fail to plan, you are planning to fail.”
Likewise when dealing with databases, it is good practice (an essential, for me), to start off with planning.
I did this by doing up an ER-diagram:
In the SQL Server, I created a new database named MOVIES, uploaded the files, used aggregate functions to check for duplicate rows and deleted the duplications in the files.
Moving on to the Data Analysis part, as usual, I took a top-down approach in analysing the data as I like to have an overall feel of what the dataset is about before I drill down into the nitty gritties (which might overwhelm if I started off there).
Then, I started drilling down to find out:
And finally, I created a data dump for the Excel Dashboard by creating views and joining all the relevant views and tables together.
I imported the data dump into Excel, did the necessary pivot tables, visualizations, arranged them on the Dashboard, and did the slicers to enable interactions.
I presented the following takeaways to the Boss.
Most released genre is Drama, followed by Comedy, then Thriller, followed by Action
Paranormal Activitiy, and The Blair Witch Project being outliers with massive ROIs!
I showed the Boss how to work the Excel Interactive Dashboard by using the slicers… by taking the Star Wars Collection as an example. It was:
The Star Wars Collection falls under the genres of Action, Adventure, Science, Fiction, Fantasy
The Dashboard would give him an efficient view of:
With my final suggestion to start off with a high return on investment movie (Horror/Thriller genre)…
The Boss is very pleased.
I aced the project.
I am pleased.
I scored a 19 out of 20.
We were rated based on the following 4 components:
https://www.linkedin.com/in/megan-ng-7708a74/ | |
megan_ng@hotmail.com |
Hosted on GitHub Pages — Theme by orderedlist