During the course Applied Data Analysis (CS-401) at EPFL, we had to work on a data analysis project on one of the proposed datasets. With my team (Baptiste Lecoeur, Enzo Palmisano, Jamil Maj & Mariella Daghfal) we chose to work on the CMU Movie Summary Corpus dataset.
For our datastory we decided to analyze the user-critic divide in cinema by comparing the ratings of movies on IMDb and Metacritic. To augment the original dataset we scraped the data from the two websites and then performed a statistical analysis on the data. We also created a website to present our results.
We employed statistical methods like T-Tests, Pearson Correlation, OLS Regression, and the Variance Inflation Factor (VIF) to analyze this data. Our findings revealed interesting patterns: certain genres exhibited a larger divide in ratings, awards correlated with differences in perceptions, and the country of origin influenced movie appreciation, highlighting cultural variances in cinematic tastes.
You can find the website here.
More details about the project can be found on the GitHub repository.