From 70ad4a839ec15d46641aebe02231782a60a1faad Mon Sep 17 00:00:00 2001 From: Emile Cornamusaz Date: Fri, 20 Dec 2024 20:56:48 +0100 Subject: [PATCH] prophet method explanation in readme --- README.md | 16 ++++++++++++---- 1 file changed, 12 insertions(+), 4 deletions(-) diff --git a/README.md b/README.md index 4683a56..7aae75d 100644 --- a/README.md +++ b/README.md @@ -68,7 +68,15 @@ After filtering the data, we proceeded to identify the main characters. For this This approach allowed us to efficiently detect main characters. Ultimately, we created a DataFrame containing the character names and their respective counts for each movie. -### prophet +### Machine learning prediction Prophet + +The metric given in the section above identifies which names are potential candidates, but we still need to fin a way to know if the name was actually influenced. + +To do so, we use a technique called Interrupted Time Series. Basically, what we do is taking the data about a name before a movie was released, and trying to deduce what would a normal evolution for the name be with machine a learning model. + +This will leave us with two curves that represent the names evolution after the release of the movie. One containing the actual data from the datasets and one that was predicted based on the previous counts (predicted). If the actual curve is much higher than the predicted one, we can assume that the movie has influenced this name! + +Thus we run this algorithm on all main character names of every movie of the dataset and then define a threshold on the distance between the curves to decide wether a movie influenced a name or not. ## Contribution of group members - Jeremy : @@ -80,8 +88,8 @@ This approach allowed us to efficiently detect main characters. Ultimately, we c - Emile : - datasets and naïve approach model presentation - "try it yourself" results display - - Movie influence over time analysis - - Birth of a new name analysis + - "Movie influence over time" analysis + - "Birth of a new name" analysis - Corentin : - Predicted name counts using Prophet and SARIMA models, incorporating confidence intervals and metric computation to determine name influence. - Developed the character name recognition system. @@ -91,7 +99,7 @@ This approach allowed us to efficiently detect main characters. Ultimately, we c - Worked on defining what a blockbuster is - Studied of genre movie on names - Studied on the case of Norwegian names (only on the results notebook) - - Updated the website to sho the findings of the analysis for the datastory + - Updated the website to show the findings of the analysis for the datastory