From 70ad4a839ec15d46641aebe02231782a60a1faad Mon Sep 17 00:00:00 2001 From: Emile Cornamusaz Date: Fri, 20 Dec 2024 20:56:48 +0100 Subject: [PATCH 1/2] prophet method explanation in readme --- README.md | 16 ++++++++++++---- 1 file changed, 12 insertions(+), 4 deletions(-) diff --git a/README.md b/README.md index 4683a56..7aae75d 100644 --- a/README.md +++ b/README.md @@ -68,7 +68,15 @@ After filtering the data, we proceeded to identify the main characters. For this This approach allowed us to efficiently detect main characters. Ultimately, we created a DataFrame containing the character names and their respective counts for each movie. -### prophet +### Machine learning prediction Prophet + +The metric given in the section above identifies which names are potential candidates, but we still need to fin a way to know if the name was actually influenced. + +To do so, we use a technique called Interrupted Time Series. Basically, what we do is taking the data about a name before a movie was released, and trying to deduce what would a normal evolution for the name be with machine a learning model. + +This will leave us with two curves that represent the names evolution after the release of the movie. One containing the actual data from the datasets and one that was predicted based on the previous counts (predicted). If the actual curve is much higher than the predicted one, we can assume that the movie has influenced this name! + +Thus we run this algorithm on all main character names of every movie of the dataset and then define a threshold on the distance between the curves to decide wether a movie influenced a name or not. ## Contribution of group members - Jeremy : @@ -80,8 +88,8 @@ This approach allowed us to efficiently detect main characters. Ultimately, we c - Emile : - datasets and naïve approach model presentation - "try it yourself" results display - - Movie influence over time analysis - - Birth of a new name analysis + - "Movie influence over time" analysis + - "Birth of a new name" analysis - Corentin : - Predicted name counts using Prophet and SARIMA models, incorporating confidence intervals and metric computation to determine name influence. - Developed the character name recognition system. @@ -91,7 +99,7 @@ This approach allowed us to efficiently detect main characters. Ultimately, we c - Worked on defining what a blockbuster is - Studied of genre movie on names - Studied on the case of Norwegian names (only on the results notebook) - - Updated the website to sho the findings of the analysis for the datastory + - Updated the website to show the findings of the analysis for the datastory From 6c2238f709ee8b4a7341016c6d04436011954a14 Mon Sep 17 00:00:00 2001 From: Emile Cornamusaz Date: Fri, 20 Dec 2024 21:02:39 +0100 Subject: [PATCH 2/2] =?UTF-8?q?d=C3=A9grad=C3=A9?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- README.md | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index 7aae75d..5b91aa8 100644 --- a/README.md +++ b/README.md @@ -58,9 +58,8 @@ We tried another approach to detect unusual trends in name counts following a ke ### Name detection -To identify the main characters in our movies, we processed the plot_summaries.txt file, which contains plot summaries for 42,306 movies extracted from English-language Wikipedia. Each entry in the file follows a consistent structure: - -Wikipedia ID \t Plot Summary \n +To identify the main characters in our movies, we processed the plot_summaries.txt file, which contains plot summaries for 42,306 movies extracted from English-language Wikipedia. +Every line in the file represents a movie, with its wikipedia id and plot summary separated by a tabulation. Using this format, we extracted both the Wikipedia ID and the plot summary, linking each movie’s name to its corresponding Wikipedia ID and release year.