Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md
age-test.txt		age-test.txt
age-train.txt		age-train.txt
final_dataframe_extended.csv		final_dataframe_extended.csv
gender-test_set.txt		gender-test_set.txt
gender-train_set.txt		gender-train_set.txt
vox1_meta.csv		vox1_meta.csv
vox2_meta.csv		vox2_meta.csv
voxceleb2_person_videos.json		voxceleb2_person_videos.json
voxceleb2_video_people.json		voxceleb2_video_people.json
voxceleb_person_videos.json		voxceleb_person_videos.json
voxceleb_video_people.json		voxceleb_video_people.json
yt_metadata.json		yt_metadata.json

README.md

Dataset

This repository contains the enriched dataset for Gender and Age recognition, together with the informations regarding train and test speakers.

VoxCeleb enriched dataset

The complete dataset is a CSV file having the following columns:

Name: Full name (or artistic name) of the given celebrity
gender_wiki: gender value according to Wikidata
birth_date_wiki: birth date value according to Wikidata
nationality_wiki nationality value according to Wikidata
gender_dbpedia: gender value according to DBPedia
birth_date_dbpedia: birth date value according to DBPedia
nationality_dbpedia: nationality value according to DBPedia
gender_gkg: gender value according to Google Knowledge Graph
birth_date_gkg: birth date value according to Google Knowledge Graph
nationality_gkg: nationality value according to Google Knowledge Graph
video_id: YouTube video ID used for obtaining the utterance(s)
title: Title of YouTube video
publishing_date: Date of upload in YouTube
description: Description of YouTube video
year_in_title: Eventual year (numeric value of 4 digits) in the Title field
VoxCeleb_ID: Id associated to the current celebrity, as found in VoxCeleb 1 and 2 meta CSV files
gender: gender value if there is unanymous consensus among DBPedia, Google Knowledge Graph
birth_year: year of birth if there is unanymous consensus among DBPedia, Google Knowledge Graph
year_upload_yt: year when the YouTube video was uploaded
recording_year: Recording year computed as described in the paper (year_upload_yt referenced in both Title and Description)
recording_year_title_only: Alternative recording year used in some training scenario (year_upload_yt referenced only in Title)
speaker_age: proposed age (recording_year - birth_year)
speaker_age_title_only: alternative age value used in some training scenarios (recording_year_title_only - birth_year)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dataset

dataset

README.md

Dataset

VoxCeleb enriched dataset

Files

dataset

Directory actions

More options

Directory actions

More options

Latest commit

History

dataset

Folders and files

parent directory

README.md

Dataset

VoxCeleb enriched dataset