Skip to content

Corpus for Aspect Based Sentiment Analysis (ABSA) of Hungarian parliamentary texts, annotated at token level.

Notifications You must be signed in to change notification settings

poltextlab/HunEmPoli_corpus

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 

Repository files navigation

General information

The (manually annotated) HunEmPoli corpus was built using pre-agenda speeches from the 2014-2018 parliamentary term, and was created within the framework of the Hungarian Comparative Agendas Project (CAP) of the Institute of Political Science of the Centre for Social Science Research, and is freely accessible for research purposes upon registration (in case of any further questions, please contact: ring.orsolya@tk.hu)

Annotation details

In the course of our research, we created an inductive emotion category system, the categories of which can later be mapped to Plutchik's emotion category system, which distinguishes eight classes and is also convertible to the positive-negative categories used in sentiment analysis. This extended system was necessary because, in our previous experience, sentences in political texts could not be classified into one of the most commonly used Plutchik's category systems for emotion analysis, or only with very low annotator agreement, whereas the extended system allowed the corpus to be annotated with high inter-annotator agreement. In the final annotation guide, a total of 12 so-called emotion topics (ET) were defined, each of which was accompanied by at least three call words or phrases to facilitate the annotators' work.

Related concepts Emotion topic In Plutchik's system Sentiment
fear, threat, intimidation, dread, anxiety Fear Fear Negative
suffering, deprivation, misery, poverty, torment, failure, negative change Suffering Sadness
sorrow, despair, hopelessness, melancholy Sorrow
misfortune, catastrophe Misfortune
crime, terror, assassination, persecution, cruelty, organized crime, vandalism, intentional harm, violence Crime Anger
anger, fury, hatred Anger
conflict, confusion, conflict of interest, revenge, punishment Conflict Disgust
contempt, mockery Contempt
improvement, relief, development, success, positive change Improvement Success Positive
joy, enjoyment, merriment, serenity, love, acceptance, tolerance Joy Joy
assistance, rescue, relief, healing, care, deliverance Assistance Trust
justice, investigation Justice

Quality Assurance

In order to ensure the quality of the corpus, the inter-annotator agreement (Cohen's Kappa) was calculated from time to time. Since annotators marked Emotion Topics at clause-level, we performed token-level evaluation, as we did not want to minor errors (e.g. in the marking of punctuation marks or hyphens) differences in punctuation marks or punctuation marks) would distort the results.

ET 1 2 3 4 5 6 7 8 11 12
Kappa 0,564 0,885 0,971 0,930 0,615 0,846 0,5 1 0,264 1

Cohen's Kappa in different ETs - 1: Fear, 2: Suffering, 3: Crime, 4: Improvement, 5: Conflict, 6: Sorrow, 7: Sadness, 8: Joy, 11: Assistance, 12: Justice.

Emotion Topic metrics per Party

The resulting corpus contains 1008 speeches before the agenda, consisting of a total of 764008 tokens or 36475 sentences. Breakdown of each ET by party:

  LMP KDNP MSZP Jobbik Fidesz Independent All
Fear 133 96 87 133 161 9 619
Suffering 2702 880 2370 1968 1156 56 9132
Crime 265 172 284 452 406 2 1581
Improvement 2031 2694 1923 2189 3454 164 12455
Conflict 2202 953 1974 2271 1977 12 9389
Sorrow 1044 298 988 989 575 0 3894
Sadness 64 84 33 43 56 1 281
Joy 47 514 69 52 168 15 865
Anger 33 30 36 49 20 0 168
Misfortune 14 4 5 0 37 12 72
Assistance 43 165 82 32 84 5 411
Justice 138 82 170 250 332 1 973

Citation

Please refer to the following publication:

@inproceedings{ring_et_al_2023,
 author = {Ring, Orsolya and Vincze, Veronika and Guba, {\relax Cs}enge and {"U}veges, István},
 editor = {...},
 title = {{HunEmPoli: magyar nyelv{\H u}, r\'eszletesen annot\'alt em\'oci\'okorpusz}},
 booktitle = {XIX. Magyar Sz\'am\'it\'og\'epes Nyelv\'eszeti Konferencia},
 publisher = {Szegedi Tudom\'anyegyetem},
 year = {2022}
 address = {Szeged} }

About

Corpus for Aspect Based Sentiment Analysis (ABSA) of Hungarian parliamentary texts, annotated at token level.

Resources

Stars

Watchers

Forks