Embeddings-Based Unsupervised Stance Detection

This repository contains the implementation of an unsupervised method for target-specific stance detection using embeddings-based clustering, as presented in our ICWSM 2021 paper.

Publications

Paper (ICWSM'21): Embeddings-Based Clustering for Target Specific Stances: The Case of a Polarized Turkey
Paper Presentation: PaperTalk ICWSM'21
Thesis (MSc August 2020): Embeddings-Based Clustering For Target Specific Stances

Overview

We propose an unsupervised method for stance detection that can capture fine-grained divergences across various topics in polarized communities. Our approach overcomes the limitations of previous methods by:

Not requiring platform-specific features (like retweets)
Working effectively with limited data
Supporting hierarchical clustering without specifying the number of clusters
Using pre-trained language models to handle morphologically rich languages

Methodology

The method consists of five main steps:

Data Collection: Collect tweets related to specific topics or targets
Feature Extraction: Encode tweets using pre-trained universal sentence encoders
User Representation: Average tweet vectors per user to create user embeddings
Projection: Project user vectors to lower dimensional space using UMAP
Clustering: Cluster the projected vectors using HDBSCAN

Key Features

Fine-grained Stance Detection

Our method can automatically detect stances down to the party-affiliation level in a completely unsupervised manner, outperforming previous approaches.

Cross-Topic Mutual Information

Using our clustering method, we can analyze the correlations between user stances across different topics, allowing for deeper insight into the structure of polarization.

Semantic Analysis Between Clusters

We identify the most prominent terms in each cluster to show how different groups talk about the same issues in different contexts, revealing semantic divergences between polarized groups.

Performance

Our method achieves:

90% precision in identifying user stances
Over 80% recall
Competitive performance with supervised methods, while being completely unsupervised
Ability to detect fine-grained sub-groups that previous methods couldn't identify

Installation

# Clone this repository
git clone https://github.com/AmmarRashed/UnsupervisedStanceDetection.git
cd UnsupervisedStanceDetection

# Create and activate a virtual environment (recommended)
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

Requirements

Note: This work was tested using specific versions of packages. Newer versions might not work as expected.

Usage

# Basic usage example
python clusterUsersUniversalSentenceEncoder.py your_data.tsv

The input file should be a tab-separated file with:

First column: UserIDs
Second column: Tweets

Code Sample

from clusterUsersUniversalSentenceEncoder import cluster_users, plot_clusters_no_labels
import tensorflow_hub as hub
import pandas as pd

# Load the universal sentence encoder
embed = hub.load('https://tfhub.dev/google/universal-sentence-encoder/4')

# Load and prepare your data
df_text = pd.read_csv('your_data.tsv', header=None, usecols=[0, 1], sep='\t')
df_text.columns = ['User', 'Text']
df_text = df_text.apply(lambda s: s.str.strip())

# Cluster users based on their tweets
cluster_users(df_text, embed, user_col='User', tweet_col='Text', save_at='results.npz')

# Visualize the clusters
plot_clusters_no_labels('results.npz.cluster')

Customization Options

The method can be customized with different parameters:

Sentence Encoder: Different pre-trained models can be used (multilingual, transformer-based, etc.)
UMAP Parameters: Adjust min_dist and n_neighbors to control projection characteristics
HDBSCAN Parameters: Modify min_cluster_size and min_samples to control clustering sensitivity

Applications

This method has been successfully applied to:

Political polarization analysis
Election stance detection
Sports fan sentiment analysis
Cross-cultural stance detection

Citation

If you use this code in your research, please cite our paper:

Rashed, A., Kutlu, M., Darwish, K., Elsayed, T., & Bayrak, C. (2021). Embeddings-Based Clustering for Target Specific Stances: The Case of a Polarized Turkey. Proceedings of the International AAAI Conference on Web and Social Media, 15(1), 537-548. https://doi.org/10.1609/icwsm.v15i1.18082

BibTeX format:

@article{rashed2021embeddings,
  title={Embeddings-Based Clustering for Target Specific Stances: The Case of a Polarized Turkey},
  author={Rashed, Ammar and Kutlu, Mucahid and Darwish, Kareem and Elsayed, Tamer and Bayrak, Cansın},
  journal={Proceedings of the International AAAI Conference on Web and Social Media},
  volume={15},
  number={1},
  pages={537--548},
  year={2021},
  doi={10.1609/icwsm.v15i1.18082}
}

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Contact

Ammar Rashed (ammar.rasid@ozu.edu.tr)
Kareem Darwish (kdarwish@hbku.edu.qa)

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
src		src
trials		trials
.gitignore		.gitignore
LICENSE		LICENSE
Presentation.pdf		Presentation.pdf
Presentation.pptx		Presentation.pptx
README.md		README.md
ami.png		ami.png
clusterUsersUniversalSentenceEncoder.py		clusterUsersUniversalSentenceEncoder.py
demo.ipynb		demo.ipynb
ed.png		ed.png
methodology_diagram.png		methodology_diagram.png
wc.png		wc.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Embeddings-Based Unsupervised Stance Detection

Publications

Overview

Methodology

Key Features

Fine-grained Stance Detection

Cross-Topic Mutual Information

Semantic Analysis Between Clusters

Performance

Installation

Requirements

Usage

Code Sample

Customization Options

Applications

Citation

Contributing

License

Contact

About

Releases

Packages

Languages

License

AmmarRashed/UnsupervisedStanceDetection

Folders and files

Latest commit

History

Repository files navigation

Embeddings-Based Unsupervised Stance Detection

Publications

Overview

Methodology

Key Features

Fine-grained Stance Detection

Cross-Topic Mutual Information

Semantic Analysis Between Clusters

Performance

Installation

Requirements

Usage

Code Sample

Customization Options

Applications

Citation

Contributing

License

Contact

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages