This tool helps identify coordinated groups based on social media data. It can be used to analyze social media datasets for coordinated activities. The main functionality includes detecting coordinated group activities within a specified time window and calculating user and group statistics, assuming that coordination can be defined as two or more accounts which disseminate an object, (e.g., post, hashtag, link, etc.), within a specified time range (time window) with a number of repeated occurences (minimum repetitions).
This project pulls from the great work conducted by the people at CooRNet1,2,3, but instead of focusing primarily on coordinated link sharing, this tool empowers digital analysts who wish to examine other forms of coordination, e.g., co-posting, co-sharing, hashtag inflation., etc. This tool is therefore useful in research concerning co-posts or instances of co-posting where the same message or similar4 content are posted by multiple users.
The small changes and optimizations include the following:
- Refactored to Python
- Refactored to first filter minimimum repetitions prior to examining timestamp differences thus reducing compute loads.
- Alterations to include a broader definition of objects to be investigates which can take on any form as detailed below.
- Detects coordinated groups based on shared content within a time window.
- Filters users and content based on a minimum repetition threshold.
- Provides detailed group and user statistics.
Clone the repository and install the dependencies:
git clone https://github.com/stephenw17/coordinated-group-detection.git
cd coordinated-group-detection
pip install -r requirements.txt
Note: Dependencies include pandas
, numpy
, and tqdm
for progress updates.
Any part of a tweet, such as text, images, links, hashtags, or the entire message, can be considered an object for coordination. The most frequently observed form of coordination (whether organic or otherwise) is retweet coordination, which can be analyzed by setting columns in your data as follows:
df_pre_coor_check = df.rename(
columns={
'sourcetweet_id': 'object_id',
'user_username': 'id_user',
'tweet_id': 'content_id',
'created_at': 'timestamp_share'
}
In the above example we are using the sourcetweet_id
column which represents the originally posted tweet's id as an object of interest. This could be done for processed text using the the text itself or even embeddings of the text if you intend to conduct cosine similarity analysis on the text (co-posting), hashtags present in the messages, or any other textual or meta-data related element.
coordinated-group-detection/
├── src/
├── examples/ #Coming soon
├── README.md
└── requirements.txt
Footnotes
-
Giglietto, F., Righetti, N., Rossi, L., & Marino, G. (2020). Coordinated Link Sharing Behavior as a Signal to Surface Sources of Problematic Information on Facebook. International Conference on Social Media and Society, 85--91. doi ↩
-
Giglietto, F., Righetti, N., Rossi, L., & Marino, G. (2020). It takes a village to manipulate the media: coordinated link sharing behavior during 2018 and 2019 Italian elections. Information, Communication and Society, 1--25. doi ↩
-
Giglietto, F., Righetti, N., & Marino, G. (2019). Understanding Coordinated and Inauthentic Link Sharing Behavior on Facebook in the Run-up to 2018 General Election and 2019 European Election in Italy. doi ↩
-
My research team and I are currently working on integrating this technique with transformer models, which would help to identify semantic similarity among messages, which could in turn be a form of 'object'. ↩