Data Literacies

Abstract

What is data? What counts as data? These are questions we will explore throughout the workshop.

Data is foundational to nearly all digital projects and often help us to understand and express our ideas and narratives. Hence, in order to do digital work, we should know how data is captured, constructed, and manipulated. In this workshop we will be discussing the basics of research data, in terms of its material, transformation, and presentation. We will also engage with the ethical dimensions of what it means to work with data, from collection to visualization to representation.

Learning Objectives

By the end of this workshop, participants will:

Know the stages of data analysis
Understand the difference between proprietary and open data formats
Become familiar with the specific requirements of "high quality data"
Learn about ethical issues around working with different types of data and analysis

Estimated time

3–4 hours

Prerequisites

Introduction to the Command Line (required) This workshop makes reference to concepts from the Command Line workshop, and having some knowledge about how to use the command line will be central for anyone who wants to learn about how to handle and process data and data analysis.
Download the workshop dataset (required) The dataset, moSmall.csv, will be used throughout the challenges in the workshop. To save the file to your local computer, right click on the "Download the workshop dataset" link and choose Save Link As.... Note: It is important to make sure your file is saved as a .csv file. Original dataset taken from The Metropolitan Museum of Art's Creative Commons Zero.

Contexts

Pre-reading suggestions

In Big? Smart? Clean? Messy? Data in the Humanities, Christof Schöch discusses what data means in the humanities and the necessity of "smart big data."
The book, Bit By Bit: Social Research in Digital Age, written by Matthew Salganik, approaches data and social research from a computational social science perspective. He also discusses the idea of "readymade" and "custommade" data alongside ethics.
Ten Simple Rules for Responsible Big Data Research explores some guidelines for addressing complex ethical issues that arise in any research project.

Projects that use these skills

The Data for Public Good is a semester-long collaborative project led by CUNY graduate students. Each semester, a different public-interest dataset is explored to present information that is useful and informative to a public audience.
SAFElab, led by Dr. Desmond U. Patton, uses computational and social work approaches to understand the mechanisms of violence and work on prevention and intervention in violence that occur in neighborhoods and on social media.

Ethical Considerations

Data and data analysis is not free from bias. There is no magic blackbox for which data emerges from and is contextually driven. As we think about the automation process of looking at "big" data, we have to be aware of the biases that gets reproduced that is "hidden."
De-identified information can be reconstructed from piecemeal data found across different sources. When we consider what we are doing with the data we have collected, we also need to think about the possible re-identification of our participants.
Consider how you may use differential privacy as a strategy against re-identification. Consider the US Census 2020 example on utilizing this strategy to address privacy concerns.
Big data projects often times requiring sharing data sets across different individuals and teams. In addition, to ensure that our work is reproducible and accountable, we may also feel inclined to share the data collected. As such, figuring out how to share such data is crucial in the project planning stage.

Datasets

National Science Foundation's open datasets
Resources to Find the Data You Need (2016)
Awesome Public Datasets

Acknowledgements

Current author: Di Yoong
Past contributing author: Stephen Zweibel
Past reviewer: Stefano Morello
Past reviewer: Filipa Calado
Current editor: Lisa Rhody
Current editor: Kalle Westerling

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

frontmatter.md

frontmatter.md

Data Literacies

Abstract

Learning Objectives

Estimated time

Prerequisites

Contexts

Pre-reading suggestions

Projects that use these skills

Ethical Considerations

Datasets

Acknowledgements

Files

frontmatter.md

Latest commit

History

frontmatter.md

File metadata and controls

Data Literacies

Abstract

Learning Objectives

Estimated time

Prerequisites

Contexts

Pre-reading suggestions

Projects that use these skills

Ethical Considerations

Datasets

Acknowledgements