Skip to content

Latest commit

 

History

History
61 lines (39 loc) · 5.39 KB

frontmatter.md

File metadata and controls

61 lines (39 loc) · 5.39 KB

Data Literacies

Abstract

What is data? What counts as data? These are questions we will explore throughout the workshop.

Data is foundational to nearly all digital projects and often help us to understand and express our ideas and narratives. Hence, in order to do digital work, we should know how data is captured, constructed, and manipulated. In this workshop we will be discussing the basics of research data, in terms of its material, transformation, and presentation. We will also engage with the ethical dimensions of what it means to work with data, from collection to visualization to representation.

Learning Objectives

By the end of this workshop, participants will:

  • Know the stages of data analysis
  • Understand the difference between proprietary and open data formats
  • Become familiar with the specific requirements of "high quality data"
  • Learn about ethical issues around working with different types of data and analysis

Estimated time

3–4 hours

Prerequisites

  • Introduction to the Command Line (required) This workshop makes reference to concepts from the Command Line workshop, and having some knowledge about how to use the command line will be central for anyone who wants to learn about how to handle and process data and data analysis.
  • Download the workshop dataset (required) The dataset, moSmall.csv, will be used throughout the challenges in the workshop. To save the file to your local computer, right click on the "Download the workshop dataset" link and choose Save Link As.... Note: It is important to make sure your file is saved as a .csv file. Original dataset taken from The Metropolitan Museum of Art's Creative Commons Zero.

Contexts

Pre-reading suggestions

Projects that use these skills

  • The Data for Public Good is a semester-long collaborative project led by CUNY graduate students. Each semester, a different public-interest dataset is explored to present information that is useful and informative to a public audience.
  • SAFElab, led by Dr. Desmond U. Patton, uses computational and social work approaches to understand the mechanisms of violence and work on prevention and intervention in violence that occur in neighborhoods and on social media.

Ethical Considerations

  • Data and data analysis is not free from bias. There is no magic blackbox for which data emerges from and is contextually driven. As we think about the automation process of looking at "big" data, we have to be aware of the biases that gets reproduced that is "hidden."
  • De-identified information can be reconstructed from piecemeal data found across different sources. When we consider what we are doing with the data we have collected, we also need to think about the possible re-identification of our participants.
  • Consider how you may use differential privacy as a strategy against re-identification. Consider the US Census 2020 example on utilizing this strategy to address privacy concerns.
  • Big data projects often times requiring sharing data sets across different individuals and teams. In addition, to ensure that our work is reproducible and accountable, we may also feel inclined to share the data collected. As such, figuring out how to share such data is crucial in the project planning stage.

Datasets

Acknowledgements