forked from sfu-db/cmpt884-fall16
-
Notifications
You must be signed in to change notification settings - Fork 0
/
index.html
333 lines (266 loc) · 14.7 KB
/
index.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>Human-in-the-loop Data Management</title>
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="description" content="">
<!-- Latest compiled and minified CSS -->
<link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.7/css/bootstrap.min.css" integrity="sha384-BVYiiSIFeK1dGmJRAkycuHAHRg32OmUcww7on3RYdg4Va+PmSTsz/K68vbdEjh4u" crossorigin="anonymous">
<style>
body {
padding-top: 20px;
}
</style>
<script>
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
})(window,document,'script','https://www.google-analytics.com/analytics.js','ga');
ga('create', 'UA-89881917-1', 'auto');
ga('send', 'pageview');
</script>
<!-- HTML5 shim, for IE6-8 support of HTML5 elements -->
<!--[if lt IE 9]>
<script src="http://html5shim.googlecode.com/svn/trunk/html5.js"></sc\
ript>
<![endif]-->
</head>
<body>
<div class="container">
<h2 id="cmpt884">CMPT 884: Human-in-the-loop Data Management (SFU, Fall 2016)</h2>
<p>In the Big Data era, humans are playing an increasingly important role in almost every phase of data management. For example, human can be treated as <b>Data Producer</b> to generate data (e.g., Twitter); human can be treated as <b>Data Processor</b> to annotate/process data (e.g., Amazon MTurk); human can be treated as <b>Data Scientist</b> to analyze data in an interactive way (e.g., Jupyter, Spark); human can be treated as <b>Data Consumer</b> to benefit from the value extracted from data (e.g., Business/Healthcare Intelligence).</p>
<p>
Because of this, human-in-the-loop data management has recently become a very hot research topic in numerous research fields including Database, Machine Learning, HCI, and Visualization. In this course, we will focus on the recent research progress that treats human as <b>Data Processor</b> or <b>Data Scientist</b>.
</p>
<p>
This graduate seminar has two objectives:
<ol>
<li>Introducing students the cutting-edge research on Human-in-the-loop Data Management;</li>
<li>Training students to master basic skills for being a researcher.</li>
</ol>
To achieve the first objective, the course will select a list of papers on different aspects of the topic, and the students will present each paper and lead a discussion in class. To achieve the second objective, the course will create a number of opportunities for students to learn how to read a paper, how to write a paper review, how to give a good research talk, and how to ask questions during a talk?
</p>
<h3 id="logistics">Logistics</h3>
<ul>
<li><b>Instructor: </b><a href="http://cs.sfu.ca/~jnwang/">Jiannan Wang</a></li>
<li><b>Time: </b>Monday, 12:30 - 2:20pm; Wednesday, 12:30 - 1:20pm</li>
<li><b>Location: </b>AQ5018 (Mon), AQ5037 (Wed)</li>
<li><b>Office Hours: </b>By appointment. <a href="mailto:jnwang@sfu.ca">E-mail me</a> to book a slot</li>
</ul>
<h3 id="pre-requisites">Pre-requisites</h3>
<ul>
<li> Knownledge of basic concepts in database system and machine learning.
<li> Knowledge of Python will be useful for the assignments.
</ul>
<h3 id="grading">Grading</h3>
<ul>
<li>Paper Presentation: 25% </li>
<li>Questions: 10%</li>
<li>Paper Review: 15%</li>
<li>Assignments: 20%</li>
<li>Final Project: 30% (2% plan + 14% poster + 14% paper)</li>
</ul>
<h3 id="grading">Assignments</h3>
<ul>
<li> <a href="./Assignments/A1/A1-instruction.html">Assignment 1: Crowdsourced Data Management</a> (<span class="bg-primary">Due: 23:59:59 Oct 26</span>)</li>
<li> <a href="./Assignments/A2/A2-instruction.html">Assignment 2: Interactive Analytics</a> (<span class="bg-primary">Due: 23:59:59 Dec 04 </span>)</li>
</ul>
<h3 id="grading">Final Project</h3>
<ul>
<li> <a href="./Project/project-instruction.html">Final Project: Human Computation + X</a> </li>
</ul>
<h3 id="schedule">Schedule</h3>
<p>
<mark>If you are a speaker, please see <a href="https://docs.google.com/document/d/1y__UCKs9-jks7Qcs1DRkkXP4k51DVvwhm8LGxepaYdE/edit?usp=sharing">this doc</a> about how to upload your slides after the presentation. <br/>If you ask any question in the Q/A sessions, please write down the questions in this <a href="https://docs.google.com/spreadsheets/d/1jh_IdS83h3aOIS7xp_HII74W1WBf1NRrzi9xs8b0ix4/edit?usp=sharing">form</a> (one question per row). <mark>
</p>
<table class="table table-bordered">
<thead>
<tr class="info">
<td><b>Date</b></td>
<td><b>Topic</b></td>
<td><b>Content</b></td>
<td><b>Presenter</b></td>
</tr>
</thead>
<tbody>
<tr>
<td>Wed 9/7</td>
<td>Course Objective I</td>
<td> Introduction to Human-in-the-loop Data Management</td>
<td> Jiannan [<a href='./Lectures/intro.pdf'>slides</a>]</td>
</tr>
<tr class="active">
<td>Mon 9/12</td>
<td>Course Objective II</td>
<td> Essential Skills Needed for a PhD Student (How to <a href='./Papers/How%20to%20Read%20a%20Paper.pdf'>read</a> & <a href='./Papers/review-writing.pdf'>review</a> a paper? How to give a <a href="./Papers/giving-a-talk.pdf">talk?</a> How to <a href="https://www.amazon.com/Asking-Right-Questions-11th-Browne/dp/0321907957">ask</a> questions?)</td>
<td> Jiannan [<a href='./Lectures/research-skills.pdf'>slides</a>]</td>
</tr>
<tr class="info">
<td colspan="4" style="text-align:center"><b>Part 1: Crowdsourced Data Processing (Human as Data Processor)</b></td>
</tr>
<tr>
<td>Wed 9/14</td>
<td>Background</td>
<td> <a href="./Papers/Crowdsourcing%20systems%20on%20the%20world-wide%20web.pdf">Crowdsourcing systems on the world-wide web</a></td>
<td> Jiannan [<a href='./Lectures/crowdsourcing.pdf'>slides</a>]</td>
</tr>
<tr class="active">
<td>Mon 9/19</td>
<td rowspan="2" style="vertical-align:middle"> Systems and Programming Models</td>
<td> <a href="./Papers/crowddb_sigmod2011.pdf">CrowdDB: Answering Queries Using Crowdsourcing</a> <br/>
<a href="./Papers/Little-UIST10.pdf">TurKit: Human Computation Algorithms on Mechanical Turk</a></td>
<td> Sima [<a href = './Lectures/CrowdDB.pdf'>slides</a>] <br/> Han Shen [slides]</td>
</tr>
<tr class="active">
<td>Wed 9/21</td>
<td> <a href="./Papers/CrowdForge.pdf">CrowdForge: crowdsourcing complex work</a></td>
<td> Han Bao [<a href = './Lectures/CrowdForge.pdf'>slides</a>] </td>
</tr>
<tr>
<td>Mon 9/26</td>
<td rowspan="2" style="vertical-align:middle"> Quality / Latency Control</td>
<td> <a href="./Papers/Get%20Another%20Label.pdf">Get Another Label? Improving Data Quality and Data Mining Using Multiple, Noisy Labelers</a> <br/>
<a href="./Papers/SQUARE.pdf">SQUARE: A Benchmark for Research on Computing Crowd Consensus</a></td>
<td> Akash [<a href = './Lectures/Get_Another_Label_slides.pdf'>slides</a>] <br/> Srikanth [<a href = './Lectures/square.pdf'>slides</a>]</td>
</tr>
<tr>
<td>Wed 9/28</td>
<td> <a href="./Papers/CLAMShell.pdf">CLAMShell: Speeding up Crowds for Low-latency Data Labeling</a></td>
<td> Yan [<a href="./Lectures/CLAMShell.pdf">slides</a>] </td>
</tr>
<tr class = "active">
<td>Mon 10/3</td>
<td rowspan="2" style="vertical-align:middle"> Data Annotation</td>
<td> <a href="./Papers/labeling-images.pdf">Labeling images with a computer game</a> <br/>
<a href="./Papers/imagenet_cvpr09.pdf">ImageNet: A Large-Scale Hierarchical Image Database</a></td>
<td> Akshay [<a href = './Lectures/LabelingImages_with_a_ComputerGame.pdf'>slides</a>] <br/> Nazanin [<a href = './Lectures/ImageNet.pdf'>slides</a>]</td>
</tr>
<tr class = "active">
<td>Wed 10/5</td>
<td> <a href="./Papers/nlp-data-annotation.pdf">Cheap and Fast — But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks</a></td>
<td> Yifang [<a href = './Lectures/nlp_annotation.pdf'>slides</a>] </td>
</tr>
<tr>
<td>Mon 10/10</td>
<td> Thanksgiving</td>
<td> No classes </td>
<td> ------- </td>
</tr>
<tr class = "active">
<td>Wed 10/12</td>
<td rowspan="2" style="vertical-align:middle"> Crowdsourced Operators I</td>
<td> <a href="./Papers/qurk.pdf">Human-powered Sorts and Joins</a> </td>
<td> Xiaoyi [<a href = 'https://github.com/fxy8211/cmpt884-fall16/blob/master/Lectures/Qurk.pdf'>slides</a>]</td>
</tr>
<tr class = "active">
<td>Mon 10/17</td>
<td> <a href="./Papers/crowder.pdf">CrowdER: Crowdsourcing Entity Resolution</a><br/><a href="./Papers/crowder-transitivity.pdf">Leveraging Transitive Relationships for Crowdsourced Joins</a>
</td>
<td> Ruochen [<a href = './Lectures/crowder.pdf'>slides</a>] <br/> Loong [<a href="./Lectures/crowder-transitivity.pdf">slides</a>]</td>
</tr>
<tr>
<td>Wed 10/19</td>
<td rowspan="2" style="vertical-align:middle"> Crowdsourced Operators II</td>
<td> <a href="./Papers/ChiltonCascadeCHI2013.pdf">Cascade: Crowdsourcing Taxonomy Creation</a> </td>
<td> Bandeep [<a href = './Lectures/cascade.pdf'>slides</a>]</td>
</tr>
<tr>
<td>Mon 10/24</td>
<td> <a href="./Papers/crowd-topk.pdf">Using the crowd for top-k and group-by queries</a><br/>
<a href="./Papers/crowdscreen.pdf">Crowdscreen: Algorithms for filtering data with humans</a></td>
<td> Mohan [<a href = './Lectures/seminar.pdf'>slides</a>]<br/> Venkatesh [<a href='./Lectures/Crowd_screen_ppt.pdf'>slides</a>]</td>
</tr>
<tr class="info">
<td colspan="4" style="text-align:center"><b>Part 2: Interactive Analytics (Human as Data Scientist)</b></td>
</tr>
<tr>
<td>Wed 10/26</td>
<td rowspan="2" style="vertical-align:middle"> Background</td>
<td> <a href="./Papers/ipython07_pe-gr_cise.pdf">IPython: A System for Interactive Scientific Computing</a></td>
<td> Rashmisnata [<a href='./Lectures/IPython.pdf'>slides</a>] </td>
</tr>
<tr>
<td>Mon 10/31</td>
<td> <a href="./Papers/2012-EnterpriseAnalysisInterviews-VAST.pdf">Enterprise data analysis and visualization: An interview study</a> <br/>
<a href="./Papers/kim-icse-2016.pdf">The Emerging Role of Data Scientists on Software Development Teams</a> </td>
<td> Abhishek [<a href = './Lectures/enterprise_data_analysis.pdf'>slides</a>] <br/> Si [<a href = './Lectures/the_emerging_role_of_scientists_on_software_development_teams.pdf'>slides</a>]
</td>
</tr>
<tr class = "active">
<td>Wed 11/2</td>
<td rowspan="2" style="vertical-align:middle"> Interactive Data Cleaning</td>
<td>
<a href="./Papers/wrangler.pdf">Wrangler: Interactive Visual Specification of Data Transformation Scripts</a></td>
<td> Eshan [<a href = './Lectures/Wrangler.pdf'>slides</a>] </td>
</tr>
<tr class = "active">
<td>Mon 11/7</td>
<td> <a href="./Papers/sampleclean-sigmod14.pdf">SampleClean: Fast and Accurate Query Processing on Dirty Data</a> <br/><a href="./Papers/scorpion.pdf">Scorpion: Explaining Away Outliers in Aggregate Queries</a></td>
<td> Jinglin [<a href = './Lectures/sample_clean_slides.pdf'>slides</a>] <br/> Sha [<a href = './Lectures/presentation_scorpion.pdf'>slides</a>]</td>
</tr>
<tr>
<td>Wed 11/9</td>
<td rowspan="3" style="vertical-align:middle"> Interactive Visualization</td>
<td> <a href="./Papers/polaris.pdf">Polaris: A System for Query, Analysis, and Visualization of Multidimensional Relational Databases</a> </td>
<td> Walther [<a href="./Lectures/Polaris_presentation.pdf">slides</a>] </td>
</tr>
<tr>
<td>Mon 11/14</td>
<td>
<a href="./Papers/prefuse.pdf">Prefuse: a toolkit for interactive information visualization</a> <br/> <a href="./Papers/seedb.pdf">SEEDB: Efficient Data-Driven Visualization Recommendations to Support Visual Analytics</a></td>
<td> Pei [<a href="./Lectures/prefuse-revised.pdf">slides</a>] <br/> Kiana [slides]</td>
</tr>
<tr>
<td>Wed 11/16</td>
<td> <a href="./Papers/immens.pdf">imMens: Real-time Visual Querying of Big Data</a> </td>
<td> Saeedeh [<a href = './Lectures/imMens.pdf'>slides</a>] </td>
</tr>
<tr class = "active">
<td>Mon 11/21</td>
<td rowspan="2" style="vertical-align:middle"> Interactive Machine Learning</td>
<td>
<a href="./Papers/settles.activelearning.pdf">Active Learning Literature Survey (Sec 1-4)</a> <br/> <a href="./Papers/activeclean-vldb16.pdf">ActiveClean: Interactive Data Cleaning For Statistical Modeling</a></td>
<td> Lovedeep [<a href = './Lectures/884_presentation_on_active_learning.pdf'>slides</a>] <br/> Mohamad [<a href = './Lectures/ActiveClean.pdf'>slides</a>] </td>
</tr>
<tr class = "active">
<td>Wed 11/23</td>
<td> <a href="./Papers/amershi_AIMagazine2014.pdf">Power to the People: The Role of Humans in Interactive Machine Learning</a> </td>
<td> Saif [<a href = './Lectures/interactiveMachineLearning.pdf'>slides</a>] </td>
</tr>
<tr>
<td>Mon 11/28</td>
<td rowspan="2" style="vertical-align:middle"> Interactive SQL Analytics</td>
<td> <a href="./Papers/SparkSQLSigmod2015.pdf">Spark SQL: Relational Data Processing in Spark</a><br/>
<a href="./Papers/implementing_data_cube.pdf">Implementing Data Cubes Efficiently</a>
</td>
<td> Mangesh [<a href ="./Lectures/SparkSQL.pdf">slides</a>]<br/> Manpreet [<a href ="./Lectures/DataCubes.pdf">slides</a>] </td>
</tr>
<tr>
<td>Wed 11/30</td>
<td> <a href="./Papers/blinkdb.pdf">BlinkDB: queries with bounded errors and bounded response times on very large data</a></td>
<td> Jacky [<a href = './Lectures/BlinkDB.pdf'>slides</a>] </td>
</tr>
<tr class="info">
<td colspan="4" style="text-align:center"><b>Final Project</b></td>
</tr>
<tr>
<td>Wed 12/7 </td>
<td> Final Project</td>
<td> Final Project Poster Session</td>
<td> Groups </td>
</tr>
</tbody>
</table>
<br/><br/>
<h3 id="related-course">Related Courses</h3>
<ul>
<li><a href="http://crowdsourcing-class.org/">NETS 213: Crowdsourcing & Human Computation (UPenn, 2016 Spring)</a> </li>
<li><a href="https://sites.google.com/a/brown.edu/hdi/">CSCI2950-T: Human-in-the-loop Data Management (Brown, 2015 Spring)</a> </li>
<li><a href="http://data-people.cs.illinois.edu/courses/cs598/">CS598: Human-in-the-loop Data Management (UIUC, 2014 Fall)</a> </li>
</ul>
<div class="row"><h4> </h4><hr><p class="text-center"> © Jiannan Wang 2016</p></div>
</div>
</body>
</html>