Update NCSLGR #79

cleong110 · 2024-06-12T21:27:17Z

cleong110#21 used by SignBLEU. They say

We use the ELAN version of Boston University’s The National Center for Sign Language and Gesture Resources corpus (NCSLGR) (Neidle and Sclaroff, 2012)

Carol Neidle and Stan Sclaroff. 2012. National Center for Sign Language and Gesture Resources (NCSLGR) corpus. Boston University. ISLRN, American Sign Language Linguistic Research Project (ASLLRP), ISLRN 833-505-711564-4.

Which links to https://www.islrn.org/resources/833-505-711-564-4/, which links to https://www.bu.edu/asllrp/ncslgr.html as the source.

Currently we have an entry for NCSLGR, it goes to dataset:databases2007volumes, aka

Databases, NCSLGR. 2007. “Volumes 2–7.” American Sign Language Linguistic Research Project (Distributed on CD-ROM ….

and it's got some TODOs

https://www.bu.edu/asllrp/ncslgr-for-download/download-info.html

cleong110 · 2024-06-14T15:23:22Z

http://asl.cs.depaul.edu/corpus/index.html actually might be the "ELAN Version" they mention in SignBLEU

cleong110 · 2024-06-14T15:25:33Z

Aha!

https://www.bu.edu/asllrp/data-credits.html

cleong110 · 2024-06-14T15:29:24Z

But I still don't know the precise citation for the Corpus itself? It says cite the corpus AND this publication. ???

cleong110 · 2024-06-14T15:29:55Z

https://www.bu.edu/asllrp/publications.html doesn't have a paper called "The National Center for Sign language and Gesture Resources (NCSLGR) Corpus

cleong110 · 2024-06-14T15:46:35Z

I think I'll just... cite this:

@inproceedings{Vogler2012ANW,
  title={A new web interface to facilitate access to corpora: development of the ASLLRP data access interface},
  author={Christian Vogler and C. Neidle},
  year={2012},
  url={https://api.semanticscholar.org/CorpusID:58305327}
}

cleong110 · 2024-06-14T15:52:17Z

And maybe add a custom citation like this:

@misc{dataset:Neidle_2020_NCSLGR_ISLRN,
  type = {Languageresource},
  title = {National Center for Sign Language and Gesture Resources (NCSLGR) corpus. ISLRN 833-505-711-564-4},
  author = {Carol Neidle and Stan Sclaroff},
  year = {2012},
  publisher = {Boston University},
  url = {https://www.islrn.org/resources/833-505-711-564-4/}
}

cleong110 · 2024-06-20T21:09:42Z

Previously the JSON pointed to

databases2007volumes

cleong110 · 2024-06-20T21:10:14Z

In index.md that is cited only here:

###### Continuous sign corpora {-}
contain parallel sequences of signs and spoken language.
Available continuous sign corpora are extremely limited, containing 4-6 orders of magnitude fewer sentence pairs than similar corpora for spoken language machine translation [@arivazhagan2019massively].
Moreover, while automatic speech recognition (ASR) datasets contain up to 50,000 hours of recordings [@pratap2020mls], the most extensive continuous sign language corpus contains only 1,150 hours, and only 50 of them are publicly available [@dataset:hanke-etal-2020-extending].
These datasets are usually synthesized [@dataset:databases2007volumes;@dataset:Crasborn2008TheCN;@dataset:ko2019neural;@dataset:hanke-etal-2020-extending] or recorded in studio conditions [@dataset:forster2014extensions;@cihan2018neural], which does not account for noise in real-life conditions. Moreover, some contain signed interpretations of spoken language rather than naturally-produced signs, which may not accurately represent native signing since translation is now a part of the discourse event.

cleong110 · 2024-06-20T21:14:51Z

As for JSON updates:
going off of https://www.bu.edu/asllrp/ncslgr-for-download/download-info.html, it seems there is:

Linguistic
gloss
video

Also

    Most of these data are from four native signers of ASL.

    This dataset includes 1,866 distinct canonical signs (i.e., grouping together very slight variants in production). The total number of sign tokens is 11,854.

    Restricting consideration to signs other than gestures and classifiers, there are 1,278 distinct canonical signs, and a total of 10,719 tokens.

    1,002 of the utterances in this collection are part of short spontaneous narratives (19). The remaining 885 utterances were elicited to illustrate a variety of constructions and sentence types.

cleong110 · 2024-06-20T21:19:02Z

Licensing is the big one: https://www.bu.edu/asllrp/data-credits.html

The data available from these pages can be used for research and education purposes, but cannot be redistributed without permission.

Commercial use, without explicit permission, is not allowed, nor are any patents and copyrights based on this material.

Those making use of these data must, in resulting publications or presentations, cite: The National Center for Sign Language and Gesture Resources (NCSLGR) Corpus and this publication:

    Carol Neidle and Christian Vogler [2012] "A New Web Interface to Facilitate Access to Corpora: Development of the ASLLRP Data Access Interface," Proceedings of the 5th Workshop on the Representation and Processing of Sign Languages: Interactions between Corpus and Lexicon, LREC 2012, Istanbul, Turkey.

and also include the following URL's: http://www.bu.edu/asllrp// and http://secrets.rutgers.edu/dai/queryPages/.

cleong110 mentioned this issue Jun 20, 2024

Straighten out RWTH-PHOENIX-* citations #90

Merged

cleong110 linked a pull request Jun 20, 2024 that will close this issue

CDL: updating NCSLGR (take 2) #99

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update NCSLGR #79

Update NCSLGR #79

cleong110 commented Jun 12, 2024 •

edited

Loading

cleong110 commented Jun 14, 2024

cleong110 commented Jun 14, 2024

cleong110 commented Jun 14, 2024

cleong110 commented Jun 14, 2024

cleong110 commented Jun 14, 2024

cleong110 commented Jun 14, 2024

cleong110 commented Jun 20, 2024

cleong110 commented Jun 20, 2024

cleong110 commented Jun 20, 2024

cleong110 commented Jun 20, 2024 •

edited

Loading

Update NCSLGR #79

Update NCSLGR #79

Comments

cleong110 commented Jun 12, 2024 • edited Loading

cleong110 commented Jun 14, 2024

cleong110 commented Jun 14, 2024

cleong110 commented Jun 14, 2024

cleong110 commented Jun 14, 2024

cleong110 commented Jun 14, 2024

cleong110 commented Jun 14, 2024

cleong110 commented Jun 20, 2024

cleong110 commented Jun 20, 2024

cleong110 commented Jun 20, 2024

cleong110 commented Jun 20, 2024 • edited Loading

cleong110 commented Jun 12, 2024 •

edited

Loading

cleong110 commented Jun 20, 2024 •

edited

Loading