Skip to content

Some file extensions excluded from the published dataset (Racket) #55

Open
@flobbit1

Description

@flobbit1

programming-languages-to-file-extensions.json correctly has the most common rkt file extension of 'rkt' for Racket, but the data subset (for Racket) at https://huggingface.co/datasets/bigcode/the-stack/tree/main/data/racket has zero instances of files with this extension, and rkt is mentioned specifically as being an excluded extension in the paper at https://arxiv.org/abs/2305.06161 This would likely exclude the majority of actual racket files found on github.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions