Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

tokenize function code in data_utils.py is incorrect #31

Open
zpengc opened this issue Dec 9, 2021 · 0 comments
Open

tokenize function code in data_utils.py is incorrect #31

zpengc opened this issue Dec 9, 2021 · 0 comments

Comments

@zpengc
Copy link

zpengc commented Dec 9, 2021

with the test intention that

>>> tokenize('Bob dropped the apple. Where is the apple?')
    ['Bob', 'dropped', 'the', 'apple', '.', 'Where', 'is', 'the', 'apple', '?']

we should write like this:

def tokenize(sent):
    return [x for x in re.findall(r"\w+(?:'\w+)?|[^\w\s]", sent)]
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant