Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Email address contains more than three special chars(punctuation) is removed by Docsplit.clean_text method #144

Open
mraj-rpx opened this issue Dec 4, 2017 · 0 comments

Comments

@mraj-rpx
Copy link

mraj-rpx commented Dec 4, 2017

I have a email in the pdf like mohan-ramanujam@gmail.com or mohan.raman.visal@gmail.com, the corresponding line number the text_cleaner.rb file is
81 (w[1...-1].scan(PUNCT).uniq.length >= 3) ||
@knowtheory, @jashkenas , @samuelclay : Please provide your opinion on this.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant