`httpspell`

This is a spellchecker that recursively fetches HTML pages, converts them to plain text (using pandoc), and spellchecks them with hunspell. Unknown words will be printed to stdout, which makes the tool a good candidate for CI pipelines where you might want to take action when a spelling error is found on a web page.

Words that are not in the dictionary for the given language (inferred from the lang attribute of the HTML document's root element) can be added to a personal dictionary, which will mark the word as correctly spelled.

Usage

The following command will retrieve the HTML document at https://example.com, spellcheck it, and not print anything because there are no errors:
```
$ httpspell https://example.com
```
The exit code is 0.
The following command will spellcheck the README of this project as rendered by GitHub, and print a list of unknown words. Note that we set the language to en_US because GitHub declares 'en' as document language, but the installed dictionaries usually refer the a specific language variant like en_US:
```
$ httpspell https://github.com/suhlig/httpspell/blob/master/README.markdown --language en_US
suhlig
Permalink
httpspell
sloc
pandoc
hunspell
...
```
The exit code is 1.

What is not checked

When spidering a site, httpspell will skip all responses with a content-type header other than text/html (unless pointing it to file, in which case it accepts anything).
Before converting, httpspell removes the following nodes from the HTML DOM as they are not a good target for spellchecking:
- code
- pre
- Elements with spellcheck='false' (this is how HTML5 allows tagging elements as a being target for spellchecking or not)

Misc

If you produce content with kramdown (e.g. using Jekyll), an Inline Attribute List can be used to set spellcheck='false' for an element by adding this line after the element (e.g. heading):

{: spellcheck="false"}

Dictionaries

Hunspell uses the system dictionary paths; on the Mac this is ~/Library/Spelling/. Get some dictionaries as explained in the hunspell project:

$ wget -O ~/Library/Spelling/en_US.aff https://cgit.freedesktop.org/libreoffice/dictionaries/plain/en/en_US.aff
$ wget -O ~/Library/Spelling/en_US.dic https://cgit.freedesktop.org/libreoffice/dictionaries/plain/en/en_US.dic

German:

$ wget -O ~/Library/Spelling/de_DE.dic https://cgit.freedesktop.org/libreoffice/dictionaries/plain/de/de_DE_frami.dic
$ wget -O ~/Library/Spelling/de_DE.aff https://cgit.freedesktop.org/libreoffice/dictionaries/plain/de/de_DE_frami.aff

Italian (for integration tests):

$ wget -O ~/Library/Spelling/it_IT.dic https://cgit.freedesktop.org/libreoffice/dictionaries/plain/it_IT/it_IT.dic
$ wget -O ~/Library/Spelling/it_IT.aff https://cgit.freedesktop.org/libreoffice/dictionaries/plain/it_IT/it_IT.aff

Name		Name	Last commit message	Last commit date
Latest commit History 310 Commits
.github		.github
exe		exe
lib/http_spell		lib/http_spell
spec		spec
.gitignore		.gitignore
.mergify.yml		.mergify.yml
.rspec		.rspec
.rubocop.yml		.rubocop.yml
.ruby-version		.ruby-version
Gemfile		Gemfile
Gemfile.lock		Gemfile.lock
Guardfile		Guardfile
README.markdown		README.markdown
Rakefile		Rakefile
TODO.markdown		TODO.markdown
httpspell.gemspec		httpspell.gemspec

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

`httpspell`

Usage

What is not checked

Misc

Dictionaries

About

Releases

Packages

Contributors 4

Languages

suhlig/httpspell

Folders and files

Latest commit

History

Repository files navigation

httpspell

Usage

What is not checked

Misc

Dictionaries

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

`httpspell`

Packages