Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Add an option for the/th #1

Open
mark-petersen opened this issue Jan 4, 2017 · 1 comment
Open

Add an option for the/th #1

mark-petersen opened this issue Jan 4, 2017 · 1 comment

Comments

@mark-petersen
Copy link
Owner

In the options page, be able to choose between the and th.

@mark-petersen
Copy link
Owner Author

From Alan Mole:
However, "the" should be th. I think you have not used all the vocabularies. There are three dictionaries: common words (about 2000 I think; I read these into an array for faster operation when computers were slow. That dictionary seems to be missing, because "the" is a common word.)

The rest of the 44,000 words were read into a file and the file searched. It appears this is the dictionary you used. Finally, ambiguous words like bow and present. That file worked like this: if the input word was present, then if the preceding word were "a" or "the" or others in a list, the SS word was present; otherwise it was pressent (a or z sound as in standard English, "the snake: his hiss".) This is not an actual example just what I remember.

me:
Alan, thanks for checking it out. This is definitely an issue worth figuring out, as I may have all the common words wrong.

These are the files I started with: I started with allspl.zip, downloaded from
http://ententetranslator.com/allspl.zip
I think the relevant files are:

ls -l DIAM*

-rwx------ 1 mpeterse staff 43946 Sep 18 1997 DIAMBAS
-rwx------ 1 mpeterse staff 2457600 Sep 9 1997 DIAMBG
-rwx------ 1 mpeterse staff 2457600 Sep 8 1997 DIAMBGB
-rwx------ 1 mpeterse staff 18297 May 26 1997 DIAMSM
-rwx------ 1 mpeterse staff 18270 Aug 13 1997 DIAMSMB

I started with DIAMBG, and simply converted it to DIAMBG.csv (attached) where each word is on a new line, as follows:

head DIAMBG.csv
a,a
aah,aa
aback,abak
abacus,abacus
abaft,abaft
abalone,abaloeny
abandon,abandon

I'm guessing the common word file is DIAMSM (1025 words). But the words in that file also appear in DIAMBG, which has 49k words. For example:

DIAMSM:
"above","abuv"
"according","acording"
"actually","acchualy"

DIAMBG.csv
above,abuv
according,acording
actually,acchualy

So it appears that the common words are indeed included in DIAMBG.

The word 'the' appears in both:
DIAMSM:
"the","the"

DIAMBG.csv:
the,the

Also, 'the' appears as 'the' in the exceptions list on the SoundSpel wiki:
https://en.wikipedia.org/wiki/SoundSpel
Do you know who wrote that or where these rules came from?

I thought SoundSpel was the same as the Rondthaler &. Lias, Dictionary of Simplified American Spelling, but maybe not. In that pdf, the translation of 'the' is 'th', on pdf page 261. Did SoundSpel evolve further after that publication?

In summary, I think the DIAMBG file that I use in my Chrome extension includes all words, and is not missing the subset of common words. Please tell me if you agree. The 'the' word may be a separate, isolated issue. Your files DIAMBG and DIAMSM translate to 'the'. Do you think this is incorrect?

Alan: Soundspel should be the same as the Rondthaler dictionary.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant