Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Managing Vocabularly Programmatically #776

Open
alexboche opened this issue Mar 30, 2020 · 3 comments
Open

Managing Vocabularly Programmatically #776

alexboche opened this issue Mar 30, 2020 · 3 comments
Labels
New Feature A new feature that is not currently implemented.

Comments

@alexboche
Copy link
Contributor

alexboche commented Mar 30, 2020

Is your feature request related to a problem? Please describe.
It's nice to be able to have all your custom vocab in a file rather than using dragon's vocab editor. Dragon vocabs can get deleted easily and are slow to change at scale. d

Describe the solution you'd like
(Ideally a single solution that works across engines but at least for dragon). I think caster should have a file with 1) a dictionary of the form pronunciation: written-form for adding custom words. 2) a list that removes all of the words in the list from the vocab. 3) Possibly also a replace-dictionary that catches commonly missinterpreted words (i.e. catches the words they are misinterpreted as) and replaces them with the desired interpretation.

I don't know the best way to do this. I recall @lexxish saying he could do this using natlink. @quintijn may have ideas. I'm not sure if changes would need to be made at the dragonfly level @Danesprite but I definitely think there should be files set up in caster with e.g. the dictionary already there so that users can just start adding words out of the box.

Dragon vocab has some settings for its words like whether the word should be capitalized if it is the first sentence and stuff. I don't think that's very important, but could potentially be handled; though I think dragon has made it hard to access this stuff (possibly relevent--but I don't recommend spending time on this: dictation-toolbox/dragonfly#111)

@LexiconCode LexiconCode added the New Feature A new feature that is not currently implemented. label Mar 31, 2020
@LexiconCode
Copy link
Member

LexiconCode commented Mar 31, 2020

I think something like this is needed especially as speech recognition backends for dragonfly diversify. I don't mind it starting off in Caster but ultimately the logic will belong in the dragonfly repository/or as a new project. To which a GUI or other interfaces could be built upon for projects and utilizing the vocabulary configuration files.

@daanzu
Copy link
Contributor

daanzu commented Mar 31, 2020

I have been working some on this for the Kaldi backend, as part of trying to improve its dictation capabilities. But you're right, it really should be generalized to work with all the backends, to avoid duplication of effort and to ease the using of different backends. I agree that at least the API should be located in dragonfly. A GUI interface could then be an optional component of dragonfly, or in a separate package.

@lexxish
Copy link
Contributor

lexxish commented Mar 31, 2020

For natlink the functions to add/remove words are below (thanks to @quintijn). I agree the implementation should be consistent and handle multiple engines.

def deleteWordIfNecessary(w):
    if not w:
        return None

    isInActiveVoc = (natlink.getWordInfo(w, 0) != None)
    if isInActiveVoc:
        natlink.deleteWord(w)

# TODO add unicode support
recharspace = re.compile("^[a-zA-Z-\\\/ ]+$")
def add_word(w):
    w = w.strip()
    if not w: return
    if not recharspace.match(w):
        print 'invalid character in word to add: %s'% w
        return

    isInVoc = (natlink.getWordInfo(w,1) != None)
    isInActiveVoc = (natlink.getWordInfo(w,0) != None)
    if isInActiveVoc:
        return
    try:
        if isInVoc:    # from backup vocabulary:
            print 'make backup word active:', w
            natlink.addWord(w,0)
            #add2logfile(w, 'activated words.txt')
        else:
            print 'adding word ', w
            natlink.addWord(w)
            #add2logfile(w, 'new words.txt')

    except natlink.InvalidWord:
        print 'not added to vocabulary, invalid word: %s'% w

I currently load a list of words from the home directory. The format is "word,pronunciation\n"

def vocab_mapping():
    add_words_file = expanduser("~") + '/dragon/addWords.csv'
    with open(add_words_file) as csvfile:
        reader = csv.reader(csvfile)
        for row in reader:
            if len(row) == 1:
                add_word(row[0])
            elif len(row) == 2:
                add_word(row[0] + "\\\\" + row[1])
            else:
                raise get_error("addWords.csv", row)

    remove_words_file = expanduser("~") + '/dragon/removeWords.csv'
    with open(remove_words_file) as csvfile:
        reader = csv.reader(csvfile)
        for row in reader:
            if len(row) == 1:
                deleteWordIfNecessary(row[0])
            else:
                raise get_error("removeWords.csv", row)

    return {
        "edit add words": F(launch_file, file=add_words_file),
        "edit remove words": F(launch_file, file=remove_words_file)
    }

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
New Feature A new feature that is not currently implemented.
Projects
None yet
Development

No branches or pull requests

4 participants