Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Edit-distance weights need to be easily configurable #221

Open
aarppe opened this issue Jan 18, 2020 · 1 comment
Open

Edit-distance weights need to be easily configurable #221

aarppe opened this issue Jan 18, 2020 · 1 comment
Labels
duplicate/out-of-date Issues that are partially or entirely outdated or redundant in being superseded by later issues Improvement Expansion or improvement of a current functionality that does already work and meets previous specs

Comments

@aarppe
Copy link
Contributor

aarppe commented Jan 18, 2020

As an improvement and generalization to #195, we need a non-computational way of manually specifying edit-distance weights by linguists etc..

In the first instance, this could be done context-independently, though on the longer term one would want to restrict certain weight-adjusted edits to certain contexts. The weights can be extracted from the regular expression that is specified for the weighted descriptive FST:

[
  ! Contraction of short /i/ marked with apostrophe 
  %' (->) i::0.0 ,,
  ! English interference
  d (->) t::0.0 ,, g (->) k::0.0 ,, j (->) c::0.0 ,,
  {iw} (->) {ow}::0.5 ,, {ow} (->) {iw}::0.5 ,,
  {tch} (->) c::0.0 ,, {ch} (->) c::0.0 ,, {ts} (->) c::0.5 ,,
  {ee} (->) [ i | î ]::0.0 ,, u (h) (->) a::0.0 ,,
  ! SRO-internal orthographical variation in hyphenation
  [..] (->) %-::0.5 ,,
  ! Cree-internal perception
  e (->) ê::0.0 ,,
  a (->) â::0.25,, i (->) î::0.25 ,, o (->) ô::0.25 ,,
  â (->) a::0.5,, î (->) i::0.5 ,, ô (->) o::0.5 ,,
  [..] (->) h::0.5 || [ a | â | ê | i | î | o | ô ] _ [ c | k | p | t ] ,,
  h (->) 0::0.75 || [ a | â | ê | i | î | o | ô ] _ [ c | k | p | t ]
] ;

Every other edit should have a standard weight of 1.0.

@aarppe aarppe added the Improvement Expansion or improvement of a current functionality that does already work and meets previous specs label Jan 18, 2020
@aarppe
Copy link
Contributor Author

aarppe commented Jun 26, 2020

The above code follows regular Xerox-style regular-expression syntax for rewrite rules (Karttunen & Beesley 2003).

@aarppe aarppe added the duplicate/out-of-date Issues that are partially or entirely outdated or redundant in being superseded by later issues label Jan 11, 2023
@fbanados fbanados moved this to To do in Third release Aug 2, 2024
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
duplicate/out-of-date Issues that are partially or entirely outdated or redundant in being superseded by later issues Improvement Expansion or improvement of a current functionality that does already work and meets previous specs
Projects
Status: To do
Development

No branches or pull requests

1 participant