Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Show morphological breakdown structure (inflectional morpheme boundaries first) #397

Closed
3 tasks done
aarppe opened this issue Apr 23, 2020 · 8 comments
Closed
3 tasks done
Labels
feature Improvement Expansion or improvement of a current functionality that does already work and meets previous specs requires-backend-work Requires work to Python, scripts, automation, etc. requires-frontend-work Work needs to be done on HTML, CSS, and/or JavaScript

Comments

@aarppe
Copy link
Contributor

aarppe commented Apr 23, 2020

EDIT (29.2.2022): removed exception for morpheme boundaries in conjunction with hyphens -.

Subtasks (added July 20):

  • 1. implement marking of inflectional morpheme boundaries in FST output (now indicated with < and > marks for inflectional boundaries, and / for derivational boundaries): nôhkom+N+A+D+Px1Pl+Pl --> ni4<ohkom>i2nân>ak --> n<ôhkom>inân>ak
  • 2. implement back-end interpretation and representation of morpheme boundaries, looking up FST morpheme boundaries (which could be done upon importation of dictionary content and the dynamic generation of the paradigm content) and communicating that appropriately to the front-end.
  • 3. implement front-end representation of morpheme boundaries (e.g. with middle-dot). For now, this could even be implemented as not showing anything, until we decide how to best represent morpheme boundaries.

Since our FST already outputs morpheme boundaries (primarily inflectional ones), there would be many circumstance when it would be advantageous to show those, in the standardized version of the search string, as well as in the generated inflectional paradigms:

One way to achieve this would be to represent the inflectional morpheme boundaries that the FST outputs as < and > with a middle-dot ·, something like the following:

image

Ideally, that middle dot (or any other character) would not be copyable, so when one paints and copies any wordform, one only gets the actual characters.

Alternatives could be using different colors or shading to differentiate the morphemes, or some visual animation effects such as slight magnification when hovering over individual morphemes. On the other hand, having the morpheme boundaries immediately but non-intrusively available might be the simpler solution - or one might have the morpheme-boundary-output option as a output setting that can be triggered similar to the selection of orthography. Also, we might want to keep magnification or pop-ups till later for giving the plain-language definition of each morpheme. Finally, one might provide such a breakdown explicitly when going after the full paradigms.

image

First, we could implement this for inflectional morpheme boundaries, and later on for derivational boundaries as well.

@aarppe aarppe added feature Improvement Expansion or improvement of a current functionality that does already work and meets previous specs labels Apr 23, 2020
@aarppe
Copy link
Contributor Author

aarppe commented Apr 24, 2020

And here's a draft visualization for providing further information about inflectional morphemes as pop-ups:

image

@aarppe aarppe changed the title Show morphological breakdown structure (inflectional first) Show morphological breakdown structure (inflectional morpheme boundaries first) Apr 25, 2020
@aarppe
Copy link
Contributor Author

aarppe commented Apr 30, 2020

@kobexamoh Note - I updated the mock-up vizualizations above.
The first (i) refers to the linguistic breakdown of the inflected word form.
The second (i) refers to inflectional subcategory information for the dictionary entry.

These two are different types of information - keeping them separate clarifies things, as well as moving the (Verb/Noun - ...) information next to the dictionary entry rather than the word form.

@aarppe aarppe added requires-backend-work Requires work to Python, scripts, automation, etc. requires-frontend-work Work needs to be done on HTML, CSS, and/or JavaScript labels Jul 20, 2020
@aarppe
Copy link
Contributor Author

aarppe commented Jul 31, 2020

As discussed in our meeting this last week of July, we might want to have multiple forms of information available for each paradigm layout cell. For instance, the following:

V+TA+Ind+1Sg+2SgO:	kiwâpamitin : (1) surface word-form without morpheme boundaries
			ki<wâpam>iti>n : (2) surface word-form with morpheme boundaries
			kit2<wâpam>i2ti >n : (3) underlying word-form with original morphemes and boundaries
			I see me, I witness me : (4) generated English translation of cell word-form
			4: (5) corpus-frequency
			(6) human recording
			(7) generated robot recording

@nienna73
Copy link
Contributor

I found a demo of how this currently works, here's how it looks:
Screen Shot 2022-03-29 at 11 44 12 AM

Is this more or less the expected behaviour on the main page?

Here's another with multiple morphemes:
Screen Shot 2022-03-29 at 11 45 40 AM

If this is visually what we're going for, then I can work on adding the option to see morpheme boundaries to the settings page, as well as showing morpheme boundaries within paradigm layouts.

@aarppe
Copy link
Contributor Author

aarppe commented Mar 29, 2022

@nienna73 Yes, it looks like what we were expecting visually. I think we thought that the middle-dot would be a good way to indicate the morpheme boundaries. I think we might want to show the middle-dot also in conjunction with hyphens (to the right of the hyphen, where the FST outputs the prefix boundary marker <), to indicate that there's a morpheme boundary there as well, i.e. ni·kî-·wâpam·âw·ak (I realize I'm changing my mind from what I had written earlier above). Further inspirations can be found in #505.

@nienna73
Copy link
Contributor

I added morpheme boundaries to the settings:

Screen Shot 2022-03-29 at 2 11 37 PM

This is what they look like in the paradigm:

Screen Shot 2022-03-29 at 2 12 08 PM

The one place I can't seem to get them to show up is here:
Screen Shot 2022-03-29 at 2 12 15 PM

@aarppe
Copy link
Contributor Author

aarppe commented Mar 29, 2022

Great progress! Looks good! The reason for the latter case, i.e. wâpamêw, is that the word-form comes out of the lexical database, which is statically defined and doesn't contain morpheme boundaries. We'd have to add those as a separate computational step -- generally not too difficult, but I would not be surprised by edge cases that caused some extra head aches.

@aarppe
Copy link
Contributor Author

aarppe commented May 30, 2022

Next step of showing information on individual morphemes moved over to #1093, and for individual word-forms in paradigm cells to #1094.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
feature Improvement Expansion or improvement of a current functionality that does already work and meets previous specs requires-backend-work Requires work to Python, scripts, automation, etc. requires-frontend-work Work needs to be done on HTML, CSS, and/or JavaScript
Projects
Status: Done
Development

No branches or pull requests

2 participants