Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Word alignment try 2 #267

Merged
merged 7 commits into from
Feb 12, 2025
Merged

Word alignment try 2 #267

merged 7 commits into from
Feb 12, 2025

Conversation

johnml1135
Copy link
Collaborator

@johnml1135 johnml1135 commented Nov 5, 2024

Add word alignment engine to IInteractiveTranslationEngine.


This change is Reviewable

@johnml1135 johnml1135 requested a review from ddaspit November 5, 2024 21:58
@codecov-commenter
Copy link

codecov-commenter commented Nov 5, 2024

Codecov Report

Attention: Patch coverage is 24.39024% with 31 lines in your changes missing coverage. Please review.

Project coverage is 70.22%. Comparing base (3a9b17e) to head (cd92a37).

Files with missing lines Patch % Lines
src/SIL.Machine/Corpora/AlignedWordPair.cs 0.00% 28 Missing ⚠️
src/SIL.Machine/Corpora/NParallelTextCorpus.cs 50.00% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #267      +/-   ##
==========================================
- Coverage   70.28%   70.22%   -0.06%     
==========================================
  Files         385      385              
  Lines       32019    32056      +37     
  Branches     4504     4511       +7     
==========================================
+ Hits        22503    22512       +9     
- Misses       8471     8499      +28     
  Partials     1045     1045              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@ddaspit ddaspit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this change for?

Reviewable status: 0 of 3 files reviewed, all discussions resolved

@johnml1135
Copy link
Collaborator Author

This is needed for adding the word alignment engine to Serval. Just exposing the alignment endpoints to the interactive engine.

@johnml1135
Copy link
Collaborator Author

This needs to be merged and released before the Serval changes will be able to compile.

Copy link
Contributor

@ddaspit ddaspit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still not sure I understand what this is for. There are already interfaces for word alignment models. Also, phrase alignment isn't word alignment. That is specific to the Thot SMT engine.

Reviewable status: 0 of 3 files reviewed, all discussions resolved

@johnml1135
Copy link
Collaborator Author

The ThotSmtModel appears to be the best place to add the alignment routines onto - as the "phrase alignment" just means that the tokenizer can be configured. If I don't use ThotSmtModel, what specific things would I use? IWordAligner assumes that the source and target are already tokenized. Also, how would it interact with loading models built by machine.py?

Copy link
Contributor

@ddaspit ddaspit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For word alignment, you should use one of the classes that inherits from ThotWordAlignmentModel. For SMT and word alignment models, you will need to tokenize the text. We should just use the LatinWordTokenizer like we do for the SMT engine.

Reviewable status: 0 of 3 files reviewed, all discussions resolved

@johnml1135
Copy link
Collaborator Author

Hmm. It wold be quite a bit of reworking. I would have to use a different wording than ThotWordAlignmentModel because that is just referring to the asymmetrical alignment, not the symmetrical alignment with tokenizer. In python, the word aligner has the tokenizer connected to it. I could rework the Machine word aligner to have the tokenizer in it, but that would be a fair amount of work. The solution I have appears to be a good minimal solution - treat the ThotSmtModel as a SymmetrizedWordAlignmentModel with tokenizers - it already has the capability of having the truecaser as null.

Otherwise, I think I would have to create base class of ThotSmtModel called ThotSymmetrizedWordAlignmentModelWithTokenizer? in which 1/2 of the functionality of ThotSmtModel is implemented. And even then, all the configurations and trainers and everything else would need to be torn apart and rewritten.

I think this minimal change is the best solution - it looks like a word aligner on Serval but is just an SMT model underneath.

Copy link
Contributor

@ddaspit ddaspit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ThotSmtModel is a full phrased-based SMT system and takes a lot more computation and time to train. The phrase alignment from the SMT model uses a different algorithm than the word alignment models and is much more expensive. Unfortunately, it is not a replacement for the word alignment models. We should meet to discuss how best to proceed. I'm sure if I had a better understanding of what you are trying to achieve, we can come up with a good solution.

Reviewable status: 0 of 3 files reviewed, all discussions resolved

@johnml1135 johnml1135 force-pushed the word_alignment_try_2 branch 3 times, most recently from 3c8ddd6 to 1f29ecd Compare November 27, 2024 17:34
@johnml1135 johnml1135 force-pushed the word_alignment_try_2 branch from 1f29ecd to f905c2d Compare December 9, 2024 16:31
@Enkidu93
Copy link
Collaborator

I'll need to update the branch and otherwise might have some small TODOs floating around, but I think this PR is generally ready for review, @johnml1135 @ddaspit 🥳

@johnml1135
Copy link
Collaborator Author

src/SIL.Machine.Tool/ToolHelpers.cs line 17 at r2 (raw file):

internal static class ToolHelpers
{
    public const string FastAlign = "fast_align";

Why were these helpers moved from the generic "ToolHelpers" to the specific ones for symmetrization and thot?

@johnml1135
Copy link
Collaborator Author

I am thinking (though not remembering fully) - are any tests needed? It doesn't really seem to be changing anything core, just rearranging where the content is and exposing it in a way more friendly to Serval.

@ddaspit - do you know if anyone will be affected by the changes we are making to these libraries?

Copy link
Contributor

@ddaspit ddaspit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 20 of 20 files at r2, all commit messages.
Reviewable status: all files reviewed, 7 unresolved discussions (waiting on @johnml1135)


src/SIL.Machine/Translation/WordAlignmentResult.cs line 7 at r2 (raw file):

namespace SIL.Machine.Translation
{
    public class WordAlignmentResult

What is this class used for?


src/SIL.Machine/Translation/SymmetrizationHeuristic.cs line 46 at r2 (raw file):

    }

    public static class SymmetrizationHelpers

If this is only used in SIL.Machine.Tool, then you can just leave it there.


src/SIL.Machine.Translation.Thot/ThotWordAlignmentModelType.cs line 16 at r2 (raw file):

    }

    public static class ThotWordAlignmentHelpers

If this is only used in SIL.Machine.Tool, I would just leave it there.


src/SIL.Machine.Translation.Thot/ThotWordAlignmentModel.cs line 124 at r2 (raw file):

        }

        public ITrainer CreateTrainer(IParallelTextCorpus corpus, ITokenizer<string, int, string> tokenizer = null)

We should handle this the same way that we handle it in ThotSmtModel. Add new properties for the source and target tokenizers to this class.


src/SIL.Machine/Translation/IWordAlignmentEngine.cs line 8 at r2 (raw file):

namespace SIL.Machine.Translation
{
    public interface IWordAlignmentEngine : IWordAligner, IDisposable

What is the purpose of this interface?


src/SIL.Machine/Corpora/AlignedWordPair.cs line 29 at r2 (raw file):

        {
            alignedWordPairs = null;
            try

You should only use exceptions for exceptional cases and not normal logic like this. Is this to catch the case where integers cannot be parsed? If so, you should just check the result from int.TryParse.

@Enkidu93
Copy link
Collaborator

Enkidu93 commented Jan 27, 2025

@johnml1135 Would you like to take a first pass at addressing Damien's concerns since you architected a lot of those changes? I'm happy to code the fixes, but you are probably better positioned to answer some of these since I'm not really aware of the rationale for some of the changes. I, in general, didn't change work you had already done that was already working properly.

@johnml1135
Copy link
Collaborator Author

src/SIL.Machine/Translation/IWordAlignmentEngine.cs line 8 at r2 (raw file):

Previously, ddaspit (Damien Daspit) wrote…

What is the purpose of this interface?

It was more logical (when I was writing it) to make the SymmetrizedWordAlignmentEngine separate from the SymmetrizedWordAlignmentModel, but I am seeing that I am not using the engine, nor the engine interface anywhere in particular. It mirrors the breakdown of engines and models in the translation space, but has not analogous use cases. I can't determine right now when AQUA is added if it will use the engine itself or not.

@johnml1135
Copy link
Collaborator Author

src/SIL.Machine/Translation/SymmetrizationHeuristic.cs line 46 at r2 (raw file):

Previously, ddaspit (Damien Daspit) wrote…

If this is only used in SIL.Machine.Tool, then you can just leave it there.

Makes sense.

@johnml1135
Copy link
Collaborator Author

src/SIL.Machine/Corpora/AlignedWordPair.cs line 29 at r2 (raw file):

Previously, ddaspit (Damien Daspit) wrote…

You should only use exceptions for exceptional cases and not normal logic like this. Is this to catch the case where integers cannot be parsed? If so, you should just check the result from int.TryParse.

Makes sense.

@johnml1135
Copy link
Collaborator Author

src/SIL.Machine/Translation/IWordAlignmentEngine.cs line 8 at r2 (raw file):

Previously, johnml1135 (John Lambert) wrote…

It was more logical (when I was writing it) to make the SymmetrizedWordAlignmentEngine separate from the SymmetrizedWordAlignmentModel, but I am seeing that I am not using the engine, nor the engine interface anywhere in particular. It mirrors the breakdown of engines and models in the translation space, but has not analogous use cases. I can't determine right now when AQUA is added if it will use the engine itself or not.

Correct that - it's used in Serval. A lot of these changes are to better expose WordAlignment to Serval for use.

@johnml1135
Copy link
Collaborator Author

src/SIL.Machine/Translation/WordAlignmentResult.cs line 7 at r2 (raw file):

Previously, ddaspit (Damien Daspit) wrote…

What is this class used for?

It's used a handful of places in Serval - though it should likely be removed. It was used more when the SMT engine and the statistical engine were sharing more infrastructure. This should be removed and replaced in all locations with WordAlignmentResult from Serval.WordAlignment.Models.

@johnml1135
Copy link
Collaborator Author

src/SIL.Machine.Translation.Thot/ThotWordAlignmentModel.cs line 124 at r2 (raw file):

Previously, ddaspit (Damien Daspit) wrote…

We should handle this the same way that we handle it in ThotSmtModel. Add new properties for the source and target tokenizers to this class.

sounds good to me.

@johnml1135
Copy link
Collaborator Author

src/SIL.Machine.Translation.Thot/ThotWordAlignmentModelType.cs line 16 at r2 (raw file):

Previously, ddaspit (Damien Daspit) wrote…

If this is only used in SIL.Machine.Tool, I would just leave it there.

Ok by me.

@johnml1135
Copy link
Collaborator Author

I took the first pass.

Copy link
Contributor

@ddaspit ddaspit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 4 files at r3, all commit messages.
Reviewable status: 18 of 20 files reviewed, 7 unresolved discussions (waiting on @johnml1135)


src/SIL.Machine/Translation/IWordAlignmentEngine.cs line 8 at r2 (raw file):

Previously, johnml1135 (John Lambert) wrote…

Correct that - it's used in Serval. A lot of these changes are to better expose WordAlignment to Serval for use.

Can we use one of the existing interfaces (IWordAlignmentModel or IWordAligner)?

@johnml1135
Copy link
Collaborator Author

src/SIL.Machine/Translation/IWordAlignmentEngine.cs line 8 at r2 (raw file):

Previously, ddaspit (Damien Daspit) wrote…

Can we use one of the existing interfaces (IWordAlignmentModel or IWordAligner)?

At the Serval level, we needed a distinction between the Engine and the Model, especially to support the Statistical engine service and the WordAlignmentEngineState. Both don't use the Trainer routines and make more sense to just load the engine.

Copy link
Collaborator

@Enkidu93 Enkidu93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 14 of 20 files reviewed, 7 unresolved discussions (waiting on @ddaspit and @johnml1135)


src/SIL.Machine/Corpora/AlignedWordPair.cs line 29 at r2 (raw file):

Previously, johnml1135 (John Lambert) wrote…

Makes sense.

Is this better? It's a little odd, but it'd be nice to have a TryParse function for use in the serializer.


src/SIL.Machine/Translation/IWordAlignmentEngine.cs line 8 at r2 (raw file):

Previously, ddaspit (Damien Daspit) wrote…

Can we use one of the existing interfaces (IWordAlignmentModel or IWordAligner)?

@johnml1135, would you like to give your rationale as to why we couldn't use one of the existing interfaces like Damien mentioned since you're the one who architected the change?


src/SIL.Machine.Tool/ToolHelpers.cs line 17 at r2 (raw file):

Previously, johnml1135 (John Lambert) wrote…

Why were these helpers moved from the generic "ToolHelpers" to the specific ones for symmetrization and thot?

I don't know. I think you did that, right? Sounds like you're OK moving them back.


src/SIL.Machine.Translation.Thot/ThotWordAlignmentModel.cs line 124 at r2 (raw file):

Previously, johnml1135 (John Lambert) wrote…

sounds good to me.

Done.


src/SIL.Machine.Translation.Thot/ThotWordAlignmentModelType.cs line 16 at r2 (raw file):

Previously, johnml1135 (John Lambert) wrote…

Ok by me.

Done.

Copy link
Collaborator

@Enkidu93 Enkidu93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 6 of 20 files reviewed, 7 unresolved discussions (waiting on @ddaspit and @johnml1135)


src/SIL.Machine/Translation/SymmetrizationHeuristic.cs line 46 at r2 (raw file):

Previously, johnml1135 (John Lambert) wrote…

Makes sense.

Done.

Copy link
Collaborator

@Enkidu93 Enkidu93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 6 of 20 files reviewed, 7 unresolved discussions (waiting on @ddaspit and @johnml1135)


src/SIL.Machine/Translation/IWordAlignmentEngine.cs line 8 at r2 (raw file):

Previously, Enkidu93 (Eli C. Lowry) wrote…

@johnml1135, would you like to give your rationale as to why we couldn't use one of the existing interfaces like Damien mentioned since you're the one who architected the change?

Sorry, I had this drafted before I saw your message

@Enkidu93
Copy link
Collaborator

Enkidu93 commented Feb 3, 2025

src/SIL.Machine/Translation/WordAlignmentResult.cs line 7 at r2 (raw file):

Previously, johnml1135 (John Lambert) wrote…

It's used a handful of places in Serval - though it should likely be removed. It was used more when the SMT engine and the statistical engine were sharing more infrastructure. This should be removed and replaced in all locations with WordAlignmentResult from Serval.WordAlignment.Models.

OK, It will create a dependency that Serval.Machine.Shared has on Serval.WordAlignment. As long as that's alright with everyone, I can do it.

Copy link
Collaborator

@Enkidu93 Enkidu93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 6 of 20 files reviewed, 7 unresolved discussions (waiting on @ddaspit and @johnml1135)


src/SIL.Machine/Translation/WordAlignmentResult.cs line 7 at r2 (raw file):

Previously, Enkidu93 (Eli C. Lowry) wrote…

OK, It will create a dependency that Serval.Machine.Shared has on Serval.WordAlignment. As long as that's alright with everyone, I can do it.

I guess that's how it already is for Serval.Translation, so that shouldn't be an issue. I'll just not use a global import.

@johnml1135
Copy link
Collaborator Author

src/SIL.Machine.Tool/ToolHelpers.cs line 17 at r2 (raw file):

Previously, Enkidu93 (Eli C. Lowry) wrote…

I don't know. I think you did that, right? Sounds like you're OK moving them back.

Sure - move them back.

Copy link
Collaborator

@Enkidu93 Enkidu93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ddaspit, @johnml1135, I believe I've responded to all comments and that this PR is ready for review.

Reviewable status: 5 of 20 files reviewed, 6 unresolved discussions (waiting on @ddaspit)


src/SIL.Machine.Tool/ToolHelpers.cs line 17 at r2 (raw file):

Previously, johnml1135 (John Lambert) wrote…

Sure - move them back.

Done.

Copy link
Collaborator Author

@johnml1135 johnml1135 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 20 of 20 files at r2, 4 of 4 files at r3, 4 of 4 files at r4, 9 of 9 files at r5, 4 of 4 files at r6, all commit messages.
Reviewable status: all files reviewed, 6 unresolved discussions (waiting on @ddaspit)

@Enkidu93 Enkidu93 force-pushed the word_alignment_try_2 branch from 0d6fee1 to d46b467 Compare February 3, 2025 20:20
Copy link
Contributor

@ddaspit ddaspit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 17 of 17 files at r8, all commit messages.
Reviewable status: all files reviewed, 4 unresolved discussions (waiting on @Enkidu93 and @johnml1135)


src/SIL.Machine/Corpora/AlignedWordPair.cs line 29 at r2 (raw file):

Previously, Enkidu93 (Eli C. Lowry) wrote…

Is this better? It's a little odd, but it'd be nice to have a TryParse function for use in the serializer.

Looks good. I would return as soon as a TryParseIndex fails.


src/SIL.Machine/Translation/IWordAlignmentEngine.cs line 8 at r2 (raw file):

Previously, Enkidu93 (Eli C. Lowry) wrote…

Sorry, I had this drafted before I saw your message

I took a look at the Serval code to better understand how the interface is being used. It looks like StatisticalEngineService needs to call GetBestAlignedWordPairs. Rather than add a new interface, we should add GetBestAlignedWordPairs to the IWordAligner interface. It makes sense there anyway. Then, we can use IWordAligner instead. It should be easy to implement for most classes that implement the IWordAligner interface. You can just add a default implementation in WordAlignerBase that looks something like this:

WordAlignmentMatrix matrix = Align(sourceSegment, targetSegment);
return matrix.ToAlignedWordPairs();

src/SIL.Machine/Translation/IWordAlignmentModel.cs line 10 at r8 (raw file):

    {
        ITrainer CreateTrainer(IParallelTextCorpus corpus);
        Task SaveAsync(CancellationToken cancellationToken = default);

We shouldn't need the Save methods. The IWordAlignmentModel interface provides no methods that update the model.


src/SIL.Machine/Corpora/AlignedWordPair.cs line 100 at r8 (raw file):

                return true;
            }
            if (indexString == "NULL")

I would perform this check first.

@johnml1135
Copy link
Collaborator Author

src/SIL.Machine/Translation/IWordAlignmentModel.cs line 10 at r8 (raw file):

Previously, ddaspit (Damien Daspit) wrote…

We shouldn't need the Save methods. The IWordAlignmentModel interface provides no methods that update the model.

I agree - see other comments.

@johnml1135
Copy link
Collaborator Author

src/SIL.Machine/Translation/IWordAlignmentEngine.cs line 8 at r2 (raw file):

Previously, ddaspit (Damien Daspit) wrote…

I took a look at the Serval code to better understand how the interface is being used. It looks like StatisticalEngineService needs to call GetBestAlignedWordPairs. Rather than add a new interface, we should add GetBestAlignedWordPairs to the IWordAligner interface. It makes sense there anyway. Then, we can use IWordAligner instead. It should be easy to implement for most classes that implement the IWordAligner interface. You can just add a default implementation in WordAlignerBase that looks something like this:

WordAlignmentMatrix matrix = Align(sourceSegment, targetSegment);
return matrix.ToAlignedWordPairs();

The IWordAligner interface does not handle Tokenization. So either we add tokenization to the IWordAligner interface, or we have a different interface. The lack of tokenization is why I didn't use it before.

Copy link
Contributor

@ddaspit ddaspit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 4 unresolved discussions (waiting on @Enkidu93 and @johnml1135)


src/SIL.Machine/Translation/IWordAlignmentEngine.cs line 8 at r2 (raw file):

Previously, johnml1135 (John Lambert) wrote…

The IWordAligner interface does not handle Tokenization. So either we add tokenization to the IWordAligner interface, or we have a different interface. The lack of tokenization is why I didn't use it before.

A while back I did a refactor of the translation interfaces to support passing strings as well as lists of tokens. Do you want something similar for the word alignment interfaces? I never got around to doing the same thing for the word alignment interfaces. I always planned to, so if we need it, then we should add it. It does look like the IWordAlignmentEngine interface does not have any methods like that though.

@johnml1135
Copy link
Collaborator Author

Is everything ready? There are still a few comments unaddressed. I am ok with the removing of the engines and just squashing the behavior into the higher level (the model level).

Copy link
Collaborator

@Enkidu93 Enkidu93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry - I missed a round of Damien's comments somehow. I think everything's done now.

Reviewable status: 4 of 20 files reviewed, 4 unresolved discussions (waiting on @ddaspit and @johnml1135)


src/SIL.Machine/Corpora/AlignedWordPair.cs line 29 at r2 (raw file):

Previously, ddaspit (Damien Daspit) wrote…

Looks good. I would return as soon as a TryParseIndex fails.

Done.


src/SIL.Machine/Corpora/AlignedWordPair.cs line 100 at r8 (raw file):

Previously, ddaspit (Damien Daspit) wrote…

I would perform this check first.

Done.


src/SIL.Machine/Translation/IWordAlignmentEngine.cs line 8 at r2 (raw file):

Previously, ddaspit (Damien Daspit) wrote…

A while back I did a refactor of the translation interfaces to support passing strings as well as lists of tokens. Do you want something similar for the word alignment interfaces? I never got around to doing the same thing for the word alignment interfaces. I always planned to, so if we need it, then we should add it. It does look like the IWordAlignmentEngine interface does not have any methods like that though.

(I'm sorry - I think I missed your earlier comment, Damien. I think this fine now, right?)


src/SIL.Machine/Translation/IWordAlignmentModel.cs line 10 at r8 (raw file):

Previously, johnml1135 (John Lambert) wrote…

I agree - see other comments.

Done.

@johnml1135
Copy link
Collaborator Author

:lgtm:

Copy link
Contributor

@ddaspit ddaspit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 13 of 13 files at r10, 4 of 4 files at r11, all commit messages.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @johnml1135)


src/SIL.Machine/Translation/SymmetrizedWordAlignmentModel.cs line 31 at r11 (raw file):

        }

        public IWordAlignmentModel DirectWordAlignmentEngine

This property was renamed. It should be reverted back to the original name.


src/SIL.Machine/Translation/SymmetrizedWordAlignmentModel.cs line 41 at r11 (raw file):

        }

        public IWordAlignmentModel InverseWordAlignmentEngine

This property was renamed. It should be reverted back to the original name.

@Enkidu93
Copy link
Collaborator

Reviewed 13 of 13 files at r10, 4 of 4 files at r11, all commit messages.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @johnml1135)

src/SIL.Machine/Translation/SymmetrizedWordAlignmentModel.cs line 31 at r11 (raw file):

        }

        public IWordAlignmentModel DirectWordAlignmentEngine

This property was renamed. It should be reverted back to the original name.

src/SIL.Machine/Translation/SymmetrizedWordAlignmentModel.cs line 41 at r11 (raw file):

        }

        public IWordAlignmentModel InverseWordAlignmentEngine

This property was renamed. It should be reverted back to the original name.

Done. Sorry I missed that!

Copy link
Collaborator Author

@johnml1135 johnml1135 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 17 of 17 files at r8, 4 of 4 files at r9, 13 of 13 files at r10, 4 of 4 files at r11, 13 of 13 files at r12, all commit messages.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @ddaspit)

Copy link
Contributor

@ddaspit ddaspit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 13 of 13 files at r12, all commit messages.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @johnml1135)

johnml1135 and others added 7 commits February 12, 2025 17:08
Add tokenizer to trainer
Small bug in NParallelTextCorpus

Extend parsing of aligned word pairs to accommodate NULLs

Add save functionality for WA; small bug in NPTC

Edits to AlignedWordPair functionality

PR review fixes

Get rid of WordAlignmentResult
Copy link
Collaborator Author

@johnml1135 johnml1135 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed all commit messages.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @johnml1135)

@johnml1135 johnml1135 merged commit 485acb3 into master Feb 12, 2025
4 checks passed
@johnml1135 johnml1135 deleted the word_alignment_try_2 branch February 12, 2025 22:45
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants