Skip to content

Conversation

@pmachapman
Copy link
Collaborator

@pmachapman pmachapman commented Dec 15, 2025

Fixes #816. Requires sillsdev/machine.py#254


This change is Reviewable

Copy link
Collaborator

@Enkidu93 Enkidu93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

@Enkidu93 reviewed 10 of 10 files at r1, all commit messages.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @pmachapman)

Copy link
Contributor

@ddaspit ddaspit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ddaspit reviewed 10 of 10 files at r1, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @pmachapman)


src/ServiceToolkit/src/SIL.ServiceToolkit/Services/IParallelCorpusPreprocessingService.cs line 12 at r1 (raw file):

    Task PreprocessAsync(
        IReadOnlyList<ParallelCorpus> corpora,
        Func<Row, bool, Task> train,

I would prefer an enum here. It allows us to easily support more than two types of data and make it clearer what this parameter means.

@pmachapman pmachapman force-pushed the separate_key_terms_files branch from c46a1eb to 9beab59 Compare December 15, 2025 17:41
Copy link
Collaborator Author

@pmachapman pmachapman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 6 of 11 files reviewed, 1 unresolved discussion (waiting on @ddaspit and @Enkidu93)


src/ServiceToolkit/src/SIL.ServiceToolkit/Services/IParallelCorpusPreprocessingService.cs line 12 at r1 (raw file):

Previously, ddaspit (Damien Daspit) wrote…

I would prefer an enum here. It allows us to easily support more than two types of data and make it clearer what this parameter means.

An enum is a great idea. Please let me know if the enum and its members are named appropriately.

@codecov-commenter
Copy link

Codecov Report

❌ Patch coverage is 87.67123% with 9 lines in your changes missing coverage. Please review.
✅ Project coverage is 66.11%. Comparing base (564f686) to head (9beab59).

Files with missing lines Patch % Lines
...Shared/Services/WordAlignmentPreprocessBuildJob.cs 43.75% 9 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #846      +/-   ##
==========================================
+ Coverage   66.04%   66.11%   +0.06%     
==========================================
  Files         382      382              
  Lines       20635    20696      +61     
  Branches     2700     2706       +6     
==========================================
+ Hits        13629    13683      +54     
- Misses       6043     6050       +7     
  Partials      963      963              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Contributor

@ddaspit ddaspit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

@ddaspit reviewed 5 of 5 files at r2, all commit messages.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @pmachapman)


src/ServiceToolkit/src/SIL.ServiceToolkit/Services/IParallelCorpusPreprocessingService.cs line 12 at r1 (raw file):

Previously, pmachapman (Peter Chapman) wrote…

An enum is a great idea. Please let me know if the enum and its members are named appropriately.

Looks good.

@pmachapman pmachapman merged commit f5919f2 into main Dec 16, 2025
3 checks passed
@pmachapman pmachapman deleted the separate_key_terms_files branch December 16, 2025 00:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Separate key terms from other training data on S3

5 participants