Skip to content

predict_proba functionality for kNN. Small bugfix. More concise and e…#362

Open
skywardfire1 wants to merge 1 commit intosmartcorelib:developmentfrom
skywardfire1:feature/predict_proba_knn
Open

predict_proba functionality for kNN. Small bugfix. More concise and e…#362
skywardfire1 wants to merge 1 commit intosmartcorelib:developmentfrom
skywardfire1:feature/predict_proba_knn

Conversation

@skywardfire1
Copy link
Contributor

@skywardfire1 skywardfire1 commented Mar 20, 2026

This PR adds:

  1. Bugfix: Robust class selection. Fixed initialization of let mut max_c = 0f64. If something went wrong, the result will always be a "class 0". Now correctly initializes with the first computed probability.
  2. Refactoring & DRY. predict_for_row is now a thin wrapper around predict_proba_for_row, the latter is somewhat of "main routine" for 2 or 3 functions, so we avoid repeating the code.
  3. predict_proba functionality for kNN. Returns class probability distributions for input samples. Works with both search algorithms.
  4. Extensive test suite. Two tests, contains ~15-20 distinct assertion checks, covering:
    • Validity of probability distributions (sum to 1.0, range [0, 1])
    • Consistency between predict() and predict_proba()
    • Equivalence of results across search algorithms
    • Edge cases: zero-weight sums, extreme k values, multiclass labels
    • Behavior differences between Uniform and Distance weighting
    • Batch prediction correctness
  5. Backward compatibility No breaking changes to the public API. All existing tests pass.

I also made a real world test.

This is the predictions of a prev functionality

 [0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 5, 0, 1, 1, 5, 0, 1, 1, 2, 0, 2, 2, 0, 0, 4, 3, 3, 3, 3, 3, 0, 1, 0, 4, 4, 4, 4, 4, 4, 4, 5, 5, 1, 5, 5, 5, 5, 5, 0, 0, 7]
 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 1, 5, 0, 1, 3, 1, 2, 2, 2, 2, 2, 0, 3, 3, 3, 3, 3, 0, 0, 0, 4, 2, 4, 4, 4, 4, 4, 1, 5, 0, 5, 5, 5, 5, 5, 5, 0, 7]
 [0, 0, 0, 0, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 3, 1, 1, 4, 1, 1, 0, 1, 2, 4, 2, 4, 2, 0, 0, 3, 3, 3, 3, 0, 4, 0, 4, 0, 4, 0, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5, 0, 5, 0]
 [0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 5, 1, 1, 1, 1, 1, 2, 4, 2, 2, 4, 1, 2, 3, 3, 3, 3, 3, 4, 0, 4, 4, 0, 4, 3, 4, 4, 4, 5, 5, 1, 5, 5, 0, 5, 5, 5, 0]
 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 2, 0, 0, 2, 0, 0, 4, 3, 3, 3, 0, 0, 0, 4, 0, 0, 3, 4, 4, 0, 4, 0, 0, 5, 5, 5, 5, 4, 5, 5, 2]

Those are new ones

 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 5, 0, 1, 1, 5, 0, 1, 1, 2, 0, 0, 2, 0, 0, 4, 3, 3, 3, 3, 3, 0, 1, 0, 4, 0, 4, 4, 4, 4, 4, 5, 5, 1, 0, 1, 5, 5, 5, 0, 2, 7]
 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 1, 3, 1, 2, 2, 2, 2, 2, 0, 3, 3, 3, 3, 3, 0, 0, 0, 4, 2, 4, 4, 0, 0, 4, 1, 0, 0, 5, 0, 5, 5, 5, 5, 0, 0]
 [0, 0, 0, 0, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 3, 1, 0, 0, 1, 1, 0, 1, 2, 4, 2, 1, 0, 0, 0, 3, 3, 3, 3, 0, 4, 0, 4, 0, 4, 0, 4, 4, 4, 0, 5, 5, 0, 5, 5, 5, 0, 0, 1, 7]
 [0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 1, 1, 0, 1, 1, 0, 0, 2, 0, 4, 0, 0, 3, 3, 3, 3, 3, 4, 0, 4, 4, 0, 0, 3, 4, 4, 4, 5, 5, 1, 5, 5, 0, 5, 5, 5, 0]
 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 2, 0, 0, 2, 4, 2, 4, 0, 3, 3, 0, 0, 0, 4, 0, 0, 0, 4, 4, 0, 4, 0, 0, 1, 5, 5, 5, 1, 5, 5, 1]

Pls note, that they are not identical. What we should also remember is that every result shown is actually a result of 18 kNN models voting, where every one kNN model is created using smartcore.

The difference, in my opinion, comes from float computing issues. And I did my research to ensure our new kNN realization is slightly more concise and exact. I have made some computations more mathematically correct.

@skywardfire1 skywardfire1 requested a review from Mec-iS as a code owner March 20, 2026 14:39
@codecov
Copy link

codecov bot commented Mar 20, 2026

Codecov Report

❌ Patch coverage is 62.06897% with 11 lines in your changes missing coverage. Please review.
✅ Project coverage is 44.35%. Comparing base (70d8a0f) to head (9b94198).
⚠️ Report is 11 commits behind head on development.

Files with missing lines Patch % Lines
src/neighbors/knn_classifier.rs 62.06% 11 Missing ⚠️
Additional details and impacted files
@@               Coverage Diff               @@
##           development     #362      +/-   ##
===============================================
- Coverage        45.59%   44.35%   -1.24%     
===============================================
  Files               93       95       +2     
  Lines             8034     8017      -17     
===============================================
- Hits              3663     3556     -107     
- Misses            4371     4461      +90     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant