Skip to content

cardmagic/classifier

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Classifier

Gem Version CI License: LGPL

A Ruby library for text classification using Bayesian and Latent Semantic Indexing (LSI) algorithms.

Table of Contents

Installation

Add to your Gemfile:

gem 'classifier'

Then run:

bundle install

Or install directly:

gem install classifier

Optional: GSL for Faster LSI

For significantly faster LSI operations, install the GNU Scientific Library.

Ruby 3+

The released gsl gem doesn't support Ruby 3+. Install from source:

# Install GSL library
brew install gsl        # macOS
apt-get install libgsl-dev  # Ubuntu/Debian

# Build and install the gem
git clone https://github.com/cardmagic/rb-gsl.git
cd rb-gsl
git checkout fix/ruby-3.4-compatibility
gem build gsl.gemspec
gem install gsl-*.gem
Ruby 2.x
# macOS
brew install gsl
gem install gsl

# Ubuntu/Debian
apt-get install libgsl-dev
gem install gsl

When GSL is installed, Classifier automatically uses it. To suppress the GSL notice:

SUPPRESS_GSL_WARNING=true ruby your_script.rb

Compatibility

Ruby Version Status
4.0 Supported
3.4 Supported
3.3 Supported
3.2 Supported
3.1 EOL (unsupported)

Bayesian Classifier

Fast, accurate classification with modest memory requirements. Ideal for spam filtering, sentiment analysis, and content categorization.

Quick Start

require 'classifier'

classifier = Classifier::Bayes.new('Spam', 'Ham')

# Train the classifier
classifier.train_spam "Buy cheap viagra now! Limited offer!"
classifier.train_spam "You've won a million dollars! Claim now!"
classifier.train_ham "Meeting scheduled for tomorrow at 10am"
classifier.train_ham "Please review the attached document"

# Classify new text
classifier.classify "Congratulations! You've won a prize!"
# => "Spam"

Persistence with Madeleine

require 'classifier'
require 'madeleine'

m = SnapshotMadeleine.new("classifier_data") {
  Classifier::Bayes.new('Interesting', 'Uninteresting')
}

m.system.train_interesting "fascinating article about science"
m.system.train_uninteresting "boring repetitive content"
m.take_snapshot

# Later, restore and use:
m.system.classify "new scientific discovery"
# => "Interesting"

Learn More

LSI (Latent Semantic Indexing)

Semantic analysis using Singular Value Decomposition (SVD). More flexible than Bayesian classifiers, providing search, clustering, and classification based on meaning rather than just keywords.

Quick Start

require 'classifier'

lsi = Classifier::LSI.new

# Add documents with categories
lsi.add_item "Dogs are loyal pets that love to play fetch", :pets
lsi.add_item "Cats are independent and love to nap", :pets
lsi.add_item "Ruby is a dynamic programming language", :programming
lsi.add_item "Python is great for data science", :programming

# Classify new text
lsi.classify "My puppy loves to run around"
# => :pets

# Get classification with confidence score
lsi.classify_with_confidence "Learning to code in Ruby"
# => [:programming, 0.89]

Search and Discovery

# Find similar documents
lsi.find_related "Dogs are great companions", 2
# => ["Dogs are loyal pets that love to play fetch", "Cats are independent..."]

# Search by keyword
lsi.search "programming", 3
# => ["Ruby is a dynamic programming language", "Python is great for..."]

Learn More

Performance

GSL vs Native Ruby

GSL provides dramatic speedups for LSI operations, especially build_index (SVD computation):

Documents build_index Overall
5 4x faster 2.5x
10 24x faster 5.5x
15 116x faster 17x
Detailed benchmark (15 documents)
Operation              Native          GSL      Speedup
----------------------------------------------------------
build_index            0.1412       0.0012       116.2x
classify               0.0142       0.0049         2.9x
search                 0.0102       0.0026         3.9x
find_related           0.0069       0.0016         4.2x
----------------------------------------------------------
TOTAL                  0.1725       0.0104        16.6x

Running Benchmarks

rake benchmark              # Run with current configuration
rake benchmark:compare      # Compare GSL vs native Ruby

Development

Setup

git clone https://github.com/cardmagic/classifier.git
cd classifier
bundle install

Running Tests

rake test                        # Run all tests
ruby -Ilib test/bayes/bayesian_test.rb  # Run specific test file

# Test without GSL (pure Ruby)
NATIVE_VECTOR=true rake test

Console

rake console

Contributing

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -am 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

Authors

License

This library is released under the GNU Lesser General Public License (LGPL) 2.1.

About

A general classifier module to allow Bayesian and LSI classifications.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 10

Languages