Skip to content

pujansrt/data-genie

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

15 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Data-Genie

A lightweight and efficient ETL Engine in TypeScript, suitable for various operations.

πŸ“¦ Features

  • πŸ”„ Read from various data sources (CSV, TSV, JSON, NDJSON, FixedWidth, etc.)
  • ✍️ Write to multiple formats (JSON, NDJSON, CSV, TSV, FixedWidth, SQL, Console, etc.)
  • βœ‚οΈ Filter and transform data with powerful field filters
  • πŸ“Š Supports complex filtering expressions
  • πŸ”— Chainable nd high performance operations for flexible data processing
  • πŸ” Supports data validation and transformation
  • πŸ“ˆ Ideal for data cleaning, migration, and analysis
  • 🧩 Modular design for easy integration into existing projects
  • πŸ§ͺ Easy to use with TypeScript/JavaScript/Browser
  • πŸ”’ Secure and reliable with TypeScript's type safety
  • πŸ”§ Easy to install and get started (with examples)

πŸš€ Getting Started

πŸ”§ Installation

Install from npm:

npm install @pujansrt/data-genie

Or, with yarn:

yarn add @pujansrt/data-genie
Development install (clone & build)
git clone https://github.com/pujansrt/data-genie.git
cd data-genie
npm install
npm run build

πŸ“š How to use

Example to read a CSV file, filter data, and write to console

import { ConsoleWriter, CSVReader, Job, SetCalculatedField, TransformingReader, RemoveDuplicatesReader, RemoveFields } from '@pujansrt/data-genie';

async function runExample() {
  let reader: any = new CSVReader('input/credit-balance-01.csv').setFieldNamesInFirstRow(true);
  
  reader = new RemoveDuplicatesReader(reader, 'Rating', 'CreditLimit');
  
  reader = new TransformingReader(reader)
    .add(new SetCalculatedField('AvailableCredit', 'parseFloat(record.CreditLimit) - parseFloat(record.Balance)').transform())
    .add(new RemoveFields('CreditLimit', 'Balance').transform());

  await Job.run(reader, new ConsoleWriter());
  // await Job.run(filteringReader, new JsonWriter('output/filtered-data.json'));
  // await Job.run(filteringReader, new CsvWriter('output/filtered-data.csv'));
  // await Job.run(filteringReader, new FixedWidthWriter('output/filtered-data.fw').setFieldNamesInFirstRow(true).setFieldWidths(10, 15, 10, 15));
}

runExample().catch(console.error);

Writing to Fixed Width File

const fwWriter = new FixedWidthWriter('output/ex-simulated.fw').setFieldNamesInFirstRow(true).setFieldWidths(10, 15, 10, 15);

await Job.run(reader, fwWriter);

Example to read a CSV file, filter data, and write to JSON:

import { ConsoleWriter, CSVReader, FieldFilter, FilterExpression, FilteringReader, IsNotNull, IsType, Job, PatternMatch, ValueMatch } from "@pujansrt/data-genie";

async function runExample() {
  const reader = new CSVReader('input/example.csv').setFieldNamesInFirstRow(true);

  const filteringReader = new FilteringReader(reader)
    .add(new FieldFilter('Rating').addRule(IsNotNull()).addRule(IsType('string')).addRule(ValueMatch('B', 'C')).createRecordFilter())
    .add(new FieldFilter('Account').addRule(IsNotNull()).addRule(IsType('string')).addRule(PatternMatch('[0-9]*')).createRecordFilter())
    .add(
      new FilterExpression(
        'record.CreditLimit !== undefined && record.Balance !== undefined && parseFloat(record.CreditLimit) >= 0 && parseFloat(record.CreditLimit) <= 5000 && parseFloat(record.Balance) <= parseFloat(record.CreditLimit)'
      ).createRecordFilter()
    );

  await Job.run(filteringReader, new ConsoleWriter());
}
runExample().catch(console.error);

Example to read a JSON file and transform data

import {ConsoleWriter, Job, JsonReader, SetCalculatedField, TransformingReader} from "@pujansrt/data-genie";

async function runExample() {
    let reader: any = new JsonReader('input/simple-json-input.json');

    reader = new TransformingReader(reader)
        .setCondition((record) => record.balance < 0)
        .add(new SetCalculatedField('balance', '0.0').transform()); // Using SetCalculatedField for dynamic value

    await Job.run(reader, new ConsoleWriter());
}
runExample().catch(console.error);

FixedWidth Example

import {ConsoleWriter, FixedWidthReader, Job} from "@pujansrt/data-genie";

async function runExample() {
    let reader: any = new FixedWidthReader('input/credit-balance-01.fw');
    reader.setFieldWidths(8, 16, 16, 12, 14, 16, 7);
    reader.setFieldNamesInFirstRow(true);

    await Job.run(reader, new ConsoleWriter());
}
runExample().catch(console.error);

Transform, Deduplicate and Fields Manipulation Example

import {ConsoleWriter, CSVReader, Job, RemoveDuplicatesReader, RemoveFields, SetCalculatedField, TransformingReader} from "@pujansrt/data-genie";

async function runExample() {
    let reader: any = new CSVReader('input/credit-balance-01.csv').setFieldNamesInFirstRow(true);
    
    reader = new RemoveDuplicatesReader(reader, 'Rating', 'CreditLimit');
    
    reader = new TransformingReader(reader)
        .add(new SetCalculatedField('AvailableCredit', 'parseFloat(record.CreditLimit) - parseFloat(record.Balance)').transform())
        .add(new RemoveFields('CreditLimit', 'Balance').transform());

    await Job.run(reader, new ConsoleWriter());
}

runExample().catch(console.error);

Upcoming Features

  • Support for Apache Avro
  • Support for Apache Parquet
  • πŸ”— Enhanced data validation rules

πŸ§ͺ Use Cases

  • Data cleaning and transformation
  • Data validation and filtering
  • Data migration and ETL processes
  • Data analysis and reporting
  • Data integration from multiple sources

🀝 Contributing

Contributions are welcome! Please open an issue or submit a pull request.


πŸ“œ License

MIT License β€” free for personal and commercial use.


πŸ‘€ Author

Developed and maintained by Pujan Srivastava, a mathematician and software engineer with 18+ years of programming experience.

About

High performant ETL engine written in TypeScript

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published