Skip to content

medjed/embulk-filter-distinct

 
 

Repository files navigation

Distinct filter plugin for Embulk

filter returns distinct records by columns you configured.

Overview

  • Plugin type: filter

Configuration

  • columns: column name list to distinguish records (array of string, required)

Example

filters:
  - type: distinct
    columns: [c0, c1]

Run Example

$ ./gradlew classpath
$ embulk run -I lib example/config.yml

Note

this plugin uses a lot of memory because of having distinct column values.

TODO

  • lessen further the amount of memory by filter. i.e. use crc32 of values as distinct key?
    • want ideas!
  • test

Build

$ ./gradlew gem  # -t to watch change of files and rebuild continuously

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Java 98.1%
  • Ruby 1.9%