filter returns distinct records by columns you configured.
- Plugin type: filter
- columns: column name list to distinguish records (array of string, required)
filters:
- type: distinct
columns: [c0, c1]
$ ./gradlew classpath
$ embulk run -I lib example/config.yml
this plugin uses a lot of memory because of having distinct column values.
- lessen further the amount of memory by filter. i.e. use crc32 of values as distinct key?
- want ideas!
- test
$ ./gradlew gem # -t to watch change of files and rebuild continuously