Skip to content

πŸ•΅πŸ»β€β™‚οΈ Explore your data for better labeling πŸ•΅πŸΌ

Notifications You must be signed in to change notification settings

IanMenendez/bundler

Β 
Β 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

46 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

bundler

Bundler is a tool to help you gain insights from your dataset. Given 2D embeddings you can use the UI to either spot wrong annotations or simply label samples per batch.

Install

You first need to clone the repo:

git clone https://github.com/JulesBelveze/bundler.git

Then download and install the package dependencies using poetry:

python3 -m venv .venv/bundler
source .venv/bundler/bin/activate
pip3 install --upgrade pip
poetry install
To install poetry
curl -sSL https://raw.githubusercontent.com/python-poetry/poetry/master/get-poetry.py | python3 -

Disclaimer

The original tool from koaning can be found here, but I forked the project and kept working on a duplicate as our use cases were different.

Usecase

The interface may help you label very quickly. It enables you to directly create tabs on Label Studio to either correct or create labels.

As you might have noticed the tool was developed to handle textual data. However, it should be easily extendable to any other data type.

Usage

To use bundler, you first need to prepare your dataset to be mapped into a 2D embedding space. You can find a couple of scripts under the scripts/ folder on how to retrieve such coordinates.

Once your dataset is ready you can now open the UI and start exploring your data by running:

python3 -m bundler text [MY_FILE]

Note that one of the main feature of bundler is to be able to create tabs directly in Label Studio. However, to do so you need to authenticate and specify the project of interest. To do so run the following before running bundler:

export LS_TOKEN=<YOUR_LS_TOKEN>
export LS_ENDPOINT=<ENDPOINT_OF_YOUR_LS>

About

πŸ•΅πŸ»β€β™‚οΈ Explore your data for better labeling πŸ•΅πŸΌ

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%