Skip to content

spurgeonhc/k-means-u-star

This branch is 1 commit behind gittar/k-means-u-star:master.

Folders and files

NameName
Last commit message
Last commit date
Jun 25, 2017
Jul 1, 2017
Jul 2, 2017
Jun 29, 2017
Jul 2, 2017
Jul 1, 2017

Repository files navigation

The k-means-u* algorithm

non-local jumps and greedy retries improve k-means++ clustering

GitHub Logo

This repository contains example python code for the k-means-u and k-mean-u* algorithms as proposed in https://arxiv.org/abs/1706.09059.

Quick Start

  • clone repository: git clone https://github.com/gittar/k-means-u-star
  • cd main directory: cd k-means-u-star
  • install miniconda or anaconda: https://conda.io/docs/install/quick.html
  • create kmus environment: conda env create -f envsimple.yml
  • activate environment: source activate kmus (on windows: activate kmus)
  • start one of the jupyter notebooks, e.g.: jupyter notebook notebooks/algo-pure.ipynb
  • continue in the browser window which opens (jupyter manual: http://jupyter-notebook.readthedocs.io/en/latest/)

jupyter notebooks:

  • algo-pure.ipynb
    a bare-bones implementation meant for easy understanding of the algorithms
  • simu-detail.ipynb
    detailed simulations and graphics to illustrate the way the algorithms work, uses kmeansu.py
  • simu-bulk.ipynb
    systematic simulations with various data sets to compare k-means-++, k-means-u and k-means-u*, uses kmeansu.py
  • dataset_class.ipynb
    examples for using the data generator

python files:

  • kmeansu.py
    main implementation of k-means-u and k-means-u*, makes heavy use of http://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html for efficient implementations of k-means and k-means++, gathers certain statistics while training to enable systematic evaluation, code therefore a bit larger
  • bfdataset.py
    contains a class "dataset" to generate test data sets and also an own implementation of k-means++ which allows to access the codebook after initialization but before the run of k-means
  • bfutil.py
    various utility functions for plotting etc.

About

implementation of the k-means-u* clustering algorithm

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 99.4%
  • Python 0.6%