Pic2Vec

Featurize images using a small, contained pre-trained deep learning network

Free software: BSD license

Features

This is the prototype for image features engineering. Supports Python 2.7, 3.4, 3.5, and 3.6.

pic2vec is a python package that performs automated feature extraction for image data. It supports training models via the DataRobot modeling API, as well as feature engineering on new image data.

Input Specification

Data Format

pic2vec works on image data represented as either:

A directory of image files.
As URL pointers contained in a CSV.
Or as a directory of images with a CSV containing pointers to the image files.

If no CSV is provided with the directory, it automatically generates a CSV to store the features with the appropriate images.

Each row of the CSV represents a different image, and image rows can also have columns containing other data about the images as well. Each image's featurized representation will be appended as a series of new columns at the end of the appropriate image row.

Constraints Specification

The goal of this project was to make the featurizer as easy to use and hard to break as possible. If working properly, it should be resistant to badly-formatted data, such as missing rows or columns in the csv, image mismatches between a CSV and an image directory, and invalid image formats.

However, for the featurizer to function optimally, it prefers certain constraints:

The CSV should have no missing columns or rows, and there should be full overlap between images in the CSV and the image directory
If checking predictions on a separate test set (such as on Kaggle), the filesystem needs to sort filepaths consistently with the sorting of the test set labels. The order in the CSV (whether generated automatically or passed in) will be considered the canonical order for the feature vectors.

The featurizer can only process .png, .jpeg, or .bmp image files. Any other images will be left out of the featurization by being represented by zero vectors in the image batch.

Quick Start

The following Python code shows a typical usage of pic2vec:

import pandas as pd
from pic2vec import ImageFeaturizer

image_column_name = 'images'
my_csv = 'path/to/data.csv'
my_image_directory = 'path/to/image/directory/'

my_featurizer = ImageFeaturizer(model='xception', depth=2, auto_sample=True)

my_featurizer.load_data(image_column_name, csv_path = my_csv, image_path = my_image_directory)

my_featurizer.featurize()

Examples

To get started, see the following example:

Cats vs. Dogs: Dataset from combined directory + CSV

Examples coming soon:

Facebook Like Prediction: Dataset from unsupervised directory only, with PCA visualization
URLs: Dataset from CSV with URLs and no image directory

Installation

See the Installation Guide for details.

Installing Keras/Tensorflow

If you run into trouble installing Keras or Tensorflow as a dependency, read the Keras installation guide and Tensorflow installation guide for details about installing Keras/Tensorflow on your machine.

Using Featurizer Output With DataRobot

pic2vec generates a CSV that is ready to be dropped directly into the DataRobot application, if the data has been labelled with a variable that can be considered a target in the CSV. The image features are each treated like regular columns containing data.

Running tests

To run the unit tests with pytest, run

py.test tests

Credits

This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage project template.

Name		Name	Last commit message	Last commit date
Latest commit History 236 Commits
.pytest_cache/v/cache		.pytest_cache/v/cache
docs		docs
examples		examples
pic2vec		pic2vec
tests		tests
.gitignore		.gitignore
.travis.yml		.travis.yml
AUTHORS.rst		AUTHORS.rst
CONTRIBUTING.rst		CONTRIBUTING.rst
HISTORY.rst		HISTORY.rst
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
Makefile		Makefile
README.md		README.md
create_arrays.py		create_arrays.py
requirements.txt		requirements.txt
requirements_dev.txt		requirements_dev.txt
setup.cfg		setup.cfg
setup.py		setup.py
tox.ini		tox.ini
travis_pypi_setup.py		travis_pypi_setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Pic2Vec

Features

Input Specification

Data Format

Constraints Specification

Quick Start

Examples

Installation

Installing Keras/Tensorflow

Using Featurizer Output With DataRobot

Running tests

Credits

About

Releases

Packages

Languages

License

joristaglio/pic2vec

Folders and files

Latest commit

History

Repository files navigation

Pic2Vec

Features

Input Specification

Data Format

Constraints Specification

Quick Start

Examples

Installation

Installing Keras/Tensorflow

Using Featurizer Output With DataRobot

Running tests

Credits

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages