Skip to content

alivcor/node-red-contrib-sparkml

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

node-red-contrib-sparkml

This is a Node-RED extension pack and contains a set of nodes which offer Spark Dataframe, SQL and machine learning functionalities. All nodes have a python/pyspark core.

Allows Drag & Drop Machine Learning with Spark. Provides Visual Interface.

Features

Drag Drop Spark ML

Functionalities

This project is a WIP, and I am planning to add more nodes - as many as are available in Spark Transformers and Estimators.

Feature Extractors

  • TF-IDF
  • Word2Vec
  • CountVectorizer
  • FeatureHasher

Feature Transformers

  • Tokenizer
  • StopWordsRemover
  • n-gram
  • Binarizer
  • PCA
  • StringIndexer
  • IndexToString
  • OneHotEncoderEstimator
  • VectorIndexer
  • SQLTransformer
  • VectorAssembler

Classification Algorithms

  • Decision Tree Classifier
  • Logistic Regression
  • Gradient-boosted Tree Classifier
  • Multilayer Perceptron
  • Random Forest Classifier
  • Support Vector Machines
  • k-Nearest Neighbour Classifier

Clustering Algorithms

  • K-Means Clustering
  • Latent Dirichlet allocation (LDA)

Pre requisites

Be sure to have a working installation of Node-RED.
Install python and the following libraries:

  • Python 3.6.4 or higher accessible by the command 'python' (on linux 'python3')
  • PySpark

Install

To install the latest version use the Menu - Manage palette option and search for node-red-contrib-sparkml, or run the following command in your Node-RED user directory (typically ~/.node-red):

npm i node-red-contrib-sparkml

Usage

These flows create a dataset, train a model and then evaluate it. Models, after training, can be use in real scenarios to make predictions.

There is an example flow and a test dataset available in the 'test' folder.

Tip: You can run 'node-red' (or 'sudo node-red' if you are using linux/mac) from the folder '.node-red/node-modules/node-red-contrib-sparkml' to avoid confusion.

Example Deployment Deployment

Contributors Welcome

I am looking for contributors! Feel free to open issues directly on github or email me for any questions, suggesting features or general feedback!