This course is designed to build upon your fundamental knowledge of Pentaho Data Integration (PDI).
Moving beyond the basics of creating transformations and jobs, you will learn how to use PDI in real-world project scenarios.
You'll add PDI as a data source for a variety of visualization options, utilize PDI's streaming data processing capabilties, build transformations with metadata injection, and scale and performance tune your PDI solution.
The following software need to be installed and configured:
Pentaho Business Analytics 8.1.x
Java JDK 9.0.x
Docker Toolbox
Confluent 5.x
Kafka Tool 2.x
MQTT.fx
Git / GitHub
Visual Studio Code
R
RStudio
Python 2.7
On completing this course, you will be able to:
Overview of Metadata Injection
* Metadata Injection Workflows
- Standard
- Push / Pull
- 2-phase
- Filters
* Use Case - Retail Sales
Configure PDI as a datasource for various scenarios:
* Pentaho Reports step
* Google BQ & Drive
* CDA
* Machine Learning
* Data Services
Implement a MQTT Broker
* Stream GPS co-ordinates to PDI to demonstrate IoT
* Use Case - Racing Cars
Implement Kafka
* Use Case - Twitter Stream - you will need a twitter account
Configuring Master & Slave nodes
* Clustering
* Partitioning
Scheduling
Checkpoints
Course Materials - Batch script for GitHub repositories *Requires Git to be installed.
Software - Shared File on DropBox
Beppe Raymaekers
Morgan Senechal
Caio Moreno de Souza