Skip to content

Pentaho Data Integration - Advanced

Notifications You must be signed in to change notification settings

Gordonzi/DI-1500

Repository files navigation

Advanced Pentaho Data Integration

This course is designed to build upon your fundamental knowledge of Pentaho Data Integration (PDI).
Moving beyond the basics of creating transformations and jobs, you will learn how to use PDI in real-world project scenarios. You'll add PDI as a data source for a variety of visualization options, utilize PDI's streaming data processing capabilties, build transformations with metadata injection, and scale and performance tune your PDI solution.

Prerequisites

The following software need to be installed and configured:

Pentaho Business Analytics 8.1.x
Java JDK 9.0.x
Docker Toolbox
Confluent 5.x
Kafka Tool 2.x
MQTT.fx
Git / GitHub
Visual Studio Code
R
RStudio
Python 2.7

Course Overview

On completing this course, you will be able to:

Module 1 - Metadata Injection

  Overview of Metadata Injection
  * Metadata Injection Workflows
    - Standard
    - Push / Pull
    - 2-phase
    - Filters
  * Use Case - Retail Sales

Module 2 - PDI as a Data Source

  Configure PDI as a datasource for various scenarios:
  * Pentaho Reports step
  * Google BQ & Drive
  * CDA
  * Machine Learning
  * Data Services

Module 3 - Streaming Data

  Implement a MQTT Broker
  * Stream GPS co-ordinates to PDI to demonstrate IoT
  * Use Case - Racing Cars
  Implement Kafka
  * Use Case - Twitter Stream - you will need a twitter account

Module 4 - Scalability

  Configuring Master & Slave nodes
  * Clustering
  * Partitioning
  Scheduling
  Checkpoints

Getting Started

Course Materials - Batch script for GitHub repositories *Requires Git to be installed.

Software - Shared File on DropBox

Acknowledgments

Beppe Raymaekers
Morgan Senechal
Caio Moreno de Souza

About

Pentaho Data Integration - Advanced

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published