This demo and accompanying playbook show users how to deploy a Kafka streaming ETL using KSQL for stream processing and Confluent Control Center for monitoring. All the components in the Confluent platform have security enabled end-to-end. Run the demo with the playbook and video tutorials.
Table of Contents
The use case is a streaming ETL deployment on live edits to real Wikipedia pages. Wikimedia Foundation has IRC channels that publish edits happening to real wiki pages (e.g. #en.wikipedia, #en.wiktionary) in real time. Using Kafka Connect, a Kafka source connector kafka-connect-irc streams raw messages from these IRC channels, and a custom Kafka Connect transform kafka-connect-transform-wikiedit transforms these messages and then the messages are written to a Kafka cluster. This demo uses KSQL for data enrichment, or you can optionally develop and run your own Kafka Streams application. Then a Kafka sink connector kafka-connect-elasticsearch streams the data out of Kafka, applying another custom Kafka Connect transform called NullFilter. The data is materialized into Elasticsearch for analysis by Kibana.
You can find the documentation for running this demo, playbook, and video tutorials at https://docs.confluent.io/current/tutorials/cp-demo/docs/index.html.