Name: Harel Ben Attia
Talk Title: River - A data flow management infrastructure
Starting from the algorithms which are at Outbrain's core, and ending with Internal and Customer reporting, the Outbrain backend is a data processing monster. As the company grows, our data processing needs grow as well, leading to very complex dependencies between the various processes. These dependencies form a growing challange, both from an operational viewpoint and from a development viewpoint. The Outbrain River infrastructure has been created in order provide a solution for this challenge.
Outbrain River provides the following major features:
- Declarative job definitions
- Event-driven dependency management
- Decentralized development of data flows
- Ops-level managability
- Out-of-the-box support for JDBC and Hive/Hadoop, easily extensible to any other unit-of-work
- A clear roadmap for distributed processing and high availability
Outbrain is working on open sourcing River.
I am a senior software engineer with 12 years of experience in the field. I've been working for Outbrain in the Data Infrastructure team for the last year, and previously with vmware and b-hive Networks. With more than 7 years of experience in large scale monitoring, lots of OS and networking knowledge, and working on big data infrastructures, I am an "all around" engineer. Being part of the world of software from the age of the Sinclair Spectrum I had at age 10, and up to working with Hadoop and Storm clusters on live systems in the present, I enjoy both the macro and the micro of the software engineering world.
I am the creator of q - A Linux tool which merges the world of linux and the world of databases, allowing to treat text files as databases.
Specialties: Monitoring, Performance Analysis, Large scale designs and topologies, Python and Java, Linux, DevOps
Twitter @harelba