by Ayyappadas Ravindran Nair (@ayyappa) on Sunday, 31 January 2016

Vote on this proposal
Status: Submitted
Full talk

Technical level


  1. Understanding life cycle of Kafka-request
  2. Understanding how a trivial (metrics addition) change caused a Kafka cluster to crumble under high load causing frontend user impact. (KAFKA-2664)


The talk is about a Kafka outage which caused frontend user impact. This is a very rare occation in Linkedin, where a backend messaging system outage causing front end impact. The presentation will touch base on Kafka request cycle, we would dissect a fetch request, will do profiling of verious API calls & also will talk about how we fixed the issue.


Good understanding of Kafka & Kafka ecosystem. We won’t be able to cover Kafka basics.

Speaker bio

I am leading “Data Infra Streaming” SRE team in Linkedin Bangalore. My team, takes care of Kafka, Samza and Zookeeper platform in Linkedin. Before joining Linkedin, I was worked for Intuit & Yahoo!. In Yahoo!, I was leading a SE (Service Engineering) team who were taking care of Hadoop platform in Yahoo. Detailed profile can be found here