Reliability Engineering Behind The Most Trusted Kafka Platform at LINE
Apache Kafka is one of the middleware used most often when architecturing services at LINE. For many requirements from service development, we are operating common Kafka platform for internal services. It has been used by many services with high service-level requirements such as LINE, LINE Ads Platform, LINE Pay and more, helping them to build reliable service quickly and easy, at the same time making our Kafka platform's service-level requirement very high. Achieving such requirement, keeping cluster's availability nearly 100% with less than 10 milliseconds response time while dealing with our traffic scale such as over 260 billion dailly messages making the challenge extreme.
Recently we had an interesting case which makes Kafka broker to lowly perform due to insanely long JVM pause which wasn't caused by GC nor other typical causes. In this talk I'm going to talk about how did we investigated this issue diving into the Linux kernel beyond JVM and finally solved it.
Reliability Engineer, Performance Engineer, whoever who has interest in Kafka, JVM or Linux kernel
Yuto Kawamura is a Senior Software Engineer at LINE. He's leading a team providing company-wide Kafka platform and working for a lot of reliability/performance engineering. He's an Apache Kafka contributor, also he gave a lot of presentations in large conferences such as Kafka Summit 2017/2018, LINE DEVELOPER DAY 2018.