by Gaurav on Monday, 21 May 2012

Vote on this proposal
Status: Confirmed

Session type

Technical level


Being able to monitor a distributed system for various system/application level statistics using popular open source tools


Active real-time monitoring is one of the most basic prerequisites for designing a scalable distributed system. The easier it is to track/add custom metrics across the distributed system, the easier it is to get a clear idea of the current system performance, identify bottlenecks, implement design changes to scale in a certain direction.

Ganglia is a scalable distributed monitoring system for high-performance computing systems such as clusters and grids. Nagios is a popular IT infrastructure monitoring tool which we use for managing email/sms alerts. This talk is on how we use and integrate these open source tools to make a customized system with ease of integration and centralized metric gathering that helps us get a clear picture of the current state of the server farm, parallelly execute commands across a selection of these servers, and get notified of any erroneous state as and when it happens.

Speaker bio

I am a linux enthusiast who works with the Platforms & Systems team at Capillary Technologies. Develop and optimize for scalability, various apps in the cloud.


  • 1
    [-] Kartik Singhal (@k4rtik) 5 years ago

    Can we get the slides of your presentation?

Login with Twitter or Google to leave a comment