by Gaurav on Monday, 21 May 2012
- Session type
- Technical level
Being able to monitor a distributed system for various system/application level statistics using popular open source tools
Active real-time monitoring is one of the most basic prerequisites for designing a scalable distributed system. The easier it is to track/add custom metrics across the distributed system, the easier it is to get a clear idea of the current system performance, identify bottlenecks, implement design changes to scale in a certain direction.
Ganglia is a scalable distributed monitoring system for high-performance computing systems such as clusters and grids. Nagios is a popular IT infrastructure monitoring tool which we use for managing email/sms alerts. This talk is on how we use and integrate these open source tools to make a customized system with ease of integration and centralized metric gathering that helps us get a clear picture of the current state of the server farm, parallelly execute commands across a selection of these servers, and get notified of any erroneous state as and when it happens.
I am a linux enthusiast who works with the Platforms & Systems team at Capillary Technologies. Develop and optimize for scalability, various apps in the cloud.