by Shailesh Hegde (@shlsh) on Monday, 18 January 2016

+10
Vote on this proposal
Status: Confirmed & Scheduled
View session in schedule
Section
Crisp talk

Technical level
Intermediate

Media

Objective

To discuss resiliency testing challenges in large scale cloud deployments and how to automate them (think Chaos Monkey, but with a few key differences).

Description

This talk will cover the following:

  • What is resiliency of a large-scale distributed system ?
  • Challenges in resiliency testing of a large-scale distributed system which uses third party applications and protocols such as RabbitMQ/AMQP, Caching/NoSQL/Couchbase/Cassandra, service discovery/zookeeper, media (SIP, RTP, H323, PSTN, audio/video)
  • Gotcha! What you think won’t fail, but fails
  • Describe the Goblin framework (working to open source it in Q1 2016) that induces faults, runs tests, verifies results, recovers the system, all in a controlled manner
  • How to use Goblin for live group testing as well as nightly automated runs
  • Extending Goblin to other systems

Requirements

Working in Linux based cloud environments

Speaker bio

Currently working as a Lead QA engineer at BlueJeans Network. Part of the core team that built Goblin.

Comments

  • 1
    [-] Virendra Singh Bhalothia (@bhalothia) a year ago

    All the best, Shailesh!

  • 1
    [-] Philip Paeps (@trouble) a year ago

    What is the status of the open sourcing effort? Will it be open source by April?

  • 1
    [-] Shailesh Hegde (@shlsh) a year ago

    Philip, Yes. I expect it to be open source by April.

Login with Twitter or Google to leave a comment