Lead Site Reliability Engineer

Full Time
Posted 7 months ago

Required Experience Levels – 6 to 9 yrs

● Implement secure, scalable and automated infrastructure architectures on Public Cloud Platforms (AWS/GCP, etc.).
● Primary point responsible for the overall operability, resiliency, performance, and capacity of owned production services.
● Collaborate with Engineering Leads to execute strategic changes in the Infrastructure based on the product roadmap.
● Collaborate with other SRE’s, L2/Support and Developers in the deployment and scaling of new product features to facilitate rapid iteration and massive growth.
● Develop tools to improve our ability to rapidly deploy and effectively monitor production applications in a large-scale Linux environment.

● Proven experience in Cloud Platforms – AWS (preferred)/GCP.
● Proven experience in Linux systems administration.
● Proven production service trouble-shooting skills that span applications, systems and network.
● Strong experience in web application concepts and standards.
● Managing small teams of junior DevOps/SRE members and mentoring them.
● Demonstrated programming skills in any of Ruby/Python/Java, etc.
● Solid understanding of operational principles, such as capacity planning, monitoring and incident handling.
● Very comfortable working in an agile DevOps oriented capacity, alongside Development partners.


● Languages: Ruby/Python/Java/Go/Shell, etc.
● Infrastructure as Code: Ansible/Chef/Salt, Terraform, etc.
● Web Frameworks: Rails/Sinatra/Django/Spring etc.
● Databases: PostgreSQL, MySQL ( Hosted and RDS/CloudSQL ).
● NoSQL: Redis, MongoDB, Riak
● Version Control: Git, SVN.
● Appservers: Passenger (Nginx, Apache)/Puma/Unicorn/mod_wsgi/JBoss.
● Load Balancers: Nginx/HAProxy/F5 BigIP.
● Collaboration & ALM: Trello/TargetProcess/Target Process/Jira.
● Build Tools: Rake/Paver/Ant.
● Continuous Integration: Jenkins/ThoughtWorks GoCD.
● Monitoring & Metrics Tools: Nagios/Zabbix/SaaS monitoring tools like Scout/Datadog.
● APM: NewRelic/Dynatrace/HoneyBadger.
● Log Management: Sumologic/Kibana/Splunk/ELK.
● Cloud Platforms: Amazon AWS/Google Cloud.
● Container Orchestration: Kubernetes/Mesos/Docker Swarm.
● Cloud Networking: VPC, Calico/Flannel/etc.
● OS: RHEL/Debian and its derivatives. Windows Server 2008/2012/2016, etc.

Job Features

Job CategoryEngineering

Apply Online

A valid email address is required.
A valid phone number is required.