Graduate
Site Reliability Engineers are hybrid systems and software engineers who are responsible and take ownership for reliability, scalability, automation, and other issues related to availability. Our goal is to build, scale and guard the systems that delight our guests.
Responsibilities
Design, write and build tools to improve the reliability, latency, availability and scalability.
Engender reliability and availability starting with metrics and measurements
Enable scaling by providing tools, developing training and/or augmenting processes
Build tools/automate to prevent re-occurrence of problems in mission critical products/services.
Engages with development team throughout the life cycle to help develop software for reliability and scale, ensuring minimal refactoring or changes.
Dynamically manage workload of the SRE team, drive and deliver on multiple priorities simultaneously
Provide thought leadership in architecture, design, product features and provide feedback on products built on a variety of platforms
Design, code, test, and deliver software to automate manual operational work
Troubleshoot priority incidents, facilitate blameless post-mortems and ensure permanent closure of incidents
Engage with development team throughout the life cycle to help develop software for reliability and scale, ensuring minimal refactoring or changes
Identify application patterns and analytics in support of better service level objectives
Design self-healing and resiliency patterns
Design automated software and product upgrades, change management, and release management solutions
Coach or manage teams as applicable
Participate in the 24x7 support coverage as needed
Should be self-motivated and willing to work under minimum surveillance.
Qualifications
Bachelor's degree or equivalent experience in an software engineering discipline
3 to 6 years of experience.
Experience in Software development in one or more of the following programming language is must:
Python/go,
Expertise in at least one technology stack designing, coding, testing, and delivering software
Devops Tools experience in Jenkins/Ansible/Git workflows / CICD
Proficiency in one or more technology domains, may be a cross-domain expert able to solve complex and mission critical problems within a business or across the firm
Working knowledge of infrastructure components (e.g. routers, load balancers, cloud products, container systems, compute, storage, and networks)
