Global leader in Augmented Identity
• Harden platforms before they go live by reviewing their design and implementation, tuning configuration as well as developing auxiliary tools and necessary monitoring of critical health indicators
• Maintain platforms after go live by measuring and monitoring their availability, performance and overall system health
• Recover platforms during production incidents to meet targeted SLO; perform detailed root cause analysis to prevent regressions. 8×5+oncall work model
• Proactively seek improvements of non-functional requirements; cooperate with development teams to improve operational aspects of platforms under your responsibility
• Validate readiness and maturity of new rollouts through development, execution and verification of automated smoke test suites
• Provide technical expertise on products and support processes to internal and external customers, including defining SLI/SLO acceptable by all involved parties
Nice to have:
The candidate must be curious, autonomous and highly motivated by new information technologies, as the tasks require performing in-depth technical tests with problem and root cause analysis
The SRE Engineer, as a part of the Services Reliability Engineers team, is responsible for providing automated operations and preventive monitoring of SLA-critical production platforms.
SRE teams incorporate their technical background and engineering skillset in order to improve reliability, availability and efficiency of the services they operate on. Effectively, it’s “what happens when a software engineer is tasked with what used to be called operations”, as Ben Treynor stated when setting up SRE teams for Google’s search engine.