Artificial intelligence, biometrics, cryptography, embedded systems and software and cyber security.
The Site Reliability Engineer, as a part of the SRE team is responsible for providing automated processes linked with building and deploying software in our own DC and AWS, developing scripts/software needed by all activities done by the SRE team. Other areas that SR Engineer will take care of are operations linked with monitoring of SLA-critical production platforms, resolving issues and manual intervention. All off these actions will be done with close cooperation with software development teams.
Key Responsibilities / Main Activities:
•Deploy platforms on public or private cloud environments and work closely to Development team to prepare operations.
•Harden and automate platforms before they go live by reviewing their design and implementation, tuning configuration as well as developing auxiliary tools and necessary monitoring of critical health indicators.
•Maintain platforms after go live by measuring and monitoring their availability, performance and overall system health.
•Recover platforms during production incidents to meet targeted SLO; perform detailed root cause analysis to prevent regressions.
•Provide technical expertise on products and support processes to internal and external customers, including defining SLI/SLO acceptable by all involved parties.
•Provide technical and first business level support to customers on a 8/5 basis. Ensure that each product has all of the O&M functionalities present.
•Validate readiness and maturity of new rollouts through development, execution and verification of automated smoke test suites.
•Participate to on-board new customers on existing services and assist them during their technical on- boarding.
•Involved in on-call duties.
Typical education: Engineering degree
Experience: +2 years
Technologies needed: Kubernetes, Docker, Bash scripting and automation, JAVA 11+, GIT.
AWS Services experience would be a plus.