Site Reliability Engineer - Production Engineering
About the Job:
We're hiring a Site Reliability Engineer to join one team and support the reliability and performance of
- generation software systems at Out
Systems, a global leader in
- code application development platforms.
If you like automation,
- native infrastructure, and enabling development teams to build reliable and scalable systems, this is your chance to play a key role in a
- thinking tech environment.
Responsibilities:
- Act as a key partner to development teams, driving the adoption of Site Reliability Engineering best practices.
- Define, implement, and manage Service Level Objectives (SLOs) and Service Level Agreements (SLAs).
- Design and implement resilient, secure, and scalable infrastructure using
- native technologies. - Lead efforts to improve observability, including monitoring, alerting, logging, and tracing.
- Drive the incident management lifecycle, from detection to resolution, and lead
- incident reviews. - Automate operational processes to reduce toil and increase system reliability.
- Champion a culture of continuous improvement, knowledge sharing, and accountability.
- Participate in a 24/7
- call rotation to support production systems.
Required Skills & Experience:
- 5+ years of experience in Software Engineering, Dev
Ops, or Site Reliability Engineering. - Proficiency in at least one programming language: Python, Go, Java, C#, or similar.
- Experience with Kubernetes, EKS, and container orchestration platforms.
- Familiarity with AWS services (e. g. , EC2, RDS, Lambda, ELB, Cloud
Front). - Hands-on experience with Infrastructure as Code (Terraform, Cloud
Formation, Puppet, etc. ). - Skilled in implementing monitoring and incident management using tools like Prometheus, Grafana, ELK stack, or equivalent.
- Strong troubleshooting and debugging skills, especially in distributed systems.
- Fluent in English with excellent communication skills.
Nice to Have:
- Certifications such as CKA, CKAD, or CKS.
- Experience with Spacelift, Chef, or similar automation tools.
- Familiarity with SLOs, SLIs, and
- driven reliability engineering.
Soft Skills:
- Excellent
- solving mindset and
- down analytical approach. - A humble, collaborative attitude and the ability to take ownership.
- Skilled in negotiation, expectation management, and stakeholder communication.
- Process-oriented and eager to challenge inefficiencies.
Location & Collaboration:
- 100% remote, but candidates must be based in Portugal.
- Strong collaboration with international product teams.
- Participation in
- call support rotation required.
- Informações detalhadas sobre a oferta de emprego
Empresa: emagine - Portugal Localização: Leiria
Leiria, Leiria District, PortugalPublicado: 23. 5. 2025
Vaga de emprego atual
Seja o primeiro a candidar-se à vaga de emprego oferecida!