Site Reliability Engineer - Production Engineering
About the Job:
We're hiring a Site Reliability Engineer to join one team and support the reliability and performance of
- generation software systems at Out
Systems, a global leader in
- code application development platforms.
If you like automation,
- native infrastructure, and enabling development teams to build reliable and scalable systems, this is your chance to play a key role in a
- thinking tech environment.
Responsibilities:
Act as a key partner to development teams, driving the adoption of Site Reliability Engineering best practices.
Define, implement, and manage Service Level Objectives (SLOs) and Service Level Agreements (SLAs).
Design and implement resilient, secure, and scalable infrastructure using
- native technologies.
Lead efforts to improve observability, including monitoring, alerting, logging, and tracing.
Drive the incident management lifecycle, from detection to resolution, and lead
- incident reviews.
Automate operational processes to reduce toil and increase system reliability.
Champion a culture of continuous improvement, knowledge sharing, and accountability.
Participate in a 24/7
- call rotation to support production systems.
Required Skills & Experience:
5+ years of experience in Software Engineering, Dev
Ops, or Site Reliability Engineering.
Proficiency in at least one programming language: Python, Go, Java, C#, or similar.
Experience with Kubernetes, EKS, and container orchestration platforms.
Familiarity with AWS services (e. g. , EC2, RDS, Lambda, ELB, Cloud
Front).
Hands-on experience with Infrastructure as Code (Terraform, Cloud
Formation, Puppet, etc. ).
Skilled in implementing monitoring and incident management using tools like Prometheus, Grafana, ELK stack, or equivalent.
Strong troubleshooting and debugging skills, especially in distributed systems.
Fluent in English with excellent communication skills.
Nice to Have:
Certifications such as CKA, CKAD, or CKS.
Experience with Spacelift, Chef, or similar automation tools.
Familiarity with SLOs, SLIs, and
- driven reliability engineering.
Soft Skills:
Excellent
- solving mindset and
- down analytical approach.
A humble, collaborative attitude and the ability to take ownership.
Skilled in negotiation, expectation management, and stakeholder communication.
Process-oriented and eager to challenge inefficiencies.
Location & Collaboration:
100% remote, but candidates must be based in Portugal.
Strong collaboration with international product teams.
Participation in
- call support rotation required.
- Informações detalhadas sobre a oferta de emprego
Empresa: emagine - Portugal Localização: Faro
Faro, Faro District, PortugalPublicado: 23. 5. 2025
Vaga de emprego atual
Seja o primeiro a candidar-se à vaga de emprego oferecida!