Site Reliability Engineer

We are looking for a Site Reliability Engineer to join the team of our client – a company specialized in the technology sector.

Operate and support the production environment, responding to incidents and ensuring systems remain highly available;
Triage and troubleshoot production issues across services, infrastructure and network layers;
Monitor systems using observability tools, contributing to alert tuning and service level objectives;
Collaborate with platform teams to improve reliability, operability, and scalability;
Execute standard operational procedures (e. g. deployments, rollbacks, failovers);
Identify common BAU operational tasks and automate them in a safe, auditable and scalable way.

Degree in Computer Science, Engineering, or other similar area;
At least 2-3 years of experience in a similar role;
Solid understanding of Linux systems administration (troubleshooting, permissions, system services);
Experience with AWS services (e. g. , VPCs, EC2, S3, IAM, EKS) and Kubernetes;
Hands‑on experience with production environments, preferably in roles such as SRE, Cloud Support Engineer or Production Support Engineer;
Familiarity with incident response and operational run books;
Skills in Bash, Go, Python, or similar;
Familiarity with CI/CD pipelines and deployment automation;
Knowledge of monitoring/logging tools like Prometheus, Grafana and ELK
Exposure to security and compliance practices in cloud environments;
Strong communication and collaboration skills;
Calm under pressure, particularly during incident response;
Eagerness to learn and continuously improve operational excellence;
Fluency in English, written and spoken.

Sounds like you? Send us your CV and let’s talk!

#J-18808-Ljbffr

Responder ao anúncio
Seja o primeiro a candidar-se à vaga de emprego oferecida!

Ofertas de emprego interessantes nas proximidades: