Site Reliability Engineer
We are looking for a Site Reliability Engineer to join the team of our client – a company specialized in the technology sector.
Responsibilities
- Operate and support the production environment, responding to incidents and ensuring systems remain highly available;
- Triage and troubleshoot production issues across services, infrastructure and network layers;
- Monitor systems using observability tools, contributing to alert tuning and service level objectives;
- Collaborate with platform teams to improve reliability, operability, and scalability;
- Execute standard operational procedures (e. g. deployments, rollbacks, failovers);
- Identify common BAU operational tasks and automate them in a safe, auditable and scalable way.
Requirements
- Degree in Computer Science, Engineering, or other similar area;
- At least 2-3 years of experience in a similar role;
- Solid understanding of Linux systems administration (troubleshooting, permissions, system services);
- Experience with AWS services (e. g. , VPCs, EC2, S3, IAM, EKS) and Kubernetes;
- Hands‑on experience with production environments, preferably in roles such as SRE, Cloud Support Engineer or Production Support Engineer;
- Familiarity with incident response and operational run books;
- Skills in Bash, Go, Python, or similar;
- Familiarity with CI/CD pipelines and deployment automation;
- Knowledge of monitoring/logging tools like Prometheus, Grafana and ELK
- Exposure to security and compliance practices in cloud environments;
- Strong communication and collaboration skills;
- Calm under pressure, particularly during incident response;
- Eagerness to learn and continuously improve operational excellence;
- Fluency in English, written and spoken.
Sounds like you? Send us your CV and let’s talk!
- Informações detalhadas sobre a oferta de emprego
Empresa: QiBit Localização: Braga
Braga, Braga, PortugalPublicado: 18. 11. 2025
Vaga de emprego atual
Seja o primeiro a candidar-se à vaga de emprego oferecida!