Site Reliability Engineer
About Us
Planck Technologies is a company specialized in Software Development, dedicated to shaping futures and creating value through innovative IT solutions. By expanding teams and offering a comprehensive range of services—from Software Development and Infrastructure Management to Cybersecurity—we empower clients with all the expertise they need in one place. Inspired by the principles of quantum physics, we push beyond traditional boundaries to deliver customized solutions that redefine the IT landscape and drive shared success.
About the job
As a Site Reliability Engineer with a focus on Observability Engineering, you will:
Design, implement, and maintain
-
- end observability solutions across metrics, logs, traces, and real user monitoring;
Work
- on with the Grafana stack (Grafana Cloud, Tempo, Loki, Mimir, Alloy) and Open
Telemetry to build scalable observability pipelines;
Develop reliable alerting and monitoring systems based on SLOs/SLAs, emphasizing automation and
- healing solutions;
Ensure data accuracy and system visibility by maintaining the health of telemetry flows from instrumentation to visualization;
Collaborate with development and operations teams to embed observability principles throughout the software delivery lifecycle;
Establish and advocate best practices, standards, and frameworks for observability across the organization;
Drive modernization by evolving legacy monitoring and alerting systems into a unified,
- proof observability platform;
Partner with Fin
Ops to monitor and optimize observability costs while improving platform efficiency.
What are we looking for?
Must-have:
3+ years of professional experience as an SRE, Observability Engineer, or similar role;
Hands-on experience with Open
Telemetry or equivalent instrumentation frameworks;
Proficiency in Kubernetes, Helm, Terraform, and Argo
CD;
Strong expertise in designing and managing telemetry pipelines (metrics, logs, traces), including exporters and sidecars;
Solid background in performance monitoring, alerting, dashboarding, and root cause analysis.
Knowledge of Java development and
- level instrumentation;
Strong
- oriented mindset with a focus on automation and ownership (“you build it, you run it” culture);
Fluency in English (written and spoken).
Nice-to-have:
Knowledge of APM and distributed tracing solutions;
Experience applying Fin
Ops practices to observability cost optimization;
Proven track record in replacing or modernizing legacy monitoring stacks;
Familiarity with cloud environments (Azure preferred);
Contributions to
- source observability tools and communities.
Location: Remote, Leiria
We're waiting for you!
- Informações detalhadas sobre a oferta de emprego
Empresa: Planck Technologies Localização: Lisboa
Lisboa, Lisboa, PortugalPublicado: 22. 8. 2025
Vaga de emprego atual
Seja o primeiro a candidar-se à vaga de emprego oferecida!