Senior Site Reliability Engineer, Observability - Lisboa

Senior Site Reliability Engineer, Observability
Lisboa
Lisboa, Lisboa, Portugal

Overview

Join to apply for the Senior Site Reliability Engineer, Observability role at Chainlink Labs.

Chainlink Labs is the primary contributing developer of Chainlink, the decentralized computing platform powering the verifiable web. Chainlink is the
- standard platform for providing access to
- world data, offchain computation, and secure
- chain interoperability across any blockchain. Chainlink Labs helps power verifiable applications for banking, De
Fi, global trade, and gaming by collaborating with some of the world’s largest financial institutions, notably Swift, DTCC, and ANZ. Chainlink Labs also works with top Web3 teams, including Aave, Compound, GMX, Maker, and Synthetix. Chainlink Labs was ranked as one of the Global Top 100 Most Loved Workplaces by Newsweek 2025.

The Observability Team enables Chainlink development and empowers engineers to continue building and supporting crucial products and services that have a profound impact in the blockchain industry. Reliability is vital to the success of our company. As a Senior SRE, you will help us accelerate and enable other engineering teams by increasing
- service and decreasing cognitive load. This job would be perfect for someone with a strong Dev
Ops mentality, is passionate about building and maintaining a mature Git
Ops environment, and has experience focusing on observability. The entire engineering team is expanding, and you would have plenty of opportunities to build, learn, and grow. We are committed to diversity and inclusion and welcome applicants from all backgrounds, even if you don’t meet 100% of the job requirements.

Responsibilities

Build and orchestrate Modern OTEL-based Observability Platform
Support multiple telemetry types, like metrics, logs and traces
Define and support modern governance in observability and problems at scale
Ensure reliability, security, and performance exceed our defined SLAs
Work with engineers from across the company to help troubleshoot issues, deploy new products and services, and increase velocity while decreasing cognitive load
Lead the design and deployment of monitoring/observability services to detect and alert the team of needed action
Ingest, aggregate, transform, and utilize data from a multitude of sources in our real time data pipeline
Oversee the availability, performance, and supportability of our observability infrastructure
Create processes around alert response operations and support the team to ensure the reliable delivery of oracle data
Make recommendations to ensure sufficient metrics are collected to create alerts with every new feature release
Champion reliability and security by taking the time to do your work right the first time

Requirements

7+ years of relevant professional experience. You probably have worked on a devops, infrastructure, SRE, and/or platform team before
Ability to develop software outside of the scope of typical infrastructure requirements and configurations
Experience programming in C, C++, Java, Python, Go, Perl, or Ruby
Expert knowledge in all aspects of designing, developing, and managing large
- time systems
Experience with monitoring and logging. You know how to export metrics using Prometheus, have built a Grafana dashboard or two, and have experience with a centralized logging solution like an ELK Stack, Splunk or Grafana Stack
Experience with distributed systems and container orchestration. You have maintained or even built Kubernetes clusters before and feel comfortable deploying completely new services on them
Strong communication skills. You can give and receive constructive feedback, and you do not shy away from planning meetings and code reviews

Desired Qualifications

Excitement for blockchain, Web 3. 0, and similar decentralized technologies
Experience running any infrastructure in the blockchain/web3 space
Ability to scale systems sustainably through automation and reliability improvements
Experience working remotely in a distributed team
A strong desire to grow and automate services to reduce toil

Tools and environment

AWS; Terraform/Terragrunt; Kubernetes, Calico and Argo
CD; Prometheus and Grafana; Git
Hub Actions; Packer
We expect you to be comfortable with most of those tools and proficient in several of them

All roles with Chainlink Labs are global and
- based. We ask that you overlap some working hours with EST.

We carefully review all applications and aim to respond within two weeks after the job posting closes. The closing date is listed on the job advert.

Commitment to Equal Opportunity: Chainlink Labs is an equal opportunity employer. All qualified applicants will receive equal consideration for employment in compliance with applicable laws, regulations, or ordinances. If you need assistance or accommodation due to a disability, contact us via this form.

Global Data Privacy Notice for Job Candidates and Applicants: Information collected and processed as part of your Chainlink Labs Careers profile, and any job applications you choose to submit is subject to our Privacy Policy. By submitting your application, you are agreeing to our use and processing of your data as required.

#J-18808-Ljbffr

Informações detalhadas sobre a oferta de emprego

Empresa:	Chainlink Labs
Localização:	Lisboa Lisboa, Lisboa, Portugal
Publicado:	31. 10. 2025 Vaga de emprego atual

Responder ao anúncio
Seja o primeiro a candidar-se à vaga de emprego oferecida!

Senior Site Reliability Engineer, Observability Lisboa