Resilience Engineer
## Resilience Engineer
LISBOA, Portugal
By connecting people, places, and things, Vodafone Io
T enables organisations to thrive in the digital world. Leveraging our expertise in connectivity, our advanced Io
T platform, and our extensive global reach, we deliver the results necessary for our customers' progress and success. We support businesses of all sizes and sectors in their efforts to connect for a better future. The Vodafone Internet of Things (Io
T) suite of products and services is specifically designed to meet the demands of emerging business verticals. Our connection base has experienced a 20%
-
- year growth, reaching over 200 million connections by the end of the financial year 2025. Vodafone Io
T maintains its leadership as a
- time consecutive leader in the Io
T Connectivity Gartner Magic Quadrant. To address the technological needs of Io
T, Vodafone has developed an
- leading Io
T Connectivity Management Platform, targeting key strategic growth opportunities to meet the global requirements of Io
T customers. Vodafone has also carved out the Io
T Connectivity business to secure additional external investment and maintain our leading position in the industry through the following. 1. Continue accelerating and enhancing our Platform as a Service for Vodafone customers on footprint. 2. Introduce service propositions in markets beyond Vodafone's current footprint. 3. Address long tail lower volume segment through digital
- service platform globally. We are seeking a senior Resilience Engineer to own and evolve the stability, availability, and recoverability of our Io
T platforms. This role operates at the intersection of system architecture, reliability engineering, and operational excellence, with
-
- end accountability for designing resilience into our services. You will define and govern resilience strategies, influence platform architecture, and partner across product, infrastructure, and engineering teams to ensure our systems continue to perform under failure, scale, and unexpected disruption. * Developing and governing resilience strategies across system architecture, deployment, monitoring, and incident response;
* Defining and tracking stability KPIs (e. G. , MTTD, MTTR, error budgets), partnering with performance and operations teams to meet or exceed targets;
* Designing and implementing fault injection testing, chaos engineering practices, and
- based simulations to validate platform robustness;
* Collaborating with product, infrastructure, architecture and development teams to
- design services with
- in redundancy, failover, and graceful degradation;
* Driving automation andobservability improvements to reduce noise, increase fault detection speed, and support predictive failure mitigation;
* Contributing to the design and maintenance of our Business Continuity and Disaster Recovery Plan (BCDR), ensuring Io
T systems remain resilient and recoverable in the face of unexpected disruptions;
* Owning the resilience roadmap and continuously assessing emerging threats, technologies, and architectural shifts to guide evolution of stability practices;
* Evangelizing a cultureof resilience through internal communication, workshops, and
- incident learning programs;
* Deliver
- quality engineering solutions while continuously strengthening the resilience, scalability, and cost efficiency of our Io
T platform;
* Consistently meet or exceed delivery expectations by prioritizing the
- leverage resilience initiatives that improve customer experience, business outcomes, and financial performance;
* Build trusted, transparent, and
- driven relationships by providing clear technical direction and
- off recommendations to business and engineering stakeholders. ## **Who you are*** Educated to BSc degree level in Software Engineer or related discipline with Computer Science* Strong scripting and automation experience (e. G. , Python, Bash, Go, Power
Shell), with a demonstrated ability to replace manual processes with reliable, scalable automation;
* Proven experience designing and operating
- availability,
- tolerant systems, including the use of chaos engineering techniques and proactive
- mitigation strategies;
* Experience applying Business Continuity and resilience standards (e. G. , ISO 22301) in the context of
- world platform design and operational readiness;
* Hands-on experience designing or integrating monitoring, alerting, and automated testing frameworks to support early fault detection and system validation;
* Broad experienceworking with Linux-based platforms across
- premises and cloud environments, with an understanding of how infrastructure choices impact reliability, scalability, and recovery;
* Deep expertise in Site
Reliability Engineering principles, including SLOs/SLIs, error budgets, observability, toil reduction, and automation, with the ability to apply them at platform and system scale to guide architectural decisions and
- term resilience strategy;
* Proven ability to balance
- term platform stability with delivery velocity by making clear,
- driven
- offs;
* Strong understanding of security principles, practices, and standards, and the ability to incorporate them into resilient,
- world technical solutions;
* Deep command of telemetry, logging, and alerting ecosystems (e. G. , Prometheus, Grafana, ELK, Datadog, Splunk), with the ability to design signals that enable early fault detection and informed
- making;
* Experience defining meaningful SLIs and building dashboards that drive architectural insight, prioritization, and corrective action;
* Proven experience leading blameless
- incident reviews, root cause analysis, and systemic improvements across multiple teams;
* Expertise in identifying and addressing system bottlenecks, latency issues, and throughput constraints in distributed environments;
* Proficiency in forecasting demand, planning capacity, and managing system growth in a
- efficient and sustainable manner;
* Strong track record ofpartnering with software engineering, infrastructure, product, and business teams to embed resilience into the full development lifecycle;
* Fluency in English. We are a leading international Telco, serving millions of customers. At Vodafone, we believe that connectivity is a force for good. If we use it for the things that really matter, it can improve people's lives and the world around us. Through our technology we empower people, connecting everyone regardless of who they are or where they live and we protect the planet, whilst helping our customers do the same. Belonging at Vodafone isn't a concept;
it's lived, breathed, and cultivated through everything we do. You'll be part of a global and diverse community, with many different minds, abilities, backgrounds and cultures. ;
We're committed toincrease diversity, ensure equal representation, and make Vodafone a place everyone feels safe, valued and included. If you require any reasonable adjustments or have an accessibility request as part of your recruitment journey, for example, extended time or breaks in between online assessments, please refer to for guidance. Together we can. Top skills
Ansible#J-18808-Ljbffr
- Informações detalhadas sobre a oferta de emprego
Empresa: Vodafone Localização: Viseu
Viseu, Viseu District, PortugalPublicado: 14. 1. 2026
Vaga de emprego atual
Seja o primeiro a candidar-se à vaga de emprego oferecida!