Site Reliability Engineer - Flutter Functions, Hybrid - Porto

Site Reliability Engineer - Flutter Functions, Hybrid
Porto
Porto, Porto District, Portugal

Site Reliability Engineer - Flutter Functions, Hybrid

Join to apply for the Site Reliability Engineer - Flutter Functions, Hybrid role at Betfair Romania Development.

4 days ago Be among the first 25 applicants

About Betfair Romania Development

Betfair Romania Development is the largest technology hub of Flutter Entertainment, with over 2, 000 people powering the world’s leading sports betting and i
Gaming brands. Exciting, immersive and safe experiences are delivered to over 18 million customers worldwide, from our office in Cluj‑Napoca. Driven by relentless innovation and commitment to excellence, we operate our own unbeatable portfolio of diverse proprietary brands such as Fan
Duel, Poker
Stars, Sports
Bet, Betfair, Paddy Power, or Sky Betting & Gaming.

Our Values

We are looking for passionate individuals who align with our values and are committed to making a difference.

Win together | Raise the bar | Got your back | Own it | Positive impact

About Flutter Functions

The Flutter Functions division is a key component of Flutter Entertainment, responsible for providing essential support and services across the organization. The division encompasses various corporate functions, including finance, legal, human resources, technology, and more, ensuring seamless operations and strategic alignment throughout the company.

Role Overview

The Site Reliability Engineer will be responsible for ensuring the reliability, availability, and performance of Flutter Entertainment's critical gaming and betting platforms across our global operations. This role combines software engineering expertise with operational excellence to maintain 24/7/365 service availability for millions of customers worldwide. As part of the Service Management Function within Flutter Functions, you will collaborate closely with development teams, infrastructure specialists, and business stakeholders to maintain the
- performance, scalable systems that power our i
Gaming & Sport platforms across multiple markets. Your role will involve implementing automation, monitoring, and incident response procedures to support Flutter's mission of delivering
- class entertainment experiences.

You understand and embrace the philosophy of continuous improvements and have experience of leading teams operating within a CI culture. You don't complain about recurring incidents – you drive process improvements and implement preventative measures to eliminate root causes. You work with internal and external teams to drive best in class to develop
- world solutions and positive user experiences for every interaction.

This role requires exceptional communication skills, as interaction and engagement with senior management during incident escalations and
- incident reviews will be a regular aspect of the role.

Key Accountabilities & Responsibilities

Maintain 99. 9%+ uptime for critical gaming and betting platforms serving millions of concurrent users
Design and implement monitoring, alerting, and observability solutions using tools such as Grafana, Splunk & Cloud
Watch
Conduct capacity planning and performance optimization to ensure systems can handle peak loads during major sporting events
Establish and maintain Service Level Objectives (SLOs) and Service Level Indicators (SLIs) for all critical services with support from Service Management
Support Prod
Ops and Service Management teams during P1/P2 incident response, providing technical expertise and facilitating
- functional coordination to minimize customer impact
Collaborate with Service Management on
- incident reviews, contributing technical insights and supporting the implementation of preventative measures to reduce repeat occurrences
Assist in developing and maintaining comprehensive runbooks and incident response procedures in partnership with Service Management teams
Grafana Stack Management: Design, deploy, and maintain comprehensive Grafana dashboards for
- time system visibility across all Flutter platforms
Advanced Visualization: Create custom Grafana panels and dashboards for business metrics, technical KPIs, and operational insights tailored to different stakeholder needs
Multi-Source Data Integration: Configure and optimize Grafana data sources including Prometheus, Influx
DB, Elasticsearch, Cloud
Watch, and custom APIs
Alerting Strategy: Implement intelligent alerting rules using Grafana Alerting, reducing alert fatigue while ensuring critical issues are promptly escalated
Performance Monitoring: Establish application performance monitoring (APM) using Grafana Agent and integrate with existing observability stack
Custom Metrics Development: Work with development teams to implement custom business and technical metrics that provide actionable insights
Partner with development teams to improve application reliability and deployment practices
Mentor junior team members and contribute to the development of SRE practices across Flutter
Participate in architecture reviews and provide reliability expertise for new system designs
Document procedures, troubleshooting guides, and system architecture for knowledge sharing
Look for ways to use AI to triage and investigate alerts allowing for more rapid resolution
Use AI to find root cause by connecting the dots between code changes, alerts and past incidents
Investigate the use of AI to provide more collaboration and identify possible resolutions to incidents

Skills, Capabilities & Experience Required

Cloud Platforms: Advanced experience with AWS, Azure, or Google Cloud Platform services and architecture
Containerization: Proficiency with Docker and Kubernetes for container orchestration and management
Programming: Strong scripting abilities in Python, Go, Bash, or Power
Shell; familiarity with Java or. NET advantageous
Monitoring & Observability: Hands-on experience with Prometheus, Grafana, ELK stack, or similar monitoring solutions
CI/CD: Proficiency with Jenkins, Git
Lab CI, Azure Dev
Ops, or similar continuous integration tools
Database Technologies: Working knowledge of SQL databases (Postgre
SQL, My
SQL) and No
SQL solutions
Networking: Understanding of load balancers, CDNs, DNS, and network security principles

Benefits

Hybrid & remote working options
€1, 000 per year for
- development
Company share scheme
25 days of annual leave per year
20 days per year to work abroad
5 personal days/year
Flexible benefits: travel, sports, hobbies
Extended health, dental and travel insurances
Customized
- being programmes
Career growth sessions
Thousands of online courses through Udemy
A variety of engaging office events

Disclaimer

We are an inclusive employer. By embracing diverse experiences and perspectives, we create a lasting, positive impact for our employees, customers, and the communities we’re part of. You don't have to meet all the requirements listed to apply for this role. If you need any adjustments to make this role work for you, let us know, and we’ll see how we can accommodate them.

We thank all applicants for their interest; however, only the candidates who best meet the job requirements will be contacted for an interview.

By submitting your application online, you agree that your details will be used to progress your application for employment. If your application is successful, your details will be used to administer your personnel record. If your application is unsuccessful, we will retain your details for a period no longer than three years, to consider you for prospective roles within the company.

Seniority level

Mid-Senior level

Employment type

Full-time

Job function

Engineering and Information Technology

Industries

Software Development

Referrals increase your chances of interviewing at Betfair Romania Development by 2x

Get notified about new Site Reliability Engineer jobs in Cluj‑Napoca, Cluj, Romania.

#J-18808-Ljbffr

Informações detalhadas sobre a oferta de emprego

Empresa:	Betfair Romania Development
Localização:	Porto Porto, Porto District, Portugal
Publicado:	31. 10. 2025 Vaga de emprego atual

Responder ao anúncio
Seja o primeiro a candidar-se à vaga de emprego oferecida!

Site Reliability Engineer - Flutter Functions, Hybrid Porto