Senior Platform Engineer (Kubernetes & Data Infrastructure)
About Sybilion
Sybilion builds AI-driven market forecasting for process industries (chemicals, packaging, pulp & paper, textiles, and broader manufacturing). We help procurement, supply chain, and commercial teams make better buy/sell decisions by turning messy external signals and internal operational data into clear, defensible forecasts that teams trust and act on.
Our stack includes Python-based microservices, Postgre
SQL data infrastructure, and ML/AI workflows that support forecasting models and decision tooling.
About the Role
We’re hiring someone to own both our platform and data infrastructure: Kubernetes administration, Linux systems, CI/CD, observability, and Postgre
SQL administration for our data lakes and ML pipelines. You’ll keep production reliable, fast, secure, and scalable, while supporting the
-
- day needs of our engineers and ML workflows.
This is an
- site role in Maia (Porto). We value
- person collaboration and move quickly.
What You’ll Do
Platform / Kubernetes / Systems
- Design, deploy, and operate Kubernetes clusters in production (networking, storage, security)
- Operate Linux server infrastructure (Ubuntu/RHEL), patching, hardening, and reliability
- Manage Docker image lifecycle (builds, optimisation, registry management, security scanning)
- Implement and maintain CI/CD pipelines for microservices deployments and infrastructure changes
- Build and maintain Infrastructure as Code (Terraform, Ansible, Helm) and Git workflows
- Operate and improve monitoring, logging, and alerting (Prometheus/Grafana, ELK/EFK/Loki, etc. )
- Manage secrets and credentials securely (Vault, Sealed Secrets, or equivalent)
- Ensure high availability, capacity planning, incident response, and disaster recovery readiness
- Support GPU-enabled workloads and ML/LLM deployments (resource allocation, utilisation, scaling)
Postgre
SQL / Data Infrastructure
- Administer and optimise Postgre
SQL databases and data lake infrastructure (performance, reliability, cost) - Own backup/recovery and disaster recovery procedures (including
-
- time recovery) - Design schemas, indexing strategies, and query optimisation approaches;
analyse execution plans - Manage migrations and versioning (schema changes, rollout strategies, rollback plans)
- Implement replication/failover/clustering patterns for high availability
- Own database security: access controls, encryption at rest/in transit, audit logging, compliance needs
Python Microservices / Data Pipelines / ML Workflows
- Support deployment and troubleshooting of Python microservices (Fast
API/Flask/Django or similar) - Help maintain Python environments and dependency management (pip/poetry/conda/mamba)
- Support ETL/ELT pipelines feeding our data lake and ML training workflows
- Implement data quality checks and validation where needed
- Partner with engineers and ML team to improve runtime performance, reliability, and operational visibility
Must-Have Experience (Required)
- 5+ years of
- on production experience in: Linux, Docker, Kubernetes, and Postgre
SQL - Strong Kubernetes administration skills (clusters, networking, ingress, storage, RBAC, security)
- Strong Postgre
SQL administration skills (performance tuning, backups, replication/HA, security) - Strong Linux systems skills (operations, troubleshooting, hardening)
- CI/CD experience (Git
Hub Actions/Git
Lab CI/Jenkins or similar) - Infrastructure as Code experience (Terraform and/or Ansible;
Helm for Kubernetes) - Observability experience (metrics, logs, alerting;
- cause analysis) - Solid Python literacy for debugging services and automating operational tasks
- Strong communication skills in English and comfort working independently
-
- end - Willingness to participate in an
- call rotation for critical systems
Preferred (Nice to Have)
- Startup background (you’ve worked in small teams, moved fast, and owned outcomes
-
- end) - Experience running ML infrastructure (MLflow, Kubeflow, Airflow, KServe/Torch
Serve, etc. ) - GPU cluster experience (NVIDIA GPU Operator or similar) and model serving optimisation
- Experience with service mesh (Istio/Linkerd)
- Experience with cloud managed databases (AWS RDS, GCP Cloud SQL, Azure Database)
- Familiarity with data lake / warehouse patterns and data versioning (DVC/MLflow tracking)
- Experience with Redis/Mongo
DB or other complementary data systems
Soft Skills We Value
- Strong
- solving and analytical mindset - Calm, structured incident handling and good judgement under pressure
- Proactive improvement orientation (you spot issues before they become outages)
- High bar for security, documentation, and operational hygiene
- Collaborative approach with product, engineering, and ML teams
What We Offer
- €40, 000–€70, 000 salary range (depending on experience)
- Professional development and training budget
- Modern office environment in Maia, Porto
- Opportunity to work with
- edge ML/AI infrastructure and
- world data systems - Career growth path within a growing technology organisation
- Coffee and snacks
Work Environment
This is an
- site position based in our Maia office. We value
- person collaboration and believe being together improves speed, clarity, and ownership.
- Informações detalhadas sobre a oferta de emprego
Empresa: Sybilion Localização: Viseu
Viseu, Viseu District, PortugalPublicado: 9. 1. 2026
Vaga de emprego atual
Seja o primeiro a candidar-se à vaga de emprego oferecida!