DevOps & ML Ops Engineer
Join to apply for the Dev
Ops & ML Ops Engineer role at Trans
Perfect
Dev
Ops & ML Ops Engineer would be responsible for developing and maintaining scalable, stable services that deliver machine learning models to end users with guaranteed uptime. The primary focus will be on the infrastructure, deployment, and continuous integration/continuous delivery (CI/CD) processes for our ML services.
Responsibilities
- Manage resource allocation and workload scheduling for multiple ML services, ensuring efficient utilization of CPU/GPU resources and creating reliable queues based on service priorities.
- Maintain VM environments and manage OS updates, keeping the VM inventory
-
- date. - Collaborate with the Dev and QA team to detect hot spots in our applications and set preventive measures before they become live issues.
- Troubleshoot and provide solutions for system configurations.
- Plan, execute, and test disaster recovery.
- Monitor and examine all application, performance, event, and system logs to assist in troubleshooting.
- File all IT/Colocation tickets ensuring fulfillment of requests, escalating to the right person if necessary.
- Design, develop, and maintain the infrastructure required for deploying and scaling machine learning services.
- Implement and manage CI/CD pipelines to ensure seamless and efficient deployment of ML models.
- Collaborate with data scientists, ML researchers, and language experts to understand the requirements for deploying ML models and provide necessary infrastructure support.
- Automate and streamline the build, test, and deployment processes to enhance efficiency and reduce time‑to‑market.
- Monitor and optimize the performance, availability, and scalability of production ML systems.
- Develop and maintain robust monitoring, logging, and alerting systems to proactively identify and address issues.
- Implement security best practices to protect sensitive data and ensure compliance with relevant regulations.
- Stay
-
- date with industry trends and emerging technologies related to ML Ops and Dev
Ops, proposing innovative solutions to improve our ML service delivery.
Required Skills, Experience and Qualifications
- Strong knowledge of cloud platforms (such as AWS, Azure, or GCP) and local cluster deployments, and experience in deploying and managing ML services on these platforms.
- Knowledge of distributed computing frameworks (e. g. , Spark) and big data technologies (e. g. , Hadoop, Kafka).
- Proficiency in Python, Shell, Ruby, Golang, or C++ and experience with
-
- code tools (e. g. , Terraform, Cloud
Formation). - Hands‑on experience with containerization technologies (e. g. , Docker) and orchestration frameworks (e. g. , Kubernetes).
- Familiarity with CI/CD tools (e. g. , Jenkins, Git
Lab CI/CD) and version control systems (e. g. , Git). - Solid understanding of networking, security, and system administration concepts.
- Strong problem‑solving and troubleshooting skills, with the ability to quickly analyze and resolve issues in complex ML systems.
- Excellent communication and collaboration skills, with the ability to work effectively in a team‑oriented environment.
- Bachelor's or higher degree in Computer Science, Engineering, or a related field.
- Proven experience as an ML Ops Engineer, Dev
Ops Engineer, or a similar role, with a focus on deploying and maintaining machine learning models in production environments.
Desired Skills and Experience
- Experience with machine learning frameworks and libraries, such as Tensor
Flow, Py
Torch, or scikit‑learn. - Familiarity with serverless computing and event‑driven architectures.
- Experience with logging and monitoring tools (e. g. , ELK Stack, Prometheus, Grafana).
- Understanding of software development methodologies and agile practices.
Seniority Level
Mid‑Senior level
Employment Type
Full‑time
Job Function
Engineering and Information Technology
Industries
Translation and Localization, Software Development, and IT Services and IT Consulting
- Informações detalhadas sobre a oferta de emprego
Empresa: TransPerfect Localização: Lisboa
Lisboa, Lisboa, PortugalPublicado: 13. 11. 2025
Vaga de emprego atual
Seja o primeiro a candidar-se à vaga de emprego oferecida!