Data Scientist (ML, Speech, NLP & Multimodal Expertise) - Lisboa

Data Scientist (ML, Speech, NLP & Multimodal Expertise)
Lisboa
Lisboa, Lisboa, Portugal

Overview

Join to apply for the Data Scientist (ML, Speech, NLP & Multimodal Expertise) role at Trans
Perfect

We are looking to hire a Data Scientist with strong expertise in machine learning, speech and language processing, and multimodal systems. This role is essential to driving our product roadmap forward, particularly in building out our core machine learning systems and developing
- generation speech technologies.

The ideal candidate will be capable of working independently while effectively collaborating with
- functional teams. In addition to deep technical knowledge, we are looking for someone who is curious, experimental, and communicative.

Key Responsibilities

Create maintainable, elegant code and
- quality data products that are modeled,
- documented, and simple to use.
Build, maintain, and improve the infrastructure to extract, transform, and load data from a variety of sources using SQL, Azure, GCP and AWS technologies.
Perform statistical analysis of training datasets to identify biases, quality issues, and coverage gaps.
Implement automated evaluation pipelines that scale across multiple models and tasks.
Create interactive dashboards and visualization tools for model performance analysis.

Additional Responsibilities

Design and implement robust data ingestion pipelines for
- scale text and speech corpora including automated data preprocessing and cleaning pipelines.
Create data validation frameworks and monitoring systems for dataset quality.
Develop sampling strategies for balanced and representative training data.
Implement comprehensive experiment tracking and hyperparameter optimization frameworks.
Conduct statistical analysis of training dynamics and convergence patterns.
Create automated model selection pipelines based on multiple evaluation criteria.
Design comprehensive benchmark suites with statistical significance testing.
Develop fairness metrics and bias detection systems.
Build
- time monitoring systems for model performance in production.
Implement feature drift detection and data quality monitoring.
Design feedback loops to capture user interactions and model effectiveness.
Create automated retraining pipelines based on performance degradation signals.
Develop business metrics and ROI analysis for model deployments.

Required Skills, Experience and Qualifications

Programming & Software Engineering

Python (Expert Level): Advanced proficiency in scientific computing stack (Num
Py, Pandas, Sci
Py, Scikit-learn).
Version Control: Git workflows, collaborative development, and code review processes.
Software Engineering Practices: Testing frameworks, CI/CD pipelines, and
- quality code development.

Machine Learning and Language Model Expertise

Traditional Machine Learning and Deep Learning Knowledge: Proficiency in classical ML algorithms (Naive Bayes, SVM, Random Forest, etc. ) and Deep Learning architectures.
Understanding of Transformer Architecture: Attention mechanisms, positional encoding, and scaling laws.
Training Pipeline Knowledge: Data preprocessing for large corpora, tokenization strategies, and distributed training concepts.
Evaluation Frameworks: Experience with standard NLP benchmarks (GLUE, Super
GLUE, etc. ) and custom evaluation design.
Fine-tuning Techniques: Understanding of PEFT methods, instruction tuning, and alignment techniques.
Model Deployment: Knowledge of model optimization, quantization, and serving infrastructure for large models.

Collaboration & Adaptability

Strong communication skills are a must
Self-reliant but knows when to ask for help
Comfortable working in an environment where conventional development practices may not always apply
Proactive and takes initiative rather than waiting for PBIs to be assigned when circumstances call for it
Strong interest in AI and its possibilities
Curious and open to experimenting with technologies or languages outside their comfort zone

Mindset & Work Approach

Takes ownership when things don’t go as planned
Capable of working from
- level explanations and general guidance on implementations and final outcomes
Continuous, clear communication is crucial
Self-starter,
- motivated, and proactive in
- solving
Enjoys exploring and testing different approaches, even in unfamiliar programming languages

Additional Skills, Experience and Qualifications

Framework Proficiency: Scikit-learn, XGBoost, Py
Torch (preferred) or Tensor
Flow for model implementation and experimentation.
MLOps Expertise: Model versioning, experiment tracking, model monitoring (MLflow, Weights & Biases), data monitoring and validation (Great Expectations, Prometheus, Grafana), and automated ML pipelines (Git
Hub CI/CD, Jenkins, Circle
CI, Git
Lab etc. ).
Statistical Modeling: Hypothesis testing, experimental design, causal inference, and Bayesian statistics.
Model Evaluation: Cross-validation strategies,
- variance analysis, and performance metric design.
Feature Engineering: Advanced techniques for text,
- series, and multimodal data.
Big Data Technologies: Spark (Py
Spark), Hadoop ecosystem, and distributed computing frameworks (DDP, TP, FSDP).
Cloud Platforms: AWS (Sage
Maker, S3, EMR), GCP (Vertex AI, Big
Query), or Azure ML.
Database Systems: No
SQL databases (Mongo
DB, Elasticsearch), graph databases (Neo4j), and vector databases (Pinecone, Milvus, Chroma
DB, FAISS).
Data Pipeline Tools: Airflow, Prefect, or similar orchestration frameworks.

By applying, I confirm I have read and accept Trans
Perfect's Privacy Policy: https://www.transperfect.com/about/data-privacy-recruiting.

Seniority level

Mid-Senior level

Employment type

Full-time

Job function

Engineering and Other

Industries

Translation and Localization, Software Development, and Technology, Information and Media

#J-18808-Ljbffr

Informações detalhadas sobre a oferta de emprego

Empresa:	TransPerfect
Localização:	Lisboa Lisboa, Lisboa, Portugal
Publicado:	14. 9. 2025 Vaga de emprego atual

Responder ao anúncio
Seja o primeiro a candidar-se à vaga de emprego oferecida!

Data Scientist (ML, Speech, NLP & Multimodal Expertise) Lisboa