AI Cloud Solution Architect & Engineer
About The Project
Joinurons Lab as an AI Cloud Solution Architect & Engineer – a unique hybrid role combining strategic solution design with hands‑on engineering execution. You’ll bridge the gap between client requirements and technical implementation, designing AI/ML architectures and then building them yourself using modern cloud infrastructure practices.
Our Focus: We specialize in serving Banking, Financial Services, and Insurance (BFSI) enterprise customers with stringent compliance, security, and regulatory requirements. You’ll work on mission‑critical AI/ML systems where security architecture, data governance, and regulatory compliance are paramount.
Duration: Part‑time long‑term engagement with project‑based allocations
Reporting: Direct report to Head of Cloud
Objective
- Architecture & Design: Gather requirements, design cloud architectures, calculate ROI, and create technical proposals for AI/ML solutions
- Engineering Excellence: Build production‑grade infrastructure using Ia
C, develop APIs and prototypes, implement CI/CD pipelines, and manage AI workload operations - Client Success: Transform business requirements into working solutions that are secure, scalable, cost‑effective, and aligned with AWS best practices
- Knowledge Transfer: Create reusable artifacts, comprehensive documentation, and architectural patterns that accelerate future project delivery
KPI – Architecture & Pre‑Sales
- Design and document 3+ solution architectures per month with comprehensive diagrams and specifications
- Achieve 80%+ client acceptance rate on proposed architectures and estimates
- Deliver ROI calculations and cost models within 2 business days of request
KPI – Engineering Delivery
- Deploy infrastructure through Ia
C (AWS CDK/Terraform) with zero manual configuration - Create at least 3 reusable Ia
C components or architectural patterns per quarter - Implement CI/CD pipelines for all projects with automated testing and deployment
- Maintain 95%+ uptime for production AI/ML inference endpoints
- Document architecture and implementation details weekly for knowledge sharing
KPI – Quality & Best Practices
- Ensure all solutions pass AWS Well‑Architected Review standards
- Deliver comprehensive documentation within 1 week of architecture completion
- Create simplified UIs/demos for Po
C validation and client presentations
Areas of Responsibility – Solution Architecture (40%)
- Elicit and document business and technical requirements from clients
- Design end‑to‑end cloud architectures for AI/ML solutions (training, inference, data pipelines)
- Create architecture diagrams, technical specifications, and implementation roadmaps
- Evaluate technology options and recommend optimal AWS services for specific use cases
Business Analysis
- Calculate ROI, TCO, and cost‑benefit analysis for proposed solutions
- Estimate project scope, timelines, team composition, and resource requirements
- Participate in presales activities: technical presentations, demos, and proposal support
- Collaborate with sales team on SOW creation and customer workshops
Strategic Planningh3>- Design for scalability, security, compliance, and cost optimization from day one
- Create reusable architectural patterns and reference architectures
- Stay current with AWS AI/ML services and emerging cloud technologies
Cloud Engineering & AI Infrastructure (60%)
- Build and maintain cloud infrastructure using AWS CDK (primary) and Terraform
- Develop reusable Ia
C components and modules for common patterns - Implement infrastructure for AI/ML workloads: GPU clusters, model serving, data lakes
- Manage compute resources: EC2, ECS, EKS, Lambda, Sage
Maker compute instances
Application Development
- Develop Python applications: Fast
API backends, data processing scripts, automation tools - Create prototype interfaces using Streamlit, React, or similar frameworks
- Build and integrate RESTful APIs for AI model serving and data access
- Implement authentication, authorization, and API security best practices
AI/ML Operations (MLOps)
- Deploy and manage AI/ML model serving infrastructure (Sage
Maker endpoints, containerized models) - Build ML pipelines: data ingestion, preprocessing, training automation, model deployment
- Implement model versioning, experiment tracking, and A/B testing frameworks
- Monitor model performance, inference latency, and system health metrics
Dev
Ops & Automation
- Design and implement CI/CD pipelines using Git
Hub Actions, Git
Lab CI, or AWS Code
Pipeline - Automate deployment processes with infrastructure testing and validation
- Implement monitoring, logging, and alerting using Cloud
Watch, Prometheus, Grafana - Manage containerization with Docker and orchestration with Kubernetes/ECS
Data Engineering
- Build data pipelines for AI training and inference using AWS Glue, Step Functions, Lambda
- Design and implement data lakes using S3, Lake Formation, and data cataloging
- Implement automated and scheduled data synchronization processes
- Optimize data storage and retrieval for ML workloads
Security & Compliance
- Implement cloud security best practices: IAM, VPC design, encryption, secrets management
- Build enterprise security and compliance strategies for AI/ML workloads
- Ensure solutions meet regulatory requirements (PCI‑DSS, GDPR, SOC2, MAS TRM, etc where applicable)
- Conduct security reviews and implement remediation strategies
Cost & Performance Optimization
- Optimize cloud spend for compute‑intensive AI workloads
- Implement spot instance strategies, auto‑scaling, and resource scheduling
- Monitor and optimize GPU utilization, inference latency, and throughput
- Perform cost analysis and implement cost‑saving measures
Operations & Support
- Implement disaster recovery procedures for AI models and training data
- Manage backup strategies and business continuity planning
- Troubleshoot and resolve production issues in AI infrastructure
- Provide technical guidance to project teams during implementation
Skills – Cloud Architecture & Design
- Strong solution architecture skills with ability to translate business requirements into technical designs
- Experience in Well Architected review and remediation
- Deep understanding of AWS services, particularly compute, storage, networking, and AI/ML services
- Experience designing scalable, highly available, and fault‑tolerant systems
- Ability to create clear architecture diagrams and technical documentation
- Cost modeling and ROI calculation capabilities
Technical Leadership
- Comfortable leading technical discussions with clients and stakeholders
- Ability to guide engineers and share knowledge effectively
- Strong problem‑solving and analytical thinking skills
- Experience with architectural decision‑making and trade‑off analysis
Programming & Development
- Advanced Python programming: object‑oriented design, async programming, testing
- API development with Fast
API, Flask, or similar frameworks - Frontend development basics: React, etc. (for prototypes and demos with AI code generation tools)
- Shell scripting for automation and deployment
- Git version control and collaborative development workflows
Infrastructure as Code
- AWS CDK (required) – Cloud
Formation experience is valuable - Terraform (highly preferred) for multi‑cloud or hybrid scenarios
- Understanding of Ia
C best practices: modularity, reusability, testing - Experience with infrastructure testing and validation frameworks
AI/ML Infrastructure
- Hands‑on experience with AWS Sage
Maker: training jobs, endpoints, pipelines, notebooks - Understanding of ML lifecycle: data preparation, training, deployment, monitoring
- Experience with GPU management and optimization for training/inference
- Knowledge of containerization for ML models (Docker, container registries)
- Familiarity with ML frameworks: Py
Torch, Tensor
Flow, Lang
Chain, Llama
Index, etc
Dev
Ops & Automation
- CI/CD pipeline design and implementation (Git
Hub Actions, Git
Lab CI, AWS Code
Pipeline) - Container orchestration: Docker, Kubernetes, Amazon ECS
- Configuration management and deployment automation
- Monitoring and observability: Cloud
Watch, Prometheus, Grafana, ELK stack
Communication & Collaboration
- Excellent written and verbal communication Advanced English
- Ability to explain complex technical concepts to non‑technical stakeholders
- Comfortable with client‑facing presentations and technical demos
- Strong documentation skills with attention to detail
- Collaborative mindset with ability to work across functional teams
Problem‑Solving
- Advanced task breakdown and estimation abilities
- Debugging and troubleshooting complex distributed systems
- Performance optimization and tuning
- Incident response and root cause analysis
Experience – Cloud Engineering & Architecture
- 5+ years in cloud engineering, Dev
Ops, or solution architecture roles - 3+ years hands‑on experience with AWS services and architecture
- Proven track record of designing and implementing cloud solutions from scratch
- Experience with both greenfield projects and cloud migration initiatives
Experience – AI/ML Infrastructure
- 2+ years working with AI/ML workloads on cloud platforms
- Hands‑on experience deploying and managing ML models in production
- Experience with GPU‑based compute for training or inference
- Understanding of AI/ML infrastructure challenges and optimization techniques
Experience – Infrastructure as Code
- 3+ years building infrastructure using Ia
C tools (AWS CDK, Terraform, Cloud
Formation) - Experience creating reusable Ia
C modules and components - Track record of infrastructure automation and standardization
Experience – Software Development
- 4+ years programming experience in Python (required)
- Experience building APIs with Fast
API, Flask, or similar frameworks - History of creating prototypes, MVPs, or Po
C applications - Comfortable with full‑stack development for demos and prototypes
Experience – Dev
Ops & Automation
- 3+ years implementing CI/CD pipelines and deployment automation
- Experience with containerization (Docker) and orchestration (Kubernetes/ECS)
- Linux/UNIX system administration experience
- Monitoring and observability implementation
Experience – Client‑Facing Work
- Experience gathering requirements and translating them into technical solutions
- History of presenting technical architectures to clients and stakeholders
- Participation in presales activities, demos, or technical workshops
- Ability to work directly with customers to solve complex problems
Experience – Industry (Preferred)
- Consulting or professional services background
- Experience in regulated industries (Fin
Tech, Insurance, Banks) - Work with enterprise clients on large‑scale implementations
- Startup or fast‑paced environment experience
Seniority level
C components and modules for common patterns
Maker compute instances
API backends, data processing scripts, automation tools
Maker endpoints, containerized models)
Ops & Automation
Hub Actions, Git
Lab CI, or AWS Code
Pipeline
Watch, Prometheus, Grafana
API, Flask, or similar frameworks
Formation experience is valuable
C best practices: modularity, reusability, testing
Maker: training jobs, endpoints, pipelines, notebooks
Torch, Tensor
Flow, Lang
Chain, Llama
Index, etc
Ops & Automation
Hub Actions, Git
Lab CI, AWS Code
Pipeline)
Watch, Prometheus, Grafana, ELK stack
Ops, or solution architecture roles
C tools (AWS CDK, Terraform, Cloud
Formation)
C modules and components
API, Flask, or similar frameworks
C applications
Ops & Automation
Tech, Insurance, Banks)
Mid‑Senior level
Employment type
Part‑time
Job function
Engineering and Information Technology
We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.
- Informações detalhadas sobre a oferta de emprego
Empresa: Neurons Lab Localização: Lisboa
Lisboa, Lisboa, PortugalPublicado: 4. 12. 2025
Vaga de emprego atual
Seja o primeiro a candidar-se à vaga de emprego oferecida!