ML Engineer

We are looking for an experienced ML Engineer to join our ML infrastructure team. The team is responsible for the ML pipelines and the related services and technologies that are the backbone of our ML-based strategies.
Our infrastructure is written in Python, Go, and C++.

Software Development

Amsterdam, FULLTIME
Responsibilities
The team's responsibilities include ML and Python infrastructure within the company.

– Building and maintaining pipelines for training models. Key tasks include:

  • Training models using deep learning frameworks such as PyTorch and JAX on our cluster;
  • Training models using machine learning libraries like CatBoost and XGBoost on our cluster.

– Optimizing model training. Key tasks include:

  • Profiling GPU, CPU, memory, and I/O during training to identify bottlenecks and optimization opportunities;
  • Debugging performance and quality issues in the training process, such as handling gradient explosions or addressing slowdowns when switching between FP32 and FP16 precision;
  • Monitoring and improving GPU utilization on the cluster.

– Building GPU data centers. Key tasks include:

  • Comparing and selecting GPUs based on the tasks and training workloads we run.
  • Debugging hardware stack issues. For example, addressing multinode DDP training bottlenecks related to interconnects between two GPU hosts.


Responsibilities
The team's responsibilities include ML and Python infrastructure within the company.
– Building and maintaining pipelines for training models. Key tasks include:
  • Training models using deep learning frameworks such as PyTorch and JAX on our cluster;
  • Training models using machine learning libraries like CatBoost and XGBoost on our cluster.
– Optimizing model training. Key tasks include:
  • Profiling GPU, CPU, memory, and I/O during training to identify bottlenecks and optimization opportunities;
  • Debugging performance and quality issues in the training process, such as handling gradient explosions or addressing slowdowns when switching between FP32 and FP16 precision;
  • Monitoring and improving GPU utilization on the cluster.
– Building GPU data centers. Key tasks include:
  • Comparing and selecting GPUs based on the tasks and training workloads we run.
  • Debugging hardware stack issues. For example, addressing multinode DDP training bottlenecks related to interconnects between two GPU hosts.
Requirements
– At least 3 years of professional experience as a ML/Python/C++ Engineer;

– Strong expertise in databases, data pipelines, and service development;

– Proficiency in Python;

– Experience contributing to projects with complex architecture and high load;

– Linux expertise;

– Expertise in algorithms and data structures;

– Ability to work in a fast-paced environment and efficiently multi-task;

– Ability to communicate effectively within your team and with other teams.

Requirements
– At least 3 years of professional experience as a ML/Python/C++ Engineer;
– Strong expertise in databases, data pipelines, and service development;
– Proficiency in Python;
– Experience contributing to projects with complex architecture and high load;
– Linux expertise;
– Expertise in algorithms and data structures;
– Ability to work in a fast-paced environment and efficiently multi-task;
– Ability to communicate effectively within your team and with other teams.
Would be great if you had this
– Experience in developing high load services;

– Experience in SQL/Spark/Pandas/Numpy/PyTorch/Jax;

– Competency in Go/C++;

– Understanding of operating systems, networks, and performance optimization;

– Any experience in competitive programming contests (IOI, ICPC, hashcode) or CTFs.

Would be great if you had this
– Experience in developing high load services;
– Experience in SQL/Spark/Pandas/Numpy/PyTorch/Jax;
– Competency in Go/C++;
– Understanding of operating systems, networks, and performance optimization;
– Any experience in competitive programming contests (IOI, ICPC, hashcode) or CTFs.
Apply