Team Lead
ML Engineer
We are looking for an experienced ML Engineer Lead to join our ML infrastructure team. The team is responsible for ML and Python infrastructure that are the backbone of our ML-based strategies.
Our infrastructure is written in Python, Go, and C++.

Software Development

Amsterdam, FULLTIME
Responsibilities
The team is responsible for ML Infrastructure, which includes:

– Building and maintaining pipelines for training models. Key tasks include:

  • Training models using deep learning frameworks such as PyTorch and JAX on our cluster;
  • Training models using machine learning libraries like CatBoost and XGBoost on our cluster.

– Optimizing model training. Key tasks include:

  • Profiling GPU, CPU, memory, and I/O during training to identify bottlenecks and optimization opportunities;
  • Debugging performance and quality issues in the training process, such as handling gradient explosions or addressing slowdowns when switching between FP32 and FP16 precision;
  • Monitoring and improving GPU utilization on the cluster.

– Building GPU data centers. Key tasks include:

  • Comparing and selecting GPUs based on the tasks and training workloads we run.
  • Debugging hardware stack issues. For example, addressing multinode DDP training bottlenecks related to interconnects between two GPU hosts.

Also the team is responsible for Python Infrastructure, which includes:

– Developing a library for executing task graphs on the cluster, with a focus on enhancing the user experience with writing, maintaining, and debugging these graphs.

– Building a library for working with the company’s data. Key tasks include:

  • Improving the user experience in data analytics and dataset creation;
  • Optimizing storage formats and data retrieval methods.

– Building datasets from the company’s data. Key tasks include:

  • Creating user-friendly interfaces for feature and target engineering;
  • Automating the generation and updating of datasets with new features and data;
  • Working dataset creation speed, for example, by optimizing Pandas/Polars/Spark queries.

– Supporting the Python simulator. Key tasks include:

  • Optimizing execution time;
  • Working closely with quantitative researchers to improve the realism of simulation and user experience.
Responsibilities
The team is responsible for ML Infrastructure, which includes:
– Building and maintaining pipelines for training models. Key tasks include:
  • Training models using deep learning frameworks such as PyTorch and JAX on our cluster;
  • Training models using machine learning libraries like CatBoost and XGBoost on our cluster.
– Optimizing model training. Key tasks include:
  • Profiling GPU, CPU, memory, and I/O during training to identify bottlenecks and optimization opportunities;
  • Debugging performance and quality issues in the training process, such as handling gradient explosions or addressing slowdowns when switching between FP32 and FP16 precision;
  • Monitoring and improving GPU utilization on the cluster.
– Building GPU data centers. Key tasks include:
  • Comparing and selecting GPUs based on the tasks and training workloads we run.
  • Debugging hardware stack issues. For example, addressing multinode DDP training bottlenecks related to interconnects between two GPU hosts.
Also the team is responsible for Python Infrastructure, which includes:
– Developing a library for executing task graphs on the cluster, with a focus on enhancing the user experience with writing, maintaining, and debugging these graphs.
– Building a library for working with the company’s data. Key tasks include:
  • Improving the user experience in data analytics and dataset creation;
  • Optimizing storage formats and data retrieval methods.
– Building datasets from the company’s data. Key tasks include:
  • Creating user-friendly interfaces for feature and target engineering;
  • Automating the generation and updating of datasets with new features and data;
  • Working dataset creation speed, for example, by optimizing Pandas/Polars/Spark queries.
– Supporting the Python simulator. Key tasks include:
  • Optimizing execution time;
  • Working closely with quantitative researchers to improve the realism of simulation and user experience.
Requirements
– At least 3 years of professional experience as a ML/Python/C++ Engineer;

– At least 1 year of professional experience as a Team Lead;

– Ability to manage a team of 3-4 engineers;

– Proficiency in Python;

– Experience in PyTorch/Jax;

– Experience contributing to projects with complex architecture;

– Linux expertise;

– Expertise in algorithms and data structures;

– Ability to work in a fast-paced environment and efficiently multi-task;

– Ability to communicate effectively within your team and with other teams.


Requirements
– At least 3 years of professional experience as a ML/Python/C++ Engineer;
– At least 1 year of professional experience as a Team Lead;
– Ability to manage a team of 3-4 engineers;
– Proficiency in Python;
– Experience in PyTorch/Jax;
– Experience contributing to projects with complex architecture;
– Linux expertise;
– Expertise in algorithms and data structures;
– Ability to work in a fast-paced environment and efficiently multi-task;
– Ability to communicate effectively within your team and with other teams.
Would be great if you had this
– Experience in developing high load services;

– Experience in SQL/Spark/Pandas/Numpy;

– Competency in Go/C++;

– Understanding of operating systems, networks, and performance optimization;

– Any experience in competitive programming contests (IOI, ICPC, hashcode) or CTFs.

Would be great if you had this
– Experience in developing high load services;
– Experience in SQL/Spark/Pandas/Numpy/PyTorch/Jax;
– Competency in Go/C++;
– Understanding of operating systems, networks, and performance optimization;
– Any experience in competitive programming contests (IOI, ICPC, hashcode) or CTFs.
Apply