MLOps Lead

We are looking for an experienced MLOps Lead to join our ML infrastructure team. The team is responsible for ML and Python infrastructure that are the backbone of our ML-based strategies.
Our infrastructure is written in Python, Go, and C++.

Software Development

Amsterdam, FULLTIME

All roles

Responsibilities

The team's responsibilities include ML and Python infrastructure within the company.

— Developing a library for executing task graphs on the cluster, with a focus on enhancing the user experience with writing, maintaining, and debugging these graphs;

— Building a library for working with the company’s data. Key tasks include:

Improving the user experience in data analytics and dataset creation;
Optimizing storage formats and data retrieval methods.

— Building datasets from the company’s data. Key tasks include:

Creating user-friendly interfaces for feature and target engineering;
Automating the generation and updating of datasets with new features and data;
Working on dataset creation speed, for example, by optimizing Pandas/Polars/Spark queries.

— Supporting the Python simulator. Key tasks include:

Optimizing execution time;
Working closely with quantitative researchers to improve the realism of simulation and user experience.

Also the team is responsible for ML Infrastructure, which includes:
— Building and maintaining pipelines for training models. Key tasks include:

Training models using deep learning frameworks such as PyTorch and JAX on our cluster;
Training models using machine learning libraries like CatBoost and XGBoost on our cluster.

— Optimizing model training. Key tasks include:

Profiling GPU, CPU, memory, and I/O during training to identify bottlenecks and optimization opportunities;
Debugging performance and quality issues in the training process, such as handling gradient explosions or addressing slowdowns when switching between FP32 and FP16 precision;
Monitoring and improving GPU utilization on the cluster.

— Building GPU data centers. Key tasks include:

Comparing and selecting GPUs based on the tasks and training workloads we run;
Debugging hardware stack issues. For example, addressing multinode DDP training bottlenecks related to interconnects between two GPU hosts.

Responsibilities

The team's responsibilities include ML and Python infrastructure within the company.
— Developing a library for executing task graphs on the cluster, with a focus on enhancing the user experience with writing, maintaining, and debugging these graphs;
— Building a library for working with the company’s data. Key tasks include:

Improving the user experience in data analytics and dataset creation;
Optimizing storage formats and data retrieval methods.

— Building datasets from the company’s data. Key tasks include:

Creating user-friendly interfaces for feature and target engineering;
Automating the generation and updating of datasets with new features and data;
Working on dataset creation speed, for example, by optimizing Pandas/Polars/Spark queries.

— Supporting the Python simulator. Key tasks include:

Optimizing execution time;
Working closely with quantitative researchers to improve the realism of simulation and user experience.

Also the team is responsible for ML Infrastructure, which includes:
— Building and maintaining pipelines for training models. Key tasks include:

Training models using deep learning frameworks such as PyTorch and JAX on our cluster;
Training models using machine learning libraries like CatBoost and XGBoost on our cluster.

— Optimizing model training. Key tasks include:

Profiling GPU, CPU, memory, and I/O during training to identify bottlenecks and optimization opportunities;
Debugging performance and quality issues in the training process, such as handling gradient explosions or addressing slowdowns when switching between FP32 and FP16 precision;
Monitoring and improving GPU utilization on the cluster.

— Building GPU data centers. Key tasks include:

Comparing and selecting GPUs based on the tasks and training workloads we run;
Debugging hardware stack issues. For example, addressing multinode DDP training bottlenecks related to interconnects between two GPU hosts.

Requirements

— At least 3 years of professional experience as a ML/Python/C++ Engineer;

— At least 1 year of professional experience as a Team Lead;

— Ability to manage a team of 3-4 engineers;

— Proficiency in Python;

— Experience in PyTorch/Jax;

— Experience contributing to projects with complex architecture;
— Linux expertise;

— Expertise in algorithms and data structures;

— Ability to work in a fast-paced environment and efficiently multi-task;

— Ability to communicate effectively within your team and with other teams.

Requirements

— At least 3 years of professional experience as a ML/Python/C++ Engineer;
— At least 1 year of professional experience as a Team Lead;
— Ability to manage a team of 3-4 engineers;
— Proficiency in Python;
— Experience in PyTorch/Jax;
— Experience contributing to projects with complex architecture;
— Linux expertise;
— Expertise in algorithms and data structures;
— Ability to work in a fast-paced environment and efficiently multi-task;
— Ability to communicate effectively within your team and with other teams.

Would be great if you had this

— Experience in developing high load services;

— Experience in SQL/Spark/Pandas/Numpy;

— Competency in Go/C++;

— Understanding of operating systems, networks, and performance optimization;

— Any experience in competitive programming contests (IOI, ICPC, hashcode) or CTFs.

Would be great if you had this

— Experience in developing high load services;
— Experience in SQL/Spark/Pandas/Numpy;
— Competency in Go/C++;
— Understanding of operating systems, networks, and performance optimization;
— Any experience in competitive programming contests (IOI, ICPC, hashcode) or CTFs.

What we offer

— Competitive compensation above the market with bonuses twice a year up to 50% of annual salary;

— Sophisticated internal training and development programs;

— Comprehensive health insurance;

— Reimbursement for sports activities;

— Engaging in corporate events twice a year;

— High level of influence and ownership of the process;

— Work closely with experienced team in a flat organizational structure.

What we offer

Apply

[{"lid":"1531306243545","ls":"10","loff":"","li_parent_id":"","li_type":"nm","li_ph":"NAME","li_nm":"Name"},{"lid":"1659964936844","ls":"20","loff":"","li_parent_id":"","li_type":"in","li_ph":"PHONE OR MESSENGER","li_name":"PHONE OR MESSENGER","li_nm":"PHONE OR MESSENGER"},{"lid":"1531306540094","ls":"30","loff":"","li_parent_id":"","li_type":"em","li_ph":"EMAIL","li_name":"EMAIL ADRESS","li_req":"y","li_nm":"EMAIL ADRESS"},{"lid":"1659964753903","ls":"40","loff":"","li_parent_id":"","li_type":"ta","li_ph":"MESSAGE","li_rows":"3","li_name":"MESSAGE","li_nm":"MESSAGE"},{"lid":"1659965360148","ls":"50","loff":"","li_parent_id":"","li_type":"uw","li_uwkey":"google-5acd400c04c6f758affaa07","li_req":"y","li_multiupl":"y","li_nm":"File"},{"lid":"1669986370980","ls":"60","loff":"","li_parent_id":"","li_type":"cb","li_label":"I hereby consent to the processing of the personal data, including special categories of personal data, that I provided, under terms and conditions as described in the <a href=\"http:\/\/pinely.tilda.ws\/privacy-policy\" target=\"_blank\" rel=\"noreferrer noopener\" style=\"color: rgb(255, 112, 10);\">Privacy statement<\/a>.","li_req":"y","li_nm":"Checkbox"},{"lid":"1669986441738","ls":"70","loff":"","li_parent_id":"","li_type":"cb","li_label":"I hereby explicitly consent to the retention of the personal data, that I provided, for up to 1 year for the purpose of my application for subsequent vacancies.","li_req":"y","li_nm":"Checkbox_2"},{"lid":"1674222650206","ls":"80","loff":"","li_parent_id":"","li_type":"cb","li_label":"I hereby explicitly consent to the transfer of the personal data, that I provided, to (employees of) NXT Capital and its affiliates and business partners, which serve to execute part or whole of the functions of the employer (recruitment, onboarding, employment), and which might reside outside the European Economic Area, in accordance with our <a href=\"http:\/\/pinely.tilda.ws\/privacy-policy\" style=\"color: rgb(255, 109, 5);\">Privacy statement<\/a>.","li_req":"y","li_nm":"Checkbox_3"}]