Machine Learning Performance Engineer

We’re looking for a performance-focused ML Engineer to help speed up large-scale model training by optimizing our internal stack and compute infrastructure. You’ll work across the full training pipeline — from GPU kernels to system-level throughput — applying profiling, CUDA-level tuning, and distributed systems techniques. The goal is to reduce training time, boost iteration speed, and use compute more efficiently.
This is a key role in a growing team building deep technical expertise in ML training systems.

software development

Amsterdam, fulltime

All roles

Responsibilities

— Optimize our model training pipeline to improve both speed and reliability, enabling faster and more efficient experimentation;

— Apply GPU-level optimization techniques using tools like JAX, Triton, low-level CUDA to improve training performance and efficiency at scale;

— Identify and resolve performance bottlenecks across the entire ML pipeline — from data loading and preprocessing to CUDA kernels;

— Build tools and extend internal infrastructure to support scalable, reproducible, and high-performance training workflows;

— Mentor and support engineers and researchers in adopting performance best practices across the team;

— Help grow the team’s GPU and systems-level capabilities, and contribute to a culture of engineering excellence and rapid experimentation.

Responsibilities

— Optimize our model training pipeline to improve both speed and reliability, enabling faster and more efficient experimentation;
— Apply GPU-level optimization techniques using tools like JAX, Triton, low-level CUDA to improve training performance and efficiency at scale;
— Identify and resolve performance bottlenecks across the entire ML pipeline — from data loading and preprocessing to CUDA kernels;
— Build tools and extend internal infrastructure to support scalable, reproducible, and high-performance training workflows;
— Mentor and support engineers and researchers in adopting performance best practices across the team;
— Help grow the team’s GPU and systems-level capabilities, and contribute to a culture of engineering excellence and rapid experimentation.

Requirements

— Demonstrated experience optimizing neural network training in production or large-scale research settings - e.g. reducing training time, improving hardware utilization, or accelerating feedback cycles for ML researchers;

— Extensive practical experience with ML frameworks such as PyTorch or JAX;

— Hands-on experience with training and optimizing deep learning architectures such as LSTM and Transformer-based models, including different attention mechanisms;

— Experience working with CUDA, Triton, or other low-level GPU technologies for performance tuning;

— Proficiency in profiling and debugging training pipelines, using tools such as Nsight/cprofiler/CUDA/gdb/torch profiler;

— Understanding of distributed training concepts (e.g. data/model/tensor/sequence/pipeline/context parallelism, memory and compute tradeoffs);

— A collaborative and proactive mindset, with strong communication skills and the ability to mentor teammates and partner effectively within the team;

— Strong proficiency in Python for building infrastructure-level tooling, debugging training systems, and integrating with ML frameworks and profiling tools;

Requirements

— Demonstrated experience optimizing neural network training in production or large-scale research settings - e.g. reducing training time, improving hardware utilization, or accelerating feedback cycles for ML researchers;
— Extensive practical experience with ML frameworks such as PyTorch or JAX;
— Hands-on experience with training and optimizing deep learning architectures such as LSTM and Transformer-based models, including different attention mechanisms;
— Experience working with CUDA, Triton, or other low-level GPU technologies for performance tuning;
— Proficiency in profiling and debugging training pipelines, using tools such as Nsight/cprofiler/CUDA/gdb/torch profiler;
— Understanding of distributed training concepts (e.g. data/model/tensor/sequence/pipeline/context parallelism, memory and compute tradeoffs);
— A collaborative and proactive mindset, with strong communication skills and the ability to mentor teammates and partner effectively within the team;
— Strong proficiency in Python for building infrastructure-level tooling, debugging training systems, and integrating with ML frameworks and profiling tools;

What we offer

— Competitive compensation above the market with bonuses twice a year up to 50% of annual salary;

— Sophisticated internal training and development programs;
— Comprehensive health insurance;

— Reimbursement for sports activities;

— Engaging in corporate events twice a year;

— High level of influence and ownership of the process;

— Work closely with experienced team in a flat organizational structure.

What we offer

— Competitive compensation above the market with bonuses twice a year up to 50% of annual salary;
— Sophisticated internal training and development programs;
— Comprehensive health insurance;
— Reimbursement for sports activities;
— Engaging in corporate events twice a year;
— High level of influence and ownership of the process;
— Work closely with experienced team in a flat organizational structure.

Apply

[{"lid":"1531306243545","ls":"10","loff":"","li_parent_id":"","li_type":"nm","li_ph":"NAME","li_nm":"Name"},{"lid":"1659964936844","ls":"20","loff":"","li_parent_id":"","li_type":"in","li_ph":"PHONE OR MESSENGER","li_name":"PHONE OR MESSENGER","li_nm":"PHONE OR MESSENGER"},{"lid":"1531306540094","ls":"30","loff":"","li_parent_id":"","li_type":"em","li_ph":"EMAIL","li_name":"EMAIL ADRESS","li_req":"y","li_nm":"EMAIL ADRESS"},{"lid":"1659964753903","ls":"40","loff":"","li_parent_id":"","li_type":"ta","li_ph":"MESSAGE","li_rows":"3","li_name":"MESSAGE","li_nm":"MESSAGE"},{"lid":"1659965360148","ls":"50","loff":"","li_parent_id":"","li_type":"uw","li_uwkey":"google-00cf703f120c6dcf8f01e81","li_req":"y","li_multiupl":"y","li_nm":"File"},{"lid":"1753697442435","ls":"60","loff":"","li_parent_id":"","li_type":"sb","li_title":"How did you hear about us?","li_variants":"Conference\nMath events\nML events\nProgramming events\nUniversities & Schools\nLinkedin\nSocial Media Platforms\nPersonal connection \nOnline Job Boards\nOther","li_selfirstvar":"Choose a variant from the list","li_req":"y","li_nm":"How did you hear about us?"},{"lid":"1669986370980","ls":"70","loff":"","li_parent_id":"","li_type":"cb","li_label":"I hereby consent to the processing of the personal data, including special categories of personal data, that I provided, under terms and conditions as described in the <a href=\"http:\/\/pinely.tilda.ws\/privacy-policy\" target=\"_blank\" rel=\"noreferrer noopener\" style=\"color: rgb(255, 112, 10);\">Privacy statement<\/a>.","li_req":"y","li_nm":"Checkbox"},{"lid":"1669986441738","ls":"80","loff":"","li_parent_id":"","li_type":"cb","li_label":"I hereby explicitly consent to the retention of the personal data, that I provided, for up to 1 year for the purpose of my application for subsequent vacancies.","li_req":"y","li_nm":"Checkbox_2"},{"lid":"1674222650206","ls":"90","loff":"","li_parent_id":"","li_type":"cb","li_label":"I hereby explicitly consent to the transfer of the personal data, that I provided, to (employees of) NXT Capital and its affiliates and business partners, which serve to execute part or whole of the functions of the employer (recruitment, onboarding, employment), and which might reside outside the European Economic Area, in accordance with our <a href=\"http:\/\/pinely.tilda.ws\/privacy-policy\" style=\"color: rgb(255, 109, 5);\">Privacy statement<\/a>.","li_req":"y","li_nm":"Checkbox_3"}]