remoty.work
D /100

Kog ( | GPU Engineer | Paris, France | REMOTE within a Europe-compatible timezone, one week per month onsite in Paris

Kog (https://kog.ai) logo
Kog (https://kog.ai)
Paris, France 🌐 Remote

Job Description

We are hiring a GPU Engineer to work on the fastest LLM inference engine on standard datacenter GPUs.
You would own low-level kernel work in CUDA/PTX or HIP/CDNA ISA, the monokernel pipeline, profiling infrastructure inside it, scaling to the frontier MoE models that run in production, and building our own agents that optimize kernels and inference autonomously.
We generate 3,000 tokens/s per request on 8x AMD MI300X and 2,100 on 8x NVIDIA H200, at batch size 1, FP16, no speculative decoding.
At batch size 1, the decode is GEMV, so it is memory bandwidth bound, and MBU is what counts.
We rewrote the whole hot path ourselves, from the assembly on the chip up to the Transformer we designed around it, with the full decode running as a single persistent GPU kernel.
Try it at https://playground.kog.ai
Showing your code is part of the process.
If you are outside a Europe-compatible timezone, relocation to one is required.
Apply: https://jobs.ashbyhq.com/kog/e3950334-a2a6-43cc-a744-df6c38683166
Questions, email me at [email protected]

📱 Want jobs like this daily? Join @remotywork on Telegram — top 5 scored remote jobs every weekday, no spam.

remoTy Weekly — Every Monday

Get the best remote jobs in your inbox

Curated remote jobs scored A–F. Ghost-job alerts. Market pulse. No spam — just signal.

Prefer instant updates? Join @remotywork on Telegram — daily top jobs, no inbox clutter.