Back to Navan jobs
Navan

Senior Machine Learning Engineer- LLMs & Self-Hosted AI

Navan
|SaaS
Tel Aviv
Engineering

Job Description

We are looking for a highly skilled Senior ML Engineer to lead our transition from third-party LLM APIs to a fully self-hosted ecosystem by fine-tuning high-performance, domain-specific models.

Our core product is an advanced, agentic support chatbot capable of complex reasoning, API tool calling, database lookups, and orchestrating specialized LLMs for specific tasks.

What You’ll Do:

  • Model Fine-Tuning: Design and execute fine-tuning strategies to improve model accuracy on specific domain tasks and tool-calling execution.
  • Agentic Workflows: Develop and refine the chatbot's agentic capabilities, ensuring reliable tool-use, routing, and interactions between massive LLMs and specialized SLMs.
  • Inference Optimization: Deploy and manage large-scale models using high-performance inference engines (like vLLM) to ensure low latency and high throughput for our agentic chatbot.
  • Rigorous Evaluation: Build comprehensive offline and online evaluation frameworks to constantly measure model performance and business impact through structured A/B testing.

What We’re Looking For:

Core Engineering & AI Frameworks

  • Deep experience with PyTorch and the Hugging Face ecosystem.
  • Strong Data Engineering skills: data manipulation, synthetic data generation, and active learning/margin-sampling.
  • High proficiency with AI-assisted development workflows (e.g., Claude Code, Cursor, Codex) to accelerate development.

LLMs & Agents

  • Strong fundamental understanding of LLM architectures, attention mechanisms, and generation parameters.
  • Hands-on experience building Agentic systems (ReAct, function/tool calling, RAG).
  • Expertise in fine-tuning strategies (e.g., SFT, RLHF, DPO) and parameter-efficient techniques (PEFT/LoRA).

Bonus Points

  • Alignment Techniques: Experience with RLHF and DPO strategies for future reasoning-model development.
  • Containerization & Orchestration: Experience with Ray for orchestrating large-scale model deployments across multi-GPU clusters.
  • Model Quantization: Experience with memory optimization techniques like AWQ, GPTQ, or GGUF to fit 70B models efficiently onto hardware.
  • API Development: Proficiency in building robust, asynchronous microservices using FastAPI to serve model requests.
  • Experience with core MLOps practices, including dataset versioning (e.g., DVC), experiment tracking (e.g., Weights & Biases, MLflow), and model registries.

About Navan

First seen: April 8, 2026
Last updated: June 16, 2026