Back to Navan jobs

Senior Machine Learning Engineer- LLMs & Self-Hosted AI
Tel Aviv
EngineeringJob Description
We are looking for a highly skilled Senior ML Engineer to lead our transition from third-party LLM APIs to a fully self-hosted ecosystem by fine-tuning high-performance, domain-specific models.
Our core product is an advanced, agentic support chatbot capable of complex reasoning, API tool calling, database lookups, and orchestrating specialized LLMs for specific tasks.
What You’ll Do:
- Model Fine-Tuning: Design and execute fine-tuning strategies to improve model accuracy on specific domain tasks and tool-calling execution.
- Agentic Workflows: Develop and refine the chatbot's agentic capabilities, ensuring reliable tool-use, routing, and interactions between massive LLMs and specialized SLMs.
- Inference Optimization: Deploy and manage large-scale models using high-performance inference engines (like vLLM) to ensure low latency and high throughput for our agentic chatbot.
- Rigorous Evaluation: Build comprehensive offline and online evaluation frameworks to constantly measure model performance and business impact through structured A/B testing.
What We’re Looking For:
Core Engineering & AI Frameworks
- Deep experience with PyTorch and the Hugging Face ecosystem.
- Strong Data Engineering skills: data manipulation, synthetic data generation, and active learning/margin-sampling.
- High proficiency with AI-assisted development workflows (e.g., Claude Code, Cursor, Codex) to accelerate development.
LLMs & Agents
- Strong fundamental understanding of LLM architectures, attention mechanisms, and generation parameters.
- Hands-on experience building Agentic systems (ReAct, function/tool calling, RAG).
- Expertise in fine-tuning strategies (e.g., SFT, RLHF, DPO) and parameter-efficient techniques (PEFT/LoRA).
Bonus Points
- Alignment Techniques: Experience with RLHF and DPO strategies for future reasoning-model development.
- Containerization & Orchestration: Experience with Ray for orchestrating large-scale model deployments across multi-GPU clusters.
- Model Quantization: Experience with memory optimization techniques like AWQ, GPTQ, or GGUF to fit 70B models efficiently onto hardware.
- API Development: Proficiency in building robust, asynchronous microservices using FastAPI to serve model requests.
- Experience with core MLOps practices, including dataset versioning (e.g., DVC), experiment tracking (e.g., Weights & Biases, MLflow), and model registries.