Senior Machine Learning Engineer- LLMs & Self-Hosted AI

We are looking for a highly skilled Senior ML Engineer to lead our transition from third-party LLM APIs to a fully self-hosted ecosystem by fine-tuning high-performance, domain-specific models.

Our core product is an advanced, agentic support chatbot capable of complex reasoning, API tool calling, database lookups, and orchestrating specialized LLMs for specific tasks.

What You’ll Do:

Model Fine-Tuning: Design and execute fine-tuning strategies to improve model accuracy on specific domain tasks and tool-calling execution.
Agentic Workflows: Develop and refine the chatbot's agentic capabilities, ensuring reliable tool-use, routing, and interactions between massive LLMs and specialized SLMs.
Inference Optimization: Deploy and manage large-scale models using high-performance inference engines (like vLLM) to ensure low latency and high throughput for our agentic chatbot.
Rigorous Evaluation: Build comprehensive offline and online evaluation frameworks to constantly measure model performance and business impact through structured A/B testing.

What We’re Looking For:

Core Engineering & AI Frameworks

Deep experience with PyTorch and the Hugging Face ecosystem.
Strong Data Engineering skills: data manipulation, synthetic data generation, and active learning/margin-sampling.
High proficiency with AI-assisted development workflows (e.g., Claude Code, Cursor, Codex) to accelerate development.

LLMs & Agents

Strong fundamental understanding of LLM architectures, attention mechanisms, and generation parameters.
Hands-on experience building Agentic systems (ReAct, function/tool calling, RAG).
Expertise in fine-tuning strategies (e.g., SFT, RLHF, DPO) and parameter-efficient techniques (PEFT/LoRA).

Bonus Points

Alignment Techniques: Experience with RLHF and DPO strategies for future reasoning-model development.
Containerization & Orchestration: Experience with Ray for orchestrating large-scale model deployments across multi-GPU clusters.
Model Quantization: Experience with memory optimization techniques like AWQ, GPTQ, or GGUF to fit 70B models efficiently onto hardware.
API Development: Proficiency in building robust, asynchronous microservices using FastAPI to serve model requests.
Experience with core MLOps practices, including dataset versioning (e.g., DVC), experiment tracking (e.g., Weights & Biases, MLflow), and model registries.

Job Description

About Navan