Principal Software Engineer, Data Engineering

About Highspot

Highspot is pioneering the category that is fundamentally changing the way companies increase sales productivity. On a mission to transform the way millions of people work with sales enablement, Highspot is committed to building breakthrough software with a spark of magic. We believe a great place to work is about more than the work – it’s about what the company stands for, and how it authentically represents its values in the real world. To this end, we have put intentional focus on creating equitable workspaces for each of our employees. Our goal is to create a culture where everyone feels a deep sense of belonging and is empowered to be an agent of change, with the ability to transform themselves, their workplace, and their world.

About the Role

We are looking for a Principal Data Engineer to join our growing Data Platform team. This is a rare opportunity to define the technical vision for high-scale data products that power customer-facing analytics, intelligent AI agents, and core product capabilities.

Highspot's data engineering challenges are uniquely interesting: our platform generates rich, deeply nested document-oriented data from millions of enterprise interactions – content engagement, CRM activity, sales workflows – and our job is to make all of it available, trustworthy, and fast for radically different consumption patterns. Customer-facing analytics dashboards need pre-aggregated, low-latency reads. Scorecards and reports need flexible dimensional models. AI agents need fresh, contextually assembled data with retrieval characteristics that look nothing like traditional BI. You'll own the architecture that serves all three from a unified foundation.

As a Principal Data Engineer, you will shape our overarching data architecture direction and fundamentally influence our data strategy. Your impact will extend well beyond your immediate crew – acting as a vital bridge that coordinates directly with upstream software engineering teams (data producers) as well as downstream engineering and AI teams (data consumers) across multiple organizational pillars. You will lead the technical execution of a high-performing engineering crew, balancing deep architectural pipeline design with advanced data modeling, query optimization, and data trust.

What You'll Do

Architect the data platform – drive the technical direction for a scalable, reliable data platform built on a medallion architecture that serves customer-facing analytics, reporting, and agentic AI from a unified foundation.

Build and optimize ingestion pipelines – design robust CDC, real-time streaming (Kafka, Flink), and batch processing pipelines that transform complex, nested document-oriented operational data into clean analytical models at enterprise scale.

Tame schema complexity – build resilient ingestion and transformation layers that gracefully handle deeply nested, continuously evolving document schemas — deciding where to absorb complexity (ingestion, transformation, or query time) and making those tradeoffs explicit and sustainable.

Serve AI and analytics consumption patterns – architect data products that support both traditional BI workloads (pre-aggregated dashboards, dimensional models for scorecards and reports) and emerging AI consumption patterns (low-latency retrieval, contextual assembly, freshness-sensitive agent queries).

Own data quality, contracts, and observability – establish the data trust infrastructure that makes cross-team data consumption reliable: schema contracts with upstream producers, data quality monitoring, lineage tracking, freshness SLAs, and clear escalation paths when things break.

Drive cost-aware architecture – own Snowflake warehouse optimization, compute governance, and cost-efficient pipeline design. Build the practices and visibility so the team makes principled cost/performance tradeoffs rather than discovering them on the invoice.

Bridge producers and consumers – collaborate across organizational boundaries to align upstream software engineering teams and downstream analytics and AI teams around unified data strategies, shared contracts, and engineering standards.

Lead and grow the team – technically lead and growth-coach a diverse crew of data engineers. Champion best practices across the full spectrum of data engineering disciplines, from low-level pipeline architecture to sophisticated data modeling and analytical query performance.

Your Background

What will set you apart:

Demonstrated depth in building production data platforms that serve multiple consumption patterns – you've gone beyond traditional BI to support real-time product features, AI/ML workloads, or customer-facing analytics from the same data foundation.
Deep experience with the impedance mismatch between document-oriented operational stores and analytical systems – you've dealt with nested, schema-evolving source data (MongoDB, DynamoDB, or similar) and have opinions on where flattening and transformation should live.
Hands-on experience with data quality and trust at scale – you've built or operated schema registries, data contracts, quality monitoring, or lineage systems in an environment where multiple teams depend on shared data products.
Track record of cost-conscious data architecture – you've optimized Snowflake (or comparable) warehouse spend, designed compute governance policies, or re-architected pipelines to materially reduce cost without sacrificing reliability.
Strong instinct for the bridge role: you're as comfortable pushing back on an upstream team's schema change as you are negotiating freshness SLAs with a downstream AI consumer.

Foundations:

8+ years of professional software engineering experience, with significant time spent on distributed, data-intensive production systems – including substantial depth in data pipeline and platform architecture.
Deep hands-on expertise with modern data technologies: Snowflake, Apache Kafka, Apache Flink, and CDC tooling (Debezium or similar).
Experience developing and operating cloud data infrastructure at enterprise scale (AWS preferred), including infrastructure-as-code (Terraform) and CI/CD automation.
Strong programming skills in Python, Java, and SQL. You write production-grade code, not just scripts.
A track record of designing performant data models that support fast, efficient querying for analytical and product-facing use cases.
Strong cross-functional communication skills - you work effectively with software engineers, data scientists, AI teams, and business stakeholders across organizational boundaries.
Experience mentoring engineers and building collaborative, high-performing teams.

Job Description

About the Role

What You'll Do

Your Background

About Highspot