Back to all jobs
A
Principal Architect - Infrastructure
Remote US, USA
Remotefull-timeR&DJob Description
Aera Technology is a pioneer in the growing category of Decision Intelligence Platforms and a Leader in the Gartner® Magic Quadrant™ for 2026– the technology to digitize, augment, and automate decision-making processes with AI and machine learning. Through our AI decision automation platform, Aera Decision Cloud™, we are helping the best-known brands in the world make smarter, faster decisions.
Privately-held and VC-funded, we have a global team of over 400 Aeranauts – and we’re growing. We deliver Decision Intelligence innovation and services that enable enterprises to automate and scale decision making with accuracy and speed. We continue to be the trusted choice of market leaders for our proven ability to generate value and unlock opportunities that were previously unattainable.
As a Principal Architect, you will play a crucial role in developing the multi-cloud infrastructure to ensure the scalability, reliability, and security of our systems. You will work closely with cross-functional teams to drive innovation, streamline operations, and optimize our platform for performance and efficiency.
Responsibilities
- Architect and scale enterprise-grade AKS clusters built for high concurrency, performance, and real-time AI inference, ensuring the platform is globally distributed and highly available.
- Leverage Crossplane for Kubernetes-native provisioning of Azure services, creating a Kubernetes-native control plane for rapid scaling of AI services.
- Champion GitOps practices with Argo CD to standardize deployments across multiple environments and regions, enabling reliable, automated delivery of mission-critical SaaS workloads.
- Engineer infrastructure that supports data-intensive AI/ML pipelines, integrating compute, storage, and messaging with Kubernetes to power real-time decision intelligence use cases.
- Optimize scalability and concurrency with autoscaling, pod disruption budgets, and advanced workload scheduling, ensuring millions of daily requests are served with low latency.
- Develop and maintain automation, tooling, and integrations using Python, Ruby, and Terraform, enabling teams to scale infrastructure and AI services efficiently.
- Design and enforce secure, compliant, multi-tenant architectures with Azure AD SSO, managed identities, RBAC, and Key Vault integration.
- Build resilient networking topologies with VNets, VNet peering, Private Link, and service mesh technologies (e.g., Istio, Linkerd) and emissary ingress for advanced security and reliability.
- Integrate observability frameworks at scale using Prometheus, Grafana, Azure Monitor, and OpenTelemetry, providing deep visibility into performance, availability, and latency.
- Collaborate closely with AI/ML engineering teams to align infrastructure with real-time inference and streaming data requirements, enabling cutting-edge decision automation.
- Mentor engineering and operations teams while documenting and evangelizing Kubernetes-native and Azure-native best practices, driving innovation across the organization.
About You
- 10+ years of cloud infrastructure experience with expert-level skills in Kubernetes and Azure.
- Proven experience designing and operating multi-tenant SaaS platforms where performance, scalability, and security are critical.
- Hands-on expertise with Crossplane for Kubernetes-controlled Azure service provisioning.
- Deep familiarity with Azure services: AKS, AzureFlexible MySQL, Blob Storage, Event Hubs, Key Vault, etc.
- Strong coding and automation background with Github Actions, Python, and Terraform, plus experience with other high-level programming and scripting languages.
- Skilled in Infrastructure as Code (Terraform, Crossplane, Helm) and GitOps (Argo CD).
- In-depth knowledge of Kubernetes networking, autoscaling, and workload orchestration for AI/ML inference workloads.
- Proficiency with observability tooling: Prometheus, Grafana, Azure Monitor, and OpenTelemetry.
- A collaborative leader who thrives on mentoring and enabling teams, with excellent communication and documentation skills.
- Motivated to build the core infrastructure behind AI-powered decision intelligence at global scale, driving meaningful impact for some of the world’s most recognized brands.
Nice to Have
- Background with large-scale, real-time data streaming platforms.
- Prior collaboration on AI/ML infrastructure platforms or decision intelligence systems.
- Contributions to open-source projects, especially in Kubernetes or the cloud-native ecosystem.
About Aera Technology
First seen: April 10, 2026
Last updated: May 2, 2026