
Remote DevOps & SRE Jobs- Kubernetes, Platform, AI Infrastructure
Remote DevOps, SRE, platform engineer, Kubernetes, Terraform, AWS, observability, GPU infrastructure, and AI infrastructure jobs. Updated daily.
DevOps and infrastructure engineers keep the systems running that power modern AI and tech products. These roles span platform engineering, site reliability engineering (SRE), cloud infrastructure, developer productivity, and AI infrastructure. Search demand often clusters around Kubernetes jobs, Terraform jobs, AWS platform engineer roles, remote SRE jobs, GPU infrastructure jobs, and production reliability for AI products.
Common toolchains include Kubernetes, Terraform, ArgoCD, Datadog, Prometheus, OpenTelemetry, and cloud platforms such as AWS, GCP, and Azure. Infrastructure engineers at AI companies often work on GPU cluster management, training job orchestration, high-throughput inference serving, and reliability for data pipelines.
Remote DevOps jobs are common because platform teams can support distributed engineering organizations across time zones. Senior candidates should expect interviews on production incident response, capacity planning, infrastructure as code, observability, networking, and tradeoffs between managed cloud services and custom systems.
Related Job Categories
Frequently Asked Questions
What is the difference between DevOps and SRE?
DevOps focuses on the culture and practices of integrating software development and operations - building CI/CD pipelines, automating deployments, and improving developer productivity. SRE (Site Reliability Engineering) applies software engineering to operations problems with an emphasis on reliability, availability, and managing production systems at scale. In practice the roles overlap significantly and many companies use the titles interchangeably.
What do infrastructure engineer jobs pay?
DevOps and infrastructure engineers typically earn $130,000 to $230,000 in base salary at tech companies. Senior SREs and platform engineers at top-tier AI companies (OpenAI, Anthropic, xAI) can earn $200,000 to $300,000+. Roles involving GPU infrastructure and ML systems command a premium.
Which DevOps skills are most useful for AI companies?
Kubernetes, Terraform, AWS or GCP, observability, Linux networking, and incident response remain core. AI companies also value GPU infrastructure, model serving, distributed training systems, queueing, storage performance, and experience operating high-cost compute workloads.
Are platform engineer jobs the same as DevOps jobs?
They overlap, but platform engineer jobs usually focus more on internal developer platforms, deployment systems, infrastructure abstractions, and paved-road tooling. DevOps and SRE roles can include the same work while also covering incident response, cloud operations, and reliability targets.