AI Safety& Alignment Jobs

AI safety, alignment, and governance roles at Anthropic, DeepMind, OpenAI, and leading AI research labs. Work on RLHF, red teaming, mechanistic interpretability, and AI policy. $160k-$400k+ salaries.

Showing 0 of 0 jobs

AI safety is a rapidly growing field as governments, companies, and researchers work to ensure AI systems are safe, trustworthy, and beneficial. Roles span technical alignment research, red teaming and adversarial testing, AI governance and policy, and trust and safety engineering. Major employers include Anthropic, Google DeepMind, OpenAI, Meta's AI Safety team, and government bodies like the UK AI Safety Institute.

Technical AI safety roles involve mechanistic interpretability (understanding what happens inside neural networks), RLHF and Constitutional AI techniques, red teaming LLMs for harmful outputs, and building evaluation frameworks. Policy and governance roles focus on regulatory compliance, risk assessments, and shaping AI standards. Both tracks offer competitive compensation and work on some of the most important problems in technology.

Frequently Asked Questions

What kinds of roles exist in AI safety?

AI safety spans several distinct tracks: technical safety research (interpretability, alignment, robustness), red teaming and evaluation (adversarially testing models for harmful behaviors), trust and safety engineering (building content moderation and safety systems), and AI governance and policy (regulatory compliance, risk frameworks, standards bodies). Each track has different skill requirements.

What background is needed for AI safety jobs?

Technical AI safety roles typically require a strong ML background with Python and PyTorch, ideally with publication experience or significant research projects. Red teaming roles value creativity and security thinking alongside ML knowledge. Governance roles often come from law, policy, or social science backgrounds with domain knowledge of AI systems. All tracks benefit from clear written communication skills.