Back to Dexian Singapore Pte. Ltd. jobs
D
HPC System Engineer
Islandwide, Singapore
Contract, Full TimeInformation TechnologyJob Description
Key Responsibilities
- Design and develop compute cluster architectures optimized for performance, reliability, scalability, and serviceability within KLA systems.
- Define and validate server hardware configurations, including CPUs, GPUs, memory subsystems, storage, networking, and specialized accelerators.
- Analyze and optimize system-level performance across hardware and software layers, including CPU/GPU utilization, memory bandwidth, PCIe topology, NUMA architecture, and I/O performance.
- Collaborate with hardware, software, firmware, and systems engineering teams to ensure seamless integration of compute clusters into broader system architectures.
- Support server bring-up, hardware integration, diagnostics, benchmarking, stress testing, and root-cause analysis activities.
- Manage and troubleshoot enterprise server platforms, including BIOS/firmware configuration, BMC/IPMI management, thermal and power optimization, and hardware health monitoring.
- Participate in architecture reviews, integration planning, technical discussions, and cross-functional problem-solving sessions.
- Create and maintain technical documentation for hardware design decisions, validation procedures, deployment standards, and troubleshooting workflows.
Required Skills & Qualifications
- Strong experience in computer hardware and system architecture design, particularly in compute clusters, HPC environments, or enterprise server platforms.
- Deep understanding of modern CPU and GPU architectures, including multicore processing, NUMA, PCIe, memory hierarchy, and hardware-software interactions.
- Experience with GPU-accelerated systems and accelerator integration (e.g., NVIDIA GPU platforms, CUDA environments, or similar technologies).
- Hands-on experience with Linux system administration and OS customization (preferably SUSE Linux Enterprise Server).
- Familiarity with enterprise server management technologies such as BIOS/UEFI, BMC, IPMI, iDRAC, or similar remote management tools.
- Understanding of distributed systems, high-performance networking, and cluster infrastructure technologies such as InfiniBand, RDMA, or high-speed Ethernet.
- Experience with system performance tuning, hardware validation, benchmarking, and low-level troubleshooting.
- Strong analytical, documentation, and communication skills.
Preferred Qualifications
- Experience in high-performance computing (HPC), AI/ML infrastructure, or large-scale distributed compute environments.
- Familiarity with server hardware bring-up, failure analysis, thermal/power optimization, and reliability engineering.
- Exposure to hardware diagnostic and monitoring tools for server and cluster environments.
- Understanding of storage architectures, parallel file systems, and distributed storage solutions.
- Experience working in cross-functional engineering teams across hardware, firmware, and software domains.
- Test-driven and detail-oriented engineering mindset with strong problem-solving skills.
- Self-motivated individual with a proactive approach to continuous improvement and technical innovation.
About Dexian Singapore Pte. Ltd.
First seen: May 21, 2026
Last updated: June 15, 2026