Reflexive Concepts is seeking a skilled System Administrator to join our team!
The system administrator will provide High Performance Computing (HPC) support, including HPC-enhanced sustainment capabilities, to two geographically dispersed locations. These capabilities include Multi-vendor HPC servers, clusters, and SPD servers. Systems running Red Hat, CentOS, SUSE, and custom vendor-specific operating systems and system tools, this also includes storage dedicated and shared over InfiniBand and Ethernet. Capabilities also include High-speed shared parallel storage that utilizes LUSTRE or GPFS to provide a performant shared storage solution between two or more HPCs. An Interconnect service integrates HPC systems via a dedicated high-speed network that connects several storage appliances to dedicated HPC LNETs. The contractor must provide support for multi-vendor HPC systems and servers (cluster, SMP, MPP, and SPD FrontEnds) running various operating systems (Red Hat, CentOS, SUSE, IBM AIX) and specialized system software. Required support includes directly attached and mounted storage capabilities (FC SAN, Ethernet, and InfiniBand).
Qualifications:
- B.S. in a technical discipline and 10 years’ experience as a System Administrator in programs and contracts of similar scope, type, and complexity
- 5 additional years of experience may be substituted in lieu of a degree
Required:
- Linux (RHEL, CentOS, Rocky, SLES, Ubuntu [new])
- Experience with OS install, file system configuration, TCP/IP networking, configuration, operating system and application troubleshooting, Bash scripting, software compilation and installation
- Understanding of HPC architecture, knowledge about high-speed networks such as InfiniBand, Slingshot
- Familiarity with Jira, Confluence, Grafana, Prometheus, Nagios, Slurm, Git, Salt, Ansible
- Good troubleshooting skills– each system is slightly different, and there's no "one fix" for a particular problem
- Lustre file system configuration and administration, troubleshooting knowledge
- Experience with DDN Exascaler file system appliances
- TCP/IP networking knowledge, specifically storage fabrics
- Experience with Cisco and Juniper (Arista is also desired)
- Genuine curiosity/proactive effort to learn/grow in what comes next: E1000s, ESNs, DAOS, Weka
- Experience with benchmarking tools (e.g. IOR, iperf, FIO, lnet_selftest)
- DoD 8570 IAT II level certification required