Job Opening: Future Computing, System Administrator (f/m/d)
The Leibniz Supercomputing Centre (Leibniz-Rechenzentrum, LRZ) stands at the forefront of its field as a leadership-class IT service and computing user facility serving Munich’s top universities and colleges as well as research institutions in Bavaria, Germany, and Europe. As an institute of the Bavarian Academy of Sciences and Humanities, LRZ has provided a robust, holistic IT infrastructure for its users throughout the scientific community for nearly sixty years. It offers a complete range of resources, services, consulting, and support – from e-mail, web servers, and Internet access to virtual machines, cloud solutions, data storage, and the Munich ScientificNetwork (MWN).
Home to SuperMUC-NG, LRZ is part of Germany’s Gauss Centre for Supercomputing (GCS) and serves as part of the nation’s backbone for the advanced research and discovery possible through high-performance computing (HPC). In addition to current systems, LRZ plays a leading role in future-facing initiatives focusing on the evaluation of emerging exascale-class architectures and technologies, development of highly scalable artificial intelligence and machine learning, and system integration of quantum acceleration with classical supercomputing.
We have an opening for:
Future Computing, System Administrator (f/m/d)
The LRZ Future Computing Group explores supercomputing technologies and supports LRZ man-agement on decisions of future procurements for our supercomputing portfolio with deep bench expertise and knowledge of HPC and AI hardware, software, and workflows. This work extends from evaluations of current and commercial assets to future road-mapped technology with vendors via co-design relations and research activities. The FC team ensures and maintains an accurate benchmark suite reflecting current and predicted LRZ user demands, and provides recommenda-tions for programming and usage models to ensure users get the best performance with accepta-ble effort. The BEAST (the Bavarian Energy, Architecture, and Software Testbed) Program is the vehicle to drive most of these efforts, with continuous additions of new systems and accelerator components added per year. As a member of the Future Computing team, you will work at the very forefront of new innovative hardware technologies and software stacks from various vendors–some not yet publicly available–and you will help shape the future of computing at LRZ.
As the system administrator for the Future Computing team and closely related teams and efforts, your role is to wrangle the systems of the BEAST program. This includes experimental/research HPC and AI subsystems–including new accelerator technologies–through some production subsystems (ex. smaller HPC systems running quantum simulators) and ranges from single nodes to larger scales. Due to the nature of the systems and the group's mission, an adventurous spirit comfortable with change and pivoting, strong attention to detail, thoroughness, and anticipation of needs are paramount to success. As the BEAST master, you will be tightly integrated in the FC team as all members should know and share information and activity updates within the group with the goal to help and back up one another. As such, part of your time will be spent working on various topics around future software stacks, benchmarking, programming models, and putting your results out into the community (where applicable).
- Ensure access to the various systems in BEAST for LRZ internal staff and efforts as well as for selected users from university research partners
- Support interactive and batch usage models, including change in OS images and root access for users to explore kernels, drivers, and system-level software tools
- Troubleshoot hardware, system software, and user issues
- Serve as technical point of contact for vendor support
- Make recommendations on how to best integrate new BEAST components
- Manage the interface to the upstream LRZ network environment
- Maintain reference installation images for the systems in BEAST, able to run typical LRZ HPC workloads, using the Spack package system
- Help design, develop, and maintain a system able to automatically fetch, build, and run HPC benchmark codes for performance regression tests on the various BEAST systems, in the scope of toolchain and code changes. This should take reproducibility and archival of results into account and may be based on container techniques.
- Ensure the protection of vendor and partner proprietary or sensitive information, complying with information security policies and agreements (e.g. NDAs)
- Assist in guidance of new resources through LRZ's policy and process compliance workflows (I/SMS, IDM, service catalog inclusion), where applicable
- Proven experience and expertise in administrating HPC systems
- Bachelor's degree in computer science
- Proficiency in script programming (Bash, Perl, Python)
- Proficiency in low-level programming (C, C++)
- Experience in system administration for Linux (IPMI, NFSv4/Kerberos, networking, user management, containers)
- Interest and experience in new hardware technologies (CPUs, GPUs, Memory / Networks / Storage)
- Excellent written and spoken English
- Ability to communicate results clearly and comprehensively in reports, diagrams, etc.
- Ability to work across the organization and at different levels of management
- Ability to interact and represent LRZ towards users, vendors, and other HPC centers
- Strong desire to contribute your education, experience, energy, and enthusiasm to help build a dynamic and progressive team and share in its success
- Friendly, collegial, and positive personality with a strong drive to roll up your sleeves, get involved and get things done
- Proven experience and expertise in administrating non-production, research-oriented supercomputing systems
- Knowledge of system-side HPC environments (management/monitoring tools, SLURM)
- Proficiency in kernel-level programming
- Experience with git and CI/CD
- Interest in interdisciplinary codes and workflow methods
- Experience working with users with complex codes and applications
- Interest in mentoring student research projects and instruction in education/training offerings related to BEAST
What you can expect from LRZ
- Ample room for contributing and implementing your own ideas
- A smart, motivated, fun, and tightly coupled team with an important mission in which to join and of which to be part
- An organization that greatly values your contribution to our common success
We offer a multifaceted and intellectually stimulating position with flexible working hours and a family-friendly atmosphere in one of the largest and most innovative scientific data centres in Europe. You will work in a dynamic, collaborative, and innovative environment characterised by an excellent working atmosphere and creative leeway.
Salary and benefits are compensated according to the collective employment agreement of the German Federal States (Tarifvertrag der Länder, TV-L). Classification is based upon qualifications and assigned duties. LRZ operates flexible work schemes. Handicapped persons will be given preference to other equally qualified applicants. As an Institute of the Bavarian Academy of Sciences, we are an equal opportunity, affirmative action employer and strongly encourage applications from women, men, and non-binary alike, regardless of social or cultural background.
This full-time position will initially be limited until December 31, 2021, and is to be staffed immediately.
We look forward to receiving your complete application documents (including cover letter, CV, certificates, list of publications, and academic service record) in a single PDF file via e-mail by latest 19.01.2021:
Subject: FC-SYS (2021/04)
If you have open questions regarding this position, our colleagues are happy to answer them at the above e-mail address.
Click → Informationen über die Erhebung personenbezogener Daten for information regarding the EU General Data Protection Regulation and our application procedure.