SuperMUC: phase 2 operation scheduled Nov 15
Dear users of SuperMUC,
due to the bring-up of the next-generation system SuperMUC-NG and restrictions on our electrical infrastructure, we need to limit the power intake of SuperMUC.
The following table describes the current state as well as the near term plans of measures implemented for power saving:
|August 23, 2018||All phase 2 (Haswell) nodes||The frequency is capped at 1.8 GHz. This means that even with an energy tag, processors on the system will clock at most this capping value. For the duration of this measure, maximum job execution time on the micro and general queues has been adjusted to 60 hours to compensate for slower execution speed.|
|August 24, 2018||All phase 1 (Sandy Bridge) nodes||The frequency is capped at 2.0 GHz. This means that even with an energy tag, processors on the system will clock at most this capping value. For the duration of this measure, maximum job execution time on the general and large queues has been adjusted to 70 hours to compensate for slower execution speed.|
|October 4, 2018||All phase 1 (Sandy Bridge) nodes||Phase 1 thin nodes will become unavailable for user operation on the late afternoon of October 4. The scheduler will be switched to draining mode in advance. The outage is expected to last for some weeks. We will update this document once we know the date for returning phase 1 to user operation. The fat island remains available for job processing.|
|October 19, 2018||File systems||The WORK and SCRATCH file systems are accessible again via the phase 2 login nodes.|
|October 22, 2018||Project status||Extension of all SuperMUC projects by two months without additional CPU hours which would normally have expired between 2018-10-17 and 2019-02-10|
|November 15, 2018||All phase 2 (Haswell nodes)||System operation is expected to restart again. Frequency is capped at 1.8 GHz as described above.|
|November 21, 2018||Phase 1 (Sandy Bridge) nodes||10 islands of phase 1 have been returned to user operation. Frequency is capped at 2.0 GHz as described above.|
Please note that these measures have consequences for job processing in that
- individual jobs may show reduced performance and therefore require more wall time to complete. You may need to adjust the frequency of checkpointing, or the job run time.
- overall job throughput will be reduced