HPC systems security issue: Schedule for reopening

Update (July 7, 2020)

here the updated schedule for systems that currently are still unavailable

System

Estimate for restart of operation Current status

Linux Cluster: Login nodes lxlogin8,9 

10.07.2020

offline

Linux Cluster IvyMUC

01.07.2020

online

Login node lxlogin10

10.07.2020

offline

Linux Cluster teramem1

03.07.2020

online

Housed Cluster (SLURM matum)

01.07.2020

online

Linux Cluster RVS (remote vis)

01.07.2020

online

SuperMUC-NG fat island

07.07.2020

online

Update (June 18, 2020)

While the CoolMUC-3 batch nodes have been returned to operation, the front end node lxlogin8 is not yet available. 
Job submission can be done from the CoolMUC-2 login nodes.#

The KCS cluster is not yet ready, we hope to open that system tomorrow.

Update (June 17, 2020)

Status

Reopening the systems for operation can only be done with additional precautions in place. We are currently
in the process of implementing the required measures.

For the SuperMUC-NG system, reopening the service is done in coordination with the other GCS sites.

Recommendations

Because it is possible that executable files have been modified for misuse of the computational resources, we strongly recommend recompiling your applications.

Schedule for Resumption of Services

The following table will be updated as the precise dates become known.

System

Date for restart of operation Current status
SuperMUC-NG Globus Online data services

17.06.2020

online

SuperMUC-NG login nodes and batch operation

16.06.2020 

online

Linux Cluster CoolMUC-2
(includes associated housed systems)

 

online

Linux Cluster CoolMUC-3
(includes associated housed systems)

18.06.2020

online

Changes to Usage Policies

Apart from the measures already implemented end of May, the following changes to the usage policies apply:

  • on the Linux cluster systems, no outgoing regular ssh connections are possible.
  • all authorized ssh keys must specify a "from" clause to limit the networks from which accesses are possible. This will be audited and enforced by LRZ.
  • the cryptographic algorithms used for validation will be limited to ecdsa and ed25519.

All users are strongly urged to read the updated secure shell documentation page, which describes all the technical details that need taken care of.

Zusätzliche Anmerkungen für LMU-Benutzerkennungen (28. Mai 2020)

Passwörter von LMU Benutzerkennungen ("LMU Campuskennungen") mit Berechtigung für den Linux-Cluster-Zugriff müssen, soweit 
dies nicht in den letzten Tagen erfolgt ist, über das Web-Interface https://login.portal.uni-muenchen.de/pwreset/ neu gesetzt werden.

Update (May 21, 2020)

As announced on May 13, 2020, we needed to close access to all HPC systems at LRZ (SuperMUC-NG and Linux Cluster) due to
a security incident. The public authorities were informed about this event. Hereby, we provide information about the first steps
of preparing readiness of operation.

Course of Events

The opportunity for attacking the systems arose from the combination of two factors:

  1. A number of compromised user accounts on external systems whose private SSH keys were configured with an empty
    pass phrase,
  2. A software bug that permits privilege escalation after regular login to the system.

The aims of the perpetrators are currently unknown. Up to now, we have not found any indications of concrete activities like
access to or manipulation of regular users' data. Should you find any evidence of tampering in your data, we ask you to
immediately report this to us so we can in turn inform the authorities.

Further measures

Because we cannot discount the possibility that the perpetraors have gained knowledge about other users' passwords and secure
shell key pairs, the following measures will be executed by us on short order:

  1. The SIM password of all user accounts with access permission to an HPC system at LRZ (Linux-Cluster, SuperMUC-NG,
    Cluster-Housing) will be invalidated on May 22, 15:00. All these users must reset their passwords via 
    https://idmportal.lrz.de/pwreset/.
  2. All secure shell keys of regular users stored on the system will be made invalid and therefore cannot be used for
    authentication any more. All users therefore will need to generate new key pairs; we point out that it is essential that
    the private key used for authentication from the users' workstation must not be configured with an empty pass phrase.

Furthermore, all users of the cluster systems are obliged to add key-specific "from" clauses to all their entries in
~/.ssh/authorized_keys, in order to limit access to the actually needed external systems. We will appropriately update the
secure shell documentation before user operation is resumed on our systems.

We reserve the decision to impose further limitations on the access mechanisms should the circumstances require it.

Currently, we cannot yet supply a date for resumption of operation. This document will be updated once this state of affairs changes.

(Initial report May 13)

Dear users of the HPC systems at LRZ,

due to a security issue we have temporarily closed access from the outside world to all HPC systems.


Verfasser: R. Bader
veröffentlicht: 2020-06-08