SuperMUC Petascale System

SuperMUC

SuperMUC is the name of the new supercomputer at Leibniz-Rechenzentrum (Leibniz Supercomputing Centre) in Garching near Munich (the MUC suffix is borrowed from the Munich airport code). With more than 155.000 cores and a peak performance of 3 Petaflop/s (=10^15 Floating Point Operations per second) in June 2012 SuperMUC is one of the fastest supercomputers in the world.

System purpose and target users

SuperMUC strengthens the position of Germany's Gauss Centre for Supercomputing in Europe by delivering outstanding compute power and integrating it into the European High Performance Computing ecosystem. With the operation of SuperMUC, LRZ will act as an European Centre for Supercomputing and will be Tier-0 centre of PRACE, the Partnership for Advanced Computing in Europe.

SuperMUC is available to all European researchers to expand the frontiers of science and engineering.

System overview

  • 155,656 processor cores in 9400 compute nodes
  • >300 TB RAM
  • Infiniband FDR10 interconnect
  • 4 PB of NAS-based permanent disk storage
  • 10 PB of GPFS-based temporary disk storage
  • >30 PB of tape archive capacity
  • Powerful visualization systems
  • Highest energy-efficiency

Energy Efficiency

SuperMUC uses a new, revolutionary form of warm water cooling developed by IBM. Active components like processors and memory are directly cooled with water that can have an inlet temperature of up to 40 degrees Celsius. The "High Temperature Liquid Cooling" together with very innovative system software promises to cut the energy consumption of the system. In addition, all LRZ buildings will be heated re-using this energy.

Why "warm" water cooling?

Typically water used in data centers has an inlet temperature of approx 16 degrees Celsius and, after leaving the system, an outlet temperature of approx. 20 degrees Celsius. To make water with 16 degrees Celsius requires complex and energy-hungry cooling equipment. At the same time there is hardly any use for the warmed-up water as it is too cold to be uses in any technical processes.

SuperMUC allows an increased inlet temperature. It is easily possible to provide water having up to 40 degrees Celsius using simple "free-cooling" equipment as outside temperatures in Germany hardly ever exceed 35 degrees Celsius. At the same time the outlet water can be made quite hot (up to 70 degrees Celsius) and re-used in other technical processes - for example to heat buildings or in other technical processes.

By reducing the number of cooling components and using free cooling LRZ expects to save several millions of Euros in cooling costs over the 5-year lifetime of the system.

SuperMUC_render_3

Figure: SuperMUC in the computer room

System Configuration Details

LRZ's target for the architecture is a combination of a large number of moderately powerful compute nodes, with a peak performance of several hundred GFlop/s each, and a small number of fat compute nodes with a large shared memory. The network interconnect between the nodes allows for perfectly linear scaling of parallel applications up to the level of more than 10,000 tasks.

SuperMUC consists of 18 Thin Node Islands and one Fat Node Island which is at first also used as the Migration System SuperMIG. Each Island contains more than 8,192 cores. All compute nodes within an individual Island are connected via a fully non-blocking Infiniband network (FDR10 for the Thin nodes / QDR for the Fat Nodes). Above the Island level, the high speed interconnect enables a bi-directional bi-section bandwidth ratio of 4:1 (intra-Island / inter-Island).

In June 2014 the Intel Xeon Phi based cluster SuperMIC has been integrated into SuperMUC. Details on SuperMIC can be found under http://www.lrz.de/services/compute/supermuc/supermic/.

The SuperMUC system will be expanded in 2015 by doubling the performance.

SuperMUC_Config

Figure: Schematic view of SuperMUC

Technical data

Installation PhasePhase 1Phase 2
Installation Date2012201120132014/2015

Item

Thin Node IslandsFat Node IslandMany Core IslandThin Node Islands
System IBM System x iDataPlex BladeCenter HX5    
Processor Types Sandy Bridge-EP
Intel Xeon E5-2680
8C
Westmere-EX
Intel Xeon E7-4870 10C
Ivy-Bridge (IvyB) and Intel Xeon Phi 5110P Intel Haswell
Number of Islands 18 1  1  
Nodes per Island 512 205  32  
Processors per Node 2 4

 2 (IvyB) 2.6 GHz + 2 Phi 5110P

 
Cores per Processor 8 10  8 (IvyB) + 60 (Phi)   
Cores per Node 16 40  16 (host) + 120 (Phi)  
Logical CPUs per Node (Hyperthreading) 32 80  32 (host) + 480 (Phi)   
Total Number of nodes 9216 205 32  
Total Number of cores 147,456 8200 4352  
Peak Performance [PFlop/s] 3.185 0.078 0.064 (Phi) 3.2
Linpack Performance [PFlop/s] 2.897 0.065  n.a.  
Total size of memory [TByte] 288 52  2.56  
Memory per Core [GByte]
(typically available for applications)
2
(~1.5)
6.4
(~6.0)
 4 (host) + 2 x 0.13 (Phi)  
Size of shared Memory per node [GByte] 32 256  64 (host) + 2 x 8 (Phi)  
Bandwidth to Memory per node [Gbyte/s] 102.4 136.4  Phi: 384  
Latency to local memory [ns (cylces)] ~ 50 (~135) ~70 (~170)    
Latency to remote memory [ns (cylces)] ~ 90 (~240) ~120 (~200)    
Level 3 Cache Size  (shared) [Mbyte] 20 24    
Level 2 Cache Size [kByte] 256 256  60 x 512  
Level 1 Cache Size [kByte], Associativity 32@ 8 way 32  32  
Level 3 Cache Bandwidth and Latency (shared) [byte/cycle] 1 x 32 @ 31 cycles 1 x 32    
Level 2 Cache Bandwidth and Latency  [byte/cycle]
Latency is much longer, if data are also in L1 or L2 of other core.
1 x 32 @ 12 cycles 1 x 32    
Level 1 Cache Bandwidth and Latency  [byte/cycle]
Latency is much longer, if data are also in L1 or L2 of other core.
2 x 16 @ 4 cycles 2 x 16    
Level 3 Cache line Size [Byte] 64 64    
Electrical power consumption of total system [MW] < 2.3    
Network Technology Infiniband FDR10  Infiniband FDR14  
Intra-Island Topology non-blocking Tree    
Inter-Island Topology Pruned Tree 4:1    
Bisection bandwidth of Interconnect [TByte/s] 35.6    
Filesystem for SCRATCH and WORK IBM GPFS
File System for HOME NetApp NAS
Size of parallel storage [Pbyte] 10   5
Size of NAS user storage [PByte] 1.5 (+ 1.5 for replication)    
Aggregated bandwidth to/from GPFS [GByte/s] 200   100
Aggregated bandwidth to/from NAS storage [GByte/s] 10   5
Login Servers for users 5    
Service and management Servers 12    
Batchsystem IBM Loadleveler
Archive and Backup Software IBM TSM
Capacity of Archive and Backup Storage [PByte] > 30

Details on processors

System Software

SuperMUC uses following software components:

From the user side a wide range of compilers, tools and commercial and free applications is provided. Many scientists also build and run their own software.

Storage Systems

SuperMUC has a powerful I/O-Subsystem which helps to process large amounts of data generated by simulations.

Home file systems

Permanent storage for data and programs is provided by a 16-node NAS cluster from Netapp. This primary cluster has a capacity of 2 Petabytes and has demonstrated an aggregated throughput of more than 10 GB/s using NFSv3. Netapp's Ontap 8 "Cluster-mode" provides a single namespace for several hundred project volumes on the system. Users can access multiple snapshots of data in their home directories.

Data is regularly replicated to a separate 4-node Netapp cluster with another 2 PB of storage for recovery purposes. Replication uses Snapmirror-technology and runs with up to 2 GB/s in this setup.

Storage hardware consists of >3400 SATA-Disks with 2 TB each protected by double-parity RAID and integrated checksums.

Work and Scratch areas

For highest-performance checkpoint I/O IBM's General Parallel File System (GPFS) with 10 PB of capacity and an aggregated throughput of 200 GB/s is available. Disk storage subsystems were built by DDN.

Tape backup and archives

LRZ's tape backup and archive systems based on TSM (Tivoli Storage Manager) from IBM are used for or archiving and backup. The have been extended to provide more than 30 Petabytes of capacity to the users of SuperMUC. Digital long-term archives help to preserve results of scientific work on SuperMUC. User archives are also transferred to a disaster recovery site.

Visualization and Support systems

SuperMUC will be connected to powerful visualization systems: the new LRZ office building houses a large 4K stereoscopic powerwall as well as a 5-sided CAVE artificial virtual reality environment.


See also: