Hardware Description of HLRB II

The HLRB II is based on SGI's Altix 4700 platform. The system installed at LRZ
is optimized for high application performance and high memory bandwidth.

Photograph of the HLRB II system

The following table provides an overview of the hardware and characteristics of the HLRB II.

Overall Characteristics for both installation phases 

 

 

Phase 1
(until 03/2007)

Phase 2
(since 04/2007)

 

Total number of cores

4096

9728

 

Peak Performance of the entire system

26.2 TFlop/s

62.3 TFlop/s

  Linpack Performance 24.5 TFlop/s 56.5 TFlop/s

 

Total size of memory for entire system

17.5 TByte

39 TByte

 

Direct Attached Disks

300 TByte

600 TByte

 

Network Attached Disks

40 TByte

60 TByte

Granularity

 

Number of compute partitions

16

19  (13+ 6 with high density blades) 

 

Number of cores per compute partition

256

512

 

Number of blades (memory channels) per compute partition

256

128 (high density) or 256

 

Number of cores per socket

1

2

 

Number of cores per blade

1

2 or 4 (high density blades)

Processor

  Processor type Intel Itanium2 Madison 9M Intel Itanium2 Montecito Dual Core

 

Clock rate

1.6 GHz

1.6 GHz

 

Number of Floating Point Operations per clock

4   (=2 FMAs)

4   (=2 FMAs)

 

Peak performance of a socket

6.4 GFlop/s

12.8 GFlop/s

 

Max. number of Instructions per clock tick

6

12 (6 per Core)

 

Peak number of instructions per second of a socket (Gip/s)

9.6 Gip/s

19.2 Gip/s (9.6 per Core)

 

Number of  FP Registers

128

256 (128 per core)

Memory

 

Memory per core

4 GByte (8 GByte on interactive node)

4 GByte per Core
(1st socket in Partition contains 16 GByte)

 

Clock rate of frontside bus (FSB)

533 MHz

533 MHz

 

Peak bandwidth to local memory

8.5 GByte/s per core

8.5 GByte/s shared between 2 or  4 cores (density blades) 

 

Total bandwidth to local memory of the entire system

34816 GByte/s

34816 GByte/s

 

Latency to local memory

approx. 210 cycles

approx. ??? cycles

Memory Hierarchy

 

L1 Data Cache (not used for floating point data)

 

 

               size

16 kByte

16 kByte

 

               cacheline size

64 Byte

64 Byte

 

               associativity

4-way

4-way

 

               latency

1 cycle

1 cycle

 

               Bandwidth

25.6 GByte/s

25.6 GByte/s

 

L2 Data Cache (per core)

 

 

               Size

256 kByte

256 kByte

 

               Cacheline size

128 Byte

128 Byte

 

               Associativity

8-way

8-way

 

               min. Latency

INT: 5 cycles,
FP: 6 cycles

INT: 5 cycles,
FP: 6 cycles

 

               Bandwidth

51.2 GByte/s (FP)  (+25.6 GByte/s (INT))

51.2 GByte/s (FP)  (+25.6 GByte/s (INT))

 

               Data banks

16 Bytes/bank

16 Bytes/bank

 

L2 Instr. Cache (per core)

 

 

               Size

n/a

1 MByte

 

L3 Cache (per core)

 

 

               Size

6 MByte

9 MByte

 

               Cacheline size

128 Byte

128 Byte

 

               Associativity

12-way

12-way

 

               min. Latency

14 cycles

14 cycles

 

               Bandwidth

51.2 GByte/s 

51.2 GByte/s 

 

               Fill  Bandwidth

128 Byte in 4 cycles 

128 Byte in 4 cycles 

 

L2 Data TLB

 

 

               Entries

128

128

 

               Latency

30 cycle penalty for TLB miss

30 cycle penalty for
TLB miss

Internal Interconnect

 

Connection network type

NUMAlink 4

NUMAlink 4

 

Number of  (bidirectional) links per blade

2

2

 

Bandwidth of one link (bidirectional)

6.4 GByte/s

6.4 GByte/s

 

MPI latency

1-5  µs

1-5 µs

Disks

 

Direct attached disks

 

   

 

              Characteristics

few, but large files; high bandwidth;
Pseudo Temporary Files 

few, but large files; high bandwidth
Pseudo Temporary Files,
Temporary Project Files
 

              Size

300 TByte

600 TByte

 

              aggr. bandwidth
              to disks

20 GByte/s

40 GByte/s

 

Networked attached disks (Home Directories)

30 TByte 60 TByt
 

              Characteristics

many, but small files; high transaction rate  many, but small files; high transaction rate 

 

              Size

40 TByte

60 TByte

 

              bandwidth to
              disks

600 MByte/s

800 MByte/s

Environment

 

Footprint

24 m x 12 m

24 m x 12 m

 

Total weight

103 metric tons

103 metric tons

 

Total electrical power

~1000 kVA

~1100 kVA

See also: SGI's fact sheet