ALIs

kommt noch

Resource Limits

Description of constraints under which jobs run on the Cluster systems: maximum run times, maximum memory and other SGE-imposed parameters

Policies for interactive shells and interactive jobs

Limitations

  • On login shells, programs should not run for more than a few minutes. If possible, please start interactive runs with a nice value of at least 5, e. g.:

        nohup nice -n 10 ./my_program > prog.out 2> prog.err &
    

    will increase the nice value of my_program from the default of 0 to 10. At LRZ's discretion, too long-running jobs will be forcibly removed from the interactive nodes.

  • Overloading of interactive nodes with jobs may also lead to job termination by LRZ personnel, especially if memory consumption exceeds available resources.

  • Usage of the netscape/mozilla browsers is only allowed on interactive nodes; on all other nodes instances of netscape/mozilla are regularly removed by the LRZ surveillance system.

  • Usage of the cron system (e.g. via /usr/bin/crontab) as well as the /usr/bin/at or /usr/bin/batch commands is not allowed.

Resource limits for interactive jobs

Partition

Host Name

remarks

Run time limit (hours)

Memory limit (GBytes)

x86_64 interactive node (login shell)

lx64ia2

2 load balanced Opteron nodes

(4 sockets, 8 cores)

4

32 (shared)

EM64T interactive node (login shell)

lx64ia3

Intel Nocona (2 cores)

4

2 (shared)

Software licenses

Many commercial software packages have been licensed for usage on the cluster; most of these require the use of so-called floating licenses, only a limited amount of which are typically available. Since it is not possible to check whether a license is available before a batch job starts, LRZcannot provide any guarantees that an SGE job requesting use of such a license will run.

Policies for queued batch jobs

Scheduling

The scheduler assigns an initiation priority to all queued batch jobs; the priority value will increase while the job is waiting until the head of the queue is reached; as soon as the needed resources are available, the job will be started.

LRZ has also introduced user shares to prevent individual users from monopolizing the cluster. This means that if you used lots of cycles during the last few weeks and the cluster is very busy, your presently queued jobs may get started at a much lower rate until your share - as compared to other users' - has again dropped to the threshold value.

Jobs in Hold

Jobs in user hold may be removed by LRZ administrators if older than 8 weeks.

Memory use

Jobs exceeding the physical memory available on the selected node (set) will be removed at LRZ's discretion since such a usage typically has a negative impact on system stability.

Resource Limits

The following is an overview of the resource limits imposed for various classes of jobs. These are comprised of run time limits, and memory limits.

Job Type

Architecture

Remarks

Run time limit (hours)

Memory limit (GByte)

serial execution

4-way Opteron or Intel EM64T

Single core in a multi-core node.
Please specify the architecture via

 -l march=x86_64

If more than 2 GB are needed, please explicitly say-l mf=6gb (for e.g., 6 GB)

240

7.9

long running

serial execution

(at increased risk)

4-way Opteron Single core in a multi-core node. The SGE specifications

-l march=x86_64

-l h_rt=hh:mm:ss

with hour values larger than 240 must be specified. The remarks on memory usage from the entry above also apply here.

Warning:

  • only limited resources are available for this job type. So waiting times may become long.
  • there is increased danger of premature job termination due to hardware failures
336 (two weeks)

or

1344 (8 weeks)

7.9