ALIs

kommt noch

MPI Shell

mpiShell is a wrapper around MPI that allows end users to partition data among different processors in a cluster, run non-parallel applications (in parallel on the cluster) on each of the partitions, and combine results obtained from the applications working on each partition of data.

The mpiShell requires the MPI (Message Passing Interface) software to be installed for the mpiShell to work. Currently, you must initiate the compiled shell with the following command:

 mpirun -np [SIZE] -machinefile [MFname] mpiShell [f.pcf] 

  • SIZE number of processes you wish to execute
  • MFname name of File containing a list of Machines to execute the processes (see MPI documentation)
  • f.pcf name of process control file that contains instructions for each of the processes to execute (as defined below)
  • On the HLRBII a module is available, that facilitates the use:

    > module load mpish

    The mpish sciprt is then called by using:

    mpish [N] [file]

    The mpiShell reads a control file that directs the shells actions. The Process Control File [f.pcf] is broadcast from rank 0 to all processes. Each process reads and executes functions specified in this file. Note, if communication is not necessary (as in minor function "I" - instruction), then each process performs the function without additional communication. Also, there are two wildcard variables ($R, $S), that may be used in any string (file name or instruction to be processed by the shell). When the wildcard variables are used, they will be replaced with with either the rank number of the process that is processing this string, or the MPI-World-Size (number of processes).

    Sample Process Control Files

    In the following examples typical tasks are parallelized using mpish.

    Example 1

    Use 16 processors, broadcast several commands to all slaves, partition data and at the end collect all data to one file.

    
    > mpish 16 example1.mpish
    
    # example1.mpish
    BI mkdir /tmp/P8$R
    BI cd /tmp/P8$R
    BI cp /shared/testData.hd .
    BI ln testData.hd testPartition.hb
    
    BP 0 1 /shared/testData.db testData.db testPartition.db
    BI /tmp/bin/kMeans testPartition -train test
    BI /tmp/bin/kMeans testData -predict test
    
    GC 0   region_case.merge    merged_region_case.merge
    

    Example 2

    Use 5 processors, broadcast commands to all slaves, send a message to slave 0 to 4, broadcast command and finally collect all data in one file.

    
    > mpish 5 example2.mpish
    
    # example2.mpish
    BI mkdir /tmp/P8$R
    BI cd /tmp/P8$R
    BI cp /shared/testData.hd .
    BI ln testData.hd testPartition.db
    BI /shared/testData.db testData.db testPartition.db
    
    MI 0 /tmp/bin/cMeans testPartition -train test
    MI 1 /tmp/bin/cMeans testPartition -train test
    MI 2 /tmp/bin/kMeans testPartition -train test
    MI 3 /tmp/bin/kMeans testPartition -train test
    MI 4 /tmp/bin/cMeans testPartition -train test
    
    BI /tmp/bin/kMeans testData -predict test
    GC 0   region_case.merge    merged_region_case.merge
    

    Syntax

    The "Process Control File" [f.pcf] has the following functions with associated formats (first two columns of each record must contain the major and minor control functions, if last column is a '\', then the next line is concatenated to this line):

    .----------  major control function occurs in Col 1 and is defined as:
    |                    
    |		[" ", "#", "|", "."] - Comment   
    |
    |		["N", "n"] - NumberOfProcessors -  for validation,
    |				did user define all the processes (can be left out when using mpish module)
    |
    |		["B", "b"] - BroadCast - send to all processes (root#)
    |
    |		["M", "m"] - Message - send to specific process 
    |					(sender#, receiver#)
    |			
    |		["S", "s"] - Scatter data  (by Rows)
    |			
    |		["G", "g"] - Gather data (data will be ordered by rank) 	
    |			
    |		["X", "x"] - Exit execution (MPIFinalize) (or eof() of f.pcf)
    |			
    |   ------ Each of these functions are further defined below ------
    x
    
    
    
    .------- N - Major control funtion to define the number of processors
    |
    |  .---------- xxx Defines the number of processors, space before is optional
    |  |
    N xxx  
    
    
    		
    .------- B - broadcast major control function
    |
    |.----------  minor control function occurs in Col 2 and is defined as:
    ||                    			
    ||              ["B", "b"]	Barrier      			
    ||                    			
    ||		["I", "i"]	Instruction to be passed to "system" command
    ||				by all processes
    ||
    ||		["P", "p"]	Partitioning of a data file.  The data file
    ||				will be passed in Block File format, but each
    ||				record will now consists of two strings.  The
    ||				first is the rank # of receiving process and
    ||				the second is the original input record
    ||			Three partitioning modes (round-robin, block, random)
    ||			
    ||		["F", "f"]	Broadcast a file (in block format) to all ranks
    ||			
    ||		["T", "t"]	Time stamp - number of seconds since start
    ||			
    Bx
    
    
    .------- B - major control function for broadcast
    |.---------- B - minor control function - Barrier
    ||
    BB 
    
    .------- B - major control function for broadcast
    |.---------- I - minor control function - instruction for "system" command
    ||
    ||	 .-----  instruction to be processed by all processes
    ||       |
    BI xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx......
    
    .------- B - major control function for broadcast
    |.---------- P - minor control function - Partitioning of the data
    ||
    ||.------------- optional space
    ||| .-------------  process number of root of broad cast (integer 0..999)
    ||| |
    ||| | .-------------- required space after each field from here on
    ||| | |
    ||| | |.---------------- code to identify partitioning method
    ||| | ||
    ||| | ||    .------------ file name that root will read/broadcase
    ||| | ||    |         .------- file name, for each process to save entire file
    ||| | ||    |         |         .-- file name, each process's partition of file
    ||| | ||    |         |         |
    BP ### # xxxxxxxxx xxxxxxxxx xxxxxxxxx 
    
    .------- B - major control function for broadcast
    |.---------- F - minor control function - Broadcast file
    ||
    ||.-- optional space
    ||| .-------------  process number of root of broad cast (integer 0..999)
    ||| |
    ||| | .-- required space after each field from here on
    ||| | |
    ||| | |    .------------ file name that root will read/broadcase
    ||| | |    |
    ||| | |    |         .------- file name, for each process to save entire file
    ||| | |    |         |
    BF ### xxxxxxxxx xxxxxxxxx 
    
    .------- B - major control function for broadcast
    |.---------- T - minor control function - Time Stamp
    ||
    BT 
    
    
    
    .------- M - Message major control function
    |.----------  minor control function occurs in Col 2 and is defined as:
    ||                    			
    ||		["I", "i"]	Instruction to be passed to "system" command
    ||				by indicated processes
    ||			
    ||		["F", "f"]	"file" for sending files (in block format)
    ||			
    ||		["T", "t"]	Time stamp - number of seconds since start
    ||			
    Mx
    			
    
    .------- M - major control function for Message
    |
    |.---------- I - minor control function - instruction for "system" command
    ||
    ||  .----------- rank number of sending process
    ||  |   
    ||  |          .-----  instruction to be processed by receiving process
    ||  |          |
    MI ### xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx......
    
    .------- M - major control function for Message
    |
    |.---------- F - minor control function - Send file
    ||
    ||.------------- optional space
    |||
    ||| .-------------  process number of sender of (integer 0..999)
    ||| |
    ||| | .-------------- required space after each field from here on
    ||| | |
    ||| | | .-------------- process number of receiver (integer 0..999)
    ||| | | |
    ||| | | |      .------------ file name used by sender
    ||| | | |      |
    ||| | | |      |         .------- file name used by receiver
    ||| | | |      |         |
    MF ### ### xxxxxxxxx xxxxxxxxx 
    
    .------- M - major control function for Message
    |
    |.---------- T - minor control function - Time Stamp
    ||
    ||  .----------- rank number of process to generate time stamp
    ||  |   
    MT ### 
    
    
    
    .--------- S - major control function for Scatter (row-wize scatter)
    |
    |   .---------------- rank number of root process (scattering the data)
    |   |
    |   |  .---------------- code to identify partitioning method
    |   |  | 
    |   |  |      .------------ file name to be scattered
    |   |  |      |
    |   |  |      |            .----  file name of output file
    |   |  |      |            |
    S  ### # xxxxxxxxxxx  xxxxxxxxxx
    
    
    
    .--------- G - major control function for gather (ordered by rank)
    |
    |.------------ minor control funtion for gather
    ||
    ||            ["R", "r"] - row wise collection of data - records are appended 
    ||	      ["C", "c"] - col wise collectin of data - data is merged by columns
    ||
    ||  .---------------- rank number of root process (collecting the data)
    ||  |
    ||  |      .------------ file name to be gathered (source)
    ||  |      |
    ||  |      |            .----  file name of output file (destination)
    ||  |      |            |
    Gx ### xxxxxxxxxxx  xxxxxxxxxx
    
    
    
    .--------- X - major control function telling process to exit
    |
    X
    

    References: http://penguin.lhup.edu/~bwooley/software/mpiShellDoc.html