Secure file transfer using dmscp2

Nowadays the secure copy command scp is the method of choice for secure authentication and file transfer, since the passwords as well as the data itself is encrypted.

However, the process of encrypting an decrypting the data requires considerable computational effort, which may limit the transfer rates to a value well below those achieved by a simple ftp transfer. On the other hand, in many cases the encryption of the transmitted (raw) data itself is not necessary, which makes it desirable to have a method combining secure authentication with fast, unencrypted data transfer.

Such a method is provided by the dmscp2 package developed by Manfred Stolle at Konrad-Zuse-Institut (ZIB), Berlin. It offers the additional advantage of being able to open multiple parallel data streams. This enables a more efficient use of network I/O buffers, which usually ensues considerable speedup. A schematic of the tool's functionality is shown below. Note that everything takes place under a normal user account, no special permissions or system services are necessary.

dmscp_diagram

Prerequisites and Installation

At LRZ, dmscp2 has been installed on the 64bit Linux cluster and on the HLRB2 System. Between LRZ hosts it is usually not recommended to use dmscp2, since the network infrastructure is already very fast.

In order to transfer data to external hosts, dmscp2 has to be installed on these systems. Before that, make sure that at least one TCP/IP port above 1000 is usable between the systems targeted for the data transfer! This information can usually be obtained form the remote site's security or firewall responsible. You can test whether port XX is open on remote_host by issuing

telnet remote_host XX

If the port is open, you will get a message like:

Trying 129.187.20.168...
telnet: connect to address
129.187.20.168: Connection refused

Otherwise, the output stops after the first line, since most firewalls swallow all packages to closed ports.

In order to transfer data to LRZ hosts, make sure /lrz/sys/bin is in your path on the LRZ machine. However, on the LRZ High-Performance systems, most ports are closed for security reasons. You can however contact LRZ support, asking for the GridFTP ports to be opened for your systems (i.e. the ones you submitted in the HLRB proposal). Out of these ports, the range 24000-25000 can conveniently be used for dmscp2.

To install dmscp2 on your system, make sure ksh is installed, download the package, and run

tar xzf dmscp2.1.2.0.tar.gz
cd dmscp2.1.2.0

By default, the binary and manual pages are installed in subdirectories of /usr/local. Edit the file configure to change this, then run

./configure --minport=24000 --maxport=25000

(this sets the default portrange to the hlrb2 open ports)

make; make install

Basic Usage

dmscp2 is commonly called with the following options

{-w|-r}

mandatory; either read from (-r) or write to (-w) the remote server.

-s <targethost>

remote host IP address or name

-u <user>

user name on remote host

-port <port>

high port to use (defaults to the --minport specified with ./configure)

-portrange <#ports>

if high port is busy, search this number of ports from there to find an open one (default: 0)

-l <localfile>

file on the local host

-f <remotefile>

file on the remote host

-streams <#streams>

number of network streams to use (default: 4).

Usually it is best to start with the default configuration and run a few speed tests, before transferring huge amounts of data. If the default gives you a speedup compared to -streams 1, try increasing the number of streams until the performance levels off. Please refrain from using an excessive number of streams (more than 32 or so), since this may cause delays for other users.

Using special options, dmscp2 also offers the capability to transfer entire directory trees, has diverse debug settings and can write logfiles on client and server. 

Please refer to the manpage or the Using_dmscp2 document for more information and examples.

Whether or not dmscp2 succeeds in increasing the speed of your data transfers is of great interest to us. This page's author (and future users!) would be thankful for any feedback regarding your experiences with this new tool.

Caveat: In order to read from the remote host, the ports specified with dmscp2 also have to be open on the local host! If the open port ranges do not overlap, you have to use dmscp2 -w from the remote host instead. Most firewalls are configured to quietly swallow packages to closed ports. This will cause dmscp2 to hang after the buffer assignment message.

Known Issues

This tool is still in beta, therefore we advise using it with a little care. Please consider the following:

  • In case of an error or a user-requested abort (Ctrl-C), especially when working with older ssh implementations, there may occasionally be lingering processes on local and/or remote host. These usually self-destruct after a few minutes. To prevent waste of resources, and blocked ports, please issue ps -e | grep dmscp some time after experiencing abnormal termination of the program, and remove all dmscp2 processes with kill -9

  • When writing logfiles on client or server, do not let the program unattended. We've seen cases where 50 GB(!) of logs were written within a few minutes...