Ppss
From BITS wiki
|P|P|S|S| Distributed Parallel Processing Shell Script 2.97 ([1] )allows faster processing of repeated tasks using a shared input format. The command/script sis applied to every single file in the input folder using available resources. This is obviously an application for computer with sufficient cpu and IO capacity and is not meant for a single cpu computer.
|P|P|S|S| Distributed Parallel Processing Shell Script 2.97
usage: ppss [[ -d <sourcedir> | -f <sourcefile> ]] [[ -c '<command> "$ITEM"' ]] [[ -C <configfile> ]] [[ -j ]] [[ -l <logfile> ]] [[ -p <# jobs> ]] [[ -q ]] [[ -D <delay> ]] [[ -h ]] [[ --help ]] [[ -r ]] [[ --daemon ]] Examples: ppss -d /dir/with/some/files -c 'gzip ' ppss -d /dir/with/some/files -c 'cp "$ITEM" /tmp' -p 2 ppss -f <file> -c 'wget -q -P /destination/directory "$ITEM"' -p 10
ppss manpage
ppss --help |P|P|S|S| Distributed Parallel Processing Shell Script 2.97 PPSS is a Bash shell script that executes commands in parallel on a set of items, such as files in a directory, or lines in a file. Usage: /opt/biotools/bin/ppss [[ MODE ]] [[ options ]] Modes are optional and mainly used for running in distributed mode. Modes are: config Generate a config file based on the supplied option parameters. deploy Deploy PPSS and related files on the specified nodes. erase Erase PPSS and related files from the specified nodes. start Starting PPSS on nodes. pause Pausing PPSS on all nodes. stop Stopping PPSS on all nodes. continue Continuing PPSS on all nodes. node Running PPSS as a node, requires additional options. Usage /opt/biotools/bin/ppss [[ options ]] --command | -c Command to execute. Syntax: '<command> ' including the single quotes. Example: -c 'ls -alh '. It is also possible to specify where an item must be inserted: 'cp "$ITEM" /somedir'. --sourcedir | -d Directory that contains files that must be processed. Individual files are fed as an argument to the command that has been specified with -c. --sourcefile | -f Each single line of the supplied file will be fed as an item to the command that has been specified with -c. Read input from stdin with -f - --config | -C If the mode is config, a config file with the specified name will be generated based on all the options specified. In the other modes. this option will result in PPSS reading the config file and start processing items based on the settings of this file. --disable-ht | -j Disable hyper threading. Is enabled by default. --log | -l Sets the name of the log file. The default is ppss-log.txt. --processes | -p Start the specified number of processes. Ignore the number of available CPUs. --quiet | -q Shows no output except for a progress indication using percents. --delay | -D Adds an initial random delay to the start of all parallel jobs to spread the load. The delay (seconds) is only used at the start of all 'threads'. --daemon Daemon mode. Do not exit after items are professed, but keep looking for new items and process them. Read the manual how to use this! See --help for important additional options regarding daemon mode. --disable-inotify Linux users can use real-time inotify filesystem events when using daemon mode. Requires inotify-tools. Enabled by default if available. Automatically disabled if NFS is used as the daeon source dir. --no-traversal|-r By default, PPSS uses the regular 'find' command to list all files within the directory specified by the -d option. If you do not wish for PPSS to process files in sub directories, use this option. Only files within the specified directory will be processed. Any subdirectories will then be ignored. --email | -e PPSS sends an e-mail if PPSS has finished. It is also used if processing of an item has failed (configurable, see -h). --debug Enable debugging output to the |P|P|S|S| log file. --help Extended help, including options for distributed mode. The following options are used for distributed execution of PPSS. --master | -m Specifies the SSH server that is used for communication between nodes. Using SSH, file locks are created, informing other nodes that an item is locked. If items are files that must be processed, they must reside on this host. SCP is used to transfer files from this host to nodes for local procesing. --node | -n File containig a list of nodes that act as PPSS clients. One IP / DNS name per line. --key | -k The SSH key that a node uses to connect to the master. --known-hosts | -K The file that contains the server public key. Can often be found on hosts that already once connected to the server. See the file ~/.ssh/known_hosts or else, manualy connect once and check this file. --user | -u The SSH user name that is used by the node when logging in into the master SSH server. --script | -S Specifies the script/program that must be copied to the nodes for execution through PPSS. Only used in the deploy mode. This option should be specified if necessary when generating a config. --download This option specifies that an item will be downloaded by the node from the server or share to the local node for processing. --upload This option specifies that the output file will be copied back to the server, the --outputdir option is mandatory. --no-scp | -b Do not use scp for downloading items. Use cp instead. Assumes that a network file system (NFS/SMB) is mounted under a local mount point. --outputdir | -o Directory on server where processed files are put. If the result of encoding a wav file is an mp3 file, the mp3 file is put in the directory specified with this option. --homedir | -H Directory in which PPSS is installed on the node. Default is 'ppss-home'. --script | -S Script to run on the node. PPSS must copy this script to the node. --randomize | -R Randomise which items to process by the client in distributed mode. This makes sure that with many nodes, it is prevented that some clients spend all their time trying to get a lock on an item. Example: encoding some wav files to mp3 using lame: /opt/biotools/bin/ppss -c 'lame ' -d /path/to/wavfiles -j Running PPSS based on a configuration file. /opt/biotools/bin/ppss -C config.cfg Generating a configuration file. Wavs are converted to mp3. SCP is used for data transfer. /opt/biotools/bin/ppss config -C ppss-config.cfg -d /some/dir -o output --download --upload -K known_hosts \ -k ppss-key.dsa -n nodes.txt -m 10.0.0.100 \ -c 'lame --quiet "$ITEM" -o "$OUTPUT_DIR/$OUTPUT_FILE".mp3' Running PPSS on a client as part of a cluster. /opt/biotools/bin/ppss node -d /somedir -c 'cp "$ITEM" /some/destination' -m 10.0.0.50 -u ppss -k ppss-key.key
References:
[ Main_Page ]