Home > parsh

parsh

Parsh is a project mainly written in C, it's free.

Parallel Dependency Shell

PARSH README

============ INTRODUCTION

Parsh, the Parallel Dependency Shell, attempts to automate parallel execution of commands whenever possible, without any extra effort from script writers or generation programs.

As a brief architectural summary, Parsh has a central file hash table data structure and three main threads that operate on this structure: a parser, an evaluator, and a reaper. These threads all communicate with a central file hash table structure to determine which commands are runnable and which must wait.

The file hash is a hash table of files that are currently being accessed or will be accessed by processed commands. Each file has a single entry in the table, and each entry contains an "accessor queue," a queue of commands waiting for access to the file. When multiple commands need access to a single file, they are enqueued. The first command in the queue (or commands, if they are all reads) has access to the file and are considered "runnable," while the remaining are blocked.

The parser takes input from the user or a script, parses the commands from them, and adds them to the hash. At this stage, if a new command accesses a file that is not already in the hash table, a new file entry will be created. Otherwise the command will be enqueued in the command queue of an existing file entry.

The evaluator makes use of the concept of a "frontier," which is a list of commands in the file hash table that are runnable. The hash table has taken care of all matters of dependency and control flow, thus the evaluator only needs to mindlessly evaluate the next command in the frontier list.

The reaper catches signals sent by exiting processes, removes the appropriate commands from the file hash, and re-evaluates the dependency status of related commands, possibly marking them as runnable.

============ Dependencies

There are three types of dependencies in Parsh: 1) File dependencies 2) Control flow dependencies 3) Variable dependencies

File Dependencies: Commands attempting to access the same file are considered to collide if at least one is not a read access. The structure for a file dependency has the following fields:

inode #: primary way of identifying files, since with inode numbers there is no need to worry about links. absolute path: for resolving the inode # of a link, and as a secondary way of identifying a file when an inode # is not available (i.e. the file has yet to be created). read/write: access type to determine dependency

For file dependencies, each file is hashed into the file hash table. Each file entry contains the fields listed above. Each command that accesses a file (an "accessor") will enqueue itself in the file entry's accessor queue. In the case of multiple accessors, the head of the queue is runnable, while the rest are blocked. In the case consecutive read accessors are at the head of the queue, they are all simultaneously runnable.

Control Flow Dependencies: Some commands must know another's exit status to determine whether to run or not. These maybe a command inside an IF statement, or a command waiting to see if control flow reaches a BREAK. A simple IF-like dependency (IF, ||, UNTIL and the such) are handled solely within the dependency hash. A CONTINUE or BREAK dependency is more complicated, and thus requires its own data structure.

nest level: the true level the CONTINUE or BREAK statement belongs to. For example, a BREAK 2 nested 4 levels deep acts on other commands at nest level 2 or greater. parent: to follow the nest chain up. iteration: loop iteration. For nested loops, this is the iteration of the innermost loop. If the iteration of an outer loop is needed, the parent pointer should be followed.

Note that in order to establish a dependency, a command must be at or below the same nest level. For CONTINUE, to be at the same iteration is a requirement, while for BREAK the command needs to be at the same or a later iteration in the nest level of the BREAK command.

Control flow dependencies are handled internally in the file hash table. A control flow command will be considered an accessor of all the subcommands in it. Upon evaluating the branch decision, the appropriate subcommands will be inserted into the hash table.

For example, an IF command is considered an accessor of all files accessed in the conditional, if, and else part of the command. When the IF command becomes runnable, the conditional part is evaluated. Depending on the outcome, either the if or else part will become runnable.

Variable Dependencies: In Parsh, variables use static single assignment form, thus there is never a value collision. Rather, the only variable dependencies in Parsh will be a command waiting for a delayed value resolution. This will be explained in more detail in the variables section.

Variable dependencies are handled by the variable module. For the scope of this section, it will be sufficient to say that a variable whose value is not yet ready counts towards a accessor command's dependencies, and upon the value becoming available the accessor will be updated.

Previous:Rat