enable_loadbalancing command

Purpose

Command for dynamic adjustment of the size of the processor sub-domains during the simulation for improved parallel performance.

Note

This command is supported by Aspherix GPU.

Syntax

enable_loadbalancing keyword arg ...
  • zero or more keyword/arg pairs may be appended

  • keyword = check_every_time or target_imbalance or id

check_every_time arg = time
    time = perform dynamic load balancing every this many time units
target_imbalance arg = threshold
    threshold = imbalance threshold that must be exceeded to perform a re-balance
id arg = command_id
    command_id = id of this command

Examples

enable_loadbalancing
enable_loadbalancing check_every_time 0.01 target_imbalance 1.1

Description

This command adjusts the size and shape of processor sub-domains within the simulation box, to attempt to balance the number of particles and thus the computational cost (load) evenly across processors. The load balancing is “dynamic” in the sense that rebalancing is performed periodically during the simulation. To perform “static” balancing, before or between runs, see the balance command. When using this command for a simulation with only a single processor, load balancing will not be enabled (a warning will be shown).

Load-balancing is typically most useful if the particles in the simulation box have a spatially-varying density distribution or where the computational cost varies significantly between different atoms. E.g. a model of a vapor/liquid interface, or a solid with an irregular-shaped geometry containing void regions, or hybrid pair style simulations which combine pair styles with different computational cost. In these cases, the Aspherix® default of dividing the simulation box volume into a regular-spaced grid of 3d bricks, with one equal-volume sub-domain per processor, may assign numbers of particles per processor in a way that the computational effort varies significantly. This can lead to poor performance when the simulation is run in parallel.

On a particular timestep, a load-balancing operation is only performed if the current “imbalance factor” in particles owned by each processor exceeds the specified target_imbalance parameter. The imbalance factor is defined as the maximum number of particles owned by any processor, divided by the average number of particles per processor. Thus an imbalance factor of 1.0 is perfect balance, the default value is 1.1.

As an example, for 10000 particles running on 10 processors, if the most heavily loaded processor has 1200 particles, then the factor is 1.2, meaning there is a 20% imbalance. Note that re-balances can be forced even if the current balance is perfect (1.0) be specifying a target_imbalance < 1.0.

Note

This command attempts to minimize the imbalance factor, as defined above. But depending on the method a perfect balance (1.0) may not be achieved.

Note

The imbalance factor is also an estimate of the maximum speed-up you can hope to achieve by running a perfectly balanced simulation versus an imbalanced one. In the example above, the 10000 particle simulation could run up to 20% faster if it were perfectly balanced, versus when imbalanced. However, computational cost is not strictly proportional to particle count, and changing the relative size and shape of processor sub-domains may lead to additional computational and communication overheads. Thus you should benchmark the run times of a simulation before and after balancing.


The rcb method invokes a “tiled” method for balancing, as described above. It performs a recursive coordinate bisectioning (RCB) of the simulation domain. The basic idea is as follows.

The simulation domain is cut into 2 boxes by an axis-aligned cut in the longest dimension, leaving one new box on either side of the cut. All the processors are also partitioned into 2 groups, half assigned to the box on the lower side of the cut, and half to the box on the upper side. (If the processor count is odd, one side gets an extra processor.) The cut is positioned so that the number of atoms in the lower box is exactly the number that the processors assigned to that box should own for load balance to be perfect. This also makes load balance for the upper box perfect. The positioning is done iteratively, by a bisectioning method. Note that counting atoms on either side of the cut requires communication between all processors at each iteration.

That is the procedure for the first cut. Subsequent cuts are made recursively, in exactly the same manner. The subset of processors assigned to each box make a new cut in the longest dimension of that box, splitting the box, the subset of processors, and the atoms in the box in two. The recursion continues until every processor is assigned a sub-box of the entire simulation domain, and owns the atoms in that sub-box.

When a “tiling” method is specified, the current domain partitioning is ignored, and a new partitioning is computed from scratch. As a result, the settings from the processors command are ignored.

Warning

For optimal load balancing with tiled methods, make sure the number of processors is a power of 2. Otherwise, the target_imbalance can never be reached.


The check_every_time setting determines how often a rebalance is performed. Per default the rebalancing is performed every 1000 time steps. If check_every_time is set to a value larger than 0 in the input script, then rebalancing will occur every check_every_time time units. Each time a rebalance occurs, a reneighboring is triggered, so check_every_time should not be too small. If check_every_time = 0, then rebalancing will be done every time reneighboring normally occurs, as determined by the the neighbor and neigh_modify command settings.

On rebalance steps, rebalancing will only be attempted if the current imbalance factor, as defined above, exceeds the target_imbalance setting.

Restart

No information about this fix is written to binary restart files.

Output

This command computes a global scalar which is the imbalance factor after the most recent rebalance and a global vector of length 3 with additional information about the most recent rebalancing. The 3 values in the vector are as follows (where ID is the id of the command):

  • id_ID[1]: max # of particles per processor

  • id_ID[2]: total # iterations performed in last rebalance

  • id_ID[3]: imbalance factor right before the last rebalance was performed

As explained above, the imbalance factor is the ratio of the maximum number of particles (or total weight) on any processor to the average number of particles (or total weight) per processor.

These quantities can be accessed by various output commands. The scalar and vector values calculated by this fix are “intensive”.

To visualize the processor distribution use the dump decomposition/vtk.

Restrictions

None.

Default

target_imbalance = 1.1, check_every_time = 1000 * simulation_timestep