Tutorial on parallel API usage

Description:

This test shows you how to set up a simulation using the Aspherix(R) C++ API in parallel.

Introduction:

The purpose of this second tutorial is to enhance the first basic API tutorial and allow the simulation to run in parallel. Again, the test case with three particles would not exactly benefit from an MPI parallelization we can easily demonstrate the difference between the old serial version and the one that includes the MPI parallelization. The concepts shown here will be usable also in much more complex scenarios.

If you have not read the basic API tutorial then head over to that page and read through it carefully. Everything that follows here assumes that you are familiar with that tutorial case and we will only discuss the differences between the two.

The parallel API example explained

Similar to the previous tutorial there will be exactly the same input script and mesh being used. The CMakeLists.txt is nearly unchanged except for changing the executable and project name from basic to basic_mpi. The basic.cpp is now called basic_mpi.cpp and that has some significant changes that will need to be reviewed in detail.

The basic_mpi.cpp file:

In the following we will go through the changes in the file compared to the basic.cpp file from the basic API tutorial.

// MPI include
#include <mpi.h>

The only new include required here is for mpi.h so that we can access some MPI specific functions.

// Initialize MPI communication
MPI_Init(&argc, &argv);

Immediately after the start of our main function we are required to initialize the MPI library. The MPI_Init function which is used for this case also requires the passing of the command line arguments.

In the previous tutorial we initialized our Aspherix object as follows:

// initialize Aspherix object without any arguments
Aspherix_API::Aspherix asx("");

This will not work as we need to complete the simulation and delete our Aspherix object before the end of our code. Thus, instead of allocating the variable on the stack we allocate it on the heap as a pointer:

// initialize Aspherix object passing all arguments from the command line and the MPI communicator
Aspherix_API::Aspherix *asx = new Aspherix_API::Aspherix(argc, argv, MPI_COMM_WORLD);

Note, this time we also pass three instead of one argument to the constructor. The first two are the arguments from the command line and the last is the MPI communicator that Aspherix(R) will use to communicate in parallel.

The execution part of the code is roughly the same, except that we have to use

asx->executeInputFile("basic.asx");

instead of

asx.executeInputFile("basic.asx");

since asx is now a pointer to an Aspherix object.

Finally, the next interesting part of the code is just before the end.

// Delete our asx variable, i.e. shut down Aspherix
delete asx;

Here we delete the asx pointer, i.e. we are closing Aspherix(R) and with that we are ending the simulation. It is important to perform this before the MPI_Finalize command shown below in order to avoid MPI related errors. This was also the reason why we used the new operator above instead of the previous stack allocation which would not allow us to call delete.

// Finalize MPI communication
MPI_Finalize();

Just before the end of the simulation MPI_Finalize is called to end the MPI communication.

Running the parallel API example

Linux:

After the compilation which is equivalent to the one shown in our previous tutorial case we can verify that the executable is where we expect it to be:

$ ls install/bin/
basic_mpi

To execute the code we now use mpirun with two processors:

$ mpirun -np 2 install/bin/basic_mpi

and watch the simulation output:

Aspherix (Version Aspherix 7.1.0, compiled 2026-03-12-13:20:23 by vagrant, git commit 3efa378ae0fe305d4e235e2e5c66d2244884a712)
Checkout of asx_solver OK.
Created orthogonal box = (-0.1 -0.1 -1) to (1 1 1)
  1 by 1 by 2 MPI processor grid

[... more output from the Aspherix(R) simulation ...]

Completed execution of input script. Next, we execute a single command.
Completed execution of input script. Next, we execute a single command.

[... more output from the second Aspherix(R) simulation step ...]

Completed execution of single command. Goodbye
Completed execution of single command. Goodbye

Note that directly after the Created orthogonal box output you can see how the processors are distributing the load in our domain. Additionally, the output in our basic_mpi.cpp file using std::cout can be seen twice. This is not surprising as MPI is running this code on each processor separately.

If you wish to avoid this duplicated output you can have a look below and perform a small change.

Modifying the parallel API example

As noted above the output is duplicated when using std::cout. To avoid this we want to restrict the output command to a single processor. To do so, we first need to know the id, or rank in MPI parlance, which can be obtained by

int rank = 0;
// Get the current processor id (also called rank)
MPI_Comm_rank(MPI_COMM_WORLD, &rank);

which can be done right after the MPI_Init or anywhere else in the code, before we use the rank variable. The integer rank is now equal to 0 for our first processor and 1 for our second. Thus, we modify our usage of std::cout to be

if (rank == 0)
    std::cout << "Completed execution of single command. Goodbye" << std::endl;

And we can do the same for our other output. Next, we compile the code again using make and make install and run it. The previously duplicated output should now be gone.

And a note on why we output on processor with rank 0 and not on the other. In case the program would run in serial mode, there will only be processor 0 available, so had we used rank == 1 there would be no output in that case.