3.5.2 Parallel execution in Abaqus/Standard

Products: Abaqus/Standard  Abaqus/CAE  

Overview

Parallel execution in Abaqus/Standard:

  • reduces run time for large analyses;

  • is available for shared memory computers and computer clusters for the element operations, direct sparse solver, and iterative linear equation solver; and

  • can use compute-capable GPGPU hardware on shared memory computers for the direct sparse solver.

Parallel equation solution with the default direct sparse solver

The direct sparse solver (Direct linear equation solver, Section 6.1.5) supports both shared memory computers and computer clusters for parallelization. On shared memory computers or a single node of a computer cluster, thread-based parallelization is used for the direct sparse solver, and high-end graphics cards that support general processing (GPGPUs) can be used to accelerate the solution. On multiple compute nodes of a computer cluster, a hybrid MPI and thread-based parallelization is used.

The direct sparse solver cannot be used on multiple compute nodes of a computer cluster if:

  • the analysis also includes an eigenvalue extraction procedure, or

  • the analysis requires features for which MPI-based parallel execution of element operations is not supported.

In addition, the direct sparse solver cannot be used on multiple nodes of a computer cluster for analyses that include any of the following:

To execute the parallel direct sparse solver on computer clusters, the environment variable mp_host_list must be set to a list of host machines (see Using the Abaqus environment settings, Section 3.3.1). MPI-based parallelization is used between the machines in the host list. Thread-based parallelization is used within a host machine if more than one processor is available on that machine in the host list and if the model does not contain cavity radiation using parallel decomposition (see Decomposing large cavities in parallel” in “Cavity radiation, Section 41.1.1). For example, if the environment file has the following:

cpus=8
mp_host_list=[['maple',4],['pine',4]]
Abaqus/Standard will use four processors on each host through thread-based parallelization. A total of two MPI processes (equal to the number of hosts) will be run across the host machines so that all eight processors are used by the parallel direct sparse solver.

Models containing parallel cavity decomposition use only MPI-based parallelization. Therefore, MPI is used on both shared memory parallel computers and distributed memory compute clusters. The number of processes is equal to the number of CPUs requested during job submission. Element operations are executed in parallel using MPI-based parallelization when parallel cavity decomposition is enabled.

Input File Usage:          Use the following option in conjunction with the command line input to execute the parallel direct sparse solver:
*STEP

Enter the following input on the command line:

abaqus job=job-name cpus=n

For example, the following input will run the job “beam” on two processors:

abaqus job=beam cpus=2 

Abaqus/CAE Usage:   

Step module: step editor: Other: Method: Direct

Job module: job editor: Parallelization: toggle on Use multiple processors, and specify the number of processors, n


GPGPU acceleration of the direct sparse solver

The direct sparse solver supports GPGPU acceleration.

Input File Usage:          Enter the following input on the command line to activate GPGPU direct sparse solver acceleration:

abaqus job=job-name gpus=n


Abaqus/CAE Usage:   

Step module: step editor: Other: Method: Direct

Job module: job editor: Parallelization: toggle on Use GPGPU acceleration, and specify the number GPGPUs


Memory requirements for the parallel direct sparse solver

The parallel direct sparse solver processes multiple fronts in parallel in addition to parallelizing the solution of individual fronts. Therefore, the direct parallel solver requires more memory than the serial solver. The memory requirements are not predictable exactly in advance since it is not determined a priori which fronts will actually be processed simultaneously.

Equation ordering for minimum solve time

Direct sparse solvers require the system of equations to be ordered for minimum floating point operation count. The ordering procedure is performed in parallel when multiple host machines are used on a computer cluster. In a shared memory configuration the ordering procedure is not performed in parallel. The parallel ordering procedure will compute different orders when run on different number of host machines, which will affect the floating point operation count for the direct solver. Parallel ordering can offer performance improvements, particularly for large models using many host machines by significantly reducing the time to compute the order. Parallel ordering may cause performance degradation if the order determined results in a higher floating point operation count for the direct solver.

The serial ordering procedure can be used in cases where the variability in the ordering inherent in the parallel ordering procedure is not acceptable. You can deactivate parallel solver ordering from the command line or by using the order_parallel environment file parameter (see Command line default parameters” in “Using the Abaqus environment settings, Section 3.3.1).

Input File Usage:          Enter the following input on the command line to deactivate parallel solver ordering:

abaqus job=job-name order_parallel=OFF


Abaqus/CAE Usage:   Deactivation of parallel solver ordering is not supported in Abaqus/CAE.

Parallel equation solution with the iterative solver

The iterative solver (Iterative linear equation solver, Section 6.1.6) uses only MPI-based parallelization. Therefore, MPI is used on both shared memory parallel computers and distributed memory compute clusters. To execute the parallel iterative solver, specify the number of CPUs for the job. The number of processes is equal to the number of CPUs requested during job submission. Element operations are executed in parallel using MPI-based parallelization when the parallel iterative solver is used.

Input File Usage:          Use the following option in conjunction with the command line input to execute the parallel iterative solver:
*STEP, SOLVER=ITERATIVE

Enter the following input on the command line:

abaqus job=job-name cpus=n

For example, the following input will run the job “cube” on four processors with the iterative solver:

abaqus job=cube cpus=4

Abaqus/CAE Usage:   

Step module: step editor: Other: Method: Iterative

Job module: job editor: Parallelization: toggle on Use multiple processors, and specify the number of processors, n


Parallel execution of the element operations in Abaqus/Standard

Parallel execution of the element operations is the default on all supported platforms. The command line and environment variable standard_parallel can be used to control the parallel execution of the element operations (see Using the Abaqus environment settings, Section 3.3.1, and Abaqus/Standard, Abaqus/Explicit, and Abaqus/CFD execution, Section 3.2.2). If parallel execution of the element operations is used, the solvers also run in parallel automatically. For analyses using the direct sparse solver and not containing parallel cavity decomposition, thread-based parallelization of the element operations is used on shared memory computers and a hybrid MPI and thread parallel scheme is used on computer clusters. For analyses using the iterative solver or if parallel cavity decomposition is enabled, only MPI-based parallelization of element operations is supported.

When MPI-based parallelization of element operations is used, element sets are created for each domain and can be inspected in Abaqus/CAE. The sets are named STD_PARTITION_n, where n is the domain number.

Parallel execution of the element operations (thread or MPI-based parallelization) is not supported for the following procedures:

Parallel execution of element operations is available only through MPI-based parallelization for analyses that include any of the following:

Analyses using the direct sparse solver and any of the procedures above that support only MPI-based parallelization of element operations can be run on computer clusters. However, only one processor per compute node is used for the element operations since thread-based parallelization is not supported.

Parallel execution of element operations is available only through thread-based parallelization for:

Finally, parallel execution of the element operations is not supported for analyses that include any of the following:

Input File Usage:          Enter the following input on the command line:

abaqus job=job-name standard_parallel=all cpus=n


Abaqus/CAE Usage:   Control of the parallel execution of the element operations is not supported in Abaqus/CAE.

Memory management with parallel execution of the element operations

When running parallel execution of the element operations in Abaqus/Standard, specifying the upper limit of the memory that can be used (see Abaqus/Standard analysis” in “Managing memory and disk use in Abaqus, Section 3.4.1) specifies the maximum amount of memory that can be allocated by each process.

Transverse shear stress output for stacked continuum shells

The output variables CTSHR13 and CTSHR23 are currently not available when running parallel execution of the element operations in Abaqus/Standard. See Continuum shell element library, Section 29.6.8.

Consistency of results

Some physical systems (systems that, for example, undergo buckling, material failure, or delamination) can be highly sensitive to small perturbations. For example, it is well known that the experimentally measured buckling loads and final configurations of a set of seemingly identical cylindrical shells can show significant scatter due to small differences in boundary conditions, loads, initial geometries, etc. When simulating such systems, the physical sensitivities seen in an experiment can be manifested as sensitivities to small numerical differences caused by finite precision effects. Finite precision effects can lead to small numerical differences when running jobs on different numbers of processors. Therefore, when simulating physically sensitive systems, you may see differences in the numerical results (reflecting the differences seen in experiments) between jobs run on different numbers of processors. To obtain consistent simulation results from run to run, the number of processors should be constant.

Your query was poorly formed. Please make corrections.


3.5.2 Parallel execution in Abaqus/Standard

Products: Abaqus/Standard  Abaqus/CAE  

Your query was poorly formed. Please make corrections.

Overview

Parallel execution in Abaqus/Standard:

  • reduces run time for large analyses;

  • is available for shared memory computers and computer clusters for the element operations, direct sparse solver, and iterative linear equation solver; and

  • can use compute-capable GPGPU hardware on shared memory computers for the direct sparse solver.

Your query was poorly formed. Please make corrections.
Your query was poorly formed. Please make corrections.

Parallel equation solution with the default direct sparse solver

The direct sparse solver (Direct linear equation solver, Section 6.1.5) supports both shared memory computers and computer clusters for parallelization. On shared memory computers or a single node of a computer cluster, thread-based parallelization is used for the direct sparse solver, and high-end graphics cards that support general processing (GPGPUs) can be used to accelerate the solution. On multiple compute nodes of a computer cluster, a hybrid MPI and thread-based parallelization is used.

The direct sparse solver cannot be used on multiple compute nodes of a computer cluster if:

  • the analysis also includes an eigenvalue extraction procedure, or

  • the analysis requires features for which MPI-based parallel execution of element operations is not supported.

In addition, the direct sparse solver cannot be used on multiple nodes of a computer cluster for analyses that include any of the following:

To execute the parallel direct sparse solver on computer clusters, the environment variable mp_host_list must be set to a list of host machines (see Using the Abaqus environment settings, Section 3.3.1). MPI-based parallelization is used between the machines in the host list. Thread-based parallelization is used within a host machine if more than one processor is available on that machine in the host list and if the model does not contain cavity radiation using parallel decomposition (see Decomposing large cavities in parallel” in “Cavity radiation, Section 41.1.1). For example, if the environment file has the following:

cpus=8
mp_host_list=[['maple',4],['pine',4]]
Abaqus/Standard will use four processors on each host through thread-based parallelization. A total of two MPI processes (equal to the number of hosts) will be run across the host machines so that all eight processors are used by the parallel direct sparse solver.

Models containing parallel cavity decomposition use only MPI-based parallelization. Therefore, MPI is used on both shared memory parallel computers and distributed memory compute clusters. The number of processes is equal to the number of CPUs requested during job submission. Element operations are executed in parallel using MPI-based parallelization when parallel cavity decomposition is enabled.

Input File Usage:          Use the following option in conjunction with the command line input to execute the parallel direct sparse solver:
*STEP

Enter the following input on the command line:

abaqus job=job-name cpus=n

For example, the following input will run the job “beam” on two processors:

abaqus job=beam cpus=2 

Abaqus/CAE Usage:   

Step module: step editor: Other: Method: Direct

Job module: job editor: Parallelization: toggle on Use multiple processors, and specify the number of processors, n


Your query was poorly formed. Please make corrections.

GPGPU acceleration of the direct sparse solver

The direct sparse solver supports GPGPU acceleration.

Input File Usage:          Enter the following input on the command line to activate GPGPU direct sparse solver acceleration:

abaqus job=job-name gpus=n


Abaqus/CAE Usage:   

Step module: step editor: Other: Method: Direct

Job module: job editor: Parallelization: toggle on Use GPGPU acceleration, and specify the number GPGPUs


Your query was poorly formed. Please make corrections.
Your query was poorly formed. Please make corrections.

Memory requirements for the parallel direct sparse solver

The parallel direct sparse solver processes multiple fronts in parallel in addition to parallelizing the solution of individual fronts. Therefore, the direct parallel solver requires more memory than the serial solver. The memory requirements are not predictable exactly in advance since it is not determined a priori which fronts will actually be processed simultaneously.

Your query was poorly formed. Please make corrections.
Your query was poorly formed. Please make corrections.

Equation ordering for minimum solve time

Direct sparse solvers require the system of equations to be ordered for minimum floating point operation count. The ordering procedure is performed in parallel when multiple host machines are used on a computer cluster. In a shared memory configuration the ordering procedure is not performed in parallel. The parallel ordering procedure will compute different orders when run on different number of host machines, which will affect the floating point operation count for the direct solver. Parallel ordering can offer performance improvements, particularly for large models using many host machines by significantly reducing the time to compute the order. Parallel ordering may cause performance degradation if the order determined results in a higher floating point operation count for the direct solver.

The serial ordering procedure can be used in cases where the variability in the ordering inherent in the parallel ordering procedure is not acceptable. You can deactivate parallel solver ordering from the command line or by using the order_parallel environment file parameter (see Command line default parameters” in “Using the Abaqus environment settings, Section 3.3.1).

Input File Usage:          Enter the following input on the command line to deactivate parallel solver ordering:

abaqus job=job-name order_parallel=OFF


Abaqus/CAE Usage:   Deactivation of parallel solver ordering is not supported in Abaqus/CAE.

Your query was poorly formed. Please make corrections.
Your query was poorly formed. Please make corrections.
Your query was poorly formed. Please make corrections.

Parallel equation solution with the iterative solver

The iterative solver (Iterative linear equation solver, Section 6.1.6) uses only MPI-based parallelization. Therefore, MPI is used on both shared memory parallel computers and distributed memory compute clusters. To execute the parallel iterative solver, specify the number of CPUs for the job. The number of processes is equal to the number of CPUs requested during job submission. Element operations are executed in parallel using MPI-based parallelization when the parallel iterative solver is used.

Input File Usage:          Use the following option in conjunction with the command line input to execute the parallel iterative solver:
*STEP, SOLVER=ITERATIVE

Enter the following input on the command line:

abaqus job=job-name cpus=n

For example, the following input will run the job “cube” on four processors with the iterative solver:

abaqus job=cube cpus=4

Abaqus/CAE Usage:   

Step module: step editor: Other: Method: Iterative

Job module: job editor: Parallelization: toggle on Use multiple processors, and specify the number of processors, n


Your query was poorly formed. Please make corrections.
Your query was poorly formed. Please make corrections.

Parallel execution of the element operations in Abaqus/Standard

Parallel execution of the element operations is the default on all supported platforms. The command line and environment variable standard_parallel can be used to control the parallel execution of the element operations (see Using the Abaqus environment settings, Section 3.3.1, and Abaqus/Standard, Abaqus/Explicit, and Abaqus/CFD execution, Section 3.2.2). If parallel execution of the element operations is used, the solvers also run in parallel automatically. For analyses using the direct sparse solver and not containing parallel cavity decomposition, thread-based parallelization of the element operations is used on shared memory computers and a hybrid MPI and thread parallel scheme is used on computer clusters. For analyses using the iterative solver or if parallel cavity decomposition is enabled, only MPI-based parallelization of element operations is supported.

When MPI-based parallelization of element operations is used, element sets are created for each domain and can be inspected in Abaqus/CAE. The sets are named STD_PARTITION_n, where n is the domain number.

Parallel execution of the element operations (thread or MPI-based parallelization) is not supported for the following procedures:

Parallel execution of element operations is available only through MPI-based parallelization for analyses that include any of the following:

Analyses using the direct sparse solver and any of the procedures above that support only MPI-based parallelization of element operations can be run on computer clusters. However, only one processor per compute node is used for the element operations since thread-based parallelization is not supported.

Parallel execution of element operations is available only through thread-based parallelization for:

Finally, parallel execution of the element operations is not supported for analyses that include any of the following:

Input File Usage:          Enter the following input on the command line:

abaqus job=job-name standard_parallel=all cpus=n


Abaqus/CAE Usage:   Control of the parallel execution of the element operations is not supported in Abaqus/CAE.

Your query was poorly formed. Please make corrections.

Memory management with parallel execution of the element operations

When running parallel execution of the element operations in Abaqus/Standard, specifying the upper limit of the memory that can be used (see Abaqus/Standard analysis” in “Managing memory and disk use in Abaqus, Section 3.4.1) specifies the maximum amount of memory that can be allocated by each process.

Your query was poorly formed. Please make corrections.
Your query was poorly formed. Please make corrections.

Transverse shear stress output for stacked continuum shells

The output variables CTSHR13 and CTSHR23 are currently not available when running parallel execution of the element operations in Abaqus/Standard. See Continuum shell element library, Section 29.6.8.

Your query was poorly formed. Please make corrections.
Your query was poorly formed. Please make corrections.
Your query was poorly formed. Please make corrections.

Consistency of results

Some physical systems (systems that, for example, undergo buckling, material failure, or delamination) can be highly sensitive to small perturbations. For example, it is well known that the experimentally measured buckling loads and final configurations of a set of seemingly identical cylindrical shells can show significant scatter due to small differences in boundary conditions, loads, initial geometries, etc. When simulating such systems, the physical sensitivities seen in an experiment can be manifested as sensitivities to small numerical differences caused by finite precision effects. Finite precision effects can lead to small numerical differences when running jobs on different numbers of processors. Therefore, when simulating physically sensitive systems, you may see differences in the numerical results (reflecting the differences seen in experiments) between jobs run on different numbers of processors. To obtain consistent simulation results from run to run, the number of processors should be constant.

Your query was poorly formed. Please make corrections.
Your query was poorly formed. Please make corrections.