PARILLU - Parallel simulation of global illumination


Prev: Locality Preserving Dynamic Load Balancing With Provably Small Overhead

Reference Implementation: Hierarchical Radiosity

Let us look back a while and summarize. We analyzed the Hierarchical Radiosity Algorithm using a graph partitioning technique and realized that spatial partitioning is a technique promising well scalability. We then defined a dynamic spatial partitioner and showed that it works well theoretically in the worst case and experimentally for a simple application. These results may be valuable not only for the HRA, but also for other scientific applications.

Now, we apply the spatial partitioner to the HRA problem. Our basis for this is a mapping of elements to 3D space and links to 6D space. We will present solutions for the following important implementation issues:

Asynchronous Processing.
An asynchronous formulation of the HRA is important to avoid unnecessarily expensive barrier synchronizations.
It is absolutely necessary to group elements and links to containers in order to reduce administration overhead.
Data Reference Locality.
Elements that once have been transferred to remote caches should be reused there as often and soon as possible. This can be achieved by choosing the right links for refinement.

The performance of our program is documented by plots. We show plots that illustrate how the two key problems balance and overhead are defeated by our program. Speedup plots (see below) give an impression of scalability. On a Cray T3E the speedup curve is almost linear up to 64 processors. This is better than previously published attempts on massively parallel distributed memory computers. The program was also measured on a network of 8 Linux PCs with satisfying results.

Figure: Speedup on a Cray T3E compared to ideal (dotted) speedup

Prev: Locality Preserving Dynamic Load Balancing With Provably Small Overhead

Robert Garmann, February 22, 2002