PARILLU - Parallel simulation of global illumination


Next: Locality Preserving Dynamic Load Balancing With Provably Small Overhead
Prev: Introduction

On The Partitionability Of Hierarchical Radiosity

We present a detailed experimental analysis of the task access graph, defined on the tasks of the HRA. We give quantitative results underpinning ``common knowlegde'', that HRA is difficult to parallelize. Measurements regarding three important issues of a parallel implementation are presented: load balance, communication, and congestion. The results are rated quantitatively and indicate that there exist several bottlenecks. Different graph partitioners are examined with respect to their ability to cope with these bottlenecks. One main finding is that a spatial partitioning method overcomes most of the problems well.

We do not perform the measurements on a real parallel implementation. Instead we abstract from several possible techniques of dynamic load balancing and communication reduction by analyzing each radiosity iteration in a static manner. However, this approach does not assume a static partitioning strategy. Instead, it allows conclusions regarding the performance characteristics of any possible real dynamic partitioning method.

We use three partitioners: a naive partitioner, a spatial partitioner and an ``optimum'' partitioner. The optimum partitioner cannot be used in a real implementation, since it needs a priori knowledge of the graph, which itself evolves during computations. We consider the optimum partitioner as a generator of a reference solution.

It turns out, that the task access graph is very poorly partitionable. Even the best partitioner achieves a cut size of only 85 percent of all edges running across processor boundaries, when partitioning into 16 parts. The reason of this poor behaviour is the overwhelming amount of communication due to visibility computations. Therefore, in the following experiments we remove the visibility access edges from the definition of the graph. The resulting task access graphs are better partitionable and allow greater insight in the structure of the HRA.

We see, that the HRA is a very dynamic algorithm. The amount of work changes dramatically from iteration to iteration. We also see that load balancing is strictly necessary, since balance of load is poor, when not accounting for load balance at all. But, the changes in the balance of load are relatively small. Hence, one could guess that load balancing in early iterations pays off in later iterations. An important result is that load balancing amortizes, which is confirmed by load continuity matrices. These relate overloaded processor sets of different iterations to each other.

Measured data on the cut size and the connectivity balance for different partitioners is presented. A result is that all partitioners do achieve comparable bad partition qualities. The spatial partitioner seems to be the best choice, because it exploits locality of reference, leading to a reduced chance of network congestion. We discuss this issue based on a new measure, the channel usage.

Next: Locality Preserving Dynamic Load Balancing With Provably Small Overhead
Prev: Introduction

Robert Garmann, February 22, 2002