E L I T E
C H A N N E L PA R T N E R
simulation@rand.com | RandSIM.com | 888.483.0674
Impact on Solve Time
Assuming that it takes 4 seconds to communicate the results to the next student, the time to solve a problem
now jumps from 6 seconds to 10. Compared to the original run, this run accounts for both load balancing and
communications, and the parallel scaling efficiency has dropped to 34.5 percent.
Based on the analogy, we can make the following conclusions:
1. Load balancing depends on accurately forecasting the workload for each task. Fortunately, this is a field
of study that a lot of people have spent time on, so often, the methods are built into the software.
2. Different HPC workloads have different communication requirements. Some need to constantly talk to
each other, some don't.
3. The more parallel tasks, the higher the communication bandwidth required. A lot more talk needs to
happen if there are 400 people in a room vs. 5.
4. There are diminishing returns as core counts continue to increase. The percentages of lost efficiency
add up as you scale further and further up.
5. The larger the problem, the stronger the parallelization benefits. If you learn the ideal number of
questions per student, you can increase the number of students to match an increase in questions
without impacting efficiency.
A Benchmark Graph
Below is a theoretical example of a benchmark graph aligned by different mesh sizes: 2 million nodes, 15
million, 40 million, and 100 million.
Understanding and Predicting the ROI of High-Performance Computing