Develop a complete genetic algorithm tool set for determining optimal parameters for Linpack runs.
- GA DNA decode in to Linpack HPL.dat translator
- Fitness measurer, parent choser/crosser, population variance
- Use the issue tracker to track features and enhancement
requests. It is not just for bugs, pre-populate it with feature
descriptions so that potential contributors know what you would like them
to work on.
Current Project Assumptions
- Use perl and bash scripts until functionally solid, then convert to single script.
- Smaller N values are representative of larger N values
For a myriad of reasons, clusters are often judged on their ability to
solve dense systems of linear equations. Specifically the "Top 500"
super computers in the world are judged using the High Performance
Linpack benchmark or "HPL".
Overall Linpack performance can be thought of as a function many parameters:
- cpu speed and instruction sets
- memory capacity
- system bus speeds
- interconnect topologies, performance and design
- linear algrebra library optimizations
- compiler optimizations
- communication stack and protocol optimizations
- hpl linpack run parameters
Given that most cluster hardware is already in place, and the
high likelihood that compile-time and communications options are
generally slow to change, this investigation focuses on the
linpack run parameters.
The hpl benchmark contains several tunable parameters.
Linpack Tuning Document
To most cluster engineers (the authors included) the tuning
explanations of the hpl parameters yield little clue as to the
underlying effect of varying these parameters. Not everyone
can take a graduate mathematics course in advanced linear algebra in
their free time.
Converting Linpack Parameters to Genetic Algorithm DNA string.
- N Problem Size
(fixed at 10,000 for “quick” fitness testing)
- NB Block Size: 0-1023 (10 bits)
- PMAP Process Mapping (row or column): 0-1 (1 bit)
- P & Q Grid Process Columns and Rows
(P x Q=Number of Procs)
For 64 Procs there are 7 possibilities (3 bits)
(1x64, 2x32, 4x16, 8x8, 16x4, 32x2, 64x1)
- PFACT Panel Factorization Method: 0-2 (2 bits)
- NBMIN Minimum Columns: 1-15 (4 bits)
- NDIV Number of Panel Divisions in Recursion: 0-7 (3 bits)
- RFACT Recursive Panel Factor: 0-2 (2 bits)
- BCAST Broadcast Method: 0-5 (3 bits)
- DEPTH Lookahead Depth: 1-3 (2 bits)
- SWAP Swap Algorithm: 0-2 (2 bits)
- L1 Upper Right Transpose Method: 0-1 (1 bit)
- U Panel of Rows U Transpose Method: 0-1 (1 bit)
- EQUIB Equilibration Toggle: 0-1 (1 bit)
- TOTAL 35 bits