Browse Source

refrain from using a benchmarking script as this is too inflexible and a huge amount of work to create, manual mode it is ..., also create more concrete plan for benchmarks and the conclusions to draw from them in bechmark-plan.md (new)

master
Constantin Fürst 1 year ago
parent
commit
854cd6916c
  1. 34
      benchmarks/benchmark-plan.md
  2. 11
      benchmarks/benchmarker.py

34
benchmarks/benchmark-plan.md

@ -0,0 +1,34 @@
# peak performance
- meassure ddr to ddr, intra-node
- meassure ddr to hbm, intra-node
- meassure ddr to ddr, inter-node
- meassure ddr to hbm, inter-node
- meassure ddr to ddr, inter-socket
- meassure ddr to hbm, inter-socket
All for 1KiB, 4KiB, 1MiB, 1GiB
All for HW and also SW path
--> conclude how much overhead DSA engine has
--> conclude size after which using HW makes sense
this point is reached when submit overhead for
hw execution is smaller than entire copy time
for sw execution
# submit // done
- single submit-and-wait
- multi submit
- batch submit
All with both 1 and 4 engines per WQ
All for 1KiB, 4KiB, 1MiB, 1GiB but only ddr-ddr intra node
--> conclude which work submission strategy is best for which size
--> conclude whether multiple engines significantly improve batch perf
# MT submit
- multiple threads submit to the same WQ
- use 1,2,4,8,12 threads
All for 1KiB, 4KiB, 1MiB, 1GiB but only ddr-ddr intra node
All for 1 vs 4 engines
--> conclude how bad mt submit hurts performance
--> conclude whether multiple engines help mt submit
# cross copy // done
- compare which is faster: xcopy, copy from source node, copy from dst node
All for both inter-node and inter-socket copy using DDR and 1MiB on 4E
--> conclude where a copy thread should live

11
benchmarks/benchmarker.py

@ -1,11 +0,0 @@
def main():
print("This is doing nothing!")
# test all for sizes 1KiB - 64kKiB / 64MiB increasing exponentially
# test ddr->ddr, ddr->hbm, hbm->hbm, hbm->ddr
# test for both-local, one-local, no-local engine
# test for single thread and multi thread on one engine
# test for single thread with other thread(s) working on disjoint set of node but possibly overlapping source/destination memory
if __name__ == "__main__":
main()
Loading…
Cancel
Save