|
|
@ -1,12 +1,9 @@ |
|
|
|
# peak performance |
|
|
|
- meassure ddr to ddr, intra-node |
|
|
|
- meassure ddr to hbm, intra-node |
|
|
|
- meassure ddr to ddr, inter-node |
|
|
|
- meassure ddr to hbm, inter-node |
|
|
|
- meassure ddr to ddr, inter-socket |
|
|
|
- meassure ddr to hbm, inter-socket |
|
|
|
# peak-perf |
|
|
|
- meassure ddr to ddr |
|
|
|
- meassure ddr to hbm |
|
|
|
All for 1KiB, 4KiB, 1MiB, 1GiB |
|
|
|
All for HW and also SW path |
|
|
|
All for intra-node, inter-node and inter-socket |
|
|
|
--> conclude how much overhead DSA engine has |
|
|
|
--> conclude size after which using HW makes sense |
|
|
|
this point is reached when submit overhead for |
|
|
@ -17,17 +14,19 @@ All for HW and also SW path |
|
|
|
- multi submit |
|
|
|
- batch submit |
|
|
|
All with both 1 and 4 engines per WQ |
|
|
|
All for 1KiB, 4KiB, 1MiB, 1GiB but only ddr-ddr intra node |
|
|
|
All for 1KiB, 4KiB, 1MiB, 1GiB |
|
|
|
All only on DDR and intra-node |
|
|
|
--> conclude which work submission strategy is best for which size |
|
|
|
--> conclude whether multiple engines significantly improve batch perf |
|
|
|
# MT submit |
|
|
|
# mtsubmit // done |
|
|
|
- multiple threads submit to the same WQ |
|
|
|
- use 1,2,4,8,12 threads |
|
|
|
All for 1KiB, 4KiB, 1MiB, 1GiB but only ddr-ddr intra node |
|
|
|
All using DDR and 1MiB |
|
|
|
All for 1 vs 4 engines |
|
|
|
All on DDR and intra-node |
|
|
|
--> conclude how bad mt submit hurts performance |
|
|
|
--> conclude whether multiple engines help mt submit |
|
|
|
# cross copy // done |
|
|
|
# cross-copy // done |
|
|
|
- compare which is faster: xcopy, copy from source node, copy from dst node |
|
|
|
All for both inter-node and inter-socket copy using DDR and 1MiB on 4E |
|
|
|
--> conclude where a copy thread should live |
|
|
|