Constantin Fürst
|
1b6c60c49b
|
benchmark copy throughput for using 1,2,4,8 dsas and remove brute cpu bench (we steal it from andre)
|
11 months ago |
Constantin Fürst
|
21bbf53e55
|
use local and one remote node for pushpull on intranode
|
11 months ago |
Constantin Fürst
|
9e329d39e4
|
add pushpul benchmark for peak throughput
|
11 months ago |
Constantin Fürst
|
067a31e560
|
modify submission benchmark descriptors to have x10 internal repetitions
|
11 months ago |
Constantin Fürst
|
ed77e57e5f
|
remove benchmark descriptors for unused engine location benchmark
|
11 months ago |
Constantin Fürst
|
8ac601fc07
|
add option for internal repetitions to benchmarks which allows the small copies of 1kib to run long enough for the timings to become usable (goal is about 1s runtime for each iteration)
|
11 months ago |
Constantin Fürst
|
ea978423c0
|
slightly modify the debug benchmark descriptor to contain multiple threads and also a batch
|
11 months ago |
Constantin Fürst
|
cae0fbb56e
|
remove benchmark descriptor handling helper scripts
|
11 months ago |
Constantin Fürst
|
e2c6fd8587
|
set iteration count for submission bench to 10 for the large and 100 for the small, also test 128mib which is over the size of the cache available on xeonmax (instead of 32mib)
|
11 months ago |
Constantin Fürst
|
72fb3764fc
|
modify benchmarker script to require less parameters and sit in root dir
|
11 months ago |
Constantin Fürst
|
c8b4f3d624
|
fix issues with benchmark.hpp
|
11 months ago |
Constantin Fürst
|
6e26739eb9
|
give unique name to brute cpu copy benchmark descriptors for identification in results
|
11 months ago |
Constantin Fürst
|
8068fc3437
|
rewrite descriptors to match the format of the rewrite of bench from previous commit
|
11 months ago |
Constantin Fürst
|
d20b6dad93
|
remove all unused descriptors and begin rewriting submission descriptors
|
11 months ago |
Constantin Fürst
|
cf14cf34ac
|
:wqsubdivide the descriptors for peak perf allnodes into fromn0 and fromn1to15
|
11 months ago |
Constantin Fürst
|
f2059d4d47
|
use 12 threads for each brute benchmark
|
11 months ago |
Constantin Fürst
|
439ec97c8e
|
remove smart sw copy, add brute sw copy bench
|
11 months ago |
Constantin Fürst
|
e9090fb482
|
redo benchmarks for allnodes cpu due to mistake in modifying the json files
|
11 months ago |
Constantin Fürst
|
317a74d164
|
benchmark software peak performance for smart and brute force from node 0 for thesis
|
11 months ago |
Constantin Fürst
|
791184ff10
|
fix mistake in benchmark descriptors for smart peak performance
|
1 year ago |
Constantin Fürst
|
0748826fcd
|
use transfer size of 512mib for HBM intranode copy and modify plotter accordingly
|
1 year ago |
Constantin Fürst
|
475b2d5b5a
|
provide two benchmarks for peak performance, one that is brute force and one that uses smart node assignment and therefore lower utilization
|
1 year ago |
Constantin Fürst
|
7ced0bce4c
|
fix an issue in the python script that lead to references being modified which caused bad node settings
|
1 year ago |
Constantin Fürst
|
6d9002d1e7
|
use different engine configuration depending on whether intra socket (all 4 engines on the socket) or inter socket (src and destination engine, cross copy) is the copy type
|
1 year ago |
Constantin Fürst
|
7f7230197c
|
dont submit multiple copies - test takes too long and this has almost no effect at work size of 1gib
|
1 year ago |
Constantin Fürst
|
e9807df09c
|
use all 8 engines for each copy task, as the engine location does not affect performance
|
1 year ago |
Constantin Fürst
|
5089936f30
|
prepare peak throughput plotter for multi-node results
|
1 year ago |
Constantin Fürst
|
1682b84fb4
|
use a batch size of 8 to check whether multiple engines can increase throughput
|
1 year ago |
Constantin Fürst
|
575ff8cf82
|
turn node -1 into node 7 again - this got messed up by a hastily written modification script
|
1 year ago |
Constantin Fürst
|
cf675e37f5
|
dont pin the thread to hbm nodes but to the hbm src node minus 8 in peak perf benchmark
|
1 year ago |
Constantin Fürst
|
808c8f3ae7
|
run the engine location benchmarks with size of 1gib only 10 times
|
1 year ago |
Constantin Fürst
|
b37968dd3f
|
rename engine location benchmarks, modify plotter to support missing configuration files as the new cases are not universally applicable to all configurations
|
1 year ago |
Constantin Fürst
|
a548d9afe5
|
restructure mtsubmit benchmarks
|
1 year ago |
Constantin Fürst
|
b7cae18b6d
|
restructure the engine location bench, correct and update the plotter to use new total time
|
1 year ago |
Constantin Fürst
|
405166cbe8
|
add peak perf benchmark descriptors
|
1 year ago |
Constantin Fürst
|
9886b20112
|
remove the multiple tests for mtsubmit and ensure that the wq will always receive the same elements to make the test fair
|
1 year ago |
Constantin Fürst
|
3964da0d7a
|
use ms10 instead of ms50 as ms50 with > 2 threads will overfill the wq of size 128 (max) and cause an error unhandled in the benchmark yet
|
1 year ago |
Constantin Fürst
|
f9b00a5b32
|
modify mtsubmit to meassure both ssaw and ms50 submit methods
|
1 year ago |
Constantin Fürst
|
d72e28adb8
|
make mtsubmit a multisubmit
|
1 year ago |
Constantin Fürst
|
151ddd79d5
|
set repetition count of tests to 1000 again and replace the 1 GiB tests with 32 MiB
|
1 year ago |
Constantin Fürst
|
21194b7fde
|
fix the benchmarker script by testing for the correct amount of parameters and obtaining the test name correctly
|
1 year ago |
Constantin Fürst
|
d3a09181ce
|
move the existing results into a new folder and modify them to be only benchmark descriptors
|
1 year ago |