3 Commits (25451fa26a21c9d88d06e989b3a625f4a997d9ff)

Author SHA1 Message Date
Constantin Fürst 3bfbeca21f resize source and destination pointer holders properly before use and use path from template and not dml::software for cache flush in benchmark loop 11 months ago
Constantin Fürst 24bdccd1e3 rewrite the benchmarker to not allocate the memory regions each iteration but before the test runs, also flush cache each iteration using dml-operation, also set dsa-device using the parameter to submit and not using libnuma assignment 11 months ago
Constantin Fürst 4f9abc911f make benchmark.hpp a cpp file to make it clear that it will have global variables 11 months ago
Constantin Fürst f905ee77eb wait less for task launch and dont write iterations complete out 11 months ago
Constantin Fürst c8b4f3d624 fix issues with benchmark.hpp 11 months ago
Constantin Fürst fccc255aae rewrite the benchmark to meassure timings for the entire run of all threads, doing multiple sync-steps with the launch barrier as done in the qdp bench 11 months ago
Constantin Fürst 405166cbe8 add peak perf benchmark descriptors 1 year ago
Constantin Fürst b44b52b600 meassure nanoseconds instead of microseconds 1 year ago
Constantin Fürst e6df656845 use vector for timing - this update got lost somehow 1 year ago
Constantin Fürst bc8c4f8ab3 restructure of directory layout 1 year ago
Constantin Fürst 8065dd4345 collect entire duration vector and dont condense the information down 1 year ago
Constantin Fürst 8c5a061343 remove buggy option for multiple sizes 1 year ago
Constantin Fürst 1bfb1f316c ignore first five runs to reduce influence of warmup, add non-batch-descriptor batch loop for testing, calculate standard deviation for all three messurements 1 year ago
Constantin Fürst 5fa12feb7d add required explicit block_on_fault-option to the task submission so that the DSA can handle page faults 1 year ago
Constantin Fürst 4e9688224b create a custom barrier structure that allows synchronization of each iteration of the meassurement loop 1 year ago
Constantin Fürst 80d1b5f543 remove fractional average calculation in favour of std::accumulate and remove option to set thread affinity to core while keeping support for node affinity assignment 1 year ago
Constantin Fürst 659883a765 check the status of the batch operation too - forgot this 1 year ago
Constantin Fürst 22f3ed8956 clean up naming of structs, functions and files 1 year ago
Constantin Fürst cbdf9b3dcf rename execute_move to execute_dml_memcpy and rename the file that contains this function to benchmark-dml-memcpy.hpp 1 year ago
Constantin Fürst 7e8c9acbc3 implement batch operation and add control parameters to the ThreadArgs struct, also add more timing information: now submission and completion will be timed separately 1 year ago
Constantin Fürst 9083ba834f small changes to execute move; remove unused repetition-options, pass args as ref to allow for writing results, dont use numa-node-setting dml submit call 1 year ago
Constantin Fürst b14ca88e03 start implementation of benchmarks code, begin with state from test project, execute-move.hpp contains numa-aware task submit routine which is WIP 1 year ago