Constantin Fürst
|
eb9960924e
|
extend plotter to also provide rawtime for the base configurations of qdp, forgot to add it to a previous commit that should have included it
|
11 months ago |
Constantin Fürst
|
f653c22f88
|
add rawtiming table for the comparison-benchmarks (dram,hbm) in the qdp lotter, update the plots
|
11 months ago |
Constantin Fürst
|
44c220fd5a
|
add table output to qdp result plotter which displays speedup compared to dram as baseline, redo the timing plots with the latest test results, add the speedup table
|
11 months ago |
Constantin Fürst
|
366cd84a1f
|
remove the single-group results due to them not making the results easier to view
|
11 months ago |
Constantin Fürst
|
6c39d79610
|
add results with only a single group (1/32th) and 128 MiB Tasksize (1/32th) for cleaner results for plotting
|
11 months ago |
Constantin Fürst
|
a4dac61730
|
change config to allow 64 threads for stage 1 and 32 for stage 2 in all benchmarks
|
11 months ago |
Constantin Fürst
|
fb2a8e445c
|
increase font size of timing donuts from qdp bench for better readability
|
11 months ago |
Constantin Fürst
|
bd23ae138e
|
finalize plotter script and add timing results
|
11 months ago |
Constantin Fürst
|
567a24f8c0
|
add hbm baseline and reorganize folders again
|
11 months ago |
Constantin Fürst
|
f6c43a6659
|
restructure evaluation results folder again
|
11 months ago |
Constantin Fürst
|
6070f320f5
|
add results for distributed locations prefetching
|
11 months ago |
Constantin Fürst
|
3c7c7852a5
|
remeassure performance for out of cache allocation
|
11 months ago |
Constantin Fürst
|
b710aec5fe
|
restructure evaluation results, add new results with out of cache allocation
|
11 months ago |
Constantin Fürst
|
c5022105cb
|
publish current configuration for testing
|
11 months ago |
Constantin Fürst
|
16e47a862f
|
allocate the correct amount of chunks for caching (missing was the run count) and add them to the queue for each run
|
11 months ago |
Constantin Fürst
|
e99bf619c2
|
handle memory allocation outside of the cache, pre-allocate in benchmark and memset to hopefully guarantee no pagefaults will be encountered
|
11 months ago |
Constantin Fürst
|
19ef2df856
|
update perf profile with manually disabled huge pages
|
11 months ago |
Constantin Fürst
|
d4677b3c59
|
measure performance without huge pages on
|
11 months ago |
Constantin Fürst
|
7afcffbefa
|
set correct node in perf recording script
|
11 months ago |
Constantin Fürst
|
c86d517444
|
fix the published results for prefetching
|
11 months ago |
Constantin Fürst
|
94b3576d5a
|
publish measurements from benchmark
|
11 months ago |
Constantin Fürst
|
8999fe4ca3
|
share current config for qdp bench from vampir
|
11 months ago |
Constantin Fürst
|
79a7e9637c
|
fix benchmark by waiting and not dropping barrier in aggrj
|
11 months ago |
Constantin Fürst
|
bc1c3d0096
|
fix block size for access by cacher in scanb
|
11 months ago |
Constantin Fürst
|
99552b3de4
|
add option for forcing map of pages by touching each one with a write at its begin, required as somehow behaviour changed, cache was experiencing page fault errors and handling by dsa is simply too slow
|
11 months ago |
Constantin Fürst
|
5044b4419c
|
make load balancing thread-local to reduce atomic cost
|
11 months ago |
Constantin Fürst
|
f9d47d3a45
|
add scanb back to the barrier, now other threads will wait for finish of work submission
|
11 months ago |
Constantin Fürst
|
006b856c44
|
resolve issues from the recent reset of qdp benchmark
|
11 months ago |
Constantin Fürst
|
de1de9134b
|
reset benchmark
|
11 months ago |
Constantin Fürst
|
4a587a36e2
|
remove overlap-execution barriers and run for the entire block
|
11 months ago |
Constantin Fürst
|
c393b8eb88
|
improve load balancing node assignment
|
11 months ago |
Constantin Fürst
|
21702d5309
|
remove sub and overchunking for scanb caching, use the per-iteration barriers again
|
11 months ago |
Constantin Fürst
|
7d614769db
|
remove forgotten access to load timer
|
11 months ago |
Constantin Fürst
|
6a4eec37ca
|
remove vector-load timing as its too expensive
|
11 months ago |
Constantin Fürst
|
93a281fa26
|
improve debug output for relwithdebinfo in qdp, fix filename for record perf script, add perf.svg with better debug info
|
11 months ago |
Constantin Fürst
|
624e8b55ea
|
add script to record perf and make the flame graph
|
11 months ago |
Constantin Fürst
|
942d7be7e9
|
redo benchmarks for qdp
|
11 months ago |
Constantin Fürst
|
a83f208cd2
|
fix time evaluation for qdp bench
|
11 months ago |
Constantin Fürst
|
cc8d203771
|
redo benchmarks for qdp, move previous results to old (folder)
|
11 months ago |
Constantin Fürst
|
94669924c8
|
implement cache in aggrj for qdp
|
11 months ago |
Constantin Fürst
|
c7877ecdf6
|
remove skeleton of now defunct function in qdp
|
11 months ago |
Constantin Fürst
|
20c6e54df7
|
remove broken implementation for non-divisible chunk-group-thread-counts
|
11 months ago |
Constantin Fürst
|
a3a8dff1aa
|
reset some changes to the aggregation and filter functions not quite needed
|
11 months ago |
Constantin Fürst
|
69aec6fa48
|
add plotter for the results of qdp which turns them into a donut-graph
|
11 months ago |
Constantin Fürst
|
122eab35b7
|
modify benchmarking code to measure time spent loading vectors too
|
11 months ago |
Constantin Fürst
|
d1cc3e3b0c
|
modification to qdp benchmark, returns to per-chunk barrier wait, uses userspace semaphore for one-way barrier from scan_b to aggr_j as scan_b should submit asap but aggr_j should wait on submission from scan_b, contains TODO for modifying code to support chunkcount not divisible by 2
|
11 months ago |
Constantin Fürst
|
a963406f7c
|
move mode selection to Configuration.hpp, adapt the CopyMethodPolicy-Function to return only src_node for task sizes under 16MiB which is now required to not cause high submission count which slows down small copies
|
11 months ago |
Constantin Fürst
|
ef805244ac
|
use 4gib as size and again 1 aggrj thread for qdp bench
|
11 months ago |
Constantin Fürst
|
81527fdb6b
|
commit current vampir config
|
11 months ago |
Constantin Fürst
|
b35f9978ae
|
again, redo the perf-eval with reduced data size and load to prevent missing frames, the second
|
11 months ago |