Constantin Fürst
|
de1de9134b
|
reset benchmark
|
11 months ago |
Constantin Fürst
|
4a587a36e2
|
remove overlap-execution barriers and run for the entire block
|
11 months ago |
Constantin Fürst
|
c393b8eb88
|
improve load balancing node assignment
|
11 months ago |
Constantin Fürst
|
21702d5309
|
remove sub and overchunking for scanb caching, use the per-iteration barriers again
|
11 months ago |
Constantin Fürst
|
7d614769db
|
remove forgotten access to load timer
|
11 months ago |
Constantin Fürst
|
6a4eec37ca
|
remove vector-load timing as its too expensive
|
11 months ago |
Constantin Fürst
|
93a281fa26
|
improve debug output for relwithdebinfo in qdp, fix filename for record perf script, add perf.svg with better debug info
|
11 months ago |
Constantin Fürst
|
624e8b55ea
|
add script to record perf and make the flame graph
|
11 months ago |
Constantin Fürst
|
942d7be7e9
|
redo benchmarks for qdp
|
11 months ago |
Constantin Fürst
|
a83f208cd2
|
fix time evaluation for qdp bench
|
11 months ago |
Constantin Fürst
|
cc8d203771
|
redo benchmarks for qdp, move previous results to old (folder)
|
11 months ago |
Constantin Fürst
|
94669924c8
|
implement cache in aggrj for qdp
|
11 months ago |
Constantin Fürst
|
c7877ecdf6
|
remove skeleton of now defunct function in qdp
|
11 months ago |
Constantin Fürst
|
20c6e54df7
|
remove broken implementation for non-divisible chunk-group-thread-counts
|
11 months ago |
Constantin Fürst
|
a3a8dff1aa
|
reset some changes to the aggregation and filter functions not quite needed
|
11 months ago |
Constantin Fürst
|
69aec6fa48
|
add plotter for the results of qdp which turns them into a donut-graph
|
11 months ago |
Constantin Fürst
|
122eab35b7
|
modify benchmarking code to measure time spent loading vectors too
|
11 months ago |
Constantin Fürst
|
d1cc3e3b0c
|
modification to qdp benchmark, returns to per-chunk barrier wait, uses userspace semaphore for one-way barrier from scan_b to aggr_j as scan_b should submit asap but aggr_j should wait on submission from scan_b, contains TODO for modifying code to support chunkcount not divisible by 2
|
11 months ago |
Constantin Fürst
|
a963406f7c
|
move mode selection to Configuration.hpp, adapt the CopyMethodPolicy-Function to return only src_node for task sizes under 16MiB which is now required to not cause high submission count which slows down small copies
|
11 months ago |
Constantin Fürst
|
ef805244ac
|
use 4gib as size and again 1 aggrj thread for qdp bench
|
11 months ago |
Constantin Fürst
|
81527fdb6b
|
commit current vampir config
|
11 months ago |
Constantin Fürst
|
b35f9978ae
|
again, redo the perf-eval with reduced data size and load to prevent missing frames, the second
|
11 months ago |
Constantin Fürst
|
18d5e62b80
|
again, redo the perf-eval with reduced data size and load to prevent missing frames
|
11 months ago |
Constantin Fürst
|
d63d8ac547
|
add redone flame graph
|
11 months ago |
Constantin Fürst
|
69a3d2cef4
|
experimental implementation for tc-scanb > tc-aggrj, the second
|
11 months ago |
Constantin Fürst
|
07fba8a5f0
|
experimental implementation for tc-scanb > tc-aggrj
|
11 months ago |
Constantin Fürst
|
d4122ba25a
|
add updated config for prefetch from vampir
|
11 months ago |
Constantin Fürst
|
e4a0030049
|
fix prefetching subchunk indexing and adapt the weak access flag for join
|
11 months ago |
Constantin Fürst
|
f978d6b9b4
|
redo tests for prefetching
|
11 months ago |
Constantin Fürst
|
972440d19f
|
repair flags implementation
|
11 months ago |
Constantin Fürst
|
b3607329a6
|
add a flags-concept to cacher, add the option to select whether to handle pagefaults or not
|
11 months ago |
Constantin Fürst
|
4b0770fc8e
|
add result for try with strong waiting
|
11 months ago |
Constantin Fürst
|
6dd7f80500
|
again, redo the perf flame graph
|
11 months ago |
Constantin Fürst
|
29c49ca5b4
|
redo flame graph with correct stack information
|
11 months ago |
Constantin Fürst
|
4cbe649601
|
generate flame graph for runtime of prefetch
|
11 months ago |
Constantin Fürst
|
57e696297c
|
provide new results for simpleq
|
11 months ago |
Constantin Fürst
|
bb1d20924a
|
fix index clash for thread-and-group unique indexing
|
11 months ago |
Constantin Fürst
|
0eca180e53
|
fix destination indexing in aggrj for happly
|
11 months ago |
Constantin Fürst
|
5e8f3e05e3
|
fix chunk indexing in scanb and refactor result calculation
|
11 months ago |
Constantin Fürst
|
c2b9e6656d
|
fix chunk selection in scanb, use the dataptr in aggrj complex mode, export some functions to src/utils/BenchmarkHelpers.cpp
|
11 months ago |
Constantin Fürst
|
845e812ca7
|
set the correct sum check which was inverted by querry type
|
11 months ago |
Constantin Fürst
|
abcb9a4b2e
|
extend modestring to contain query type
|
11 months ago |
Constantin Fürst
|
e4ed4ac5b9
|
correct and minimize subchunking implementation which now is only allowed in scanb
|
11 months ago |
Constantin Fürst
|
50560606a3
|
add complex query as benchmarking option and evaluate results
|
11 months ago |
Constantin Fürst
|
79a7dcead8
|
re-run bench with actually working query
|
11 months ago |
Constantin Fürst
|
3c1606da51
|
init datab correctly as well to fix the benchmark
|
11 months ago |
Constantin Fürst
|
10a791dea1
|
remove the experimental code branches that turned out not to yield any benefit (sched-yield has too high delay and with the new load balancer, subchunking for aggrj is also not needed anymore)
|
11 months ago |
Constantin Fürst
|
a6771287e9
|
add result with the new load balancer
|
11 months ago |
Constantin Fürst
|
881047068c
|
rerun benchmarks for dram baseline and hbm peak
|
11 months ago |
Constantin Fürst
|
a72a26dbee
|
remove cout/cerr output from cache and benchmark to not falsify results
|
11 months ago |