VIS assembler code with block loads and stores. These are certainly
experimental results. I assumed 64-byte alignment, and multiples of 64-byte
vector lengths.
1 procs 2048 K elements per proc 10 loop repeats 48 MB memory
8192 byte alignment, 8192 byte offset
VIS
Min RMS Max Max ---Total---- --Per Proc-- ----Time per----
time time time Min Stream Total Stream Total Load Store Mem
sec sec sec Range MB/s MB/s MB/s MB/s ns ns ns
Copy : 0.103 0.104 0.104 1% 325 325 325 325 394 394 197
Scale: 0.104 0.104 0.105 1% 322 322 322 322 397 397 199
Vadd : 0.175 0.175 0.176 1% 288 288 288 288 333 667 222
Triad: 0.192 0.192 0.192 0% 263 263 263 263 365 731 244
% 8 procs 2048 K elements per proc 10 loop repeats 384 MB memory
8192 byte alignment, 8192 byte offset
VIS
Min RMS Max Max ---Total---- --Per Proc-- ----Time per----
time time time Min Stream Total Stream Total Load Store Mem
sec sec sec Range MB/s MB/s MB/s MB/s ns ns ns
Copy : 0.107 0.108 0.109 1% 2499 2499 312 312 410 410 205
Scale: 0.108 0.109 0.109 1% 2491 2491 311 311 411 411 206
Vadd : 0.179 0.179 0.180 1% 2252 2252 281 281 341 682 227
Triad: 0.192 0.192 0.193 0% 2099 2099 262 262 366 732 244
16 procs 2048 K elements per proc 10 loop repeats 768 MB memory
8192 byte alignment, 8192 byte offset
VIS
Min RMS Max Max ---Total---- --Per Proc-- ----Time per----
time time time Min Stream Total Stream Total Load Store Mem
sec sec sec Range MB/s MB/s MB/s MB/s ns ns ns
Copy : 0.119 0.120 0.120 1% 4527 4527 283 283 452 452 226
Scale: 0.115 0.115 0.117 2% 4669 4669 292 292 439 439 219
Vadd : 0.190 0.191 0.191 1% 4243 4243 265 265 362 724 241
Triad: 0.204 0.205 0.205 1% 3944 3944 246 246 389 779 260
24 procs 2048 K elements per proc 10 loop repeats 1152 MB memory
8192 byte alignment, 8192 byte offset
VIS
Min RMS Max Max ---Total---- --Per Proc-- ----Time per----
time time time Min Stream Total Stream Total Load Store Mem
sec sec sec Range MB/s MB/s MB/s MB/s ns ns ns
Copy : 0.120 0.121 0.122 2% 6720 6720 280 280 457 457 229
Scale: 0.119 0.120 0.123 3% 6759 6759 282 282 455 455 227
Vadd : 0.196 0.197 0.197 1% 6156 6156 256 256 374 749 250
Triad: 0.206 0.207 0.210 2% 5860 5860 244 244 393 786 262
32 procs 2048 K elements per proc 10 loop repeats 1536 MB memory
8192 byte alignment, 8192 byte offset
VIS
Min RMS Max Max ---Total---- --Per Proc-- ----Time per----
time time time Min Stream Total Stream Total Load Store Mem
sec sec sec Range MB/s MB/s MB/s MB/s ns ns ns
Copy : 0.136 0.137 0.137 1% 7872 7872 246 246 520 520 260
Scale: 0.134 0.135 0.136 1% 7987 7987 250 250 513 513 256
Vadd : 0.218 0.219 0.220 1% 7377 7377 231 231 416 833 278
Triad: 0.227 0.228 0.228 1% 7092 7092 222 222 433 866 289
40 procs 1024 K elements per proc 10 loop repeats 960 MB memory
8192 byte alignment, 8192 byte offset
VIS
Min RMS Max Max ---Total---- --Per Proc-- ----Time per----
time time time Min Stream Total Stream Total Load Store Mem
sec sec sec Range MB/s MB/s MB/s MB/s ns ns ns
Copy : 0.072 0.073 0.073 1% 9277 9277 232 232 552 552 276
Scale: 0.072 0.073 0.073 2% 9355 9355 234 234 547 547 274
Vadd : 0.113 0.114 0.115 1% 8877 8877 222 222 433 865 288
Triad: 0.117 0.118 0.118 1% 8594 8594 215 215 447 894 298
48 procs 1024 K elements per proc 10 loop repeats 1152 MB memory
8192 byte alignment, 8192 byte offset
VIS
Min RMS Max Max ---Total---- --Per Proc-- ----Time per----
time time time Min Stream Total Stream Total Load Store Mem
sec sec sec Range MB/s MB/s MB/s MB/s ns ns ns
Copy : 0.081 0.081 0.082 1% 9938 9938 207 207 618 618 309
Scale: 0.081 0.082 0.082 1% 9917 9917 207 207 620 620 310
Vadd : 0.126 0.126 0.127 1% 9618 9618 200 200 479 958 319
Triad: 0.129 0.129 0.130 1% 9373 9373 195 195 492 983 328
56 procs 1024 K elements per proc 10 loop repeats 1344 MB memory
8192 byte alignment, 8192 byte offset
VIS
Min RMS Max Max ---Total---- --Per Proc-- ----Time per----
time time time Min Stream Total Stream Total Load Store Mem
sec sec sec Range MB/s MB/s MB/s MB/s ns ns ns
Copy : 0.092 0.092 0.093 1% 10250 10250 183 183 699 699 350
Scale: 0.092 0.093 0.094 2% 10175 10175 182 182 704 704 352
Vadd : 0.141 0.141 0.141 1% 10030 10030 179 179 536 1072 357
Triad: 0.142 0.143 0.143 1% 9910 9910 177 177 542 1085 362
63 procs 1024 K elements per proc 10 loop repeats 1512 MB memory
8192 byte alignment, 8192 byte offset
VIS
Min RMS Max Max ---Total---- --Per Proc-- ----Time per----
time time time Min Stream Total Stream Total Load Store Mem
sec sec sec Range MB/s MB/s MB/s MB/s ns ns ns
Copy : 0.103 0.104 0.110 7% 10307 10307 164 164 782 782 391
Scale: 0.104 0.105 0.110 6% 10180 10180 162 162 792 792 396
Vadd : 0.156 0.157 0.165 6% 10181 10181 162 162 594 1188 396
Triad: 0.157 0.158 0.165 5% 10107 10107 160 160 598 1197 399
This archive was generated by hypermail 2b29 : Tue Apr 18 2000 - 05:23:06 CDT