Last active
February 3, 2020 20:23
-
-
Save shibacow/860f89f2b0f3cc5b30e64a97bc1d79e0 to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
root@****:~/prog/hpl-2.0_FERMI_v15/bin/CUDA# more HPL.dat | |
HPLinpack benchmark input file | |
Innovative Computing Laboratory, University of Tennessee | |
HPL.out output file name (if any) | |
7 device out (6=stdout,7=stderr,file) | |
1 # of problems sizes (N) | |
65536 73728 60000 40000 50000 60000 39007 39000 20960 364160 359424 276480 138240 115200 23040 354432 236160 95040 9600 20737 | |
16129 16128 Ns | |
3 # of NBs | |
2048 1536 1024 512 384 640 768 896 960 1024 1152 1280 384 640 960 768 640 256 960 512 768 1152 NBs | |
0 PMAP process mapping (0=Row-,1=Column-major) | |
1 # of process grids (P x Q) | |
1 Ps | |
1 Qs | |
16.0 threshold | |
1 # of panel fact | |
0 1 2 PFACTs (0=left, 1=Crout, 2=Right) | |
1 # of recursive stopping criterium | |
2 8 NBMINs (>= 1) | |
1 # of panels in recursion | |
2 NDIVs | |
1 # of recursive panel fact. | |
0 1 2 RFACTs (0=left, 1=Crout, 2=Right) | |
1 # of broadcast | |
0 2 BCASTs (0=1rg,1=1rM,2=2rg,3=2rM,4=Lng,5=LnM) | |
1 # of lookahead depth | |
1 0 DEPTHs (>=0) | |
1 SWAP (0=bin-exch,1=long,2=mix) | |
192 swapping threshold | |
1 L1 in (0=transposed,1=no-transposed) form | |
1 U in (0=transposed,1=no-transposed) form | |
1 Equilibration (0=no,1=yes) | |
8 memory alignment in double (> 0) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Linpack nvidia tesla v100 | |
================================================================================ | |
T/V N NB P Q Time Gflops | |
-------------------------------------------------------------------------------- | |
WR10L2L2 25000 768 1 1 112.10 9.293e+01 | |
-------------------------------------------------------------------------------- | |
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 0.0044867 ...... PASSED | |
================================================================================ | |
T/V N NB P Q Time Gflops | |
-------------------------------------------------------------------------------- | |
WR10L2L2 25000 1024 1 1 110.46 9.431e+01 | |
-------------------------------------------------------------------------------- | |
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 0.0042883 ...... PASSED | |
================================================================================ | |
T/V N NB P Q Time Gflops | |
-------------------------------------------------------------------------------- | |
WR10L2L2 25000 1280 1 1 113.27 9.198e+01 | |
-------------------------------------------------------------------------------- | |
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 0.0041744 ...... PASSED | |
================================================================================ | |
T/V N NB P Q Time Gflops | |
-------------------------------------------------------------------------------- | |
WR10L2L2 25000 1536 1 1 112.07 9.295e+01 | |
-------------------------------------------------------------------------------- | |
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 0.0039500 ...... PASSED | |
================================================================================ | |
T/V N NB P Q Time Gflops | |
-------------------------------------------------------------------------------- | |
WR10L2L2 30000 768 1 1 189.78 9.485e+01 | |
-------------------------------------------------------------------------------- | |
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 0.0043544 ...... PASSED | |
================================================================================ | |
T/V N NB P Q Time Gflops | |
-------------------------------------------------------------------------------- | |
WR10L2L2 30000 1024 1 1 187.45 9.603e+01 | |
-------------------------------------------------------------------------------- | |
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 0.0042670 ...... PASSED | |
================================================================================ | |
T/V N NB P Q Time Gflops | |
-------------------------------------------------------------------------------- | |
WR10L2L2 30000 1280 1 1 190.92 9.429e+01 | |
-------------------------------------------------------------------------------- | |
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 0.0047732 ...... PASSED | |
================================================================================ | |
T/V N NB P Q Time Gflops | |
-------------------------------------------------------------------------------- | |
WR10L2L2 30000 1536 1 1 278.20 6.471e+01 | |
-------------------------------------------------------------------------------- | |
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 0.0044749 ...... PASSED | |
================================================================================ | |
Finished 8 tests with the following results: | |
8 tests completed and passed residual checks, | |
0 tests completed and failed residual checks, | |
0 tests skipped because of illegal input values. | |
-------------------------------------------------------------------------------- | |
10.0.0.0/16 | |
54.244.36.217 | |
================================================================================ | |
T/V N NB P Q Time Gflops | |
-------------------------------------------------------------------------------- | |
WR10L2L2 50000 768 1 1 99.20 8.401e+02 | |
-------------------------------------------------------------------------------- | |
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 0.0047704 ...... PASSED | |
================================================================================ | |
T/V N NB P Q Time Gflops | |
-------------------------------------------------------------------------------- | |
WR10L2L2 50000 1024 1 1 80.09 1.041e+03 | |
-------------------------------------------------------------------------------- | |
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 0.0039858 ...... PASSED | |
================================================================================ | |
T/V N NB P Q Time Gflops | |
-------------------------------------------------------------------------------- | |
WR10L2L2 50000 1280 1 1 84.64 9.846e+02 | |
-------------------------------------------------------------------------------- | |
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 0.0037889 ...... PASSED | |
================================================================================ | |
T/V N NB P Q Time Gflops | |
-------------------------------------------------------------------------------- | |
WR10L2L2 50000 1536 1 1 79.82 1.044e+03 | |
-------------------------------------------------------------------------------- | |
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 0.0041055 ...... PASSED | |
================================================================================ | |
T/V N NB P Q Time Gflops | |
-------------------------------------------------------------------------------- | |
WR10L2L2 60000 768 1 1 131.37 1.096e+03 | |
-------------------------------------------------------------------------------- | |
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 0.0046351 ...... PASSED | |
================================================================================ | |
T/V N NB P Q Time Gflops | |
-------------------------------------------------------------------------------- | |
WR10L2L2 60000 1024 1 1 126.63 1.137e+03 | |
-------------------------------------------------------------------------------- | |
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 0.0039147 ...... PASSED | |
================================================================================ | |
T/V N NB P Q Time Gflops | |
-------------------------------------------------------------------------------- | |
WR10L2L2 60000 1280 1 1 132.89 1.084e+03 | |
-------------------------------------------------------------------------------- | |
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 0.0047641 ...... PASSED | |
================================================================================ | |
T/V N NB P Q Time Gflops | |
-------------------------------------------------------------------------------- | |
WR10L2L2 60000 1536 1 1 126.48 1.139e+03 | |
-------------------------------------------------------------------------------- | |
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 0.0048663 ...... PASSED | |
================================================================================ | |
T/V N NB P Q Time Gflops | |
-------------------------------------------------------------------------------- | |
WR10L2L2 73728 2048 1 1 321.99 8.298e+02 | |
-------------------------------------------------------------------------------- | |
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= -nan ...... FAILED | |
================================================================================ | |
T/V N NB P Q Time Gflops | |
-------------------------------------------------------------------------------- | |
WR10L2L2 73728 1536 1 1 236.61 1.129e+03 | |
================================================================================ | |
T/V N NB P Q Time Gflops | |
-------------------------------------------------------------------------------- | |
WR10L2L2 73728 1536 1 1 236.61 1.129e+03 | |
^C-------------------------------------------------------------------------------- | |
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 0.0042714 ...... PASSED | |
================================================================================ | |
T/V N NB P Q Time Gflops | |
-------------------------------------------------------------------------------- | |
WR10L2L2 73728 1024 1 1 236.96 1.128e+03 | |
-------------------------------------------------------------------------------- | |
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 0.0044989 ...... PASSED | |
================================================================================ | |
================================================================================ | |
T/V N NB P Q Time Gflops | |
-------------------------------------------------------------------------------- | |
WR10L2L2 65536 1536 1 1 144.04 1.303e+03 | |
-------------------------------------------------------------------------------- | |
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 0.0044428 ...... PASSED | |
================================================================================ | |
T/V N NB P Q Time Gflops | |
-------------------------------------------------------------------------------- | |
WR10L2L2 65536 1024 1 1 143.62 1.307e+03 | |
-------------------------------------------------------------------------------- | |
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 0.0044281 ...... PASSED | |
================================================================================ |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
ubuntu@*********:/usr/local/cuda-9.0/samples/5_Simulations/nbody$ ./nbody -fp64 -benchmark -numbodies=2048000 -device=0 | |
NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled. | |
> Windowed mode | |
> Simulation data stored in video memory | |
> Double precision floating point simulation | |
> 1 Devices used for simulation | |
gpuDeviceInit() CUDA Device [0]: "Tesla V100-SXM2-16GB | |
> Compute 7.0 CUDA device: [Tesla V100-SXM2-16GB] | |
number of bodies = 2048000 | |
2048000 bodies, total time for 10 iterations: 224622.891 ms | |
= 186.726 billion interactions per second | |
= 5601.794 double-precision GFLOP/s at 30 flops per interaction | |
ubuntu@*********:/usr/local/cuda-9.0/samples/5_Simulations/nbody$ ./nbody -fp64 -benchmark -numbodies=2048000 -device=0 | |
NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled. | |
> Windowed mode | |
> Simulation data stored in video memory | |
> Double precision floating point simulation | |
> 1 Devices used for simulation | |
gpuDeviceInit() CUDA Device [0]: "Tesla V100-SXM2-16GB | |
> Compute 7.0 CUDA device: [Tesla V100-SXM2-16GB] | |
number of bodies = 2048000 | |
2048000 bodies, total time for 10 iterations: 224622.891 ms | |
= 186.726 billion interactions per second | |
= 5601.794 double-precision GFLOP/s at 30 flops per interaction | |
ubuntu@***********:/usr/local/cuda-9.0/samples/5_Simulations/nbody$ ./nbody -benchmark -numbodies=4096000 -device=0 | |
NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled. | |
> Windowed mode | |
> Simulation data stored in video memory | |
> Single precision floating point simulation | |
> 1 Devices used for simulation | |
gpuDeviceInit() CUDA Device [0]: "Tesla V100-SXM2-16GB | |
> Compute 7.0 CUDA device: [Tesla V100-SXM2-16GB] | |
number of bodies = 4096000 | |
4096000 bodies, total time for 10 iterations: 296521.750 ms | |
= 565.801 billion interactions per second | |
= 11316.010 single-precision GFLOP/s at 20 flops per interaction | |
ubuntu@***********:/usr/local/cuda-9.0/samples/5_Simulations/nbody$ ./nbody -benchmark -numbodies=8192000 -device=0 | |
NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled. | |
> Windowed mode | |
> Simulation data stored in video memory | |
> Single precision floating point simulation | |
> 1 Devices used for simulation | |
gpuDeviceInit() CUDA Device [0]: "Tesla V100-SXM2-16GB | |
> Compute 7.0 CUDA device: [Tesla V100-SXM2-16GB] | |
number of bodies = 8192000 | |
8192000 bodies, total time for 10 iterations: 1170863.375 ms | |
= 573.157 billion interactions per second | |
= 11463.142 single-precision GFLOP/s at 20 flops per interaction | |
ubuntu@*********:/usr/local/cuda-9.0/samples/5_Simulations/nbody$ ./nbody -fp64 -benchmark -numbodies=8192000 -device=0 | |
NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled. | |
> Windowed mode | |
> Simulation data stored in video memory | |
> Double precision floating point simulation | |
> 1 Devices used for simulation | |
gpuDeviceInit() CUDA Device [0]: "Tesla V100-SXM2-16GB | |
> Compute 7.0 CUDA device: [Tesla V100-SXM2-16GB] | |
number of bodies = 8192000 | |
8192000 bodies, total time for 10 iterations: 3595757.750 ms | |
= 186.633 billion interactions per second | |
= 5599.003 double-precision GFLOP/s at 30 flops per interaction |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
thanks