Estimate FLOPS in Linux?What is the best Linux filesystem for MySQL (InnoDB)?Benchmark linux boxHow to run a server on port 80 as a normal user on Linux?What is the best way to estimate hosting/server costs?Simple Linux server benchmark?How to estimate hard drive usage on Linux?Anyone else experiencing high rates of Linux server crashes during a leap second day?Rough estimate of server performanceBenchmark linux server?How to disable perf subsystem in Linux kernel?
A player is constantly pestering me about rules, what do I do as a DM?
Transitive action of a discrete group on a compact space
Why is the Turkish president's surname spelt in Russian as Эрдоган, with г?
How can I convince my reader that I will not use a certain trope?
How to start learning the piano again
What shortcut does ⌦ symbol in Camunda macOS app indicate and how to invoke it?
MH370 blackbox - is it still possible to retrieve data from it?
Cross over of arrows in a complex diagram
Why does this function call behave sensibly after calling it through a typecasted function pointer?
How fast can a ship with rotating habitats be accelerated?
Difference between 'demás' and 'otros'?
Can a police officer film me on their personal device in my own home?
Do 3D printers really reach 50 micron (0.050mm) accuracy?
Could Sauron have read Tom Bombadil's mind if Tom had held the Palantir?
Three column layout
Alphabet completion rate
Should I report a leak of confidential HR information?
Signing using digital signatures?
How to convert object fill in to fine lines?
Children's short story about material that accelerates away from gravity
can’t run a function against EXEC
Why is a blank required between "[[" and "-e xxx" in ksh?
Should I hide continue button until tasks are completed?
Anagram Within an Anagram!
Estimate FLOPS in Linux?
What is the best Linux filesystem for MySQL (InnoDB)?Benchmark linux boxHow to run a server on port 80 as a normal user on Linux?What is the best way to estimate hosting/server costs?Simple Linux server benchmark?How to estimate hard drive usage on Linux?Anyone else experiencing high rates of Linux server crashes during a leap second day?Rough estimate of server performanceBenchmark linux server?How to disable perf subsystem in Linux kernel?
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;
I am looking for a quick and easy program to estimate FLOPS on my Linux system. I found HPL, but getting it compiled is proving to be irritating. All I need is a ballpark estimate of the FLOPS, without needing to spend a day researching benchmark packages and installing dependent software. Does any such program exist? Would it be sufficient to write a C program that multiples two floats in a loop?
linux benchmark
add a comment |
I am looking for a quick and easy program to estimate FLOPS on my Linux system. I found HPL, but getting it compiled is proving to be irritating. All I need is a ballpark estimate of the FLOPS, without needing to spend a day researching benchmark packages and installing dependent software. Does any such program exist? Would it be sufficient to write a C program that multiples two floats in a loop?
linux benchmark
add a comment |
I am looking for a quick and easy program to estimate FLOPS on my Linux system. I found HPL, but getting it compiled is proving to be irritating. All I need is a ballpark estimate of the FLOPS, without needing to spend a day researching benchmark packages and installing dependent software. Does any such program exist? Would it be sufficient to write a C program that multiples two floats in a loop?
linux benchmark
I am looking for a quick and easy program to estimate FLOPS on my Linux system. I found HPL, but getting it compiled is proving to be irritating. All I need is a ballpark estimate of the FLOPS, without needing to spend a day researching benchmark packages and installing dependent software. Does any such program exist? Would it be sufficient to write a C program that multiples two floats in a loop?
linux benchmark
linux benchmark
asked Nov 25 '09 at 21:37
molecularbearmolecularbear
2331 gold badge3 silver badges9 bronze badges
2331 gold badge3 silver badges9 bronze badges
add a comment |
add a comment |
6 Answers
6
active
oldest
votes
The question is what do you mean by flops? If all you care about is how many of the simplest floating point operations per clock, it is probably 3x your clock speed, but that is about as meaningless as bogomips. Some floating point ops take a long time (divide, for starters), add and multiply are typically quick (one per fp unit per clock). The next issue is memory performance, there is a reason the last classic CRAY had 31 memory banks, ultimately CPU performance is limited by how fast you can read and write to memory, so what level of caching does your problem fit in? Linpack was a real benchmark once, now it fits in cache (L2 if not L1) and is more of a pure theoretical CPU benchmark. And of course, your SSE (etc) units can add floating point performance too.
What distro do you run?
This looked like a good pointer: http://linuxtoolkit.blogspot.com/2009/04/intel-optimized-linpack-benchmark-for.html
http://onemansjourneyintolinux.blogspot.com/2008/12/show-us-yer-flops.html
http://www.phoronix-test-suite.com/ might be an easier way to install a flops benchmark.
Still I do wonder why you care, what you are using it for? If you just want a meaningless number, your systems bogomips is still right there in dmesg.
1
Phoronix seems to be exactly what I was looking for - thank you! The only reason I wanted this was because I was filling out a survey that asked how many teraflops of computing power I have. The survey wasn't terribly important, so I wasn't concerned about the accuracy of the answer. Still, it would be kind of neat to be able to say, "Our cluster can do X teraflops." Though as you point out, that number doesn't necessarily have much real-world meaning.
– molecularbear
Nov 26 '09 at 2:06
add a comment |
apparently there's a "sysbench" benchmark package and command:
sudo apt-get install sysbench
(or brew install sysbench
OS X)
run it like this:
sysbench --test=cpu --cpu-max-prime=20000 --num-threads=2 run
output for comparisons:
total time: 15.3047s
ref: http://www.midwesternmac.com/blogs/jeff-geerling/2013-vps-benchmarks-linode
3
How does this give the FLOPS?
– Martin Thoma
Dec 21 '16 at 11:00
Looks like it's more of a generic "cpu benchmark" see also bnikolic.co.uk/blog/hpc-howto-measure-flops.html
– rogerdpack
Aug 20 '18 at 13:47
add a comment |
For ballpark-estimates:
Raspberry Pi 2: 299.93 * 10^6 FLOPS (source)
Raspberry Pi 3: 462.07 * 10^6 FLOPS (source)
GTX Titan Black GPU: 5.1 * 10^12 FLOPS (source)
Sunway TaihuLight: 93 * 10^15 FLOPS (source, record holder of 2016)
Linpack
- Download it (link)
- Extract it
cd benchmarks_2017/linux/mkl/benchmarks/linpack
./runme_xeon64
- Wait for quite a while (more than 1 hour)
On a Thinkpad T460p (Intel i7-6700HQ CPU), it gives:
This is a SAMPLE run script for SMP LINPACK. Change it to reflect
the correct number of CPUs/threads, problem input files, etc..
./runme_xeon64: 33: [: -gt: unexpected operator
Mi 21. Dez 11:50:29 CET 2016
Intel(R) Optimized LINPACK Benchmark data
Current date/time: Wed Dec 21 11:50:29 2016
CPU frequency: 3.491 GHz
Number of CPUs: 1
Number of cores: 4
Number of threads: 4
Parameters are set to:
Number of tests: 15
Number of equations to solve (problem size) : 1000 2000 5000 10000 15000 18000 20000 22000 25000 26000 27000 30000 35000 40000 45000
Leading dimension of array : 1000 2000 5008 10000 15000 18008 20016 22008 25000 26000 27000 30000 35000 40000 45000
Number of trials to run : 4 2 2 2 2 2 2 2 2 2 1 1 1 1 1
Data alignment value (in Kbytes) : 4 4 4 4 4 4 4 4 4 4 4 1 1 1 1
Maximum memory requested that can be used=9800701024, at the size=35000
=================== Timing linear equation system solver ===================
Size LDA Align. Time(s) GFlops Residual Residual(norm) Check
1000 1000 4 0.014 46.5838 1.165068e-12 3.973181e-02 pass
1000 1000 4 0.010 64.7319 1.165068e-12 3.973181e-02 pass
1000 1000 4 0.009 77.3583 1.165068e-12 3.973181e-02 pass
1000 1000 4 0.010 67.0096 1.165068e-12 3.973181e-02 pass
2000 2000 4 0.064 83.6177 5.001027e-12 4.350281e-02 pass
2000 2000 4 0.063 84.5568 5.001027e-12 4.350281e-02 pass
5000 5008 4 0.709 117.6800 2.474679e-11 3.450740e-02 pass
5000 5008 4 0.699 119.2350 2.474679e-11 3.450740e-02 pass
10000 10000 4 4.895 136.2439 9.069137e-11 3.197870e-02 pass
10000 10000 4 4.904 135.9888 9.069137e-11 3.197870e-02 pass
15000 15000 4 17.260 130.3870 2.052533e-10 3.232773e-02 pass
15000 15000 4 18.159 123.9303 2.052533e-10 3.232773e-02 pass
18000 18008 4 31.091 125.0738 2.611497e-10 2.859910e-02 pass
18000 18008 4 31.869 122.0215 2.611497e-10 2.859910e-02 pass
20000 20016 4 44.877 118.8622 3.442628e-10 3.047480e-02 pass
20000 20016 4 44.646 119.4762 3.442628e-10 3.047480e-02 pass
22000 22008 4 57.918 122.5811 4.714135e-10 3.452918e-02 pass
22000 22008 4 57.171 124.1816 4.714135e-10 3.452918e-02 pass
25000 25000 4 86.259 120.7747 5.797896e-10 3.297056e-02 pass
25000 25000 4 83.721 124.4356 5.797896e-10 3.297056e-02 pass
26000 26000 4 97.420 120.2906 5.615238e-10 2.952660e-02 pass
26000 26000 4 96.061 121.9924 5.615238e-10 2.952660e-02 pass
27000 27000 4 109.479 119.8722 5.956148e-10 2.904520e-02 pass
30000 30000 1 315.697 57.0225 8.015488e-10 3.159714e-02 pass
35000 35000 1 2421.281 11.8061 1.161127e-09 3.370575e-02 pass
Performance Summary (GFlops)
Size LDA Align. Average Maximal
1000 1000 4 63.9209 77.3583
2000 2000 4 84.0872 84.5568
5000 5008 4 118.4575 119.2350
10000 10000 4 136.1164 136.2439
15000 15000 4 127.1586 130.3870
18000 18008 4 123.5477 125.0738
20000 20016 4 119.1692 119.4762
22000 22008 4 123.3813 124.1816
25000 25000 4 122.6052 124.4356
26000 26000 4 121.1415 121.9924
27000 27000 4 119.8722 119.8722
30000 30000 1 57.0225 57.0225
35000 35000 1 11.8061 11.8061
Residual checks PASSED
End of tests
Done: Mi 21. Dez 12:58:23 CET 2016
add a comment |
One benchmark that has been traditionally used to measure FLOPS is Linpack. Another common FLOPS benchmark is Whetstone.
More reading:
The Wikipedia "FLOPS" entry,
Whetstone entry,
Linpack entry
2
I appreciate your answer, however my goal is to obtain a quick n' dirty estimate of flops. Whetstone and Linpack have the same problem as HPL - I start reading about it, then get lost in site after site that all look 20 years old. When I do manage to find source code, I can't seem to compile it without installing a bunch of dependent libraries - even then I run into errors. I could get all this stuff working, but it's not important enough to spend the time. Hopefully there exists some relatively modern software that Just Works for ballparking flops.
– molecularbear
Nov 25 '09 at 22:32
1
Estimate? Then it's about 4*Hz: for 1GHz CPU it's about 4GFLOPS :))
– kolypto
Nov 26 '09 at 1:43
add a comment |
I highly recommend the ready-to-run linpack build from Intel:
http://software.intel.com/en-us/articles/intel-math-kernel-library-linpack-download/
add a comment |
As you mention cluster, we have used the the HPCC suite. It takes a bit of effort to setup and tune, but in our case the point wasn't bragging per se, it was part of the acceptance criteria for the cluster; some performance benchmarking is IMHO vital to ensure that the hardware works as advertised, everything is cabled together correctly etc.
Now if you just want a theoretical peak FLOPS number, that one is easy. Just check out some article about the CPU (say, on realworldtech.com or somesuch) to get info on how many DP FLOPS a CPU core can do per clock cycle (with current x86 CPU's that's typically 4). Then the total peak FLOPS is just
number of cores * FLOPS/cycle * frequency
Then for a cluster with IB network you should be able to hit around 80% of the peak FLOPS on HPL (which BTW is one of the benchmarks in HPCC).
add a comment |
Your Answer
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "2"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fserverfault.com%2fquestions%2f88357%2festimate-flops-in-linux%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
6 Answers
6
active
oldest
votes
6 Answers
6
active
oldest
votes
active
oldest
votes
active
oldest
votes
The question is what do you mean by flops? If all you care about is how many of the simplest floating point operations per clock, it is probably 3x your clock speed, but that is about as meaningless as bogomips. Some floating point ops take a long time (divide, for starters), add and multiply are typically quick (one per fp unit per clock). The next issue is memory performance, there is a reason the last classic CRAY had 31 memory banks, ultimately CPU performance is limited by how fast you can read and write to memory, so what level of caching does your problem fit in? Linpack was a real benchmark once, now it fits in cache (L2 if not L1) and is more of a pure theoretical CPU benchmark. And of course, your SSE (etc) units can add floating point performance too.
What distro do you run?
This looked like a good pointer: http://linuxtoolkit.blogspot.com/2009/04/intel-optimized-linpack-benchmark-for.html
http://onemansjourneyintolinux.blogspot.com/2008/12/show-us-yer-flops.html
http://www.phoronix-test-suite.com/ might be an easier way to install a flops benchmark.
Still I do wonder why you care, what you are using it for? If you just want a meaningless number, your systems bogomips is still right there in dmesg.
1
Phoronix seems to be exactly what I was looking for - thank you! The only reason I wanted this was because I was filling out a survey that asked how many teraflops of computing power I have. The survey wasn't terribly important, so I wasn't concerned about the accuracy of the answer. Still, it would be kind of neat to be able to say, "Our cluster can do X teraflops." Though as you point out, that number doesn't necessarily have much real-world meaning.
– molecularbear
Nov 26 '09 at 2:06
add a comment |
The question is what do you mean by flops? If all you care about is how many of the simplest floating point operations per clock, it is probably 3x your clock speed, but that is about as meaningless as bogomips. Some floating point ops take a long time (divide, for starters), add and multiply are typically quick (one per fp unit per clock). The next issue is memory performance, there is a reason the last classic CRAY had 31 memory banks, ultimately CPU performance is limited by how fast you can read and write to memory, so what level of caching does your problem fit in? Linpack was a real benchmark once, now it fits in cache (L2 if not L1) and is more of a pure theoretical CPU benchmark. And of course, your SSE (etc) units can add floating point performance too.
What distro do you run?
This looked like a good pointer: http://linuxtoolkit.blogspot.com/2009/04/intel-optimized-linpack-benchmark-for.html
http://onemansjourneyintolinux.blogspot.com/2008/12/show-us-yer-flops.html
http://www.phoronix-test-suite.com/ might be an easier way to install a flops benchmark.
Still I do wonder why you care, what you are using it for? If you just want a meaningless number, your systems bogomips is still right there in dmesg.
1
Phoronix seems to be exactly what I was looking for - thank you! The only reason I wanted this was because I was filling out a survey that asked how many teraflops of computing power I have. The survey wasn't terribly important, so I wasn't concerned about the accuracy of the answer. Still, it would be kind of neat to be able to say, "Our cluster can do X teraflops." Though as you point out, that number doesn't necessarily have much real-world meaning.
– molecularbear
Nov 26 '09 at 2:06
add a comment |
The question is what do you mean by flops? If all you care about is how many of the simplest floating point operations per clock, it is probably 3x your clock speed, but that is about as meaningless as bogomips. Some floating point ops take a long time (divide, for starters), add and multiply are typically quick (one per fp unit per clock). The next issue is memory performance, there is a reason the last classic CRAY had 31 memory banks, ultimately CPU performance is limited by how fast you can read and write to memory, so what level of caching does your problem fit in? Linpack was a real benchmark once, now it fits in cache (L2 if not L1) and is more of a pure theoretical CPU benchmark. And of course, your SSE (etc) units can add floating point performance too.
What distro do you run?
This looked like a good pointer: http://linuxtoolkit.blogspot.com/2009/04/intel-optimized-linpack-benchmark-for.html
http://onemansjourneyintolinux.blogspot.com/2008/12/show-us-yer-flops.html
http://www.phoronix-test-suite.com/ might be an easier way to install a flops benchmark.
Still I do wonder why you care, what you are using it for? If you just want a meaningless number, your systems bogomips is still right there in dmesg.
The question is what do you mean by flops? If all you care about is how many of the simplest floating point operations per clock, it is probably 3x your clock speed, but that is about as meaningless as bogomips. Some floating point ops take a long time (divide, for starters), add and multiply are typically quick (one per fp unit per clock). The next issue is memory performance, there is a reason the last classic CRAY had 31 memory banks, ultimately CPU performance is limited by how fast you can read and write to memory, so what level of caching does your problem fit in? Linpack was a real benchmark once, now it fits in cache (L2 if not L1) and is more of a pure theoretical CPU benchmark. And of course, your SSE (etc) units can add floating point performance too.
What distro do you run?
This looked like a good pointer: http://linuxtoolkit.blogspot.com/2009/04/intel-optimized-linpack-benchmark-for.html
http://onemansjourneyintolinux.blogspot.com/2008/12/show-us-yer-flops.html
http://www.phoronix-test-suite.com/ might be an easier way to install a flops benchmark.
Still I do wonder why you care, what you are using it for? If you just want a meaningless number, your systems bogomips is still right there in dmesg.
edited Nov 26 '09 at 1:17
answered Nov 25 '09 at 22:14
Ronald PottolRonald Pottol
1,5581 gold badge9 silver badges19 bronze badges
1,5581 gold badge9 silver badges19 bronze badges
1
Phoronix seems to be exactly what I was looking for - thank you! The only reason I wanted this was because I was filling out a survey that asked how many teraflops of computing power I have. The survey wasn't terribly important, so I wasn't concerned about the accuracy of the answer. Still, it would be kind of neat to be able to say, "Our cluster can do X teraflops." Though as you point out, that number doesn't necessarily have much real-world meaning.
– molecularbear
Nov 26 '09 at 2:06
add a comment |
1
Phoronix seems to be exactly what I was looking for - thank you! The only reason I wanted this was because I was filling out a survey that asked how many teraflops of computing power I have. The survey wasn't terribly important, so I wasn't concerned about the accuracy of the answer. Still, it would be kind of neat to be able to say, "Our cluster can do X teraflops." Though as you point out, that number doesn't necessarily have much real-world meaning.
– molecularbear
Nov 26 '09 at 2:06
1
1
Phoronix seems to be exactly what I was looking for - thank you! The only reason I wanted this was because I was filling out a survey that asked how many teraflops of computing power I have. The survey wasn't terribly important, so I wasn't concerned about the accuracy of the answer. Still, it would be kind of neat to be able to say, "Our cluster can do X teraflops." Though as you point out, that number doesn't necessarily have much real-world meaning.
– molecularbear
Nov 26 '09 at 2:06
Phoronix seems to be exactly what I was looking for - thank you! The only reason I wanted this was because I was filling out a survey that asked how many teraflops of computing power I have. The survey wasn't terribly important, so I wasn't concerned about the accuracy of the answer. Still, it would be kind of neat to be able to say, "Our cluster can do X teraflops." Though as you point out, that number doesn't necessarily have much real-world meaning.
– molecularbear
Nov 26 '09 at 2:06
add a comment |
apparently there's a "sysbench" benchmark package and command:
sudo apt-get install sysbench
(or brew install sysbench
OS X)
run it like this:
sysbench --test=cpu --cpu-max-prime=20000 --num-threads=2 run
output for comparisons:
total time: 15.3047s
ref: http://www.midwesternmac.com/blogs/jeff-geerling/2013-vps-benchmarks-linode
3
How does this give the FLOPS?
– Martin Thoma
Dec 21 '16 at 11:00
Looks like it's more of a generic "cpu benchmark" see also bnikolic.co.uk/blog/hpc-howto-measure-flops.html
– rogerdpack
Aug 20 '18 at 13:47
add a comment |
apparently there's a "sysbench" benchmark package and command:
sudo apt-get install sysbench
(or brew install sysbench
OS X)
run it like this:
sysbench --test=cpu --cpu-max-prime=20000 --num-threads=2 run
output for comparisons:
total time: 15.3047s
ref: http://www.midwesternmac.com/blogs/jeff-geerling/2013-vps-benchmarks-linode
3
How does this give the FLOPS?
– Martin Thoma
Dec 21 '16 at 11:00
Looks like it's more of a generic "cpu benchmark" see also bnikolic.co.uk/blog/hpc-howto-measure-flops.html
– rogerdpack
Aug 20 '18 at 13:47
add a comment |
apparently there's a "sysbench" benchmark package and command:
sudo apt-get install sysbench
(or brew install sysbench
OS X)
run it like this:
sysbench --test=cpu --cpu-max-prime=20000 --num-threads=2 run
output for comparisons:
total time: 15.3047s
ref: http://www.midwesternmac.com/blogs/jeff-geerling/2013-vps-benchmarks-linode
apparently there's a "sysbench" benchmark package and command:
sudo apt-get install sysbench
(or brew install sysbench
OS X)
run it like this:
sysbench --test=cpu --cpu-max-prime=20000 --num-threads=2 run
output for comparisons:
total time: 15.3047s
ref: http://www.midwesternmac.com/blogs/jeff-geerling/2013-vps-benchmarks-linode
edited Jun 13 '16 at 17:31
answered Mar 14 '14 at 16:25
rogerdpackrogerdpack
4964 silver badges16 bronze badges
4964 silver badges16 bronze badges
3
How does this give the FLOPS?
– Martin Thoma
Dec 21 '16 at 11:00
Looks like it's more of a generic "cpu benchmark" see also bnikolic.co.uk/blog/hpc-howto-measure-flops.html
– rogerdpack
Aug 20 '18 at 13:47
add a comment |
3
How does this give the FLOPS?
– Martin Thoma
Dec 21 '16 at 11:00
Looks like it's more of a generic "cpu benchmark" see also bnikolic.co.uk/blog/hpc-howto-measure-flops.html
– rogerdpack
Aug 20 '18 at 13:47
3
3
How does this give the FLOPS?
– Martin Thoma
Dec 21 '16 at 11:00
How does this give the FLOPS?
– Martin Thoma
Dec 21 '16 at 11:00
Looks like it's more of a generic "cpu benchmark" see also bnikolic.co.uk/blog/hpc-howto-measure-flops.html
– rogerdpack
Aug 20 '18 at 13:47
Looks like it's more of a generic "cpu benchmark" see also bnikolic.co.uk/blog/hpc-howto-measure-flops.html
– rogerdpack
Aug 20 '18 at 13:47
add a comment |
For ballpark-estimates:
Raspberry Pi 2: 299.93 * 10^6 FLOPS (source)
Raspberry Pi 3: 462.07 * 10^6 FLOPS (source)
GTX Titan Black GPU: 5.1 * 10^12 FLOPS (source)
Sunway TaihuLight: 93 * 10^15 FLOPS (source, record holder of 2016)
Linpack
- Download it (link)
- Extract it
cd benchmarks_2017/linux/mkl/benchmarks/linpack
./runme_xeon64
- Wait for quite a while (more than 1 hour)
On a Thinkpad T460p (Intel i7-6700HQ CPU), it gives:
This is a SAMPLE run script for SMP LINPACK. Change it to reflect
the correct number of CPUs/threads, problem input files, etc..
./runme_xeon64: 33: [: -gt: unexpected operator
Mi 21. Dez 11:50:29 CET 2016
Intel(R) Optimized LINPACK Benchmark data
Current date/time: Wed Dec 21 11:50:29 2016
CPU frequency: 3.491 GHz
Number of CPUs: 1
Number of cores: 4
Number of threads: 4
Parameters are set to:
Number of tests: 15
Number of equations to solve (problem size) : 1000 2000 5000 10000 15000 18000 20000 22000 25000 26000 27000 30000 35000 40000 45000
Leading dimension of array : 1000 2000 5008 10000 15000 18008 20016 22008 25000 26000 27000 30000 35000 40000 45000
Number of trials to run : 4 2 2 2 2 2 2 2 2 2 1 1 1 1 1
Data alignment value (in Kbytes) : 4 4 4 4 4 4 4 4 4 4 4 1 1 1 1
Maximum memory requested that can be used=9800701024, at the size=35000
=================== Timing linear equation system solver ===================
Size LDA Align. Time(s) GFlops Residual Residual(norm) Check
1000 1000 4 0.014 46.5838 1.165068e-12 3.973181e-02 pass
1000 1000 4 0.010 64.7319 1.165068e-12 3.973181e-02 pass
1000 1000 4 0.009 77.3583 1.165068e-12 3.973181e-02 pass
1000 1000 4 0.010 67.0096 1.165068e-12 3.973181e-02 pass
2000 2000 4 0.064 83.6177 5.001027e-12 4.350281e-02 pass
2000 2000 4 0.063 84.5568 5.001027e-12 4.350281e-02 pass
5000 5008 4 0.709 117.6800 2.474679e-11 3.450740e-02 pass
5000 5008 4 0.699 119.2350 2.474679e-11 3.450740e-02 pass
10000 10000 4 4.895 136.2439 9.069137e-11 3.197870e-02 pass
10000 10000 4 4.904 135.9888 9.069137e-11 3.197870e-02 pass
15000 15000 4 17.260 130.3870 2.052533e-10 3.232773e-02 pass
15000 15000 4 18.159 123.9303 2.052533e-10 3.232773e-02 pass
18000 18008 4 31.091 125.0738 2.611497e-10 2.859910e-02 pass
18000 18008 4 31.869 122.0215 2.611497e-10 2.859910e-02 pass
20000 20016 4 44.877 118.8622 3.442628e-10 3.047480e-02 pass
20000 20016 4 44.646 119.4762 3.442628e-10 3.047480e-02 pass
22000 22008 4 57.918 122.5811 4.714135e-10 3.452918e-02 pass
22000 22008 4 57.171 124.1816 4.714135e-10 3.452918e-02 pass
25000 25000 4 86.259 120.7747 5.797896e-10 3.297056e-02 pass
25000 25000 4 83.721 124.4356 5.797896e-10 3.297056e-02 pass
26000 26000 4 97.420 120.2906 5.615238e-10 2.952660e-02 pass
26000 26000 4 96.061 121.9924 5.615238e-10 2.952660e-02 pass
27000 27000 4 109.479 119.8722 5.956148e-10 2.904520e-02 pass
30000 30000 1 315.697 57.0225 8.015488e-10 3.159714e-02 pass
35000 35000 1 2421.281 11.8061 1.161127e-09 3.370575e-02 pass
Performance Summary (GFlops)
Size LDA Align. Average Maximal
1000 1000 4 63.9209 77.3583
2000 2000 4 84.0872 84.5568
5000 5008 4 118.4575 119.2350
10000 10000 4 136.1164 136.2439
15000 15000 4 127.1586 130.3870
18000 18008 4 123.5477 125.0738
20000 20016 4 119.1692 119.4762
22000 22008 4 123.3813 124.1816
25000 25000 4 122.6052 124.4356
26000 26000 4 121.1415 121.9924
27000 27000 4 119.8722 119.8722
30000 30000 1 57.0225 57.0225
35000 35000 1 11.8061 11.8061
Residual checks PASSED
End of tests
Done: Mi 21. Dez 12:58:23 CET 2016
add a comment |
For ballpark-estimates:
Raspberry Pi 2: 299.93 * 10^6 FLOPS (source)
Raspberry Pi 3: 462.07 * 10^6 FLOPS (source)
GTX Titan Black GPU: 5.1 * 10^12 FLOPS (source)
Sunway TaihuLight: 93 * 10^15 FLOPS (source, record holder of 2016)
Linpack
- Download it (link)
- Extract it
cd benchmarks_2017/linux/mkl/benchmarks/linpack
./runme_xeon64
- Wait for quite a while (more than 1 hour)
On a Thinkpad T460p (Intel i7-6700HQ CPU), it gives:
This is a SAMPLE run script for SMP LINPACK. Change it to reflect
the correct number of CPUs/threads, problem input files, etc..
./runme_xeon64: 33: [: -gt: unexpected operator
Mi 21. Dez 11:50:29 CET 2016
Intel(R) Optimized LINPACK Benchmark data
Current date/time: Wed Dec 21 11:50:29 2016
CPU frequency: 3.491 GHz
Number of CPUs: 1
Number of cores: 4
Number of threads: 4
Parameters are set to:
Number of tests: 15
Number of equations to solve (problem size) : 1000 2000 5000 10000 15000 18000 20000 22000 25000 26000 27000 30000 35000 40000 45000
Leading dimension of array : 1000 2000 5008 10000 15000 18008 20016 22008 25000 26000 27000 30000 35000 40000 45000
Number of trials to run : 4 2 2 2 2 2 2 2 2 2 1 1 1 1 1
Data alignment value (in Kbytes) : 4 4 4 4 4 4 4 4 4 4 4 1 1 1 1
Maximum memory requested that can be used=9800701024, at the size=35000
=================== Timing linear equation system solver ===================
Size LDA Align. Time(s) GFlops Residual Residual(norm) Check
1000 1000 4 0.014 46.5838 1.165068e-12 3.973181e-02 pass
1000 1000 4 0.010 64.7319 1.165068e-12 3.973181e-02 pass
1000 1000 4 0.009 77.3583 1.165068e-12 3.973181e-02 pass
1000 1000 4 0.010 67.0096 1.165068e-12 3.973181e-02 pass
2000 2000 4 0.064 83.6177 5.001027e-12 4.350281e-02 pass
2000 2000 4 0.063 84.5568 5.001027e-12 4.350281e-02 pass
5000 5008 4 0.709 117.6800 2.474679e-11 3.450740e-02 pass
5000 5008 4 0.699 119.2350 2.474679e-11 3.450740e-02 pass
10000 10000 4 4.895 136.2439 9.069137e-11 3.197870e-02 pass
10000 10000 4 4.904 135.9888 9.069137e-11 3.197870e-02 pass
15000 15000 4 17.260 130.3870 2.052533e-10 3.232773e-02 pass
15000 15000 4 18.159 123.9303 2.052533e-10 3.232773e-02 pass
18000 18008 4 31.091 125.0738 2.611497e-10 2.859910e-02 pass
18000 18008 4 31.869 122.0215 2.611497e-10 2.859910e-02 pass
20000 20016 4 44.877 118.8622 3.442628e-10 3.047480e-02 pass
20000 20016 4 44.646 119.4762 3.442628e-10 3.047480e-02 pass
22000 22008 4 57.918 122.5811 4.714135e-10 3.452918e-02 pass
22000 22008 4 57.171 124.1816 4.714135e-10 3.452918e-02 pass
25000 25000 4 86.259 120.7747 5.797896e-10 3.297056e-02 pass
25000 25000 4 83.721 124.4356 5.797896e-10 3.297056e-02 pass
26000 26000 4 97.420 120.2906 5.615238e-10 2.952660e-02 pass
26000 26000 4 96.061 121.9924 5.615238e-10 2.952660e-02 pass
27000 27000 4 109.479 119.8722 5.956148e-10 2.904520e-02 pass
30000 30000 1 315.697 57.0225 8.015488e-10 3.159714e-02 pass
35000 35000 1 2421.281 11.8061 1.161127e-09 3.370575e-02 pass
Performance Summary (GFlops)
Size LDA Align. Average Maximal
1000 1000 4 63.9209 77.3583
2000 2000 4 84.0872 84.5568
5000 5008 4 118.4575 119.2350
10000 10000 4 136.1164 136.2439
15000 15000 4 127.1586 130.3870
18000 18008 4 123.5477 125.0738
20000 20016 4 119.1692 119.4762
22000 22008 4 123.3813 124.1816
25000 25000 4 122.6052 124.4356
26000 26000 4 121.1415 121.9924
27000 27000 4 119.8722 119.8722
30000 30000 1 57.0225 57.0225
35000 35000 1 11.8061 11.8061
Residual checks PASSED
End of tests
Done: Mi 21. Dez 12:58:23 CET 2016
add a comment |
For ballpark-estimates:
Raspberry Pi 2: 299.93 * 10^6 FLOPS (source)
Raspberry Pi 3: 462.07 * 10^6 FLOPS (source)
GTX Titan Black GPU: 5.1 * 10^12 FLOPS (source)
Sunway TaihuLight: 93 * 10^15 FLOPS (source, record holder of 2016)
Linpack
- Download it (link)
- Extract it
cd benchmarks_2017/linux/mkl/benchmarks/linpack
./runme_xeon64
- Wait for quite a while (more than 1 hour)
On a Thinkpad T460p (Intel i7-6700HQ CPU), it gives:
This is a SAMPLE run script for SMP LINPACK. Change it to reflect
the correct number of CPUs/threads, problem input files, etc..
./runme_xeon64: 33: [: -gt: unexpected operator
Mi 21. Dez 11:50:29 CET 2016
Intel(R) Optimized LINPACK Benchmark data
Current date/time: Wed Dec 21 11:50:29 2016
CPU frequency: 3.491 GHz
Number of CPUs: 1
Number of cores: 4
Number of threads: 4
Parameters are set to:
Number of tests: 15
Number of equations to solve (problem size) : 1000 2000 5000 10000 15000 18000 20000 22000 25000 26000 27000 30000 35000 40000 45000
Leading dimension of array : 1000 2000 5008 10000 15000 18008 20016 22008 25000 26000 27000 30000 35000 40000 45000
Number of trials to run : 4 2 2 2 2 2 2 2 2 2 1 1 1 1 1
Data alignment value (in Kbytes) : 4 4 4 4 4 4 4 4 4 4 4 1 1 1 1
Maximum memory requested that can be used=9800701024, at the size=35000
=================== Timing linear equation system solver ===================
Size LDA Align. Time(s) GFlops Residual Residual(norm) Check
1000 1000 4 0.014 46.5838 1.165068e-12 3.973181e-02 pass
1000 1000 4 0.010 64.7319 1.165068e-12 3.973181e-02 pass
1000 1000 4 0.009 77.3583 1.165068e-12 3.973181e-02 pass
1000 1000 4 0.010 67.0096 1.165068e-12 3.973181e-02 pass
2000 2000 4 0.064 83.6177 5.001027e-12 4.350281e-02 pass
2000 2000 4 0.063 84.5568 5.001027e-12 4.350281e-02 pass
5000 5008 4 0.709 117.6800 2.474679e-11 3.450740e-02 pass
5000 5008 4 0.699 119.2350 2.474679e-11 3.450740e-02 pass
10000 10000 4 4.895 136.2439 9.069137e-11 3.197870e-02 pass
10000 10000 4 4.904 135.9888 9.069137e-11 3.197870e-02 pass
15000 15000 4 17.260 130.3870 2.052533e-10 3.232773e-02 pass
15000 15000 4 18.159 123.9303 2.052533e-10 3.232773e-02 pass
18000 18008 4 31.091 125.0738 2.611497e-10 2.859910e-02 pass
18000 18008 4 31.869 122.0215 2.611497e-10 2.859910e-02 pass
20000 20016 4 44.877 118.8622 3.442628e-10 3.047480e-02 pass
20000 20016 4 44.646 119.4762 3.442628e-10 3.047480e-02 pass
22000 22008 4 57.918 122.5811 4.714135e-10 3.452918e-02 pass
22000 22008 4 57.171 124.1816 4.714135e-10 3.452918e-02 pass
25000 25000 4 86.259 120.7747 5.797896e-10 3.297056e-02 pass
25000 25000 4 83.721 124.4356 5.797896e-10 3.297056e-02 pass
26000 26000 4 97.420 120.2906 5.615238e-10 2.952660e-02 pass
26000 26000 4 96.061 121.9924 5.615238e-10 2.952660e-02 pass
27000 27000 4 109.479 119.8722 5.956148e-10 2.904520e-02 pass
30000 30000 1 315.697 57.0225 8.015488e-10 3.159714e-02 pass
35000 35000 1 2421.281 11.8061 1.161127e-09 3.370575e-02 pass
Performance Summary (GFlops)
Size LDA Align. Average Maximal
1000 1000 4 63.9209 77.3583
2000 2000 4 84.0872 84.5568
5000 5008 4 118.4575 119.2350
10000 10000 4 136.1164 136.2439
15000 15000 4 127.1586 130.3870
18000 18008 4 123.5477 125.0738
20000 20016 4 119.1692 119.4762
22000 22008 4 123.3813 124.1816
25000 25000 4 122.6052 124.4356
26000 26000 4 121.1415 121.9924
27000 27000 4 119.8722 119.8722
30000 30000 1 57.0225 57.0225
35000 35000 1 11.8061 11.8061
Residual checks PASSED
End of tests
Done: Mi 21. Dez 12:58:23 CET 2016
For ballpark-estimates:
Raspberry Pi 2: 299.93 * 10^6 FLOPS (source)
Raspberry Pi 3: 462.07 * 10^6 FLOPS (source)
GTX Titan Black GPU: 5.1 * 10^12 FLOPS (source)
Sunway TaihuLight: 93 * 10^15 FLOPS (source, record holder of 2016)
Linpack
- Download it (link)
- Extract it
cd benchmarks_2017/linux/mkl/benchmarks/linpack
./runme_xeon64
- Wait for quite a while (more than 1 hour)
On a Thinkpad T460p (Intel i7-6700HQ CPU), it gives:
This is a SAMPLE run script for SMP LINPACK. Change it to reflect
the correct number of CPUs/threads, problem input files, etc..
./runme_xeon64: 33: [: -gt: unexpected operator
Mi 21. Dez 11:50:29 CET 2016
Intel(R) Optimized LINPACK Benchmark data
Current date/time: Wed Dec 21 11:50:29 2016
CPU frequency: 3.491 GHz
Number of CPUs: 1
Number of cores: 4
Number of threads: 4
Parameters are set to:
Number of tests: 15
Number of equations to solve (problem size) : 1000 2000 5000 10000 15000 18000 20000 22000 25000 26000 27000 30000 35000 40000 45000
Leading dimension of array : 1000 2000 5008 10000 15000 18008 20016 22008 25000 26000 27000 30000 35000 40000 45000
Number of trials to run : 4 2 2 2 2 2 2 2 2 2 1 1 1 1 1
Data alignment value (in Kbytes) : 4 4 4 4 4 4 4 4 4 4 4 1 1 1 1
Maximum memory requested that can be used=9800701024, at the size=35000
=================== Timing linear equation system solver ===================
Size LDA Align. Time(s) GFlops Residual Residual(norm) Check
1000 1000 4 0.014 46.5838 1.165068e-12 3.973181e-02 pass
1000 1000 4 0.010 64.7319 1.165068e-12 3.973181e-02 pass
1000 1000 4 0.009 77.3583 1.165068e-12 3.973181e-02 pass
1000 1000 4 0.010 67.0096 1.165068e-12 3.973181e-02 pass
2000 2000 4 0.064 83.6177 5.001027e-12 4.350281e-02 pass
2000 2000 4 0.063 84.5568 5.001027e-12 4.350281e-02 pass
5000 5008 4 0.709 117.6800 2.474679e-11 3.450740e-02 pass
5000 5008 4 0.699 119.2350 2.474679e-11 3.450740e-02 pass
10000 10000 4 4.895 136.2439 9.069137e-11 3.197870e-02 pass
10000 10000 4 4.904 135.9888 9.069137e-11 3.197870e-02 pass
15000 15000 4 17.260 130.3870 2.052533e-10 3.232773e-02 pass
15000 15000 4 18.159 123.9303 2.052533e-10 3.232773e-02 pass
18000 18008 4 31.091 125.0738 2.611497e-10 2.859910e-02 pass
18000 18008 4 31.869 122.0215 2.611497e-10 2.859910e-02 pass
20000 20016 4 44.877 118.8622 3.442628e-10 3.047480e-02 pass
20000 20016 4 44.646 119.4762 3.442628e-10 3.047480e-02 pass
22000 22008 4 57.918 122.5811 4.714135e-10 3.452918e-02 pass
22000 22008 4 57.171 124.1816 4.714135e-10 3.452918e-02 pass
25000 25000 4 86.259 120.7747 5.797896e-10 3.297056e-02 pass
25000 25000 4 83.721 124.4356 5.797896e-10 3.297056e-02 pass
26000 26000 4 97.420 120.2906 5.615238e-10 2.952660e-02 pass
26000 26000 4 96.061 121.9924 5.615238e-10 2.952660e-02 pass
27000 27000 4 109.479 119.8722 5.956148e-10 2.904520e-02 pass
30000 30000 1 315.697 57.0225 8.015488e-10 3.159714e-02 pass
35000 35000 1 2421.281 11.8061 1.161127e-09 3.370575e-02 pass
Performance Summary (GFlops)
Size LDA Align. Average Maximal
1000 1000 4 63.9209 77.3583
2000 2000 4 84.0872 84.5568
5000 5008 4 118.4575 119.2350
10000 10000 4 136.1164 136.2439
15000 15000 4 127.1586 130.3870
18000 18008 4 123.5477 125.0738
20000 20016 4 119.1692 119.4762
22000 22008 4 123.3813 124.1816
25000 25000 4 122.6052 124.4356
26000 26000 4 121.1415 121.9924
27000 27000 4 119.8722 119.8722
30000 30000 1 57.0225 57.0225
35000 35000 1 11.8061 11.8061
Residual checks PASSED
End of tests
Done: Mi 21. Dez 12:58:23 CET 2016
edited Dec 21 '16 at 12:04
answered Dec 21 '16 at 11:11
Martin ThomaMartin Thoma
1521 silver badge12 bronze badges
1521 silver badge12 bronze badges
add a comment |
add a comment |
One benchmark that has been traditionally used to measure FLOPS is Linpack. Another common FLOPS benchmark is Whetstone.
More reading:
The Wikipedia "FLOPS" entry,
Whetstone entry,
Linpack entry
2
I appreciate your answer, however my goal is to obtain a quick n' dirty estimate of flops. Whetstone and Linpack have the same problem as HPL - I start reading about it, then get lost in site after site that all look 20 years old. When I do manage to find source code, I can't seem to compile it without installing a bunch of dependent libraries - even then I run into errors. I could get all this stuff working, but it's not important enough to spend the time. Hopefully there exists some relatively modern software that Just Works for ballparking flops.
– molecularbear
Nov 25 '09 at 22:32
1
Estimate? Then it's about 4*Hz: for 1GHz CPU it's about 4GFLOPS :))
– kolypto
Nov 26 '09 at 1:43
add a comment |
One benchmark that has been traditionally used to measure FLOPS is Linpack. Another common FLOPS benchmark is Whetstone.
More reading:
The Wikipedia "FLOPS" entry,
Whetstone entry,
Linpack entry
2
I appreciate your answer, however my goal is to obtain a quick n' dirty estimate of flops. Whetstone and Linpack have the same problem as HPL - I start reading about it, then get lost in site after site that all look 20 years old. When I do manage to find source code, I can't seem to compile it without installing a bunch of dependent libraries - even then I run into errors. I could get all this stuff working, but it's not important enough to spend the time. Hopefully there exists some relatively modern software that Just Works for ballparking flops.
– molecularbear
Nov 25 '09 at 22:32
1
Estimate? Then it's about 4*Hz: for 1GHz CPU it's about 4GFLOPS :))
– kolypto
Nov 26 '09 at 1:43
add a comment |
One benchmark that has been traditionally used to measure FLOPS is Linpack. Another common FLOPS benchmark is Whetstone.
More reading:
The Wikipedia "FLOPS" entry,
Whetstone entry,
Linpack entry
One benchmark that has been traditionally used to measure FLOPS is Linpack. Another common FLOPS benchmark is Whetstone.
More reading:
The Wikipedia "FLOPS" entry,
Whetstone entry,
Linpack entry
answered Nov 25 '09 at 22:00
kolyptokolypto
6,6197 gold badges42 silver badges58 bronze badges
6,6197 gold badges42 silver badges58 bronze badges
2
I appreciate your answer, however my goal is to obtain a quick n' dirty estimate of flops. Whetstone and Linpack have the same problem as HPL - I start reading about it, then get lost in site after site that all look 20 years old. When I do manage to find source code, I can't seem to compile it without installing a bunch of dependent libraries - even then I run into errors. I could get all this stuff working, but it's not important enough to spend the time. Hopefully there exists some relatively modern software that Just Works for ballparking flops.
– molecularbear
Nov 25 '09 at 22:32
1
Estimate? Then it's about 4*Hz: for 1GHz CPU it's about 4GFLOPS :))
– kolypto
Nov 26 '09 at 1:43
add a comment |
2
I appreciate your answer, however my goal is to obtain a quick n' dirty estimate of flops. Whetstone and Linpack have the same problem as HPL - I start reading about it, then get lost in site after site that all look 20 years old. When I do manage to find source code, I can't seem to compile it without installing a bunch of dependent libraries - even then I run into errors. I could get all this stuff working, but it's not important enough to spend the time. Hopefully there exists some relatively modern software that Just Works for ballparking flops.
– molecularbear
Nov 25 '09 at 22:32
1
Estimate? Then it's about 4*Hz: for 1GHz CPU it's about 4GFLOPS :))
– kolypto
Nov 26 '09 at 1:43
2
2
I appreciate your answer, however my goal is to obtain a quick n' dirty estimate of flops. Whetstone and Linpack have the same problem as HPL - I start reading about it, then get lost in site after site that all look 20 years old. When I do manage to find source code, I can't seem to compile it without installing a bunch of dependent libraries - even then I run into errors. I could get all this stuff working, but it's not important enough to spend the time. Hopefully there exists some relatively modern software that Just Works for ballparking flops.
– molecularbear
Nov 25 '09 at 22:32
I appreciate your answer, however my goal is to obtain a quick n' dirty estimate of flops. Whetstone and Linpack have the same problem as HPL - I start reading about it, then get lost in site after site that all look 20 years old. When I do manage to find source code, I can't seem to compile it without installing a bunch of dependent libraries - even then I run into errors. I could get all this stuff working, but it's not important enough to spend the time. Hopefully there exists some relatively modern software that Just Works for ballparking flops.
– molecularbear
Nov 25 '09 at 22:32
1
1
Estimate? Then it's about 4*Hz: for 1GHz CPU it's about 4GFLOPS :))
– kolypto
Nov 26 '09 at 1:43
Estimate? Then it's about 4*Hz: for 1GHz CPU it's about 4GFLOPS :))
– kolypto
Nov 26 '09 at 1:43
add a comment |
I highly recommend the ready-to-run linpack build from Intel:
http://software.intel.com/en-us/articles/intel-math-kernel-library-linpack-download/
add a comment |
I highly recommend the ready-to-run linpack build from Intel:
http://software.intel.com/en-us/articles/intel-math-kernel-library-linpack-download/
add a comment |
I highly recommend the ready-to-run linpack build from Intel:
http://software.intel.com/en-us/articles/intel-math-kernel-library-linpack-download/
I highly recommend the ready-to-run linpack build from Intel:
http://software.intel.com/en-us/articles/intel-math-kernel-library-linpack-download/
answered Oct 26 '10 at 15:34
bugaboobugaboo
1092 bronze badges
1092 bronze badges
add a comment |
add a comment |
As you mention cluster, we have used the the HPCC suite. It takes a bit of effort to setup and tune, but in our case the point wasn't bragging per se, it was part of the acceptance criteria for the cluster; some performance benchmarking is IMHO vital to ensure that the hardware works as advertised, everything is cabled together correctly etc.
Now if you just want a theoretical peak FLOPS number, that one is easy. Just check out some article about the CPU (say, on realworldtech.com or somesuch) to get info on how many DP FLOPS a CPU core can do per clock cycle (with current x86 CPU's that's typically 4). Then the total peak FLOPS is just
number of cores * FLOPS/cycle * frequency
Then for a cluster with IB network you should be able to hit around 80% of the peak FLOPS on HPL (which BTW is one of the benchmarks in HPCC).
add a comment |
As you mention cluster, we have used the the HPCC suite. It takes a bit of effort to setup and tune, but in our case the point wasn't bragging per se, it was part of the acceptance criteria for the cluster; some performance benchmarking is IMHO vital to ensure that the hardware works as advertised, everything is cabled together correctly etc.
Now if you just want a theoretical peak FLOPS number, that one is easy. Just check out some article about the CPU (say, on realworldtech.com or somesuch) to get info on how many DP FLOPS a CPU core can do per clock cycle (with current x86 CPU's that's typically 4). Then the total peak FLOPS is just
number of cores * FLOPS/cycle * frequency
Then for a cluster with IB network you should be able to hit around 80% of the peak FLOPS on HPL (which BTW is one of the benchmarks in HPCC).
add a comment |
As you mention cluster, we have used the the HPCC suite. It takes a bit of effort to setup and tune, but in our case the point wasn't bragging per se, it was part of the acceptance criteria for the cluster; some performance benchmarking is IMHO vital to ensure that the hardware works as advertised, everything is cabled together correctly etc.
Now if you just want a theoretical peak FLOPS number, that one is easy. Just check out some article about the CPU (say, on realworldtech.com or somesuch) to get info on how many DP FLOPS a CPU core can do per clock cycle (with current x86 CPU's that's typically 4). Then the total peak FLOPS is just
number of cores * FLOPS/cycle * frequency
Then for a cluster with IB network you should be able to hit around 80% of the peak FLOPS on HPL (which BTW is one of the benchmarks in HPCC).
As you mention cluster, we have used the the HPCC suite. It takes a bit of effort to setup and tune, but in our case the point wasn't bragging per se, it was part of the acceptance criteria for the cluster; some performance benchmarking is IMHO vital to ensure that the hardware works as advertised, everything is cabled together correctly etc.
Now if you just want a theoretical peak FLOPS number, that one is easy. Just check out some article about the CPU (say, on realworldtech.com or somesuch) to get info on how many DP FLOPS a CPU core can do per clock cycle (with current x86 CPU's that's typically 4). Then the total peak FLOPS is just
number of cores * FLOPS/cycle * frequency
Then for a cluster with IB network you should be able to hit around 80% of the peak FLOPS on HPL (which BTW is one of the benchmarks in HPCC).
answered Oct 26 '10 at 16:45
jannebjanneb
3,43613 silver badges18 bronze badges
3,43613 silver badges18 bronze badges
add a comment |
add a comment |
Thanks for contributing an answer to Server Fault!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fserverfault.com%2fquestions%2f88357%2festimate-flops-in-linux%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown