Estimate FLOPS in Linux?What is the best Linux filesystem for MySQL (InnoDB)?Benchmark linux boxHow to run a server on port 80 as a normal user on Linux?What is the best way to estimate hosting/server costs?Simple Linux server benchmark?How to estimate hard drive usage on Linux?Anyone else experiencing high rates of Linux server crashes during a leap second day?Rough estimate of server performanceBenchmark linux server?How to disable perf subsystem in Linux kernel?

A player is constantly pestering me about rules, what do I do as a DM?

Transitive action of a discrete group on a compact space

Why is the Turkish president's surname spelt in Russian as Эрдоган, with г?

How can I convince my reader that I will not use a certain trope?

How to start learning the piano again

What shortcut does ⌦ symbol in Camunda macOS app indicate and how to invoke it?

MH370 blackbox - is it still possible to retrieve data from it?

Cross over of arrows in a complex diagram

Why does this function call behave sensibly after calling it through a typecasted function pointer?

How fast can a ship with rotating habitats be accelerated?

Difference between 'demás' and 'otros'?

Can a police officer film me on their personal device in my own home?

Do 3D printers really reach 50 micron (0.050mm) accuracy?

Could Sauron have read Tom Bombadil's mind if Tom had held the Palantir?

Three column layout

Alphabet completion rate

Should I report a leak of confidential HR information?

Signing using digital signatures?

How to convert object fill in to fine lines?

Children's short story about material that accelerates away from gravity

can’t run a function against EXEC

Why is a blank required between "[[" and "-e xxx" in ksh?

Should I hide continue button until tasks are completed?

Anagram Within an Anagram!



Estimate FLOPS in Linux?


What is the best Linux filesystem for MySQL (InnoDB)?Benchmark linux boxHow to run a server on port 80 as a normal user on Linux?What is the best way to estimate hosting/server costs?Simple Linux server benchmark?How to estimate hard drive usage on Linux?Anyone else experiencing high rates of Linux server crashes during a leap second day?Rough estimate of server performanceBenchmark linux server?How to disable perf subsystem in Linux kernel?






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;








13















I am looking for a quick and easy program to estimate FLOPS on my Linux system. I found HPL, but getting it compiled is proving to be irritating. All I need is a ballpark estimate of the FLOPS, without needing to spend a day researching benchmark packages and installing dependent software. Does any such program exist? Would it be sufficient to write a C program that multiples two floats in a loop?










share|improve this question




























    13















    I am looking for a quick and easy program to estimate FLOPS on my Linux system. I found HPL, but getting it compiled is proving to be irritating. All I need is a ballpark estimate of the FLOPS, without needing to spend a day researching benchmark packages and installing dependent software. Does any such program exist? Would it be sufficient to write a C program that multiples two floats in a loop?










    share|improve this question
























      13












      13








      13


      3






      I am looking for a quick and easy program to estimate FLOPS on my Linux system. I found HPL, but getting it compiled is proving to be irritating. All I need is a ballpark estimate of the FLOPS, without needing to spend a day researching benchmark packages and installing dependent software. Does any such program exist? Would it be sufficient to write a C program that multiples two floats in a loop?










      share|improve this question














      I am looking for a quick and easy program to estimate FLOPS on my Linux system. I found HPL, but getting it compiled is proving to be irritating. All I need is a ballpark estimate of the FLOPS, without needing to spend a day researching benchmark packages and installing dependent software. Does any such program exist? Would it be sufficient to write a C program that multiples two floats in a loop?







      linux benchmark






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Nov 25 '09 at 21:37









      molecularbearmolecularbear

      2331 gold badge3 silver badges9 bronze badges




      2331 gold badge3 silver badges9 bronze badges




















          6 Answers
          6






          active

          oldest

          votes


















          5














          The question is what do you mean by flops? If all you care about is how many of the simplest floating point operations per clock, it is probably 3x your clock speed, but that is about as meaningless as bogomips. Some floating point ops take a long time (divide, for starters), add and multiply are typically quick (one per fp unit per clock). The next issue is memory performance, there is a reason the last classic CRAY had 31 memory banks, ultimately CPU performance is limited by how fast you can read and write to memory, so what level of caching does your problem fit in? Linpack was a real benchmark once, now it fits in cache (L2 if not L1) and is more of a pure theoretical CPU benchmark. And of course, your SSE (etc) units can add floating point performance too.



          What distro do you run?



          This looked like a good pointer: http://linuxtoolkit.blogspot.com/2009/04/intel-optimized-linpack-benchmark-for.html



          http://onemansjourneyintolinux.blogspot.com/2008/12/show-us-yer-flops.html



          http://www.phoronix-test-suite.com/ might be an easier way to install a flops benchmark.



          Still I do wonder why you care, what you are using it for? If you just want a meaningless number, your systems bogomips is still right there in dmesg.






          share|improve this answer




















          • 1





            Phoronix seems to be exactly what I was looking for - thank you! The only reason I wanted this was because I was filling out a survey that asked how many teraflops of computing power I have. The survey wasn't terribly important, so I wasn't concerned about the accuracy of the answer. Still, it would be kind of neat to be able to say, "Our cluster can do X teraflops." Though as you point out, that number doesn't necessarily have much real-world meaning.

            – molecularbear
            Nov 26 '09 at 2:06


















          6














          apparently there's a "sysbench" benchmark package and command:



          sudo apt-get install sysbench (or brew install sysbench OS X)



          run it like this:



          sysbench --test=cpu --cpu-max-prime=20000 --num-threads=2 run



          output for comparisons:



           total time: 15.3047s


          ref: http://www.midwesternmac.com/blogs/jeff-geerling/2013-vps-benchmarks-linode






          share|improve this answer




















          • 3





            How does this give the FLOPS?

            – Martin Thoma
            Dec 21 '16 at 11:00











          • Looks like it's more of a generic "cpu benchmark" see also bnikolic.co.uk/blog/hpc-howto-measure-flops.html

            – rogerdpack
            Aug 20 '18 at 13:47


















          3














          For ballpark-estimates:



          • Raspberry Pi 2: 299.93 * 10^6 FLOPS (source)


          • Raspberry Pi 3: 462.07 * 10^6 FLOPS (source)



          • GTX Titan Black GPU: 5.1 * 10^12 FLOPS (source)


          • Sunway TaihuLight: 93 * 10^15 FLOPS (source, record holder of 2016)

          Linpack



          1. Download it (link)

          2. Extract it

          3. cd benchmarks_2017/linux/mkl/benchmarks/linpack

          4. ./runme_xeon64

          5. Wait for quite a while (more than 1 hour)

          On a Thinkpad T460p (Intel i7-6700HQ CPU), it gives:



          This is a SAMPLE run script for SMP LINPACK. Change it to reflect
          the correct number of CPUs/threads, problem input files, etc..
          ./runme_xeon64: 33: [: -gt: unexpected operator
          Mi 21. Dez 11:50:29 CET 2016
          Intel(R) Optimized LINPACK Benchmark data

          Current date/time: Wed Dec 21 11:50:29 2016

          CPU frequency: 3.491 GHz
          Number of CPUs: 1
          Number of cores: 4
          Number of threads: 4

          Parameters are set to:

          Number of tests: 15
          Number of equations to solve (problem size) : 1000 2000 5000 10000 15000 18000 20000 22000 25000 26000 27000 30000 35000 40000 45000
          Leading dimension of array : 1000 2000 5008 10000 15000 18008 20016 22008 25000 26000 27000 30000 35000 40000 45000
          Number of trials to run : 4 2 2 2 2 2 2 2 2 2 1 1 1 1 1
          Data alignment value (in Kbytes) : 4 4 4 4 4 4 4 4 4 4 4 1 1 1 1

          Maximum memory requested that can be used=9800701024, at the size=35000

          =================== Timing linear equation system solver ===================

          Size LDA Align. Time(s) GFlops Residual Residual(norm) Check
          1000 1000 4 0.014 46.5838 1.165068e-12 3.973181e-02 pass
          1000 1000 4 0.010 64.7319 1.165068e-12 3.973181e-02 pass
          1000 1000 4 0.009 77.3583 1.165068e-12 3.973181e-02 pass
          1000 1000 4 0.010 67.0096 1.165068e-12 3.973181e-02 pass
          2000 2000 4 0.064 83.6177 5.001027e-12 4.350281e-02 pass
          2000 2000 4 0.063 84.5568 5.001027e-12 4.350281e-02 pass
          5000 5008 4 0.709 117.6800 2.474679e-11 3.450740e-02 pass
          5000 5008 4 0.699 119.2350 2.474679e-11 3.450740e-02 pass
          10000 10000 4 4.895 136.2439 9.069137e-11 3.197870e-02 pass
          10000 10000 4 4.904 135.9888 9.069137e-11 3.197870e-02 pass
          15000 15000 4 17.260 130.3870 2.052533e-10 3.232773e-02 pass
          15000 15000 4 18.159 123.9303 2.052533e-10 3.232773e-02 pass
          18000 18008 4 31.091 125.0738 2.611497e-10 2.859910e-02 pass
          18000 18008 4 31.869 122.0215 2.611497e-10 2.859910e-02 pass
          20000 20016 4 44.877 118.8622 3.442628e-10 3.047480e-02 pass
          20000 20016 4 44.646 119.4762 3.442628e-10 3.047480e-02 pass
          22000 22008 4 57.918 122.5811 4.714135e-10 3.452918e-02 pass
          22000 22008 4 57.171 124.1816 4.714135e-10 3.452918e-02 pass
          25000 25000 4 86.259 120.7747 5.797896e-10 3.297056e-02 pass
          25000 25000 4 83.721 124.4356 5.797896e-10 3.297056e-02 pass
          26000 26000 4 97.420 120.2906 5.615238e-10 2.952660e-02 pass
          26000 26000 4 96.061 121.9924 5.615238e-10 2.952660e-02 pass
          27000 27000 4 109.479 119.8722 5.956148e-10 2.904520e-02 pass
          30000 30000 1 315.697 57.0225 8.015488e-10 3.159714e-02 pass
          35000 35000 1 2421.281 11.8061 1.161127e-09 3.370575e-02 pass

          Performance Summary (GFlops)

          Size LDA Align. Average Maximal
          1000 1000 4 63.9209 77.3583
          2000 2000 4 84.0872 84.5568
          5000 5008 4 118.4575 119.2350
          10000 10000 4 136.1164 136.2439
          15000 15000 4 127.1586 130.3870
          18000 18008 4 123.5477 125.0738
          20000 20016 4 119.1692 119.4762
          22000 22008 4 123.3813 124.1816
          25000 25000 4 122.6052 124.4356
          26000 26000 4 121.1415 121.9924
          27000 27000 4 119.8722 119.8722
          30000 30000 1 57.0225 57.0225
          35000 35000 1 11.8061 11.8061

          Residual checks PASSED

          End of tests

          Done: Mi 21. Dez 12:58:23 CET 2016





          share|improve this answer
































            1














            One benchmark that has been traditionally used to measure FLOPS is Linpack. Another common FLOPS benchmark is Whetstone.



            More reading:
            The Wikipedia "FLOPS" entry,
            Whetstone entry,
            Linpack entry






            share|improve this answer


















            • 2





              I appreciate your answer, however my goal is to obtain a quick n' dirty estimate of flops. Whetstone and Linpack have the same problem as HPL - I start reading about it, then get lost in site after site that all look 20 years old. When I do manage to find source code, I can't seem to compile it without installing a bunch of dependent libraries - even then I run into errors. I could get all this stuff working, but it's not important enough to spend the time. Hopefully there exists some relatively modern software that Just Works for ballparking flops.

              – molecularbear
              Nov 25 '09 at 22:32






            • 1





              Estimate? Then it's about 4*Hz: for 1GHz CPU it's about 4GFLOPS :))

              – kolypto
              Nov 26 '09 at 1:43


















            1














            I highly recommend the ready-to-run linpack build from Intel:
            http://software.intel.com/en-us/articles/intel-math-kernel-library-linpack-download/






            share|improve this answer






























              1














              As you mention cluster, we have used the the HPCC suite. It takes a bit of effort to setup and tune, but in our case the point wasn't bragging per se, it was part of the acceptance criteria for the cluster; some performance benchmarking is IMHO vital to ensure that the hardware works as advertised, everything is cabled together correctly etc.



              Now if you just want a theoretical peak FLOPS number, that one is easy. Just check out some article about the CPU (say, on realworldtech.com or somesuch) to get info on how many DP FLOPS a CPU core can do per clock cycle (with current x86 CPU's that's typically 4). Then the total peak FLOPS is just



              number of cores * FLOPS/cycle * frequency



              Then for a cluster with IB network you should be able to hit around 80% of the peak FLOPS on HPL (which BTW is one of the benchmarks in HPCC).






              share|improve this answer

























                Your Answer








                StackExchange.ready(function()
                var channelOptions =
                tags: "".split(" "),
                id: "2"
                ;
                initTagRenderer("".split(" "), "".split(" "), channelOptions);

                StackExchange.using("externalEditor", function()
                // Have to fire editor after snippets, if snippets enabled
                if (StackExchange.settings.snippets.snippetsEnabled)
                StackExchange.using("snippets", function()
                createEditor();
                );

                else
                createEditor();

                );

                function createEditor()
                StackExchange.prepareEditor(
                heartbeatType: 'answer',
                autoActivateHeartbeat: false,
                convertImagesToLinks: true,
                noModals: true,
                showLowRepImageUploadWarning: true,
                reputationToPostImages: 10,
                bindNavPrevention: true,
                postfix: "",
                imageUploader:
                brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
                contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
                allowUrls: true
                ,
                onDemand: true,
                discardSelector: ".discard-answer"
                ,immediatelyShowMarkdownHelp:true
                );



                );













                draft saved

                draft discarded


















                StackExchange.ready(
                function ()
                StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fserverfault.com%2fquestions%2f88357%2festimate-flops-in-linux%23new-answer', 'question_page');

                );

                Post as a guest















                Required, but never shown

























                6 Answers
                6






                active

                oldest

                votes








                6 Answers
                6






                active

                oldest

                votes









                active

                oldest

                votes






                active

                oldest

                votes









                5














                The question is what do you mean by flops? If all you care about is how many of the simplest floating point operations per clock, it is probably 3x your clock speed, but that is about as meaningless as bogomips. Some floating point ops take a long time (divide, for starters), add and multiply are typically quick (one per fp unit per clock). The next issue is memory performance, there is a reason the last classic CRAY had 31 memory banks, ultimately CPU performance is limited by how fast you can read and write to memory, so what level of caching does your problem fit in? Linpack was a real benchmark once, now it fits in cache (L2 if not L1) and is more of a pure theoretical CPU benchmark. And of course, your SSE (etc) units can add floating point performance too.



                What distro do you run?



                This looked like a good pointer: http://linuxtoolkit.blogspot.com/2009/04/intel-optimized-linpack-benchmark-for.html



                http://onemansjourneyintolinux.blogspot.com/2008/12/show-us-yer-flops.html



                http://www.phoronix-test-suite.com/ might be an easier way to install a flops benchmark.



                Still I do wonder why you care, what you are using it for? If you just want a meaningless number, your systems bogomips is still right there in dmesg.






                share|improve this answer




















                • 1





                  Phoronix seems to be exactly what I was looking for - thank you! The only reason I wanted this was because I was filling out a survey that asked how many teraflops of computing power I have. The survey wasn't terribly important, so I wasn't concerned about the accuracy of the answer. Still, it would be kind of neat to be able to say, "Our cluster can do X teraflops." Though as you point out, that number doesn't necessarily have much real-world meaning.

                  – molecularbear
                  Nov 26 '09 at 2:06















                5














                The question is what do you mean by flops? If all you care about is how many of the simplest floating point operations per clock, it is probably 3x your clock speed, but that is about as meaningless as bogomips. Some floating point ops take a long time (divide, for starters), add and multiply are typically quick (one per fp unit per clock). The next issue is memory performance, there is a reason the last classic CRAY had 31 memory banks, ultimately CPU performance is limited by how fast you can read and write to memory, so what level of caching does your problem fit in? Linpack was a real benchmark once, now it fits in cache (L2 if not L1) and is more of a pure theoretical CPU benchmark. And of course, your SSE (etc) units can add floating point performance too.



                What distro do you run?



                This looked like a good pointer: http://linuxtoolkit.blogspot.com/2009/04/intel-optimized-linpack-benchmark-for.html



                http://onemansjourneyintolinux.blogspot.com/2008/12/show-us-yer-flops.html



                http://www.phoronix-test-suite.com/ might be an easier way to install a flops benchmark.



                Still I do wonder why you care, what you are using it for? If you just want a meaningless number, your systems bogomips is still right there in dmesg.






                share|improve this answer




















                • 1





                  Phoronix seems to be exactly what I was looking for - thank you! The only reason I wanted this was because I was filling out a survey that asked how many teraflops of computing power I have. The survey wasn't terribly important, so I wasn't concerned about the accuracy of the answer. Still, it would be kind of neat to be able to say, "Our cluster can do X teraflops." Though as you point out, that number doesn't necessarily have much real-world meaning.

                  – molecularbear
                  Nov 26 '09 at 2:06













                5












                5








                5







                The question is what do you mean by flops? If all you care about is how many of the simplest floating point operations per clock, it is probably 3x your clock speed, but that is about as meaningless as bogomips. Some floating point ops take a long time (divide, for starters), add and multiply are typically quick (one per fp unit per clock). The next issue is memory performance, there is a reason the last classic CRAY had 31 memory banks, ultimately CPU performance is limited by how fast you can read and write to memory, so what level of caching does your problem fit in? Linpack was a real benchmark once, now it fits in cache (L2 if not L1) and is more of a pure theoretical CPU benchmark. And of course, your SSE (etc) units can add floating point performance too.



                What distro do you run?



                This looked like a good pointer: http://linuxtoolkit.blogspot.com/2009/04/intel-optimized-linpack-benchmark-for.html



                http://onemansjourneyintolinux.blogspot.com/2008/12/show-us-yer-flops.html



                http://www.phoronix-test-suite.com/ might be an easier way to install a flops benchmark.



                Still I do wonder why you care, what you are using it for? If you just want a meaningless number, your systems bogomips is still right there in dmesg.






                share|improve this answer















                The question is what do you mean by flops? If all you care about is how many of the simplest floating point operations per clock, it is probably 3x your clock speed, but that is about as meaningless as bogomips. Some floating point ops take a long time (divide, for starters), add and multiply are typically quick (one per fp unit per clock). The next issue is memory performance, there is a reason the last classic CRAY had 31 memory banks, ultimately CPU performance is limited by how fast you can read and write to memory, so what level of caching does your problem fit in? Linpack was a real benchmark once, now it fits in cache (L2 if not L1) and is more of a pure theoretical CPU benchmark. And of course, your SSE (etc) units can add floating point performance too.



                What distro do you run?



                This looked like a good pointer: http://linuxtoolkit.blogspot.com/2009/04/intel-optimized-linpack-benchmark-for.html



                http://onemansjourneyintolinux.blogspot.com/2008/12/show-us-yer-flops.html



                http://www.phoronix-test-suite.com/ might be an easier way to install a flops benchmark.



                Still I do wonder why you care, what you are using it for? If you just want a meaningless number, your systems bogomips is still right there in dmesg.







                share|improve this answer














                share|improve this answer



                share|improve this answer








                edited Nov 26 '09 at 1:17

























                answered Nov 25 '09 at 22:14









                Ronald PottolRonald Pottol

                1,5581 gold badge9 silver badges19 bronze badges




                1,5581 gold badge9 silver badges19 bronze badges







                • 1





                  Phoronix seems to be exactly what I was looking for - thank you! The only reason I wanted this was because I was filling out a survey that asked how many teraflops of computing power I have. The survey wasn't terribly important, so I wasn't concerned about the accuracy of the answer. Still, it would be kind of neat to be able to say, "Our cluster can do X teraflops." Though as you point out, that number doesn't necessarily have much real-world meaning.

                  – molecularbear
                  Nov 26 '09 at 2:06












                • 1





                  Phoronix seems to be exactly what I was looking for - thank you! The only reason I wanted this was because I was filling out a survey that asked how many teraflops of computing power I have. The survey wasn't terribly important, so I wasn't concerned about the accuracy of the answer. Still, it would be kind of neat to be able to say, "Our cluster can do X teraflops." Though as you point out, that number doesn't necessarily have much real-world meaning.

                  – molecularbear
                  Nov 26 '09 at 2:06







                1




                1





                Phoronix seems to be exactly what I was looking for - thank you! The only reason I wanted this was because I was filling out a survey that asked how many teraflops of computing power I have. The survey wasn't terribly important, so I wasn't concerned about the accuracy of the answer. Still, it would be kind of neat to be able to say, "Our cluster can do X teraflops." Though as you point out, that number doesn't necessarily have much real-world meaning.

                – molecularbear
                Nov 26 '09 at 2:06





                Phoronix seems to be exactly what I was looking for - thank you! The only reason I wanted this was because I was filling out a survey that asked how many teraflops of computing power I have. The survey wasn't terribly important, so I wasn't concerned about the accuracy of the answer. Still, it would be kind of neat to be able to say, "Our cluster can do X teraflops." Though as you point out, that number doesn't necessarily have much real-world meaning.

                – molecularbear
                Nov 26 '09 at 2:06













                6














                apparently there's a "sysbench" benchmark package and command:



                sudo apt-get install sysbench (or brew install sysbench OS X)



                run it like this:



                sysbench --test=cpu --cpu-max-prime=20000 --num-threads=2 run



                output for comparisons:



                 total time: 15.3047s


                ref: http://www.midwesternmac.com/blogs/jeff-geerling/2013-vps-benchmarks-linode






                share|improve this answer




















                • 3





                  How does this give the FLOPS?

                  – Martin Thoma
                  Dec 21 '16 at 11:00











                • Looks like it's more of a generic "cpu benchmark" see also bnikolic.co.uk/blog/hpc-howto-measure-flops.html

                  – rogerdpack
                  Aug 20 '18 at 13:47















                6














                apparently there's a "sysbench" benchmark package and command:



                sudo apt-get install sysbench (or brew install sysbench OS X)



                run it like this:



                sysbench --test=cpu --cpu-max-prime=20000 --num-threads=2 run



                output for comparisons:



                 total time: 15.3047s


                ref: http://www.midwesternmac.com/blogs/jeff-geerling/2013-vps-benchmarks-linode






                share|improve this answer




















                • 3





                  How does this give the FLOPS?

                  – Martin Thoma
                  Dec 21 '16 at 11:00











                • Looks like it's more of a generic "cpu benchmark" see also bnikolic.co.uk/blog/hpc-howto-measure-flops.html

                  – rogerdpack
                  Aug 20 '18 at 13:47













                6












                6








                6







                apparently there's a "sysbench" benchmark package and command:



                sudo apt-get install sysbench (or brew install sysbench OS X)



                run it like this:



                sysbench --test=cpu --cpu-max-prime=20000 --num-threads=2 run



                output for comparisons:



                 total time: 15.3047s


                ref: http://www.midwesternmac.com/blogs/jeff-geerling/2013-vps-benchmarks-linode






                share|improve this answer















                apparently there's a "sysbench" benchmark package and command:



                sudo apt-get install sysbench (or brew install sysbench OS X)



                run it like this:



                sysbench --test=cpu --cpu-max-prime=20000 --num-threads=2 run



                output for comparisons:



                 total time: 15.3047s


                ref: http://www.midwesternmac.com/blogs/jeff-geerling/2013-vps-benchmarks-linode







                share|improve this answer














                share|improve this answer



                share|improve this answer








                edited Jun 13 '16 at 17:31

























                answered Mar 14 '14 at 16:25









                rogerdpackrogerdpack

                4964 silver badges16 bronze badges




                4964 silver badges16 bronze badges







                • 3





                  How does this give the FLOPS?

                  – Martin Thoma
                  Dec 21 '16 at 11:00











                • Looks like it's more of a generic "cpu benchmark" see also bnikolic.co.uk/blog/hpc-howto-measure-flops.html

                  – rogerdpack
                  Aug 20 '18 at 13:47












                • 3





                  How does this give the FLOPS?

                  – Martin Thoma
                  Dec 21 '16 at 11:00











                • Looks like it's more of a generic "cpu benchmark" see also bnikolic.co.uk/blog/hpc-howto-measure-flops.html

                  – rogerdpack
                  Aug 20 '18 at 13:47







                3




                3





                How does this give the FLOPS?

                – Martin Thoma
                Dec 21 '16 at 11:00





                How does this give the FLOPS?

                – Martin Thoma
                Dec 21 '16 at 11:00













                Looks like it's more of a generic "cpu benchmark" see also bnikolic.co.uk/blog/hpc-howto-measure-flops.html

                – rogerdpack
                Aug 20 '18 at 13:47





                Looks like it's more of a generic "cpu benchmark" see also bnikolic.co.uk/blog/hpc-howto-measure-flops.html

                – rogerdpack
                Aug 20 '18 at 13:47











                3














                For ballpark-estimates:



                • Raspberry Pi 2: 299.93 * 10^6 FLOPS (source)


                • Raspberry Pi 3: 462.07 * 10^6 FLOPS (source)



                • GTX Titan Black GPU: 5.1 * 10^12 FLOPS (source)


                • Sunway TaihuLight: 93 * 10^15 FLOPS (source, record holder of 2016)

                Linpack



                1. Download it (link)

                2. Extract it

                3. cd benchmarks_2017/linux/mkl/benchmarks/linpack

                4. ./runme_xeon64

                5. Wait for quite a while (more than 1 hour)

                On a Thinkpad T460p (Intel i7-6700HQ CPU), it gives:



                This is a SAMPLE run script for SMP LINPACK. Change it to reflect
                the correct number of CPUs/threads, problem input files, etc..
                ./runme_xeon64: 33: [: -gt: unexpected operator
                Mi 21. Dez 11:50:29 CET 2016
                Intel(R) Optimized LINPACK Benchmark data

                Current date/time: Wed Dec 21 11:50:29 2016

                CPU frequency: 3.491 GHz
                Number of CPUs: 1
                Number of cores: 4
                Number of threads: 4

                Parameters are set to:

                Number of tests: 15
                Number of equations to solve (problem size) : 1000 2000 5000 10000 15000 18000 20000 22000 25000 26000 27000 30000 35000 40000 45000
                Leading dimension of array : 1000 2000 5008 10000 15000 18008 20016 22008 25000 26000 27000 30000 35000 40000 45000
                Number of trials to run : 4 2 2 2 2 2 2 2 2 2 1 1 1 1 1
                Data alignment value (in Kbytes) : 4 4 4 4 4 4 4 4 4 4 4 1 1 1 1

                Maximum memory requested that can be used=9800701024, at the size=35000

                =================== Timing linear equation system solver ===================

                Size LDA Align. Time(s) GFlops Residual Residual(norm) Check
                1000 1000 4 0.014 46.5838 1.165068e-12 3.973181e-02 pass
                1000 1000 4 0.010 64.7319 1.165068e-12 3.973181e-02 pass
                1000 1000 4 0.009 77.3583 1.165068e-12 3.973181e-02 pass
                1000 1000 4 0.010 67.0096 1.165068e-12 3.973181e-02 pass
                2000 2000 4 0.064 83.6177 5.001027e-12 4.350281e-02 pass
                2000 2000 4 0.063 84.5568 5.001027e-12 4.350281e-02 pass
                5000 5008 4 0.709 117.6800 2.474679e-11 3.450740e-02 pass
                5000 5008 4 0.699 119.2350 2.474679e-11 3.450740e-02 pass
                10000 10000 4 4.895 136.2439 9.069137e-11 3.197870e-02 pass
                10000 10000 4 4.904 135.9888 9.069137e-11 3.197870e-02 pass
                15000 15000 4 17.260 130.3870 2.052533e-10 3.232773e-02 pass
                15000 15000 4 18.159 123.9303 2.052533e-10 3.232773e-02 pass
                18000 18008 4 31.091 125.0738 2.611497e-10 2.859910e-02 pass
                18000 18008 4 31.869 122.0215 2.611497e-10 2.859910e-02 pass
                20000 20016 4 44.877 118.8622 3.442628e-10 3.047480e-02 pass
                20000 20016 4 44.646 119.4762 3.442628e-10 3.047480e-02 pass
                22000 22008 4 57.918 122.5811 4.714135e-10 3.452918e-02 pass
                22000 22008 4 57.171 124.1816 4.714135e-10 3.452918e-02 pass
                25000 25000 4 86.259 120.7747 5.797896e-10 3.297056e-02 pass
                25000 25000 4 83.721 124.4356 5.797896e-10 3.297056e-02 pass
                26000 26000 4 97.420 120.2906 5.615238e-10 2.952660e-02 pass
                26000 26000 4 96.061 121.9924 5.615238e-10 2.952660e-02 pass
                27000 27000 4 109.479 119.8722 5.956148e-10 2.904520e-02 pass
                30000 30000 1 315.697 57.0225 8.015488e-10 3.159714e-02 pass
                35000 35000 1 2421.281 11.8061 1.161127e-09 3.370575e-02 pass

                Performance Summary (GFlops)

                Size LDA Align. Average Maximal
                1000 1000 4 63.9209 77.3583
                2000 2000 4 84.0872 84.5568
                5000 5008 4 118.4575 119.2350
                10000 10000 4 136.1164 136.2439
                15000 15000 4 127.1586 130.3870
                18000 18008 4 123.5477 125.0738
                20000 20016 4 119.1692 119.4762
                22000 22008 4 123.3813 124.1816
                25000 25000 4 122.6052 124.4356
                26000 26000 4 121.1415 121.9924
                27000 27000 4 119.8722 119.8722
                30000 30000 1 57.0225 57.0225
                35000 35000 1 11.8061 11.8061

                Residual checks PASSED

                End of tests

                Done: Mi 21. Dez 12:58:23 CET 2016





                share|improve this answer





























                  3














                  For ballpark-estimates:



                  • Raspberry Pi 2: 299.93 * 10^6 FLOPS (source)


                  • Raspberry Pi 3: 462.07 * 10^6 FLOPS (source)



                  • GTX Titan Black GPU: 5.1 * 10^12 FLOPS (source)


                  • Sunway TaihuLight: 93 * 10^15 FLOPS (source, record holder of 2016)

                  Linpack



                  1. Download it (link)

                  2. Extract it

                  3. cd benchmarks_2017/linux/mkl/benchmarks/linpack

                  4. ./runme_xeon64

                  5. Wait for quite a while (more than 1 hour)

                  On a Thinkpad T460p (Intel i7-6700HQ CPU), it gives:



                  This is a SAMPLE run script for SMP LINPACK. Change it to reflect
                  the correct number of CPUs/threads, problem input files, etc..
                  ./runme_xeon64: 33: [: -gt: unexpected operator
                  Mi 21. Dez 11:50:29 CET 2016
                  Intel(R) Optimized LINPACK Benchmark data

                  Current date/time: Wed Dec 21 11:50:29 2016

                  CPU frequency: 3.491 GHz
                  Number of CPUs: 1
                  Number of cores: 4
                  Number of threads: 4

                  Parameters are set to:

                  Number of tests: 15
                  Number of equations to solve (problem size) : 1000 2000 5000 10000 15000 18000 20000 22000 25000 26000 27000 30000 35000 40000 45000
                  Leading dimension of array : 1000 2000 5008 10000 15000 18008 20016 22008 25000 26000 27000 30000 35000 40000 45000
                  Number of trials to run : 4 2 2 2 2 2 2 2 2 2 1 1 1 1 1
                  Data alignment value (in Kbytes) : 4 4 4 4 4 4 4 4 4 4 4 1 1 1 1

                  Maximum memory requested that can be used=9800701024, at the size=35000

                  =================== Timing linear equation system solver ===================

                  Size LDA Align. Time(s) GFlops Residual Residual(norm) Check
                  1000 1000 4 0.014 46.5838 1.165068e-12 3.973181e-02 pass
                  1000 1000 4 0.010 64.7319 1.165068e-12 3.973181e-02 pass
                  1000 1000 4 0.009 77.3583 1.165068e-12 3.973181e-02 pass
                  1000 1000 4 0.010 67.0096 1.165068e-12 3.973181e-02 pass
                  2000 2000 4 0.064 83.6177 5.001027e-12 4.350281e-02 pass
                  2000 2000 4 0.063 84.5568 5.001027e-12 4.350281e-02 pass
                  5000 5008 4 0.709 117.6800 2.474679e-11 3.450740e-02 pass
                  5000 5008 4 0.699 119.2350 2.474679e-11 3.450740e-02 pass
                  10000 10000 4 4.895 136.2439 9.069137e-11 3.197870e-02 pass
                  10000 10000 4 4.904 135.9888 9.069137e-11 3.197870e-02 pass
                  15000 15000 4 17.260 130.3870 2.052533e-10 3.232773e-02 pass
                  15000 15000 4 18.159 123.9303 2.052533e-10 3.232773e-02 pass
                  18000 18008 4 31.091 125.0738 2.611497e-10 2.859910e-02 pass
                  18000 18008 4 31.869 122.0215 2.611497e-10 2.859910e-02 pass
                  20000 20016 4 44.877 118.8622 3.442628e-10 3.047480e-02 pass
                  20000 20016 4 44.646 119.4762 3.442628e-10 3.047480e-02 pass
                  22000 22008 4 57.918 122.5811 4.714135e-10 3.452918e-02 pass
                  22000 22008 4 57.171 124.1816 4.714135e-10 3.452918e-02 pass
                  25000 25000 4 86.259 120.7747 5.797896e-10 3.297056e-02 pass
                  25000 25000 4 83.721 124.4356 5.797896e-10 3.297056e-02 pass
                  26000 26000 4 97.420 120.2906 5.615238e-10 2.952660e-02 pass
                  26000 26000 4 96.061 121.9924 5.615238e-10 2.952660e-02 pass
                  27000 27000 4 109.479 119.8722 5.956148e-10 2.904520e-02 pass
                  30000 30000 1 315.697 57.0225 8.015488e-10 3.159714e-02 pass
                  35000 35000 1 2421.281 11.8061 1.161127e-09 3.370575e-02 pass

                  Performance Summary (GFlops)

                  Size LDA Align. Average Maximal
                  1000 1000 4 63.9209 77.3583
                  2000 2000 4 84.0872 84.5568
                  5000 5008 4 118.4575 119.2350
                  10000 10000 4 136.1164 136.2439
                  15000 15000 4 127.1586 130.3870
                  18000 18008 4 123.5477 125.0738
                  20000 20016 4 119.1692 119.4762
                  22000 22008 4 123.3813 124.1816
                  25000 25000 4 122.6052 124.4356
                  26000 26000 4 121.1415 121.9924
                  27000 27000 4 119.8722 119.8722
                  30000 30000 1 57.0225 57.0225
                  35000 35000 1 11.8061 11.8061

                  Residual checks PASSED

                  End of tests

                  Done: Mi 21. Dez 12:58:23 CET 2016





                  share|improve this answer



























                    3












                    3








                    3







                    For ballpark-estimates:



                    • Raspberry Pi 2: 299.93 * 10^6 FLOPS (source)


                    • Raspberry Pi 3: 462.07 * 10^6 FLOPS (source)



                    • GTX Titan Black GPU: 5.1 * 10^12 FLOPS (source)


                    • Sunway TaihuLight: 93 * 10^15 FLOPS (source, record holder of 2016)

                    Linpack



                    1. Download it (link)

                    2. Extract it

                    3. cd benchmarks_2017/linux/mkl/benchmarks/linpack

                    4. ./runme_xeon64

                    5. Wait for quite a while (more than 1 hour)

                    On a Thinkpad T460p (Intel i7-6700HQ CPU), it gives:



                    This is a SAMPLE run script for SMP LINPACK. Change it to reflect
                    the correct number of CPUs/threads, problem input files, etc..
                    ./runme_xeon64: 33: [: -gt: unexpected operator
                    Mi 21. Dez 11:50:29 CET 2016
                    Intel(R) Optimized LINPACK Benchmark data

                    Current date/time: Wed Dec 21 11:50:29 2016

                    CPU frequency: 3.491 GHz
                    Number of CPUs: 1
                    Number of cores: 4
                    Number of threads: 4

                    Parameters are set to:

                    Number of tests: 15
                    Number of equations to solve (problem size) : 1000 2000 5000 10000 15000 18000 20000 22000 25000 26000 27000 30000 35000 40000 45000
                    Leading dimension of array : 1000 2000 5008 10000 15000 18008 20016 22008 25000 26000 27000 30000 35000 40000 45000
                    Number of trials to run : 4 2 2 2 2 2 2 2 2 2 1 1 1 1 1
                    Data alignment value (in Kbytes) : 4 4 4 4 4 4 4 4 4 4 4 1 1 1 1

                    Maximum memory requested that can be used=9800701024, at the size=35000

                    =================== Timing linear equation system solver ===================

                    Size LDA Align. Time(s) GFlops Residual Residual(norm) Check
                    1000 1000 4 0.014 46.5838 1.165068e-12 3.973181e-02 pass
                    1000 1000 4 0.010 64.7319 1.165068e-12 3.973181e-02 pass
                    1000 1000 4 0.009 77.3583 1.165068e-12 3.973181e-02 pass
                    1000 1000 4 0.010 67.0096 1.165068e-12 3.973181e-02 pass
                    2000 2000 4 0.064 83.6177 5.001027e-12 4.350281e-02 pass
                    2000 2000 4 0.063 84.5568 5.001027e-12 4.350281e-02 pass
                    5000 5008 4 0.709 117.6800 2.474679e-11 3.450740e-02 pass
                    5000 5008 4 0.699 119.2350 2.474679e-11 3.450740e-02 pass
                    10000 10000 4 4.895 136.2439 9.069137e-11 3.197870e-02 pass
                    10000 10000 4 4.904 135.9888 9.069137e-11 3.197870e-02 pass
                    15000 15000 4 17.260 130.3870 2.052533e-10 3.232773e-02 pass
                    15000 15000 4 18.159 123.9303 2.052533e-10 3.232773e-02 pass
                    18000 18008 4 31.091 125.0738 2.611497e-10 2.859910e-02 pass
                    18000 18008 4 31.869 122.0215 2.611497e-10 2.859910e-02 pass
                    20000 20016 4 44.877 118.8622 3.442628e-10 3.047480e-02 pass
                    20000 20016 4 44.646 119.4762 3.442628e-10 3.047480e-02 pass
                    22000 22008 4 57.918 122.5811 4.714135e-10 3.452918e-02 pass
                    22000 22008 4 57.171 124.1816 4.714135e-10 3.452918e-02 pass
                    25000 25000 4 86.259 120.7747 5.797896e-10 3.297056e-02 pass
                    25000 25000 4 83.721 124.4356 5.797896e-10 3.297056e-02 pass
                    26000 26000 4 97.420 120.2906 5.615238e-10 2.952660e-02 pass
                    26000 26000 4 96.061 121.9924 5.615238e-10 2.952660e-02 pass
                    27000 27000 4 109.479 119.8722 5.956148e-10 2.904520e-02 pass
                    30000 30000 1 315.697 57.0225 8.015488e-10 3.159714e-02 pass
                    35000 35000 1 2421.281 11.8061 1.161127e-09 3.370575e-02 pass

                    Performance Summary (GFlops)

                    Size LDA Align. Average Maximal
                    1000 1000 4 63.9209 77.3583
                    2000 2000 4 84.0872 84.5568
                    5000 5008 4 118.4575 119.2350
                    10000 10000 4 136.1164 136.2439
                    15000 15000 4 127.1586 130.3870
                    18000 18008 4 123.5477 125.0738
                    20000 20016 4 119.1692 119.4762
                    22000 22008 4 123.3813 124.1816
                    25000 25000 4 122.6052 124.4356
                    26000 26000 4 121.1415 121.9924
                    27000 27000 4 119.8722 119.8722
                    30000 30000 1 57.0225 57.0225
                    35000 35000 1 11.8061 11.8061

                    Residual checks PASSED

                    End of tests

                    Done: Mi 21. Dez 12:58:23 CET 2016





                    share|improve this answer















                    For ballpark-estimates:



                    • Raspberry Pi 2: 299.93 * 10^6 FLOPS (source)


                    • Raspberry Pi 3: 462.07 * 10^6 FLOPS (source)



                    • GTX Titan Black GPU: 5.1 * 10^12 FLOPS (source)


                    • Sunway TaihuLight: 93 * 10^15 FLOPS (source, record holder of 2016)

                    Linpack



                    1. Download it (link)

                    2. Extract it

                    3. cd benchmarks_2017/linux/mkl/benchmarks/linpack

                    4. ./runme_xeon64

                    5. Wait for quite a while (more than 1 hour)

                    On a Thinkpad T460p (Intel i7-6700HQ CPU), it gives:



                    This is a SAMPLE run script for SMP LINPACK. Change it to reflect
                    the correct number of CPUs/threads, problem input files, etc..
                    ./runme_xeon64: 33: [: -gt: unexpected operator
                    Mi 21. Dez 11:50:29 CET 2016
                    Intel(R) Optimized LINPACK Benchmark data

                    Current date/time: Wed Dec 21 11:50:29 2016

                    CPU frequency: 3.491 GHz
                    Number of CPUs: 1
                    Number of cores: 4
                    Number of threads: 4

                    Parameters are set to:

                    Number of tests: 15
                    Number of equations to solve (problem size) : 1000 2000 5000 10000 15000 18000 20000 22000 25000 26000 27000 30000 35000 40000 45000
                    Leading dimension of array : 1000 2000 5008 10000 15000 18008 20016 22008 25000 26000 27000 30000 35000 40000 45000
                    Number of trials to run : 4 2 2 2 2 2 2 2 2 2 1 1 1 1 1
                    Data alignment value (in Kbytes) : 4 4 4 4 4 4 4 4 4 4 4 1 1 1 1

                    Maximum memory requested that can be used=9800701024, at the size=35000

                    =================== Timing linear equation system solver ===================

                    Size LDA Align. Time(s) GFlops Residual Residual(norm) Check
                    1000 1000 4 0.014 46.5838 1.165068e-12 3.973181e-02 pass
                    1000 1000 4 0.010 64.7319 1.165068e-12 3.973181e-02 pass
                    1000 1000 4 0.009 77.3583 1.165068e-12 3.973181e-02 pass
                    1000 1000 4 0.010 67.0096 1.165068e-12 3.973181e-02 pass
                    2000 2000 4 0.064 83.6177 5.001027e-12 4.350281e-02 pass
                    2000 2000 4 0.063 84.5568 5.001027e-12 4.350281e-02 pass
                    5000 5008 4 0.709 117.6800 2.474679e-11 3.450740e-02 pass
                    5000 5008 4 0.699 119.2350 2.474679e-11 3.450740e-02 pass
                    10000 10000 4 4.895 136.2439 9.069137e-11 3.197870e-02 pass
                    10000 10000 4 4.904 135.9888 9.069137e-11 3.197870e-02 pass
                    15000 15000 4 17.260 130.3870 2.052533e-10 3.232773e-02 pass
                    15000 15000 4 18.159 123.9303 2.052533e-10 3.232773e-02 pass
                    18000 18008 4 31.091 125.0738 2.611497e-10 2.859910e-02 pass
                    18000 18008 4 31.869 122.0215 2.611497e-10 2.859910e-02 pass
                    20000 20016 4 44.877 118.8622 3.442628e-10 3.047480e-02 pass
                    20000 20016 4 44.646 119.4762 3.442628e-10 3.047480e-02 pass
                    22000 22008 4 57.918 122.5811 4.714135e-10 3.452918e-02 pass
                    22000 22008 4 57.171 124.1816 4.714135e-10 3.452918e-02 pass
                    25000 25000 4 86.259 120.7747 5.797896e-10 3.297056e-02 pass
                    25000 25000 4 83.721 124.4356 5.797896e-10 3.297056e-02 pass
                    26000 26000 4 97.420 120.2906 5.615238e-10 2.952660e-02 pass
                    26000 26000 4 96.061 121.9924 5.615238e-10 2.952660e-02 pass
                    27000 27000 4 109.479 119.8722 5.956148e-10 2.904520e-02 pass
                    30000 30000 1 315.697 57.0225 8.015488e-10 3.159714e-02 pass
                    35000 35000 1 2421.281 11.8061 1.161127e-09 3.370575e-02 pass

                    Performance Summary (GFlops)

                    Size LDA Align. Average Maximal
                    1000 1000 4 63.9209 77.3583
                    2000 2000 4 84.0872 84.5568
                    5000 5008 4 118.4575 119.2350
                    10000 10000 4 136.1164 136.2439
                    15000 15000 4 127.1586 130.3870
                    18000 18008 4 123.5477 125.0738
                    20000 20016 4 119.1692 119.4762
                    22000 22008 4 123.3813 124.1816
                    25000 25000 4 122.6052 124.4356
                    26000 26000 4 121.1415 121.9924
                    27000 27000 4 119.8722 119.8722
                    30000 30000 1 57.0225 57.0225
                    35000 35000 1 11.8061 11.8061

                    Residual checks PASSED

                    End of tests

                    Done: Mi 21. Dez 12:58:23 CET 2016






                    share|improve this answer














                    share|improve this answer



                    share|improve this answer








                    edited Dec 21 '16 at 12:04

























                    answered Dec 21 '16 at 11:11









                    Martin ThomaMartin Thoma

                    1521 silver badge12 bronze badges




                    1521 silver badge12 bronze badges





















                        1














                        One benchmark that has been traditionally used to measure FLOPS is Linpack. Another common FLOPS benchmark is Whetstone.



                        More reading:
                        The Wikipedia "FLOPS" entry,
                        Whetstone entry,
                        Linpack entry






                        share|improve this answer


















                        • 2





                          I appreciate your answer, however my goal is to obtain a quick n' dirty estimate of flops. Whetstone and Linpack have the same problem as HPL - I start reading about it, then get lost in site after site that all look 20 years old. When I do manage to find source code, I can't seem to compile it without installing a bunch of dependent libraries - even then I run into errors. I could get all this stuff working, but it's not important enough to spend the time. Hopefully there exists some relatively modern software that Just Works for ballparking flops.

                          – molecularbear
                          Nov 25 '09 at 22:32






                        • 1





                          Estimate? Then it's about 4*Hz: for 1GHz CPU it's about 4GFLOPS :))

                          – kolypto
                          Nov 26 '09 at 1:43















                        1














                        One benchmark that has been traditionally used to measure FLOPS is Linpack. Another common FLOPS benchmark is Whetstone.



                        More reading:
                        The Wikipedia "FLOPS" entry,
                        Whetstone entry,
                        Linpack entry






                        share|improve this answer


















                        • 2





                          I appreciate your answer, however my goal is to obtain a quick n' dirty estimate of flops. Whetstone and Linpack have the same problem as HPL - I start reading about it, then get lost in site after site that all look 20 years old. When I do manage to find source code, I can't seem to compile it without installing a bunch of dependent libraries - even then I run into errors. I could get all this stuff working, but it's not important enough to spend the time. Hopefully there exists some relatively modern software that Just Works for ballparking flops.

                          – molecularbear
                          Nov 25 '09 at 22:32






                        • 1





                          Estimate? Then it's about 4*Hz: for 1GHz CPU it's about 4GFLOPS :))

                          – kolypto
                          Nov 26 '09 at 1:43













                        1












                        1








                        1







                        One benchmark that has been traditionally used to measure FLOPS is Linpack. Another common FLOPS benchmark is Whetstone.



                        More reading:
                        The Wikipedia "FLOPS" entry,
                        Whetstone entry,
                        Linpack entry






                        share|improve this answer













                        One benchmark that has been traditionally used to measure FLOPS is Linpack. Another common FLOPS benchmark is Whetstone.



                        More reading:
                        The Wikipedia "FLOPS" entry,
                        Whetstone entry,
                        Linpack entry







                        share|improve this answer












                        share|improve this answer



                        share|improve this answer










                        answered Nov 25 '09 at 22:00









                        kolyptokolypto

                        6,6197 gold badges42 silver badges58 bronze badges




                        6,6197 gold badges42 silver badges58 bronze badges







                        • 2





                          I appreciate your answer, however my goal is to obtain a quick n' dirty estimate of flops. Whetstone and Linpack have the same problem as HPL - I start reading about it, then get lost in site after site that all look 20 years old. When I do manage to find source code, I can't seem to compile it without installing a bunch of dependent libraries - even then I run into errors. I could get all this stuff working, but it's not important enough to spend the time. Hopefully there exists some relatively modern software that Just Works for ballparking flops.

                          – molecularbear
                          Nov 25 '09 at 22:32






                        • 1





                          Estimate? Then it's about 4*Hz: for 1GHz CPU it's about 4GFLOPS :))

                          – kolypto
                          Nov 26 '09 at 1:43












                        • 2





                          I appreciate your answer, however my goal is to obtain a quick n' dirty estimate of flops. Whetstone and Linpack have the same problem as HPL - I start reading about it, then get lost in site after site that all look 20 years old. When I do manage to find source code, I can't seem to compile it without installing a bunch of dependent libraries - even then I run into errors. I could get all this stuff working, but it's not important enough to spend the time. Hopefully there exists some relatively modern software that Just Works for ballparking flops.

                          – molecularbear
                          Nov 25 '09 at 22:32






                        • 1





                          Estimate? Then it's about 4*Hz: for 1GHz CPU it's about 4GFLOPS :))

                          – kolypto
                          Nov 26 '09 at 1:43







                        2




                        2





                        I appreciate your answer, however my goal is to obtain a quick n' dirty estimate of flops. Whetstone and Linpack have the same problem as HPL - I start reading about it, then get lost in site after site that all look 20 years old. When I do manage to find source code, I can't seem to compile it without installing a bunch of dependent libraries - even then I run into errors. I could get all this stuff working, but it's not important enough to spend the time. Hopefully there exists some relatively modern software that Just Works for ballparking flops.

                        – molecularbear
                        Nov 25 '09 at 22:32





                        I appreciate your answer, however my goal is to obtain a quick n' dirty estimate of flops. Whetstone and Linpack have the same problem as HPL - I start reading about it, then get lost in site after site that all look 20 years old. When I do manage to find source code, I can't seem to compile it without installing a bunch of dependent libraries - even then I run into errors. I could get all this stuff working, but it's not important enough to spend the time. Hopefully there exists some relatively modern software that Just Works for ballparking flops.

                        – molecularbear
                        Nov 25 '09 at 22:32




                        1




                        1





                        Estimate? Then it's about 4*Hz: for 1GHz CPU it's about 4GFLOPS :))

                        – kolypto
                        Nov 26 '09 at 1:43





                        Estimate? Then it's about 4*Hz: for 1GHz CPU it's about 4GFLOPS :))

                        – kolypto
                        Nov 26 '09 at 1:43











                        1














                        I highly recommend the ready-to-run linpack build from Intel:
                        http://software.intel.com/en-us/articles/intel-math-kernel-library-linpack-download/






                        share|improve this answer



























                          1














                          I highly recommend the ready-to-run linpack build from Intel:
                          http://software.intel.com/en-us/articles/intel-math-kernel-library-linpack-download/






                          share|improve this answer

























                            1












                            1








                            1







                            I highly recommend the ready-to-run linpack build from Intel:
                            http://software.intel.com/en-us/articles/intel-math-kernel-library-linpack-download/






                            share|improve this answer













                            I highly recommend the ready-to-run linpack build from Intel:
                            http://software.intel.com/en-us/articles/intel-math-kernel-library-linpack-download/







                            share|improve this answer












                            share|improve this answer



                            share|improve this answer










                            answered Oct 26 '10 at 15:34









                            bugaboobugaboo

                            1092 bronze badges




                            1092 bronze badges





















                                1














                                As you mention cluster, we have used the the HPCC suite. It takes a bit of effort to setup and tune, but in our case the point wasn't bragging per se, it was part of the acceptance criteria for the cluster; some performance benchmarking is IMHO vital to ensure that the hardware works as advertised, everything is cabled together correctly etc.



                                Now if you just want a theoretical peak FLOPS number, that one is easy. Just check out some article about the CPU (say, on realworldtech.com or somesuch) to get info on how many DP FLOPS a CPU core can do per clock cycle (with current x86 CPU's that's typically 4). Then the total peak FLOPS is just



                                number of cores * FLOPS/cycle * frequency



                                Then for a cluster with IB network you should be able to hit around 80% of the peak FLOPS on HPL (which BTW is one of the benchmarks in HPCC).






                                share|improve this answer



























                                  1














                                  As you mention cluster, we have used the the HPCC suite. It takes a bit of effort to setup and tune, but in our case the point wasn't bragging per se, it was part of the acceptance criteria for the cluster; some performance benchmarking is IMHO vital to ensure that the hardware works as advertised, everything is cabled together correctly etc.



                                  Now if you just want a theoretical peak FLOPS number, that one is easy. Just check out some article about the CPU (say, on realworldtech.com or somesuch) to get info on how many DP FLOPS a CPU core can do per clock cycle (with current x86 CPU's that's typically 4). Then the total peak FLOPS is just



                                  number of cores * FLOPS/cycle * frequency



                                  Then for a cluster with IB network you should be able to hit around 80% of the peak FLOPS on HPL (which BTW is one of the benchmarks in HPCC).






                                  share|improve this answer

























                                    1












                                    1








                                    1







                                    As you mention cluster, we have used the the HPCC suite. It takes a bit of effort to setup and tune, but in our case the point wasn't bragging per se, it was part of the acceptance criteria for the cluster; some performance benchmarking is IMHO vital to ensure that the hardware works as advertised, everything is cabled together correctly etc.



                                    Now if you just want a theoretical peak FLOPS number, that one is easy. Just check out some article about the CPU (say, on realworldtech.com or somesuch) to get info on how many DP FLOPS a CPU core can do per clock cycle (with current x86 CPU's that's typically 4). Then the total peak FLOPS is just



                                    number of cores * FLOPS/cycle * frequency



                                    Then for a cluster with IB network you should be able to hit around 80% of the peak FLOPS on HPL (which BTW is one of the benchmarks in HPCC).






                                    share|improve this answer













                                    As you mention cluster, we have used the the HPCC suite. It takes a bit of effort to setup and tune, but in our case the point wasn't bragging per se, it was part of the acceptance criteria for the cluster; some performance benchmarking is IMHO vital to ensure that the hardware works as advertised, everything is cabled together correctly etc.



                                    Now if you just want a theoretical peak FLOPS number, that one is easy. Just check out some article about the CPU (say, on realworldtech.com or somesuch) to get info on how many DP FLOPS a CPU core can do per clock cycle (with current x86 CPU's that's typically 4). Then the total peak FLOPS is just



                                    number of cores * FLOPS/cycle * frequency



                                    Then for a cluster with IB network you should be able to hit around 80% of the peak FLOPS on HPL (which BTW is one of the benchmarks in HPCC).







                                    share|improve this answer












                                    share|improve this answer



                                    share|improve this answer










                                    answered Oct 26 '10 at 16:45









                                    jannebjanneb

                                    3,43613 silver badges18 bronze badges




                                    3,43613 silver badges18 bronze badges



























                                        draft saved

                                        draft discarded
















































                                        Thanks for contributing an answer to Server Fault!


                                        • Please be sure to answer the question. Provide details and share your research!

                                        But avoid


                                        • Asking for help, clarification, or responding to other answers.

                                        • Making statements based on opinion; back them up with references or personal experience.

                                        To learn more, see our tips on writing great answers.




                                        draft saved


                                        draft discarded














                                        StackExchange.ready(
                                        function ()
                                        StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fserverfault.com%2fquestions%2f88357%2festimate-flops-in-linux%23new-answer', 'question_page');

                                        );

                                        Post as a guest















                                        Required, but never shown





















































                                        Required, but never shown














                                        Required, but never shown












                                        Required, but never shown







                                        Required, but never shown

































                                        Required, but never shown














                                        Required, but never shown












                                        Required, but never shown







                                        Required, but never shown







                                        Popular posts from this blog

                                        Wikipedia:Vital articles Мазмуну Biography - Өмүр баян Philosophy and psychology - Философия жана психология Religion - Дин Social sciences - Коомдук илимдер Language and literature - Тил жана адабият Science - Илим Technology - Технология Arts and recreation - Искусство жана эс алуу History and geography - Тарых жана география Навигация менюсу

                                        Bruxelas-Capital Índice Historia | Composición | Situación lingüística | Clima | Cidades irmandadas | Notas | Véxase tamén | Menú de navegacióneO uso das linguas en Bruxelas e a situación do neerlandés"Rexión de Bruxelas Capital"o orixinalSitio da rexiónPáxina de Bruselas no sitio da Oficina de Promoción Turística de Valonia e BruxelasMapa Interactivo da Rexión de Bruxelas-CapitaleeWorldCat332144929079854441105155190212ID28008674080552-90000 0001 0666 3698n94104302ID540940339365017018237

                                        What should I write in an apology letter, since I have decided not to join a company after accepting an offer letterShould I keep looking after accepting a job offer?What should I do when I've been verbally told I would get an offer letter, but still haven't gotten one after 4 weeks?Do I accept an offer from a company that I am not likely to join?New job hasn't confirmed starting date and I want to give current employer as much notice as possibleHow should I address my manager in my resignation letter?HR delayed background verification, now jobless as resignedNo email communication after accepting a formal written offer. How should I phrase the call?What should I do if after receiving a verbal offer letter I am informed that my written job offer is put on hold due to some internal issues?Should I inform the current employer that I am about to resign within 1-2 weeks since I have signed the offer letter and waiting for visa?What company will do, if I send their offer letter to another company