Is it safe to keep the GPU on 100% utilization for a very long time?Is there a way to run CUDA applications with the CUDA device being a secondary adapter?GPU at >= 100% for Intel HD Graphics 3000What can I do to prevent system power downs?Do modern GPU/CPU's contain a fail-safe for overheating or is it in the BIOS?What is the “maximum” temperature for my GPUIs there any utilization of “GPU in CPU” (e.g. skylake) if there is stand alone GPU in the computer?Safe Temperature for GPU and CPU while gaming in laptopNo display after installing new GPUMy laptop gpu started underclocking itself when it is not plugged at the wallIs 99 percent anhydrous isopropyl alcohol safe for gpu mobo etc?

Quotient of Three Dimensional Torus by Permutation on Coordinates

Why is Drogon so much better in battle than Rhaegal and Viserion?

When did Britain learn about the American Declaration of Independence?

Can I pay my credit card?

How many Dothraki are left as of Game of Thrones S8E5?

Failing students when it might cause them economic ruin

How can sister protect herself from impulse purchases with a credit card?

Why does the U.S military use mercenaries?

Is it standard to have the first week's pay indefinitely withheld?

Why aren't satellites disintegrated even though they orbit earth within earth's Roche Limits?

Was murdering a slave illegal in American slavery, and if so, what punishments were given for it?

Driving a school bus in the USA

Shortest amud or daf in Shas?

How to laser-level close to a surface

How can I monitor the bulk API limit?

In Dutch history two people are referred to as "William III"; are there any more cases where this happens?

Would a "ring language" be possible?

multicol package causes underfull hbox

I recently started my machine learning PhD and I have absolutely no idea what I'm doing

Pedaling at different gear ratios on flat terrain: what's the point?

FIFO data structure in pure C

Is it a good idea to teach algorithm courses using pseudocode instead of a real programming language?

How to say "that" as in "the cow that ate" in Japanese?

How do you cope with rejection?



Is it safe to keep the GPU on 100% utilization for a very long time?


Is there a way to run CUDA applications with the CUDA device being a secondary adapter?GPU at >= 100% for Intel HD Graphics 3000What can I do to prevent system power downs?Do modern GPU/CPU's contain a fail-safe for overheating or is it in the BIOS?What is the “maximum” temperature for my GPUIs there any utilization of “GPU in CPU” (e.g. skylake) if there is stand alone GPU in the computer?Safe Temperature for GPU and CPU while gaming in laptopNo display after installing new GPUMy laptop gpu started underclocking itself when it is not plugged at the wallIs 99 percent anhydrous isopropyl alcohol safe for gpu mobo etc?






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;








46















I am currently performing number-crunching using CUDA on my GPU, an NVIDIA GeForce GTX 1050 Ti. These operations often take months to complete, and during that time I leave my PC on 24/7.



Is doing so safe? Am I risking a potential overheating of my graphics card that might result in (worst case scenario) a house fire?




Note that the PC is correctly ventilated and there is no obstruction to its air flow.










share|improve this question



















  • 57





    With a proper cooling the only issue will be electricity bills.

    – montonero
    May 6 at 9:14






  • 4





    The biggest question is, what level of risk is acceptable to you? You have to consider cases where other system components (e.g. one of the fans) fails while you're unavailable to monitor it. For systems where the tolerated risk is extremely low, companies might spend much more on duplicate cooling systems, automated fire-suppression devices, and monitoring systems.

    – jpaugh
    May 6 at 15:09






  • 1





    Also, it might be cheaper to use someone else's hardware (maybe a local university?) than to do it on your own hardware, especially once you factor in your comfort level with the amount of risk involved.

    – jpaugh
    May 6 at 15:22






  • 1





    Anecdotally over the last decade I've ran 8 graphics cards at high 24/7 compute loads for at least 3 years each. During that time I've had a single fan failure (AMD 5850) after ~2 years and a single card failure (NVidia 560) after 4(?) years.

    – Dan Neely
    May 6 at 17:52






  • 1





    @EricDuminil Is the heat expended by a chip not proportional with the % utilisation?

    – Iain
    May 6 at 23:36

















46















I am currently performing number-crunching using CUDA on my GPU, an NVIDIA GeForce GTX 1050 Ti. These operations often take months to complete, and during that time I leave my PC on 24/7.



Is doing so safe? Am I risking a potential overheating of my graphics card that might result in (worst case scenario) a house fire?




Note that the PC is correctly ventilated and there is no obstruction to its air flow.










share|improve this question



















  • 57





    With a proper cooling the only issue will be electricity bills.

    – montonero
    May 6 at 9:14






  • 4





    The biggest question is, what level of risk is acceptable to you? You have to consider cases where other system components (e.g. one of the fans) fails while you're unavailable to monitor it. For systems where the tolerated risk is extremely low, companies might spend much more on duplicate cooling systems, automated fire-suppression devices, and monitoring systems.

    – jpaugh
    May 6 at 15:09






  • 1





    Also, it might be cheaper to use someone else's hardware (maybe a local university?) than to do it on your own hardware, especially once you factor in your comfort level with the amount of risk involved.

    – jpaugh
    May 6 at 15:22






  • 1





    Anecdotally over the last decade I've ran 8 graphics cards at high 24/7 compute loads for at least 3 years each. During that time I've had a single fan failure (AMD 5850) after ~2 years and a single card failure (NVidia 560) after 4(?) years.

    – Dan Neely
    May 6 at 17:52






  • 1





    @EricDuminil Is the heat expended by a chip not proportional with the % utilisation?

    – Iain
    May 6 at 23:36













46












46








46


5






I am currently performing number-crunching using CUDA on my GPU, an NVIDIA GeForce GTX 1050 Ti. These operations often take months to complete, and during that time I leave my PC on 24/7.



Is doing so safe? Am I risking a potential overheating of my graphics card that might result in (worst case scenario) a house fire?




Note that the PC is correctly ventilated and there is no obstruction to its air flow.










share|improve this question
















I am currently performing number-crunching using CUDA on my GPU, an NVIDIA GeForce GTX 1050 Ti. These operations often take months to complete, and during that time I leave my PC on 24/7.



Is doing so safe? Am I risking a potential overheating of my graphics card that might result in (worst case scenario) a house fire?




Note that the PC is correctly ventilated and there is no obstruction to its air flow.







hardware-failure gpu cuda






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited May 6 at 18:08









WELZ

1521315




1521315










asked May 6 at 9:04









KlangenKlangen

356148




356148







  • 57





    With a proper cooling the only issue will be electricity bills.

    – montonero
    May 6 at 9:14






  • 4





    The biggest question is, what level of risk is acceptable to you? You have to consider cases where other system components (e.g. one of the fans) fails while you're unavailable to monitor it. For systems where the tolerated risk is extremely low, companies might spend much more on duplicate cooling systems, automated fire-suppression devices, and monitoring systems.

    – jpaugh
    May 6 at 15:09






  • 1





    Also, it might be cheaper to use someone else's hardware (maybe a local university?) than to do it on your own hardware, especially once you factor in your comfort level with the amount of risk involved.

    – jpaugh
    May 6 at 15:22






  • 1





    Anecdotally over the last decade I've ran 8 graphics cards at high 24/7 compute loads for at least 3 years each. During that time I've had a single fan failure (AMD 5850) after ~2 years and a single card failure (NVidia 560) after 4(?) years.

    – Dan Neely
    May 6 at 17:52






  • 1





    @EricDuminil Is the heat expended by a chip not proportional with the % utilisation?

    – Iain
    May 6 at 23:36












  • 57





    With a proper cooling the only issue will be electricity bills.

    – montonero
    May 6 at 9:14






  • 4





    The biggest question is, what level of risk is acceptable to you? You have to consider cases where other system components (e.g. one of the fans) fails while you're unavailable to monitor it. For systems where the tolerated risk is extremely low, companies might spend much more on duplicate cooling systems, automated fire-suppression devices, and monitoring systems.

    – jpaugh
    May 6 at 15:09






  • 1





    Also, it might be cheaper to use someone else's hardware (maybe a local university?) than to do it on your own hardware, especially once you factor in your comfort level with the amount of risk involved.

    – jpaugh
    May 6 at 15:22






  • 1





    Anecdotally over the last decade I've ran 8 graphics cards at high 24/7 compute loads for at least 3 years each. During that time I've had a single fan failure (AMD 5850) after ~2 years and a single card failure (NVidia 560) after 4(?) years.

    – Dan Neely
    May 6 at 17:52






  • 1





    @EricDuminil Is the heat expended by a chip not proportional with the % utilisation?

    – Iain
    May 6 at 23:36







57




57





With a proper cooling the only issue will be electricity bills.

– montonero
May 6 at 9:14





With a proper cooling the only issue will be electricity bills.

– montonero
May 6 at 9:14




4




4





The biggest question is, what level of risk is acceptable to you? You have to consider cases where other system components (e.g. one of the fans) fails while you're unavailable to monitor it. For systems where the tolerated risk is extremely low, companies might spend much more on duplicate cooling systems, automated fire-suppression devices, and monitoring systems.

– jpaugh
May 6 at 15:09





The biggest question is, what level of risk is acceptable to you? You have to consider cases where other system components (e.g. one of the fans) fails while you're unavailable to monitor it. For systems where the tolerated risk is extremely low, companies might spend much more on duplicate cooling systems, automated fire-suppression devices, and monitoring systems.

– jpaugh
May 6 at 15:09




1




1





Also, it might be cheaper to use someone else's hardware (maybe a local university?) than to do it on your own hardware, especially once you factor in your comfort level with the amount of risk involved.

– jpaugh
May 6 at 15:22





Also, it might be cheaper to use someone else's hardware (maybe a local university?) than to do it on your own hardware, especially once you factor in your comfort level with the amount of risk involved.

– jpaugh
May 6 at 15:22




1




1





Anecdotally over the last decade I've ran 8 graphics cards at high 24/7 compute loads for at least 3 years each. During that time I've had a single fan failure (AMD 5850) after ~2 years and a single card failure (NVidia 560) after 4(?) years.

– Dan Neely
May 6 at 17:52





Anecdotally over the last decade I've ran 8 graphics cards at high 24/7 compute loads for at least 3 years each. During that time I've had a single fan failure (AMD 5850) after ~2 years and a single card failure (NVidia 560) after 4(?) years.

– Dan Neely
May 6 at 17:52




1




1





@EricDuminil Is the heat expended by a chip not proportional with the % utilisation?

– Iain
May 6 at 23:36





@EricDuminil Is the heat expended by a chip not proportional with the % utilisation?

– Iain
May 6 at 23:36










6 Answers
6






active

oldest

votes


















56














Short answer: This should be safe on well-designed hardware.



Long answer:
The GPU (and its software environment: drivers, OS, daemons) are designed to protect from overheating - the GPU should first turn the fans to a higher RPM, if that can't keep a safe temperature then the GPU throttles the workload (usually by reducing the clock frequency). This will assure a heat profile that will not damage the GPU and thus not the PC (or the room).



Caveat: There exist cheap knock-off graphic cards, where the firmware is specifically designed to sacrifice safety for performance. While I don't think those exist for a 1050, I am not 100% sure. You should also prefer the Nvidia drivers downloaded from their website over "optimized" vendor drivers, which might do the same thing.






share|improve this answer




















  • 23





    It's not just "cheap knock-offs". I've seen six(!) completely independent GeForce 7600GS's from a reputable manufacturer die in the same way due to a presumably inadequate cooling design. This was a fanless "super silent" card used for office work or at most light gaming. However, high-end parts will likely be designed to cope with greater thermal abuse, although likely not for 24/7 loads.

    – TooTea
    May 6 at 9:42







  • 2





    @Klangen The PSU differs from the GPU in that it is usually (apart from servers with a BMC) not actively monitored for temperature. That said, PSUs are designed to "fail safe", i.e. if they fail fail in a way, that they do not create additional damage.

    – Eugen Rieck
    May 6 at 10:02






  • 9





    Because crypto-mining is the canonical example of sacrificing safety against performance.

    – Eugen Rieck
    May 6 at 10:13






  • 6





    Anecdotal evidence: On my older desktop computer, I've managed to fry both available CPU fan power ports (too much dust), so I decided to see if I could run the machine without a CPU fan by keeping a close eye on the CPU temperature. It hit 90C in a couple of minutes, then slowed down significantly. It was a Pentium, I believe.

    – John Dvorak
    May 6 at 10:13






  • 8





    @JohnDvorak Yes, all non-ancient CPUs employ a similar method

    – Eugen Rieck
    May 6 at 10:16


















9














A house fire is extremely unlikely, but the lifespan of the card may be reduced.



Long-term overheating of the GPU chip probably won't start a fire. The chip may deteriorate and start misbehaving or die completely, but silicon chips aren't too flammable. Bad things usually happen when electrolytic capacitors fail and blow up, but these won't be subject to overheating just because the card is doing a lot of crunching and you also hopefully have a metal PC case to contain the hot shrapnel that results from such failures.



However, consumer-grade parts aren't in general designed for long-term 24/7 loads. It is thus fairly likely that the card will die sooner than if it wasn't subject to such loads. It is hard to say how much sooner without having some more statistics on a given model. Some people in the HPC community advocate using high-end gaming GPUs instead of special HPC compute parts, and there seems to be some economical sense in that. Although the commodity parts die in a year or so, it's cheaper to keep replacing them because they're many times cheaper than the alternative






share|improve this answer


















  • 6





    Mechanical stress is the worst when heating up and cooling down, rather than running at a constant temperature. What the OP is planning to do is no worse than playing GPU-intensive games every day for a few months.

    – Dmitry Grigoryev
    May 6 at 11:55






  • 3





    Unfortunately, Nvidia's license prevents a HPC data center from using consumer-grade gaming GPUs in their servers. We're required to use higher-end GPUs and I currently have a order in for P100's when the researchers would actually prefer 1080Ti cards.

    – doneal24
    May 6 at 16:11






  • 4





    @technical_difficulty The EULA for the nVidia driver says that it doesn't apply to datacenter HPC use (that's certainly a concern for large centers, but it doesn't stop people building in-house HPC clusters from consumer parts). There's a decent writeup here: microway.com/knowledge-center-articles/…

    – TooTea
    May 6 at 19:28






  • 4





    @user912264 Yes, that's what I meant, although presumably with a poor choice of words on my end. I'm by no means a lawyer or a licensing expert. My point is that you're not allowed to use the driver in such a situation (because you can't rely on the normal free license).

    – TooTea
    May 6 at 20:06






  • 2





    Also note that you get more wear on the GPU fans from keeping them at a high speed.

    – user71659
    May 6 at 21:54


















6














Yes, the card is likely to wear out sooner if it is under constant load. At small geometries, Electromigration is a significant source of device failures, and devices will typically be designed with a specific target lifetime in mind. This might be generous for typical operation (e.g. 5 years continuous operation), but might not assume 100% maximum operating point for all of that time. As soon as you start over-clocking, you can expect that target to reduce significantly. (Equally, running at only 80% load would maybe double the lifetime due to this failure mechanism).



There are of course other failures which relate to running components hot, or thermal cycling, this is just to point out that modern electronics (and even 1980's electronics when badly designed) can be suceptible to 'wearing out'.






share|improve this answer























  • But would it wear out more doing the same workload over a shorter period of time? In other words, would running at 100% for some period be worse than running it at an average of 50% for double that time?

    – trlkly
    May 8 at 4:34






  • 2





    Yes, exactly. Hotter, or higher voltages mean higher fatigue for the same 'work'. To a first order approximation, for this effect.

    – Sean Houlihane
    May 8 at 6:19


















3














If your cooling system works OK, and your hardware is of any kind of even vaguely modern design that includes on-chip temperature monitoring and thermal throttling/suspend/shutdown, then it's entirely safe. It can't overheat so long as the cooler keeps running, and if that fails, the chips will throttle back until they're no longer producing more heat than can be passively dissipated (which may mean having to suspend completely, appearing like a hang/crash).



Worst case, if the throttling doesn't kick in fast and hard enough to compensate for accumulated thermal load, some part of the chip may end up melting or burning out, and you'll end up with a dead board, but by that point the throttling circuitry should have slammed into complete emergency shutdown, maybe even tripping a (temporary or permanent) fuse on the power rail, preventing any kind of runaway dumping of the entire input voltage randomly across the die and an actual fire.



Thankfully, the PC platform worked out most of the kinks in that kind of thermal protection system 10-15 years ago, after the minor scandal of some mid generation PIIIs and Athlons proving entirely capable of completely smoking themselves (and thus being a fire risk) if the cooler failed or fell off whilst the CPU was running at full tilt. One generation of chips later and it could be easily demonstrated that an overclocked high-end processor barely exceeded the maximum rated temperature at the heat spreader surface if you tore the heatsink and fan off right in the middle of a heavy benchmark... the computer slowed to a crawl or even suffered a "fatal" (to the software; the hardware just needed the HSF replaced and rebooting) crash, but the chips survived and no risk arose. Hopefully any GPU maker worth their salt isn't going to be a decade and a half behind the curve, especially when their products can already run pretty close to their rated limits temperature-wise.



However, that doesn't make this kind of treatment entirely "safe" for the transistors on the chip. Heavyweight "number crunching" (Bitcoin? Protein folding?) using GPUs is by now a rather infamous way of literally wearing out the silicon. The combination of high voltage and current, continual switching billions of times per second, plus sustained high temperatures stress the components quite a bit, both the chips and the support parts like capacitors, so their operating lifetime can be reduced to barely two years in some cases, at least at full speed. They can then run on a bit longer if derated (maximum clock speed limited etc) and employed for less demanding purposes, like last year's games, but are on borrowed time once they start erroring out at maximum speed.



So it's not going to catch on fire, but I wouldn't bank on the card still being reliable past its third birthday in that employment...






share|improve this answer























  • Crypto mining especially tends to operate multiple cards packed onto one mobo with inadequate airflow, resulting in high temperatures. Using one card in a good tower case with proper airflow should be significantly less stressful, although there is wear and tear on the fans. And as Sean's and your answers point out, electromigration from being powered up to full voltage can still be a concern even if temperatures are kept in check.

    – Peter Cordes
    May 7 at 20:34


















1














As you have mentioned, ventilation is good, so no need to worry about this factor of risk.



Talking about the GPU, it will be worn stronger, than on usual office work for 8-16 hours a day, so when using on 100% 24/7/365 it is unlikely it will be able to work for 5-10 years and more. But you must also consider that the GPU can have a poor design of the cooling system of the GPU itself (not a PC overall), a bad overall design, software and firmware bugs, bad production quality or production defect(s) with different severity and defect rate - from single-instance defects to massive ones. These factors can make the heating worse, cause system failure, little lifetime, short-circuiting or even might cause a fire or make you electric struck. Some factors depend on the model and the revision, some are being gradually fixed with the software/firmware updates, some vary from one single item to another. Better choose models with proven reliability reputation with a proper revision (usually the latest possible). Also, it can have a bad influence and interfere badly with the other components, for example, by generating extra electric/electronic signal noise. Also, do not forget about the fact, that the thermal paste can gradually lose its qualities and make cooling worse.



I must mention, that the graphics card is not the only component to be considered, because a PC is a complex system and its successful work depends on the state of multiple components. Every single little, even if unnecessary and unused, bad component, even the floppy drive or some decorative lights may break the PC down or cause the problems close to the ones mentioned about the GPU. For example, a bad on/off button may cause shutdown or reboot. And now more deep about the key components:



  • CPU: in your use case it is likely to be used not harder than during ordinary day-to-day usage and it is likely that you absolutely do not need to overclock it. Nowadays CPUs feature all defensive mechanisms like throttling and emergency shutdown and are considered to be pretty durable. Just do not forget about the cooler and thermal paste and it is very unlikely to be the weakest point of the system.

  • Motherboard: almost the same as the CPU, but there is heavy usage of PCI-e and maybe heavy usage of disks, network and peripherals, but better choose proven models.

  • RAM: It is extremely unlikely to break, so this risk is not worthy of being worried about. Just use a good one.

  • Disks: in the tasks that rely on disk usage (like data mining, data processing, learning a neural network with the data on the disk) HDD can become a weak point in reliability - in servers and data centres it is pretty common to change a disk in 1-3 years and very rarely "live" 5 years or more. You can use RAID 1 and backup systems to increase reliability at 24/7/365 usage (RAID 0 sacrifices reliability for performance, other RAIDs can take a lot of time to restore data. Also RAID != back up, so do not neglect with backups, if required). When using SSD, operations, that are heavy on disk-writing can drain the terabytes-written limit and make the disk useless - prefer TBW over other features. RAID 1 with SSDs can defend the system against sudden failures of one disk, but do not help with TBW rate. HDD or SSD - depends on your needs, budget and choice. Better choose models with proven reliability reputation with a proper revision (usually the latest possible).

  • Power block: is heavily used by a graphics card and therefore worn more intensively - so better choose models with proven reliability reputation with a proper revision (usually the latest possible) and the power at least 1.5x more than the overall system consumption or at least 2x-2.5x more, than the main power consumers (as the GPU and the CPU). Be sure to use a good 220V AC cable, because of bad 220V AC cables are likely to cause short-circuiting, electric struck or burning (can just make smoke and self-destroy or set a real fire)!

  • Ventilators: while may seem insignificant they are crucial in such use-cases and their failure is a big problem for 24/7/365 systems. Generally, install as many as you can, but also consider the size - bigger ones are quieter and more effective while the smaller ones in some cases can be installed in a bigger amount, so the failure of one single ventilator will be less painful for the system - the choice is yours.

  • Exotic cooling systems: water cooling is considered to be compact and effective in high-heated overclocked systems, but water leakage can cause serious damage to PC`s components. Frozen nitrogen systems are extremely effective but likely not to be required, but are more bulky and expensive.

Professional enterprise 24/7/365 systems and components are better designed for that and have a reserve on all the components, even CPUs and BIOSes, and feature hot-replacement of components or modules, but even they do not feature 100% uptime (close, but not equal), professional Nvidia cards are faster for CUDA (especially neural networks) but I do not think it is your use case.



Assembling the system is not less important, than the components themselves. Do not forget about any single action, do not make something wrong, do not make a PC like a stupid and everything must be fine.



Make sure no software will forcibly shut down, reboot the PC or kill the process. If you are a Win10 user, you may think there is no way of entirely disabling the updates, but there are workarounds and pieces of software on the Web for that (Warning: it can violate EULA).



Peripherals can also cause problems, like the PC`s components. For example, a bad or worn mouse can register a button press when there is no press.



About key external circumstances:



  • Electricity: I hope the electricity in your house is very reliable and stable because switching off electricity can make you lose the results of your work. With short-time electric problems, UPS can help you, but with more long-time issues it can give you only time to hibernate the system or to save your progress correctly.

  • Network: if your task relies on the Internet or network connection, check if the wires/modem/router is ok.

Summing up: There is no solid warranty that everything will be good (literally, only death is guaranteed) and anyway you must accept the risks (they never will be equal to zero), but having a good choice of components, proper assembling and not having bad luck in buying defected components allows you use the PC that way with lower risk, then the question author initially assumed, unless you are going to do it for years and years and expect reliability for 5, 10 and more years.






share|improve this answer






























    0















    Is it safe to keep the GPU on 100% utilization for a very long time?




    Yes. It's actually safer than using it for the intended purpose, that is playing a game once in a while.



    The most wear (of the electronics) comes from mechanical stress from changing temperature. The components heat up at different rates, their thermal expansion coefficients are different, therefore every heat up, cool down cycle results in forces that try to tear the card apart, often resulting in micro-damages that accumulate and can eventually lead to failure. Don't be alarmed, it's supposed to take decades. (Unlike the infamous 2006 nVidia laptop GPUs that used wrong solder so the failures occurred soon enough to be noticeable within component's lifetime)



    If you start your computation and keep them at constant rate, it's actually less stressful to the card, as it warms up and then stays there, without the thermal cycles.



    The only parts that will see increased wear are the fans, which are usually easy to replace.



    As to your plan on actual 100% utilization - 100% is inefficient. Learn from the lesson that cryptominers taught us: as you underclock and undervolt the card, the flops go down, but consumed power goes down even more. You'll get more performance per watt. And even better lifespan.






    share|improve this answer























      Your Answer








      StackExchange.ready(function()
      var channelOptions =
      tags: "".split(" "),
      id: "3"
      ;
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function()
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled)
      StackExchange.using("snippets", function()
      createEditor();
      );

      else
      createEditor();

      );

      function createEditor()
      StackExchange.prepareEditor(
      heartbeatType: 'answer',
      autoActivateHeartbeat: false,
      convertImagesToLinks: true,
      noModals: true,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: 10,
      bindNavPrevention: true,
      postfix: "",
      imageUploader:
      brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
      contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
      allowUrls: true
      ,
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      );



      );













      draft saved

      draft discarded


















      StackExchange.ready(
      function ()
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fsuperuser.com%2fquestions%2f1433515%2fis-it-safe-to-keep-the-gpu-on-100-utilization-for-a-very-long-time%23new-answer', 'question_page');

      );

      Post as a guest















      Required, but never shown

























      6 Answers
      6






      active

      oldest

      votes








      6 Answers
      6






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes









      56














      Short answer: This should be safe on well-designed hardware.



      Long answer:
      The GPU (and its software environment: drivers, OS, daemons) are designed to protect from overheating - the GPU should first turn the fans to a higher RPM, if that can't keep a safe temperature then the GPU throttles the workload (usually by reducing the clock frequency). This will assure a heat profile that will not damage the GPU and thus not the PC (or the room).



      Caveat: There exist cheap knock-off graphic cards, where the firmware is specifically designed to sacrifice safety for performance. While I don't think those exist for a 1050, I am not 100% sure. You should also prefer the Nvidia drivers downloaded from their website over "optimized" vendor drivers, which might do the same thing.






      share|improve this answer




















      • 23





        It's not just "cheap knock-offs". I've seen six(!) completely independent GeForce 7600GS's from a reputable manufacturer die in the same way due to a presumably inadequate cooling design. This was a fanless "super silent" card used for office work or at most light gaming. However, high-end parts will likely be designed to cope with greater thermal abuse, although likely not for 24/7 loads.

        – TooTea
        May 6 at 9:42







      • 2





        @Klangen The PSU differs from the GPU in that it is usually (apart from servers with a BMC) not actively monitored for temperature. That said, PSUs are designed to "fail safe", i.e. if they fail fail in a way, that they do not create additional damage.

        – Eugen Rieck
        May 6 at 10:02






      • 9





        Because crypto-mining is the canonical example of sacrificing safety against performance.

        – Eugen Rieck
        May 6 at 10:13






      • 6





        Anecdotal evidence: On my older desktop computer, I've managed to fry both available CPU fan power ports (too much dust), so I decided to see if I could run the machine without a CPU fan by keeping a close eye on the CPU temperature. It hit 90C in a couple of minutes, then slowed down significantly. It was a Pentium, I believe.

        – John Dvorak
        May 6 at 10:13






      • 8





        @JohnDvorak Yes, all non-ancient CPUs employ a similar method

        – Eugen Rieck
        May 6 at 10:16















      56














      Short answer: This should be safe on well-designed hardware.



      Long answer:
      The GPU (and its software environment: drivers, OS, daemons) are designed to protect from overheating - the GPU should first turn the fans to a higher RPM, if that can't keep a safe temperature then the GPU throttles the workload (usually by reducing the clock frequency). This will assure a heat profile that will not damage the GPU and thus not the PC (or the room).



      Caveat: There exist cheap knock-off graphic cards, where the firmware is specifically designed to sacrifice safety for performance. While I don't think those exist for a 1050, I am not 100% sure. You should also prefer the Nvidia drivers downloaded from their website over "optimized" vendor drivers, which might do the same thing.






      share|improve this answer




















      • 23





        It's not just "cheap knock-offs". I've seen six(!) completely independent GeForce 7600GS's from a reputable manufacturer die in the same way due to a presumably inadequate cooling design. This was a fanless "super silent" card used for office work or at most light gaming. However, high-end parts will likely be designed to cope with greater thermal abuse, although likely not for 24/7 loads.

        – TooTea
        May 6 at 9:42







      • 2





        @Klangen The PSU differs from the GPU in that it is usually (apart from servers with a BMC) not actively monitored for temperature. That said, PSUs are designed to "fail safe", i.e. if they fail fail in a way, that they do not create additional damage.

        – Eugen Rieck
        May 6 at 10:02






      • 9





        Because crypto-mining is the canonical example of sacrificing safety against performance.

        – Eugen Rieck
        May 6 at 10:13






      • 6





        Anecdotal evidence: On my older desktop computer, I've managed to fry both available CPU fan power ports (too much dust), so I decided to see if I could run the machine without a CPU fan by keeping a close eye on the CPU temperature. It hit 90C in a couple of minutes, then slowed down significantly. It was a Pentium, I believe.

        – John Dvorak
        May 6 at 10:13






      • 8





        @JohnDvorak Yes, all non-ancient CPUs employ a similar method

        – Eugen Rieck
        May 6 at 10:16













      56












      56








      56







      Short answer: This should be safe on well-designed hardware.



      Long answer:
      The GPU (and its software environment: drivers, OS, daemons) are designed to protect from overheating - the GPU should first turn the fans to a higher RPM, if that can't keep a safe temperature then the GPU throttles the workload (usually by reducing the clock frequency). This will assure a heat profile that will not damage the GPU and thus not the PC (or the room).



      Caveat: There exist cheap knock-off graphic cards, where the firmware is specifically designed to sacrifice safety for performance. While I don't think those exist for a 1050, I am not 100% sure. You should also prefer the Nvidia drivers downloaded from their website over "optimized" vendor drivers, which might do the same thing.






      share|improve this answer















      Short answer: This should be safe on well-designed hardware.



      Long answer:
      The GPU (and its software environment: drivers, OS, daemons) are designed to protect from overheating - the GPU should first turn the fans to a higher RPM, if that can't keep a safe temperature then the GPU throttles the workload (usually by reducing the clock frequency). This will assure a heat profile that will not damage the GPU and thus not the PC (or the room).



      Caveat: There exist cheap knock-off graphic cards, where the firmware is specifically designed to sacrifice safety for performance. While I don't think those exist for a 1050, I am not 100% sure. You should also prefer the Nvidia drivers downloaded from their website over "optimized" vendor drivers, which might do the same thing.







      share|improve this answer














      share|improve this answer



      share|improve this answer








      edited May 8 at 2:43









      Ender

      132




      132










      answered May 6 at 9:14









      Eugen RieckEugen Rieck

      12.2k22731




      12.2k22731







      • 23





        It's not just "cheap knock-offs". I've seen six(!) completely independent GeForce 7600GS's from a reputable manufacturer die in the same way due to a presumably inadequate cooling design. This was a fanless "super silent" card used for office work or at most light gaming. However, high-end parts will likely be designed to cope with greater thermal abuse, although likely not for 24/7 loads.

        – TooTea
        May 6 at 9:42







      • 2





        @Klangen The PSU differs from the GPU in that it is usually (apart from servers with a BMC) not actively monitored for temperature. That said, PSUs are designed to "fail safe", i.e. if they fail fail in a way, that they do not create additional damage.

        – Eugen Rieck
        May 6 at 10:02






      • 9





        Because crypto-mining is the canonical example of sacrificing safety against performance.

        – Eugen Rieck
        May 6 at 10:13






      • 6





        Anecdotal evidence: On my older desktop computer, I've managed to fry both available CPU fan power ports (too much dust), so I decided to see if I could run the machine without a CPU fan by keeping a close eye on the CPU temperature. It hit 90C in a couple of minutes, then slowed down significantly. It was a Pentium, I believe.

        – John Dvorak
        May 6 at 10:13






      • 8





        @JohnDvorak Yes, all non-ancient CPUs employ a similar method

        – Eugen Rieck
        May 6 at 10:16












      • 23





        It's not just "cheap knock-offs". I've seen six(!) completely independent GeForce 7600GS's from a reputable manufacturer die in the same way due to a presumably inadequate cooling design. This was a fanless "super silent" card used for office work or at most light gaming. However, high-end parts will likely be designed to cope with greater thermal abuse, although likely not for 24/7 loads.

        – TooTea
        May 6 at 9:42







      • 2





        @Klangen The PSU differs from the GPU in that it is usually (apart from servers with a BMC) not actively monitored for temperature. That said, PSUs are designed to "fail safe", i.e. if they fail fail in a way, that they do not create additional damage.

        – Eugen Rieck
        May 6 at 10:02






      • 9





        Because crypto-mining is the canonical example of sacrificing safety against performance.

        – Eugen Rieck
        May 6 at 10:13






      • 6





        Anecdotal evidence: On my older desktop computer, I've managed to fry both available CPU fan power ports (too much dust), so I decided to see if I could run the machine without a CPU fan by keeping a close eye on the CPU temperature. It hit 90C in a couple of minutes, then slowed down significantly. It was a Pentium, I believe.

        – John Dvorak
        May 6 at 10:13






      • 8





        @JohnDvorak Yes, all non-ancient CPUs employ a similar method

        – Eugen Rieck
        May 6 at 10:16







      23




      23





      It's not just "cheap knock-offs". I've seen six(!) completely independent GeForce 7600GS's from a reputable manufacturer die in the same way due to a presumably inadequate cooling design. This was a fanless "super silent" card used for office work or at most light gaming. However, high-end parts will likely be designed to cope with greater thermal abuse, although likely not for 24/7 loads.

      – TooTea
      May 6 at 9:42






      It's not just "cheap knock-offs". I've seen six(!) completely independent GeForce 7600GS's from a reputable manufacturer die in the same way due to a presumably inadequate cooling design. This was a fanless "super silent" card used for office work or at most light gaming. However, high-end parts will likely be designed to cope with greater thermal abuse, although likely not for 24/7 loads.

      – TooTea
      May 6 at 9:42





      2




      2





      @Klangen The PSU differs from the GPU in that it is usually (apart from servers with a BMC) not actively monitored for temperature. That said, PSUs are designed to "fail safe", i.e. if they fail fail in a way, that they do not create additional damage.

      – Eugen Rieck
      May 6 at 10:02





      @Klangen The PSU differs from the GPU in that it is usually (apart from servers with a BMC) not actively monitored for temperature. That said, PSUs are designed to "fail safe", i.e. if they fail fail in a way, that they do not create additional damage.

      – Eugen Rieck
      May 6 at 10:02




      9




      9





      Because crypto-mining is the canonical example of sacrificing safety against performance.

      – Eugen Rieck
      May 6 at 10:13





      Because crypto-mining is the canonical example of sacrificing safety against performance.

      – Eugen Rieck
      May 6 at 10:13




      6




      6





      Anecdotal evidence: On my older desktop computer, I've managed to fry both available CPU fan power ports (too much dust), so I decided to see if I could run the machine without a CPU fan by keeping a close eye on the CPU temperature. It hit 90C in a couple of minutes, then slowed down significantly. It was a Pentium, I believe.

      – John Dvorak
      May 6 at 10:13





      Anecdotal evidence: On my older desktop computer, I've managed to fry both available CPU fan power ports (too much dust), so I decided to see if I could run the machine without a CPU fan by keeping a close eye on the CPU temperature. It hit 90C in a couple of minutes, then slowed down significantly. It was a Pentium, I believe.

      – John Dvorak
      May 6 at 10:13




      8




      8





      @JohnDvorak Yes, all non-ancient CPUs employ a similar method

      – Eugen Rieck
      May 6 at 10:16





      @JohnDvorak Yes, all non-ancient CPUs employ a similar method

      – Eugen Rieck
      May 6 at 10:16













      9














      A house fire is extremely unlikely, but the lifespan of the card may be reduced.



      Long-term overheating of the GPU chip probably won't start a fire. The chip may deteriorate and start misbehaving or die completely, but silicon chips aren't too flammable. Bad things usually happen when electrolytic capacitors fail and blow up, but these won't be subject to overheating just because the card is doing a lot of crunching and you also hopefully have a metal PC case to contain the hot shrapnel that results from such failures.



      However, consumer-grade parts aren't in general designed for long-term 24/7 loads. It is thus fairly likely that the card will die sooner than if it wasn't subject to such loads. It is hard to say how much sooner without having some more statistics on a given model. Some people in the HPC community advocate using high-end gaming GPUs instead of special HPC compute parts, and there seems to be some economical sense in that. Although the commodity parts die in a year or so, it's cheaper to keep replacing them because they're many times cheaper than the alternative






      share|improve this answer


















      • 6





        Mechanical stress is the worst when heating up and cooling down, rather than running at a constant temperature. What the OP is planning to do is no worse than playing GPU-intensive games every day for a few months.

        – Dmitry Grigoryev
        May 6 at 11:55






      • 3





        Unfortunately, Nvidia's license prevents a HPC data center from using consumer-grade gaming GPUs in their servers. We're required to use higher-end GPUs and I currently have a order in for P100's when the researchers would actually prefer 1080Ti cards.

        – doneal24
        May 6 at 16:11






      • 4





        @technical_difficulty The EULA for the nVidia driver says that it doesn't apply to datacenter HPC use (that's certainly a concern for large centers, but it doesn't stop people building in-house HPC clusters from consumer parts). There's a decent writeup here: microway.com/knowledge-center-articles/…

        – TooTea
        May 6 at 19:28






      • 4





        @user912264 Yes, that's what I meant, although presumably with a poor choice of words on my end. I'm by no means a lawyer or a licensing expert. My point is that you're not allowed to use the driver in such a situation (because you can't rely on the normal free license).

        – TooTea
        May 6 at 20:06






      • 2





        Also note that you get more wear on the GPU fans from keeping them at a high speed.

        – user71659
        May 6 at 21:54















      9














      A house fire is extremely unlikely, but the lifespan of the card may be reduced.



      Long-term overheating of the GPU chip probably won't start a fire. The chip may deteriorate and start misbehaving or die completely, but silicon chips aren't too flammable. Bad things usually happen when electrolytic capacitors fail and blow up, but these won't be subject to overheating just because the card is doing a lot of crunching and you also hopefully have a metal PC case to contain the hot shrapnel that results from such failures.



      However, consumer-grade parts aren't in general designed for long-term 24/7 loads. It is thus fairly likely that the card will die sooner than if it wasn't subject to such loads. It is hard to say how much sooner without having some more statistics on a given model. Some people in the HPC community advocate using high-end gaming GPUs instead of special HPC compute parts, and there seems to be some economical sense in that. Although the commodity parts die in a year or so, it's cheaper to keep replacing them because they're many times cheaper than the alternative






      share|improve this answer


















      • 6





        Mechanical stress is the worst when heating up and cooling down, rather than running at a constant temperature. What the OP is planning to do is no worse than playing GPU-intensive games every day for a few months.

        – Dmitry Grigoryev
        May 6 at 11:55






      • 3





        Unfortunately, Nvidia's license prevents a HPC data center from using consumer-grade gaming GPUs in their servers. We're required to use higher-end GPUs and I currently have a order in for P100's when the researchers would actually prefer 1080Ti cards.

        – doneal24
        May 6 at 16:11






      • 4





        @technical_difficulty The EULA for the nVidia driver says that it doesn't apply to datacenter HPC use (that's certainly a concern for large centers, but it doesn't stop people building in-house HPC clusters from consumer parts). There's a decent writeup here: microway.com/knowledge-center-articles/…

        – TooTea
        May 6 at 19:28






      • 4





        @user912264 Yes, that's what I meant, although presumably with a poor choice of words on my end. I'm by no means a lawyer or a licensing expert. My point is that you're not allowed to use the driver in such a situation (because you can't rely on the normal free license).

        – TooTea
        May 6 at 20:06






      • 2





        Also note that you get more wear on the GPU fans from keeping them at a high speed.

        – user71659
        May 6 at 21:54













      9












      9








      9







      A house fire is extremely unlikely, but the lifespan of the card may be reduced.



      Long-term overheating of the GPU chip probably won't start a fire. The chip may deteriorate and start misbehaving or die completely, but silicon chips aren't too flammable. Bad things usually happen when electrolytic capacitors fail and blow up, but these won't be subject to overheating just because the card is doing a lot of crunching and you also hopefully have a metal PC case to contain the hot shrapnel that results from such failures.



      However, consumer-grade parts aren't in general designed for long-term 24/7 loads. It is thus fairly likely that the card will die sooner than if it wasn't subject to such loads. It is hard to say how much sooner without having some more statistics on a given model. Some people in the HPC community advocate using high-end gaming GPUs instead of special HPC compute parts, and there seems to be some economical sense in that. Although the commodity parts die in a year or so, it's cheaper to keep replacing them because they're many times cheaper than the alternative






      share|improve this answer













      A house fire is extremely unlikely, but the lifespan of the card may be reduced.



      Long-term overheating of the GPU chip probably won't start a fire. The chip may deteriorate and start misbehaving or die completely, but silicon chips aren't too flammable. Bad things usually happen when electrolytic capacitors fail and blow up, but these won't be subject to overheating just because the card is doing a lot of crunching and you also hopefully have a metal PC case to contain the hot shrapnel that results from such failures.



      However, consumer-grade parts aren't in general designed for long-term 24/7 loads. It is thus fairly likely that the card will die sooner than if it wasn't subject to such loads. It is hard to say how much sooner without having some more statistics on a given model. Some people in the HPC community advocate using high-end gaming GPUs instead of special HPC compute parts, and there seems to be some economical sense in that. Although the commodity parts die in a year or so, it's cheaper to keep replacing them because they're many times cheaper than the alternative







      share|improve this answer












      share|improve this answer



      share|improve this answer










      answered May 6 at 9:55









      TooTeaTooTea

      2295




      2295







      • 6





        Mechanical stress is the worst when heating up and cooling down, rather than running at a constant temperature. What the OP is planning to do is no worse than playing GPU-intensive games every day for a few months.

        – Dmitry Grigoryev
        May 6 at 11:55






      • 3





        Unfortunately, Nvidia's license prevents a HPC data center from using consumer-grade gaming GPUs in their servers. We're required to use higher-end GPUs and I currently have a order in for P100's when the researchers would actually prefer 1080Ti cards.

        – doneal24
        May 6 at 16:11






      • 4





        @technical_difficulty The EULA for the nVidia driver says that it doesn't apply to datacenter HPC use (that's certainly a concern for large centers, but it doesn't stop people building in-house HPC clusters from consumer parts). There's a decent writeup here: microway.com/knowledge-center-articles/…

        – TooTea
        May 6 at 19:28






      • 4





        @user912264 Yes, that's what I meant, although presumably with a poor choice of words on my end. I'm by no means a lawyer or a licensing expert. My point is that you're not allowed to use the driver in such a situation (because you can't rely on the normal free license).

        – TooTea
        May 6 at 20:06






      • 2





        Also note that you get more wear on the GPU fans from keeping them at a high speed.

        – user71659
        May 6 at 21:54












      • 6





        Mechanical stress is the worst when heating up and cooling down, rather than running at a constant temperature. What the OP is planning to do is no worse than playing GPU-intensive games every day for a few months.

        – Dmitry Grigoryev
        May 6 at 11:55






      • 3





        Unfortunately, Nvidia's license prevents a HPC data center from using consumer-grade gaming GPUs in their servers. We're required to use higher-end GPUs and I currently have a order in for P100's when the researchers would actually prefer 1080Ti cards.

        – doneal24
        May 6 at 16:11






      • 4





        @technical_difficulty The EULA for the nVidia driver says that it doesn't apply to datacenter HPC use (that's certainly a concern for large centers, but it doesn't stop people building in-house HPC clusters from consumer parts). There's a decent writeup here: microway.com/knowledge-center-articles/…

        – TooTea
        May 6 at 19:28






      • 4





        @user912264 Yes, that's what I meant, although presumably with a poor choice of words on my end. I'm by no means a lawyer or a licensing expert. My point is that you're not allowed to use the driver in such a situation (because you can't rely on the normal free license).

        – TooTea
        May 6 at 20:06






      • 2





        Also note that you get more wear on the GPU fans from keeping them at a high speed.

        – user71659
        May 6 at 21:54







      6




      6





      Mechanical stress is the worst when heating up and cooling down, rather than running at a constant temperature. What the OP is planning to do is no worse than playing GPU-intensive games every day for a few months.

      – Dmitry Grigoryev
      May 6 at 11:55





      Mechanical stress is the worst when heating up and cooling down, rather than running at a constant temperature. What the OP is planning to do is no worse than playing GPU-intensive games every day for a few months.

      – Dmitry Grigoryev
      May 6 at 11:55




      3




      3





      Unfortunately, Nvidia's license prevents a HPC data center from using consumer-grade gaming GPUs in their servers. We're required to use higher-end GPUs and I currently have a order in for P100's when the researchers would actually prefer 1080Ti cards.

      – doneal24
      May 6 at 16:11





      Unfortunately, Nvidia's license prevents a HPC data center from using consumer-grade gaming GPUs in their servers. We're required to use higher-end GPUs and I currently have a order in for P100's when the researchers would actually prefer 1080Ti cards.

      – doneal24
      May 6 at 16:11




      4




      4





      @technical_difficulty The EULA for the nVidia driver says that it doesn't apply to datacenter HPC use (that's certainly a concern for large centers, but it doesn't stop people building in-house HPC clusters from consumer parts). There's a decent writeup here: microway.com/knowledge-center-articles/…

      – TooTea
      May 6 at 19:28





      @technical_difficulty The EULA for the nVidia driver says that it doesn't apply to datacenter HPC use (that's certainly a concern for large centers, but it doesn't stop people building in-house HPC clusters from consumer parts). There's a decent writeup here: microway.com/knowledge-center-articles/…

      – TooTea
      May 6 at 19:28




      4




      4





      @user912264 Yes, that's what I meant, although presumably with a poor choice of words on my end. I'm by no means a lawyer or a licensing expert. My point is that you're not allowed to use the driver in such a situation (because you can't rely on the normal free license).

      – TooTea
      May 6 at 20:06





      @user912264 Yes, that's what I meant, although presumably with a poor choice of words on my end. I'm by no means a lawyer or a licensing expert. My point is that you're not allowed to use the driver in such a situation (because you can't rely on the normal free license).

      – TooTea
      May 6 at 20:06




      2




      2





      Also note that you get more wear on the GPU fans from keeping them at a high speed.

      – user71659
      May 6 at 21:54





      Also note that you get more wear on the GPU fans from keeping them at a high speed.

      – user71659
      May 6 at 21:54











      6














      Yes, the card is likely to wear out sooner if it is under constant load. At small geometries, Electromigration is a significant source of device failures, and devices will typically be designed with a specific target lifetime in mind. This might be generous for typical operation (e.g. 5 years continuous operation), but might not assume 100% maximum operating point for all of that time. As soon as you start over-clocking, you can expect that target to reduce significantly. (Equally, running at only 80% load would maybe double the lifetime due to this failure mechanism).



      There are of course other failures which relate to running components hot, or thermal cycling, this is just to point out that modern electronics (and even 1980's electronics when badly designed) can be suceptible to 'wearing out'.






      share|improve this answer























      • But would it wear out more doing the same workload over a shorter period of time? In other words, would running at 100% for some period be worse than running it at an average of 50% for double that time?

        – trlkly
        May 8 at 4:34






      • 2





        Yes, exactly. Hotter, or higher voltages mean higher fatigue for the same 'work'. To a first order approximation, for this effect.

        – Sean Houlihane
        May 8 at 6:19















      6














      Yes, the card is likely to wear out sooner if it is under constant load. At small geometries, Electromigration is a significant source of device failures, and devices will typically be designed with a specific target lifetime in mind. This might be generous for typical operation (e.g. 5 years continuous operation), but might not assume 100% maximum operating point for all of that time. As soon as you start over-clocking, you can expect that target to reduce significantly. (Equally, running at only 80% load would maybe double the lifetime due to this failure mechanism).



      There are of course other failures which relate to running components hot, or thermal cycling, this is just to point out that modern electronics (and even 1980's electronics when badly designed) can be suceptible to 'wearing out'.






      share|improve this answer























      • But would it wear out more doing the same workload over a shorter period of time? In other words, would running at 100% for some period be worse than running it at an average of 50% for double that time?

        – trlkly
        May 8 at 4:34






      • 2





        Yes, exactly. Hotter, or higher voltages mean higher fatigue for the same 'work'. To a first order approximation, for this effect.

        – Sean Houlihane
        May 8 at 6:19













      6












      6








      6







      Yes, the card is likely to wear out sooner if it is under constant load. At small geometries, Electromigration is a significant source of device failures, and devices will typically be designed with a specific target lifetime in mind. This might be generous for typical operation (e.g. 5 years continuous operation), but might not assume 100% maximum operating point for all of that time. As soon as you start over-clocking, you can expect that target to reduce significantly. (Equally, running at only 80% load would maybe double the lifetime due to this failure mechanism).



      There are of course other failures which relate to running components hot, or thermal cycling, this is just to point out that modern electronics (and even 1980's electronics when badly designed) can be suceptible to 'wearing out'.






      share|improve this answer













      Yes, the card is likely to wear out sooner if it is under constant load. At small geometries, Electromigration is a significant source of device failures, and devices will typically be designed with a specific target lifetime in mind. This might be generous for typical operation (e.g. 5 years continuous operation), but might not assume 100% maximum operating point for all of that time. As soon as you start over-clocking, you can expect that target to reduce significantly. (Equally, running at only 80% load would maybe double the lifetime due to this failure mechanism).



      There are of course other failures which relate to running components hot, or thermal cycling, this is just to point out that modern electronics (and even 1980's electronics when badly designed) can be suceptible to 'wearing out'.







      share|improve this answer












      share|improve this answer



      share|improve this answer










      answered May 7 at 8:35









      Sean HoulihaneSean Houlihane

      1614




      1614












      • But would it wear out more doing the same workload over a shorter period of time? In other words, would running at 100% for some period be worse than running it at an average of 50% for double that time?

        – trlkly
        May 8 at 4:34






      • 2





        Yes, exactly. Hotter, or higher voltages mean higher fatigue for the same 'work'. To a first order approximation, for this effect.

        – Sean Houlihane
        May 8 at 6:19

















      • But would it wear out more doing the same workload over a shorter period of time? In other words, would running at 100% for some period be worse than running it at an average of 50% for double that time?

        – trlkly
        May 8 at 4:34






      • 2





        Yes, exactly. Hotter, or higher voltages mean higher fatigue for the same 'work'. To a first order approximation, for this effect.

        – Sean Houlihane
        May 8 at 6:19
















      But would it wear out more doing the same workload over a shorter period of time? In other words, would running at 100% for some period be worse than running it at an average of 50% for double that time?

      – trlkly
      May 8 at 4:34





      But would it wear out more doing the same workload over a shorter period of time? In other words, would running at 100% for some period be worse than running it at an average of 50% for double that time?

      – trlkly
      May 8 at 4:34




      2




      2





      Yes, exactly. Hotter, or higher voltages mean higher fatigue for the same 'work'. To a first order approximation, for this effect.

      – Sean Houlihane
      May 8 at 6:19





      Yes, exactly. Hotter, or higher voltages mean higher fatigue for the same 'work'. To a first order approximation, for this effect.

      – Sean Houlihane
      May 8 at 6:19











      3














      If your cooling system works OK, and your hardware is of any kind of even vaguely modern design that includes on-chip temperature monitoring and thermal throttling/suspend/shutdown, then it's entirely safe. It can't overheat so long as the cooler keeps running, and if that fails, the chips will throttle back until they're no longer producing more heat than can be passively dissipated (which may mean having to suspend completely, appearing like a hang/crash).



      Worst case, if the throttling doesn't kick in fast and hard enough to compensate for accumulated thermal load, some part of the chip may end up melting or burning out, and you'll end up with a dead board, but by that point the throttling circuitry should have slammed into complete emergency shutdown, maybe even tripping a (temporary or permanent) fuse on the power rail, preventing any kind of runaway dumping of the entire input voltage randomly across the die and an actual fire.



      Thankfully, the PC platform worked out most of the kinks in that kind of thermal protection system 10-15 years ago, after the minor scandal of some mid generation PIIIs and Athlons proving entirely capable of completely smoking themselves (and thus being a fire risk) if the cooler failed or fell off whilst the CPU was running at full tilt. One generation of chips later and it could be easily demonstrated that an overclocked high-end processor barely exceeded the maximum rated temperature at the heat spreader surface if you tore the heatsink and fan off right in the middle of a heavy benchmark... the computer slowed to a crawl or even suffered a "fatal" (to the software; the hardware just needed the HSF replaced and rebooting) crash, but the chips survived and no risk arose. Hopefully any GPU maker worth their salt isn't going to be a decade and a half behind the curve, especially when their products can already run pretty close to their rated limits temperature-wise.



      However, that doesn't make this kind of treatment entirely "safe" for the transistors on the chip. Heavyweight "number crunching" (Bitcoin? Protein folding?) using GPUs is by now a rather infamous way of literally wearing out the silicon. The combination of high voltage and current, continual switching billions of times per second, plus sustained high temperatures stress the components quite a bit, both the chips and the support parts like capacitors, so their operating lifetime can be reduced to barely two years in some cases, at least at full speed. They can then run on a bit longer if derated (maximum clock speed limited etc) and employed for less demanding purposes, like last year's games, but are on borrowed time once they start erroring out at maximum speed.



      So it's not going to catch on fire, but I wouldn't bank on the card still being reliable past its third birthday in that employment...






      share|improve this answer























      • Crypto mining especially tends to operate multiple cards packed onto one mobo with inadequate airflow, resulting in high temperatures. Using one card in a good tower case with proper airflow should be significantly less stressful, although there is wear and tear on the fans. And as Sean's and your answers point out, electromigration from being powered up to full voltage can still be a concern even if temperatures are kept in check.

        – Peter Cordes
        May 7 at 20:34















      3














      If your cooling system works OK, and your hardware is of any kind of even vaguely modern design that includes on-chip temperature monitoring and thermal throttling/suspend/shutdown, then it's entirely safe. It can't overheat so long as the cooler keeps running, and if that fails, the chips will throttle back until they're no longer producing more heat than can be passively dissipated (which may mean having to suspend completely, appearing like a hang/crash).



      Worst case, if the throttling doesn't kick in fast and hard enough to compensate for accumulated thermal load, some part of the chip may end up melting or burning out, and you'll end up with a dead board, but by that point the throttling circuitry should have slammed into complete emergency shutdown, maybe even tripping a (temporary or permanent) fuse on the power rail, preventing any kind of runaway dumping of the entire input voltage randomly across the die and an actual fire.



      Thankfully, the PC platform worked out most of the kinks in that kind of thermal protection system 10-15 years ago, after the minor scandal of some mid generation PIIIs and Athlons proving entirely capable of completely smoking themselves (and thus being a fire risk) if the cooler failed or fell off whilst the CPU was running at full tilt. One generation of chips later and it could be easily demonstrated that an overclocked high-end processor barely exceeded the maximum rated temperature at the heat spreader surface if you tore the heatsink and fan off right in the middle of a heavy benchmark... the computer slowed to a crawl or even suffered a "fatal" (to the software; the hardware just needed the HSF replaced and rebooting) crash, but the chips survived and no risk arose. Hopefully any GPU maker worth their salt isn't going to be a decade and a half behind the curve, especially when their products can already run pretty close to their rated limits temperature-wise.



      However, that doesn't make this kind of treatment entirely "safe" for the transistors on the chip. Heavyweight "number crunching" (Bitcoin? Protein folding?) using GPUs is by now a rather infamous way of literally wearing out the silicon. The combination of high voltage and current, continual switching billions of times per second, plus sustained high temperatures stress the components quite a bit, both the chips and the support parts like capacitors, so their operating lifetime can be reduced to barely two years in some cases, at least at full speed. They can then run on a bit longer if derated (maximum clock speed limited etc) and employed for less demanding purposes, like last year's games, but are on borrowed time once they start erroring out at maximum speed.



      So it's not going to catch on fire, but I wouldn't bank on the card still being reliable past its third birthday in that employment...






      share|improve this answer























      • Crypto mining especially tends to operate multiple cards packed onto one mobo with inadequate airflow, resulting in high temperatures. Using one card in a good tower case with proper airflow should be significantly less stressful, although there is wear and tear on the fans. And as Sean's and your answers point out, electromigration from being powered up to full voltage can still be a concern even if temperatures are kept in check.

        – Peter Cordes
        May 7 at 20:34













      3












      3








      3







      If your cooling system works OK, and your hardware is of any kind of even vaguely modern design that includes on-chip temperature monitoring and thermal throttling/suspend/shutdown, then it's entirely safe. It can't overheat so long as the cooler keeps running, and if that fails, the chips will throttle back until they're no longer producing more heat than can be passively dissipated (which may mean having to suspend completely, appearing like a hang/crash).



      Worst case, if the throttling doesn't kick in fast and hard enough to compensate for accumulated thermal load, some part of the chip may end up melting or burning out, and you'll end up with a dead board, but by that point the throttling circuitry should have slammed into complete emergency shutdown, maybe even tripping a (temporary or permanent) fuse on the power rail, preventing any kind of runaway dumping of the entire input voltage randomly across the die and an actual fire.



      Thankfully, the PC platform worked out most of the kinks in that kind of thermal protection system 10-15 years ago, after the minor scandal of some mid generation PIIIs and Athlons proving entirely capable of completely smoking themselves (and thus being a fire risk) if the cooler failed or fell off whilst the CPU was running at full tilt. One generation of chips later and it could be easily demonstrated that an overclocked high-end processor barely exceeded the maximum rated temperature at the heat spreader surface if you tore the heatsink and fan off right in the middle of a heavy benchmark... the computer slowed to a crawl or even suffered a "fatal" (to the software; the hardware just needed the HSF replaced and rebooting) crash, but the chips survived and no risk arose. Hopefully any GPU maker worth their salt isn't going to be a decade and a half behind the curve, especially when their products can already run pretty close to their rated limits temperature-wise.



      However, that doesn't make this kind of treatment entirely "safe" for the transistors on the chip. Heavyweight "number crunching" (Bitcoin? Protein folding?) using GPUs is by now a rather infamous way of literally wearing out the silicon. The combination of high voltage and current, continual switching billions of times per second, plus sustained high temperatures stress the components quite a bit, both the chips and the support parts like capacitors, so their operating lifetime can be reduced to barely two years in some cases, at least at full speed. They can then run on a bit longer if derated (maximum clock speed limited etc) and employed for less demanding purposes, like last year's games, but are on borrowed time once they start erroring out at maximum speed.



      So it's not going to catch on fire, but I wouldn't bank on the card still being reliable past its third birthday in that employment...






      share|improve this answer













      If your cooling system works OK, and your hardware is of any kind of even vaguely modern design that includes on-chip temperature monitoring and thermal throttling/suspend/shutdown, then it's entirely safe. It can't overheat so long as the cooler keeps running, and if that fails, the chips will throttle back until they're no longer producing more heat than can be passively dissipated (which may mean having to suspend completely, appearing like a hang/crash).



      Worst case, if the throttling doesn't kick in fast and hard enough to compensate for accumulated thermal load, some part of the chip may end up melting or burning out, and you'll end up with a dead board, but by that point the throttling circuitry should have slammed into complete emergency shutdown, maybe even tripping a (temporary or permanent) fuse on the power rail, preventing any kind of runaway dumping of the entire input voltage randomly across the die and an actual fire.



      Thankfully, the PC platform worked out most of the kinks in that kind of thermal protection system 10-15 years ago, after the minor scandal of some mid generation PIIIs and Athlons proving entirely capable of completely smoking themselves (and thus being a fire risk) if the cooler failed or fell off whilst the CPU was running at full tilt. One generation of chips later and it could be easily demonstrated that an overclocked high-end processor barely exceeded the maximum rated temperature at the heat spreader surface if you tore the heatsink and fan off right in the middle of a heavy benchmark... the computer slowed to a crawl or even suffered a "fatal" (to the software; the hardware just needed the HSF replaced and rebooting) crash, but the chips survived and no risk arose. Hopefully any GPU maker worth their salt isn't going to be a decade and a half behind the curve, especially when their products can already run pretty close to their rated limits temperature-wise.



      However, that doesn't make this kind of treatment entirely "safe" for the transistors on the chip. Heavyweight "number crunching" (Bitcoin? Protein folding?) using GPUs is by now a rather infamous way of literally wearing out the silicon. The combination of high voltage and current, continual switching billions of times per second, plus sustained high temperatures stress the components quite a bit, both the chips and the support parts like capacitors, so their operating lifetime can be reduced to barely two years in some cases, at least at full speed. They can then run on a bit longer if derated (maximum clock speed limited etc) and employed for less demanding purposes, like last year's games, but are on borrowed time once they start erroring out at maximum speed.



      So it's not going to catch on fire, but I wouldn't bank on the card still being reliable past its third birthday in that employment...







      share|improve this answer












      share|improve this answer



      share|improve this answer










      answered May 7 at 18:57









      tahreytahrey

      1312




      1312












      • Crypto mining especially tends to operate multiple cards packed onto one mobo with inadequate airflow, resulting in high temperatures. Using one card in a good tower case with proper airflow should be significantly less stressful, although there is wear and tear on the fans. And as Sean's and your answers point out, electromigration from being powered up to full voltage can still be a concern even if temperatures are kept in check.

        – Peter Cordes
        May 7 at 20:34

















      • Crypto mining especially tends to operate multiple cards packed onto one mobo with inadequate airflow, resulting in high temperatures. Using one card in a good tower case with proper airflow should be significantly less stressful, although there is wear and tear on the fans. And as Sean's and your answers point out, electromigration from being powered up to full voltage can still be a concern even if temperatures are kept in check.

        – Peter Cordes
        May 7 at 20:34
















      Crypto mining especially tends to operate multiple cards packed onto one mobo with inadequate airflow, resulting in high temperatures. Using one card in a good tower case with proper airflow should be significantly less stressful, although there is wear and tear on the fans. And as Sean's and your answers point out, electromigration from being powered up to full voltage can still be a concern even if temperatures are kept in check.

      – Peter Cordes
      May 7 at 20:34





      Crypto mining especially tends to operate multiple cards packed onto one mobo with inadequate airflow, resulting in high temperatures. Using one card in a good tower case with proper airflow should be significantly less stressful, although there is wear and tear on the fans. And as Sean's and your answers point out, electromigration from being powered up to full voltage can still be a concern even if temperatures are kept in check.

      – Peter Cordes
      May 7 at 20:34











      1














      As you have mentioned, ventilation is good, so no need to worry about this factor of risk.



      Talking about the GPU, it will be worn stronger, than on usual office work for 8-16 hours a day, so when using on 100% 24/7/365 it is unlikely it will be able to work for 5-10 years and more. But you must also consider that the GPU can have a poor design of the cooling system of the GPU itself (not a PC overall), a bad overall design, software and firmware bugs, bad production quality or production defect(s) with different severity and defect rate - from single-instance defects to massive ones. These factors can make the heating worse, cause system failure, little lifetime, short-circuiting or even might cause a fire or make you electric struck. Some factors depend on the model and the revision, some are being gradually fixed with the software/firmware updates, some vary from one single item to another. Better choose models with proven reliability reputation with a proper revision (usually the latest possible). Also, it can have a bad influence and interfere badly with the other components, for example, by generating extra electric/electronic signal noise. Also, do not forget about the fact, that the thermal paste can gradually lose its qualities and make cooling worse.



      I must mention, that the graphics card is not the only component to be considered, because a PC is a complex system and its successful work depends on the state of multiple components. Every single little, even if unnecessary and unused, bad component, even the floppy drive or some decorative lights may break the PC down or cause the problems close to the ones mentioned about the GPU. For example, a bad on/off button may cause shutdown or reboot. And now more deep about the key components:



      • CPU: in your use case it is likely to be used not harder than during ordinary day-to-day usage and it is likely that you absolutely do not need to overclock it. Nowadays CPUs feature all defensive mechanisms like throttling and emergency shutdown and are considered to be pretty durable. Just do not forget about the cooler and thermal paste and it is very unlikely to be the weakest point of the system.

      • Motherboard: almost the same as the CPU, but there is heavy usage of PCI-e and maybe heavy usage of disks, network and peripherals, but better choose proven models.

      • RAM: It is extremely unlikely to break, so this risk is not worthy of being worried about. Just use a good one.

      • Disks: in the tasks that rely on disk usage (like data mining, data processing, learning a neural network with the data on the disk) HDD can become a weak point in reliability - in servers and data centres it is pretty common to change a disk in 1-3 years and very rarely "live" 5 years or more. You can use RAID 1 and backup systems to increase reliability at 24/7/365 usage (RAID 0 sacrifices reliability for performance, other RAIDs can take a lot of time to restore data. Also RAID != back up, so do not neglect with backups, if required). When using SSD, operations, that are heavy on disk-writing can drain the terabytes-written limit and make the disk useless - prefer TBW over other features. RAID 1 with SSDs can defend the system against sudden failures of one disk, but do not help with TBW rate. HDD or SSD - depends on your needs, budget and choice. Better choose models with proven reliability reputation with a proper revision (usually the latest possible).

      • Power block: is heavily used by a graphics card and therefore worn more intensively - so better choose models with proven reliability reputation with a proper revision (usually the latest possible) and the power at least 1.5x more than the overall system consumption or at least 2x-2.5x more, than the main power consumers (as the GPU and the CPU). Be sure to use a good 220V AC cable, because of bad 220V AC cables are likely to cause short-circuiting, electric struck or burning (can just make smoke and self-destroy or set a real fire)!

      • Ventilators: while may seem insignificant they are crucial in such use-cases and their failure is a big problem for 24/7/365 systems. Generally, install as many as you can, but also consider the size - bigger ones are quieter and more effective while the smaller ones in some cases can be installed in a bigger amount, so the failure of one single ventilator will be less painful for the system - the choice is yours.

      • Exotic cooling systems: water cooling is considered to be compact and effective in high-heated overclocked systems, but water leakage can cause serious damage to PC`s components. Frozen nitrogen systems are extremely effective but likely not to be required, but are more bulky and expensive.

      Professional enterprise 24/7/365 systems and components are better designed for that and have a reserve on all the components, even CPUs and BIOSes, and feature hot-replacement of components or modules, but even they do not feature 100% uptime (close, but not equal), professional Nvidia cards are faster for CUDA (especially neural networks) but I do not think it is your use case.



      Assembling the system is not less important, than the components themselves. Do not forget about any single action, do not make something wrong, do not make a PC like a stupid and everything must be fine.



      Make sure no software will forcibly shut down, reboot the PC or kill the process. If you are a Win10 user, you may think there is no way of entirely disabling the updates, but there are workarounds and pieces of software on the Web for that (Warning: it can violate EULA).



      Peripherals can also cause problems, like the PC`s components. For example, a bad or worn mouse can register a button press when there is no press.



      About key external circumstances:



      • Electricity: I hope the electricity in your house is very reliable and stable because switching off electricity can make you lose the results of your work. With short-time electric problems, UPS can help you, but with more long-time issues it can give you only time to hibernate the system or to save your progress correctly.

      • Network: if your task relies on the Internet or network connection, check if the wires/modem/router is ok.

      Summing up: There is no solid warranty that everything will be good (literally, only death is guaranteed) and anyway you must accept the risks (they never will be equal to zero), but having a good choice of components, proper assembling and not having bad luck in buying defected components allows you use the PC that way with lower risk, then the question author initially assumed, unless you are going to do it for years and years and expect reliability for 5, 10 and more years.






      share|improve this answer



























        1














        As you have mentioned, ventilation is good, so no need to worry about this factor of risk.



        Talking about the GPU, it will be worn stronger, than on usual office work for 8-16 hours a day, so when using on 100% 24/7/365 it is unlikely it will be able to work for 5-10 years and more. But you must also consider that the GPU can have a poor design of the cooling system of the GPU itself (not a PC overall), a bad overall design, software and firmware bugs, bad production quality or production defect(s) with different severity and defect rate - from single-instance defects to massive ones. These factors can make the heating worse, cause system failure, little lifetime, short-circuiting or even might cause a fire or make you electric struck. Some factors depend on the model and the revision, some are being gradually fixed with the software/firmware updates, some vary from one single item to another. Better choose models with proven reliability reputation with a proper revision (usually the latest possible). Also, it can have a bad influence and interfere badly with the other components, for example, by generating extra electric/electronic signal noise. Also, do not forget about the fact, that the thermal paste can gradually lose its qualities and make cooling worse.



        I must mention, that the graphics card is not the only component to be considered, because a PC is a complex system and its successful work depends on the state of multiple components. Every single little, even if unnecessary and unused, bad component, even the floppy drive or some decorative lights may break the PC down or cause the problems close to the ones mentioned about the GPU. For example, a bad on/off button may cause shutdown or reboot. And now more deep about the key components:



        • CPU: in your use case it is likely to be used not harder than during ordinary day-to-day usage and it is likely that you absolutely do not need to overclock it. Nowadays CPUs feature all defensive mechanisms like throttling and emergency shutdown and are considered to be pretty durable. Just do not forget about the cooler and thermal paste and it is very unlikely to be the weakest point of the system.

        • Motherboard: almost the same as the CPU, but there is heavy usage of PCI-e and maybe heavy usage of disks, network and peripherals, but better choose proven models.

        • RAM: It is extremely unlikely to break, so this risk is not worthy of being worried about. Just use a good one.

        • Disks: in the tasks that rely on disk usage (like data mining, data processing, learning a neural network with the data on the disk) HDD can become a weak point in reliability - in servers and data centres it is pretty common to change a disk in 1-3 years and very rarely "live" 5 years or more. You can use RAID 1 and backup systems to increase reliability at 24/7/365 usage (RAID 0 sacrifices reliability for performance, other RAIDs can take a lot of time to restore data. Also RAID != back up, so do not neglect with backups, if required). When using SSD, operations, that are heavy on disk-writing can drain the terabytes-written limit and make the disk useless - prefer TBW over other features. RAID 1 with SSDs can defend the system against sudden failures of one disk, but do not help with TBW rate. HDD or SSD - depends on your needs, budget and choice. Better choose models with proven reliability reputation with a proper revision (usually the latest possible).

        • Power block: is heavily used by a graphics card and therefore worn more intensively - so better choose models with proven reliability reputation with a proper revision (usually the latest possible) and the power at least 1.5x more than the overall system consumption or at least 2x-2.5x more, than the main power consumers (as the GPU and the CPU). Be sure to use a good 220V AC cable, because of bad 220V AC cables are likely to cause short-circuiting, electric struck or burning (can just make smoke and self-destroy or set a real fire)!

        • Ventilators: while may seem insignificant they are crucial in such use-cases and their failure is a big problem for 24/7/365 systems. Generally, install as many as you can, but also consider the size - bigger ones are quieter and more effective while the smaller ones in some cases can be installed in a bigger amount, so the failure of one single ventilator will be less painful for the system - the choice is yours.

        • Exotic cooling systems: water cooling is considered to be compact and effective in high-heated overclocked systems, but water leakage can cause serious damage to PC`s components. Frozen nitrogen systems are extremely effective but likely not to be required, but are more bulky and expensive.

        Professional enterprise 24/7/365 systems and components are better designed for that and have a reserve on all the components, even CPUs and BIOSes, and feature hot-replacement of components or modules, but even they do not feature 100% uptime (close, but not equal), professional Nvidia cards are faster for CUDA (especially neural networks) but I do not think it is your use case.



        Assembling the system is not less important, than the components themselves. Do not forget about any single action, do not make something wrong, do not make a PC like a stupid and everything must be fine.



        Make sure no software will forcibly shut down, reboot the PC or kill the process. If you are a Win10 user, you may think there is no way of entirely disabling the updates, but there are workarounds and pieces of software on the Web for that (Warning: it can violate EULA).



        Peripherals can also cause problems, like the PC`s components. For example, a bad or worn mouse can register a button press when there is no press.



        About key external circumstances:



        • Electricity: I hope the electricity in your house is very reliable and stable because switching off electricity can make you lose the results of your work. With short-time electric problems, UPS can help you, but with more long-time issues it can give you only time to hibernate the system or to save your progress correctly.

        • Network: if your task relies on the Internet or network connection, check if the wires/modem/router is ok.

        Summing up: There is no solid warranty that everything will be good (literally, only death is guaranteed) and anyway you must accept the risks (they never will be equal to zero), but having a good choice of components, proper assembling and not having bad luck in buying defected components allows you use the PC that way with lower risk, then the question author initially assumed, unless you are going to do it for years and years and expect reliability for 5, 10 and more years.






        share|improve this answer

























          1












          1








          1







          As you have mentioned, ventilation is good, so no need to worry about this factor of risk.



          Talking about the GPU, it will be worn stronger, than on usual office work for 8-16 hours a day, so when using on 100% 24/7/365 it is unlikely it will be able to work for 5-10 years and more. But you must also consider that the GPU can have a poor design of the cooling system of the GPU itself (not a PC overall), a bad overall design, software and firmware bugs, bad production quality or production defect(s) with different severity and defect rate - from single-instance defects to massive ones. These factors can make the heating worse, cause system failure, little lifetime, short-circuiting or even might cause a fire or make you electric struck. Some factors depend on the model and the revision, some are being gradually fixed with the software/firmware updates, some vary from one single item to another. Better choose models with proven reliability reputation with a proper revision (usually the latest possible). Also, it can have a bad influence and interfere badly with the other components, for example, by generating extra electric/electronic signal noise. Also, do not forget about the fact, that the thermal paste can gradually lose its qualities and make cooling worse.



          I must mention, that the graphics card is not the only component to be considered, because a PC is a complex system and its successful work depends on the state of multiple components. Every single little, even if unnecessary and unused, bad component, even the floppy drive or some decorative lights may break the PC down or cause the problems close to the ones mentioned about the GPU. For example, a bad on/off button may cause shutdown or reboot. And now more deep about the key components:



          • CPU: in your use case it is likely to be used not harder than during ordinary day-to-day usage and it is likely that you absolutely do not need to overclock it. Nowadays CPUs feature all defensive mechanisms like throttling and emergency shutdown and are considered to be pretty durable. Just do not forget about the cooler and thermal paste and it is very unlikely to be the weakest point of the system.

          • Motherboard: almost the same as the CPU, but there is heavy usage of PCI-e and maybe heavy usage of disks, network and peripherals, but better choose proven models.

          • RAM: It is extremely unlikely to break, so this risk is not worthy of being worried about. Just use a good one.

          • Disks: in the tasks that rely on disk usage (like data mining, data processing, learning a neural network with the data on the disk) HDD can become a weak point in reliability - in servers and data centres it is pretty common to change a disk in 1-3 years and very rarely "live" 5 years or more. You can use RAID 1 and backup systems to increase reliability at 24/7/365 usage (RAID 0 sacrifices reliability for performance, other RAIDs can take a lot of time to restore data. Also RAID != back up, so do not neglect with backups, if required). When using SSD, operations, that are heavy on disk-writing can drain the terabytes-written limit and make the disk useless - prefer TBW over other features. RAID 1 with SSDs can defend the system against sudden failures of one disk, but do not help with TBW rate. HDD or SSD - depends on your needs, budget and choice. Better choose models with proven reliability reputation with a proper revision (usually the latest possible).

          • Power block: is heavily used by a graphics card and therefore worn more intensively - so better choose models with proven reliability reputation with a proper revision (usually the latest possible) and the power at least 1.5x more than the overall system consumption or at least 2x-2.5x more, than the main power consumers (as the GPU and the CPU). Be sure to use a good 220V AC cable, because of bad 220V AC cables are likely to cause short-circuiting, electric struck or burning (can just make smoke and self-destroy or set a real fire)!

          • Ventilators: while may seem insignificant they are crucial in such use-cases and their failure is a big problem for 24/7/365 systems. Generally, install as many as you can, but also consider the size - bigger ones are quieter and more effective while the smaller ones in some cases can be installed in a bigger amount, so the failure of one single ventilator will be less painful for the system - the choice is yours.

          • Exotic cooling systems: water cooling is considered to be compact and effective in high-heated overclocked systems, but water leakage can cause serious damage to PC`s components. Frozen nitrogen systems are extremely effective but likely not to be required, but are more bulky and expensive.

          Professional enterprise 24/7/365 systems and components are better designed for that and have a reserve on all the components, even CPUs and BIOSes, and feature hot-replacement of components or modules, but even they do not feature 100% uptime (close, but not equal), professional Nvidia cards are faster for CUDA (especially neural networks) but I do not think it is your use case.



          Assembling the system is not less important, than the components themselves. Do not forget about any single action, do not make something wrong, do not make a PC like a stupid and everything must be fine.



          Make sure no software will forcibly shut down, reboot the PC or kill the process. If you are a Win10 user, you may think there is no way of entirely disabling the updates, but there are workarounds and pieces of software on the Web for that (Warning: it can violate EULA).



          Peripherals can also cause problems, like the PC`s components. For example, a bad or worn mouse can register a button press when there is no press.



          About key external circumstances:



          • Electricity: I hope the electricity in your house is very reliable and stable because switching off electricity can make you lose the results of your work. With short-time electric problems, UPS can help you, but with more long-time issues it can give you only time to hibernate the system or to save your progress correctly.

          • Network: if your task relies on the Internet or network connection, check if the wires/modem/router is ok.

          Summing up: There is no solid warranty that everything will be good (literally, only death is guaranteed) and anyway you must accept the risks (they never will be equal to zero), but having a good choice of components, proper assembling and not having bad luck in buying defected components allows you use the PC that way with lower risk, then the question author initially assumed, unless you are going to do it for years and years and expect reliability for 5, 10 and more years.






          share|improve this answer













          As you have mentioned, ventilation is good, so no need to worry about this factor of risk.



          Talking about the GPU, it will be worn stronger, than on usual office work for 8-16 hours a day, so when using on 100% 24/7/365 it is unlikely it will be able to work for 5-10 years and more. But you must also consider that the GPU can have a poor design of the cooling system of the GPU itself (not a PC overall), a bad overall design, software and firmware bugs, bad production quality or production defect(s) with different severity and defect rate - from single-instance defects to massive ones. These factors can make the heating worse, cause system failure, little lifetime, short-circuiting or even might cause a fire or make you electric struck. Some factors depend on the model and the revision, some are being gradually fixed with the software/firmware updates, some vary from one single item to another. Better choose models with proven reliability reputation with a proper revision (usually the latest possible). Also, it can have a bad influence and interfere badly with the other components, for example, by generating extra electric/electronic signal noise. Also, do not forget about the fact, that the thermal paste can gradually lose its qualities and make cooling worse.



          I must mention, that the graphics card is not the only component to be considered, because a PC is a complex system and its successful work depends on the state of multiple components. Every single little, even if unnecessary and unused, bad component, even the floppy drive or some decorative lights may break the PC down or cause the problems close to the ones mentioned about the GPU. For example, a bad on/off button may cause shutdown or reboot. And now more deep about the key components:



          • CPU: in your use case it is likely to be used not harder than during ordinary day-to-day usage and it is likely that you absolutely do not need to overclock it. Nowadays CPUs feature all defensive mechanisms like throttling and emergency shutdown and are considered to be pretty durable. Just do not forget about the cooler and thermal paste and it is very unlikely to be the weakest point of the system.

          • Motherboard: almost the same as the CPU, but there is heavy usage of PCI-e and maybe heavy usage of disks, network and peripherals, but better choose proven models.

          • RAM: It is extremely unlikely to break, so this risk is not worthy of being worried about. Just use a good one.

          • Disks: in the tasks that rely on disk usage (like data mining, data processing, learning a neural network with the data on the disk) HDD can become a weak point in reliability - in servers and data centres it is pretty common to change a disk in 1-3 years and very rarely "live" 5 years or more. You can use RAID 1 and backup systems to increase reliability at 24/7/365 usage (RAID 0 sacrifices reliability for performance, other RAIDs can take a lot of time to restore data. Also RAID != back up, so do not neglect with backups, if required). When using SSD, operations, that are heavy on disk-writing can drain the terabytes-written limit and make the disk useless - prefer TBW over other features. RAID 1 with SSDs can defend the system against sudden failures of one disk, but do not help with TBW rate. HDD or SSD - depends on your needs, budget and choice. Better choose models with proven reliability reputation with a proper revision (usually the latest possible).

          • Power block: is heavily used by a graphics card and therefore worn more intensively - so better choose models with proven reliability reputation with a proper revision (usually the latest possible) and the power at least 1.5x more than the overall system consumption or at least 2x-2.5x more, than the main power consumers (as the GPU and the CPU). Be sure to use a good 220V AC cable, because of bad 220V AC cables are likely to cause short-circuiting, electric struck or burning (can just make smoke and self-destroy or set a real fire)!

          • Ventilators: while may seem insignificant they are crucial in such use-cases and their failure is a big problem for 24/7/365 systems. Generally, install as many as you can, but also consider the size - bigger ones are quieter and more effective while the smaller ones in some cases can be installed in a bigger amount, so the failure of one single ventilator will be less painful for the system - the choice is yours.

          • Exotic cooling systems: water cooling is considered to be compact and effective in high-heated overclocked systems, but water leakage can cause serious damage to PC`s components. Frozen nitrogen systems are extremely effective but likely not to be required, but are more bulky and expensive.

          Professional enterprise 24/7/365 systems and components are better designed for that and have a reserve on all the components, even CPUs and BIOSes, and feature hot-replacement of components or modules, but even they do not feature 100% uptime (close, but not equal), professional Nvidia cards are faster for CUDA (especially neural networks) but I do not think it is your use case.



          Assembling the system is not less important, than the components themselves. Do not forget about any single action, do not make something wrong, do not make a PC like a stupid and everything must be fine.



          Make sure no software will forcibly shut down, reboot the PC or kill the process. If you are a Win10 user, you may think there is no way of entirely disabling the updates, but there are workarounds and pieces of software on the Web for that (Warning: it can violate EULA).



          Peripherals can also cause problems, like the PC`s components. For example, a bad or worn mouse can register a button press when there is no press.



          About key external circumstances:



          • Electricity: I hope the electricity in your house is very reliable and stable because switching off electricity can make you lose the results of your work. With short-time electric problems, UPS can help you, but with more long-time issues it can give you only time to hibernate the system or to save your progress correctly.

          • Network: if your task relies on the Internet or network connection, check if the wires/modem/router is ok.

          Summing up: There is no solid warranty that everything will be good (literally, only death is guaranteed) and anyway you must accept the risks (they never will be equal to zero), but having a good choice of components, proper assembling and not having bad luck in buying defected components allows you use the PC that way with lower risk, then the question author initially assumed, unless you are going to do it for years and years and expect reliability for 5, 10 and more years.







          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered May 8 at 8:59









          bpalijbpalij

          113




          113





















              0















              Is it safe to keep the GPU on 100% utilization for a very long time?




              Yes. It's actually safer than using it for the intended purpose, that is playing a game once in a while.



              The most wear (of the electronics) comes from mechanical stress from changing temperature. The components heat up at different rates, their thermal expansion coefficients are different, therefore every heat up, cool down cycle results in forces that try to tear the card apart, often resulting in micro-damages that accumulate and can eventually lead to failure. Don't be alarmed, it's supposed to take decades. (Unlike the infamous 2006 nVidia laptop GPUs that used wrong solder so the failures occurred soon enough to be noticeable within component's lifetime)



              If you start your computation and keep them at constant rate, it's actually less stressful to the card, as it warms up and then stays there, without the thermal cycles.



              The only parts that will see increased wear are the fans, which are usually easy to replace.



              As to your plan on actual 100% utilization - 100% is inefficient. Learn from the lesson that cryptominers taught us: as you underclock and undervolt the card, the flops go down, but consumed power goes down even more. You'll get more performance per watt. And even better lifespan.






              share|improve this answer



























                0















                Is it safe to keep the GPU on 100% utilization for a very long time?




                Yes. It's actually safer than using it for the intended purpose, that is playing a game once in a while.



                The most wear (of the electronics) comes from mechanical stress from changing temperature. The components heat up at different rates, their thermal expansion coefficients are different, therefore every heat up, cool down cycle results in forces that try to tear the card apart, often resulting in micro-damages that accumulate and can eventually lead to failure. Don't be alarmed, it's supposed to take decades. (Unlike the infamous 2006 nVidia laptop GPUs that used wrong solder so the failures occurred soon enough to be noticeable within component's lifetime)



                If you start your computation and keep them at constant rate, it's actually less stressful to the card, as it warms up and then stays there, without the thermal cycles.



                The only parts that will see increased wear are the fans, which are usually easy to replace.



                As to your plan on actual 100% utilization - 100% is inefficient. Learn from the lesson that cryptominers taught us: as you underclock and undervolt the card, the flops go down, but consumed power goes down even more. You'll get more performance per watt. And even better lifespan.






                share|improve this answer

























                  0












                  0








                  0








                  Is it safe to keep the GPU on 100% utilization for a very long time?




                  Yes. It's actually safer than using it for the intended purpose, that is playing a game once in a while.



                  The most wear (of the electronics) comes from mechanical stress from changing temperature. The components heat up at different rates, their thermal expansion coefficients are different, therefore every heat up, cool down cycle results in forces that try to tear the card apart, often resulting in micro-damages that accumulate and can eventually lead to failure. Don't be alarmed, it's supposed to take decades. (Unlike the infamous 2006 nVidia laptop GPUs that used wrong solder so the failures occurred soon enough to be noticeable within component's lifetime)



                  If you start your computation and keep them at constant rate, it's actually less stressful to the card, as it warms up and then stays there, without the thermal cycles.



                  The only parts that will see increased wear are the fans, which are usually easy to replace.



                  As to your plan on actual 100% utilization - 100% is inefficient. Learn from the lesson that cryptominers taught us: as you underclock and undervolt the card, the flops go down, but consumed power goes down even more. You'll get more performance per watt. And even better lifespan.






                  share|improve this answer














                  Is it safe to keep the GPU on 100% utilization for a very long time?




                  Yes. It's actually safer than using it for the intended purpose, that is playing a game once in a while.



                  The most wear (of the electronics) comes from mechanical stress from changing temperature. The components heat up at different rates, their thermal expansion coefficients are different, therefore every heat up, cool down cycle results in forces that try to tear the card apart, often resulting in micro-damages that accumulate and can eventually lead to failure. Don't be alarmed, it's supposed to take decades. (Unlike the infamous 2006 nVidia laptop GPUs that used wrong solder so the failures occurred soon enough to be noticeable within component's lifetime)



                  If you start your computation and keep them at constant rate, it's actually less stressful to the card, as it warms up and then stays there, without the thermal cycles.



                  The only parts that will see increased wear are the fans, which are usually easy to replace.



                  As to your plan on actual 100% utilization - 100% is inefficient. Learn from the lesson that cryptominers taught us: as you underclock and undervolt the card, the flops go down, but consumed power goes down even more. You'll get more performance per watt. And even better lifespan.







                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered May 8 at 18:26









                  Agent_LAgent_L

                  1,318813




                  1,318813



























                      draft saved

                      draft discarded
















































                      Thanks for contributing an answer to Super User!


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid


                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.

                      To learn more, see our tips on writing great answers.




                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function ()
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fsuperuser.com%2fquestions%2f1433515%2fis-it-safe-to-keep-the-gpu-on-100-utilization-for-a-very-long-time%23new-answer', 'question_page');

                      );

                      Post as a guest















                      Required, but never shown





















































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown

































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown







                      Popular posts from this blog

                      Wikipedia:Vital articles Мазмуну Biography - Өмүр баян Philosophy and psychology - Философия жана психология Religion - Дин Social sciences - Коомдук илимдер Language and literature - Тил жана адабият Science - Илим Technology - Технология Arts and recreation - Искусство жана эс алуу History and geography - Тарых жана география Навигация менюсу

                      Bruxelas-Capital Índice Historia | Composición | Situación lingüística | Clima | Cidades irmandadas | Notas | Véxase tamén | Menú de navegacióneO uso das linguas en Bruxelas e a situación do neerlandés"Rexión de Bruxelas Capital"o orixinalSitio da rexiónPáxina de Bruselas no sitio da Oficina de Promoción Turística de Valonia e BruxelasMapa Interactivo da Rexión de Bruxelas-CapitaleeWorldCat332144929079854441105155190212ID28008674080552-90000 0001 0666 3698n94104302ID540940339365017018237

                      What should I write in an apology letter, since I have decided not to join a company after accepting an offer letterShould I keep looking after accepting a job offer?What should I do when I've been verbally told I would get an offer letter, but still haven't gotten one after 4 weeks?Do I accept an offer from a company that I am not likely to join?New job hasn't confirmed starting date and I want to give current employer as much notice as possibleHow should I address my manager in my resignation letter?HR delayed background verification, now jobless as resignedNo email communication after accepting a formal written offer. How should I phrase the call?What should I do if after receiving a verbal offer letter I am informed that my written job offer is put on hold due to some internal issues?Should I inform the current employer that I am about to resign within 1-2 weeks since I have signed the offer letter and waiting for visa?What company will do, if I send their offer letter to another company