Backup strategy for millions of files in lots of directoriesbackup solution for media company (large files)Windows Server 2008 - copy imaging backups to multiple devices?Store and backup 200 million small filesCannot get Backup Exec 11d to recognize my RDX device on windows server?recommendations for disk -> usb backup softwareCan I split open/divide a large MS SQL disk media set?How to backup 20+TB of data?Good backup strategy for heterogeneous data consisting of images/databases/office files/svn repositories/Best format/approach for one-off backups to tapes?Backup strategy for user uploaded files

How to handle self harm scars on the arm in work environment?

A IP can traceroute to it, but can not ping

SQL counting distinct over partition

How can I make some of my chapters "come to life"?

Teaching a class likely meant to inflate the GPA of student athletes

Group Integers by Originality

Did Milano or Benatar approve or comment on their namesake MCU ships?

Geopandas and QGIS Calulating Different Polygon Area Values?

How does an ordinary object become radioactive?

How can this tool find out registered domains from an IP?

Why do some employees fill out a W-4 and some don't?

Importance of Building Credit Score?

What is the purpose of the goat for Azazel, as opposed to conventional offerings?

How to manually rewind film?

How do I prevent employees from either switching to competitors or opening their own business?

You have (3^2 + 2^3 + 2^2) Guesses Left. Figure out the Last one

How do governments keep track of their issued currency?

Should I give professor gift at the beginning of my PhD?

Longest bridge/tunnel that can be cycled over/through?

Using "subway" as name for London Underground?

Were Alexander the Great and Hephaestion lovers?

Implement Own Vector Class in C++

How did old MS-DOS games utilize various graphic cards?

How to communicate to my GM that not being allowed to use stealth isn't fun for me?



Backup strategy for millions of files in lots of directories


backup solution for media company (large files)Windows Server 2008 - copy imaging backups to multiple devices?Store and backup 200 million small filesCannot get Backup Exec 11d to recognize my RDX device on windows server?recommendations for disk -> usb backup softwareCan I split open/divide a large MS SQL disk media set?How to backup 20+TB of data?Good backup strategy for heterogeneous data consisting of images/databases/office files/svn repositories/Best format/approach for one-off backups to tapes?Backup strategy for user uploaded files






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;








0















We have millions of files in lots of directories, for example:



0000.txt
0001.pdf
0002.html
... so on
5551231.txt


backing up these to tape is slow as backing up data in this format is much slower than backing up a single large file.




The total number of files on a disk and the relative size of each file impacts backup performance. Fastest backups occur when the disk contains fewer large size files. Slowest backups occur when the disk contains thousands of small files. Backup Exec Admin Guide.




Would the backup performance significantly increase by creating a virtual hard drive, hosting the data on it once mounted then backing up the vhd instead?



I'm unsure if the underlying data within the vhd would affect this.



what are the drawbacks to this method?










share|improve this question






















  • Most backup software allows you to run backups to a hard disk based staging pool, and then relocate those jobs to tape. In this case, the backup archives are created on disk, which are much more well suited to this, and then large archive files are written to tape.

    – EEAA
    Aug 17 '14 at 22:24











  • What operating system and filesystem are you writing about?

    – ewwhite
    Aug 17 '14 at 22:54











  • 1. A backup to disk job is probably going to be faster than a direct backup to tape job. You can then configure/run a duplicate job, which will backup the backup to disk files to tape. 2. Yes, hosting the files on a VHD and backing up the VHD should be faster. You'll need to make sure that the backup product you use to back up the VHD allows for file level restores from the VHD.

    – joeqwerty
    Aug 17 '14 at 22:57











  • Why would backup products use the hard drive as a staging file? Surly they would use RAM? I'm only interested in restoring to a point in time, not individual files. I may do an experiment...

    – Mark Price
    Aug 18 '14 at 8:04











  • Is this on Windows? If you had access to ZFS you could send/receive snapshots.

    – ptman
    Aug 18 '14 at 10:42

















0















We have millions of files in lots of directories, for example:



0000.txt
0001.pdf
0002.html
... so on
5551231.txt


backing up these to tape is slow as backing up data in this format is much slower than backing up a single large file.




The total number of files on a disk and the relative size of each file impacts backup performance. Fastest backups occur when the disk contains fewer large size files. Slowest backups occur when the disk contains thousands of small files. Backup Exec Admin Guide.




Would the backup performance significantly increase by creating a virtual hard drive, hosting the data on it once mounted then backing up the vhd instead?



I'm unsure if the underlying data within the vhd would affect this.



what are the drawbacks to this method?










share|improve this question






















  • Most backup software allows you to run backups to a hard disk based staging pool, and then relocate those jobs to tape. In this case, the backup archives are created on disk, which are much more well suited to this, and then large archive files are written to tape.

    – EEAA
    Aug 17 '14 at 22:24











  • What operating system and filesystem are you writing about?

    – ewwhite
    Aug 17 '14 at 22:54











  • 1. A backup to disk job is probably going to be faster than a direct backup to tape job. You can then configure/run a duplicate job, which will backup the backup to disk files to tape. 2. Yes, hosting the files on a VHD and backing up the VHD should be faster. You'll need to make sure that the backup product you use to back up the VHD allows for file level restores from the VHD.

    – joeqwerty
    Aug 17 '14 at 22:57











  • Why would backup products use the hard drive as a staging file? Surly they would use RAM? I'm only interested in restoring to a point in time, not individual files. I may do an experiment...

    – Mark Price
    Aug 18 '14 at 8:04











  • Is this on Windows? If you had access to ZFS you could send/receive snapshots.

    – ptman
    Aug 18 '14 at 10:42













0












0








0








We have millions of files in lots of directories, for example:



0000.txt
0001.pdf
0002.html
... so on
5551231.txt


backing up these to tape is slow as backing up data in this format is much slower than backing up a single large file.




The total number of files on a disk and the relative size of each file impacts backup performance. Fastest backups occur when the disk contains fewer large size files. Slowest backups occur when the disk contains thousands of small files. Backup Exec Admin Guide.




Would the backup performance significantly increase by creating a virtual hard drive, hosting the data on it once mounted then backing up the vhd instead?



I'm unsure if the underlying data within the vhd would affect this.



what are the drawbacks to this method?










share|improve this question














We have millions of files in lots of directories, for example:



0000.txt
0001.pdf
0002.html
... so on
5551231.txt


backing up these to tape is slow as backing up data in this format is much slower than backing up a single large file.




The total number of files on a disk and the relative size of each file impacts backup performance. Fastest backups occur when the disk contains fewer large size files. Slowest backups occur when the disk contains thousands of small files. Backup Exec Admin Guide.




Would the backup performance significantly increase by creating a virtual hard drive, hosting the data on it once mounted then backing up the vhd instead?



I'm unsure if the underlying data within the vhd would affect this.



what are the drawbacks to this method?







backup filesystems lto






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Aug 17 '14 at 22:17









Mark PriceMark Price

1014




1014












  • Most backup software allows you to run backups to a hard disk based staging pool, and then relocate those jobs to tape. In this case, the backup archives are created on disk, which are much more well suited to this, and then large archive files are written to tape.

    – EEAA
    Aug 17 '14 at 22:24











  • What operating system and filesystem are you writing about?

    – ewwhite
    Aug 17 '14 at 22:54











  • 1. A backup to disk job is probably going to be faster than a direct backup to tape job. You can then configure/run a duplicate job, which will backup the backup to disk files to tape. 2. Yes, hosting the files on a VHD and backing up the VHD should be faster. You'll need to make sure that the backup product you use to back up the VHD allows for file level restores from the VHD.

    – joeqwerty
    Aug 17 '14 at 22:57











  • Why would backup products use the hard drive as a staging file? Surly they would use RAM? I'm only interested in restoring to a point in time, not individual files. I may do an experiment...

    – Mark Price
    Aug 18 '14 at 8:04











  • Is this on Windows? If you had access to ZFS you could send/receive snapshots.

    – ptman
    Aug 18 '14 at 10:42

















  • Most backup software allows you to run backups to a hard disk based staging pool, and then relocate those jobs to tape. In this case, the backup archives are created on disk, which are much more well suited to this, and then large archive files are written to tape.

    – EEAA
    Aug 17 '14 at 22:24











  • What operating system and filesystem are you writing about?

    – ewwhite
    Aug 17 '14 at 22:54











  • 1. A backup to disk job is probably going to be faster than a direct backup to tape job. You can then configure/run a duplicate job, which will backup the backup to disk files to tape. 2. Yes, hosting the files on a VHD and backing up the VHD should be faster. You'll need to make sure that the backup product you use to back up the VHD allows for file level restores from the VHD.

    – joeqwerty
    Aug 17 '14 at 22:57











  • Why would backup products use the hard drive as a staging file? Surly they would use RAM? I'm only interested in restoring to a point in time, not individual files. I may do an experiment...

    – Mark Price
    Aug 18 '14 at 8:04











  • Is this on Windows? If you had access to ZFS you could send/receive snapshots.

    – ptman
    Aug 18 '14 at 10:42
















Most backup software allows you to run backups to a hard disk based staging pool, and then relocate those jobs to tape. In this case, the backup archives are created on disk, which are much more well suited to this, and then large archive files are written to tape.

– EEAA
Aug 17 '14 at 22:24





Most backup software allows you to run backups to a hard disk based staging pool, and then relocate those jobs to tape. In this case, the backup archives are created on disk, which are much more well suited to this, and then large archive files are written to tape.

– EEAA
Aug 17 '14 at 22:24













What operating system and filesystem are you writing about?

– ewwhite
Aug 17 '14 at 22:54





What operating system and filesystem are you writing about?

– ewwhite
Aug 17 '14 at 22:54













1. A backup to disk job is probably going to be faster than a direct backup to tape job. You can then configure/run a duplicate job, which will backup the backup to disk files to tape. 2. Yes, hosting the files on a VHD and backing up the VHD should be faster. You'll need to make sure that the backup product you use to back up the VHD allows for file level restores from the VHD.

– joeqwerty
Aug 17 '14 at 22:57





1. A backup to disk job is probably going to be faster than a direct backup to tape job. You can then configure/run a duplicate job, which will backup the backup to disk files to tape. 2. Yes, hosting the files on a VHD and backing up the VHD should be faster. You'll need to make sure that the backup product you use to back up the VHD allows for file level restores from the VHD.

– joeqwerty
Aug 17 '14 at 22:57













Why would backup products use the hard drive as a staging file? Surly they would use RAM? I'm only interested in restoring to a point in time, not individual files. I may do an experiment...

– Mark Price
Aug 18 '14 at 8:04





Why would backup products use the hard drive as a staging file? Surly they would use RAM? I'm only interested in restoring to a point in time, not individual files. I may do an experiment...

– Mark Price
Aug 18 '14 at 8:04













Is this on Windows? If you had access to ZFS you could send/receive snapshots.

– ptman
Aug 18 '14 at 10:42





Is this on Windows? If you had access to ZFS you could send/receive snapshots.

– ptman
Aug 18 '14 at 10:42










2 Answers
2






active

oldest

votes


















0














Storing lots of small files in a file system, which itself is kept as a file does have some potential benefits.



If the format of this file is sparse, then the backups will initially be faster. However as time passes and files are created and deleted, the sparse image may not remain as sparse. Eventually the image may end up being much larger than the files within, which of course wastes space on both disk and tape, and slows down backups compared to the speed when the image was new.



Another drawback of the image is that if it is being backed up while any writes are being performed to the file system inside the image, you may end up with a backup where integrity is not preserved.






share|improve this answer






























    0














    I decided to test this myself.



    For the test I created a 25GB VHD on Server 2008R2 and attached it.



    I then populated it with 20GB worth of data. 129000 files in 1318 directories.



    Then I ran a backup job for the contents of the VHD.
    Then I detached the VHD and backed up the VHD file itself.



    Below are the results.



    Data Elapsed Byte Count Job Rate
    VHD 00:09:51 25.0 GB 14,222.00 MB/min
    VHD Contents 00:07:38 20.2 GB 9,557.00 MB/min


    The Elapsed time is longer for the VHD file, however when scaled up to the actual sizes I'm dealing with I'm sure the increased job rate will take over.



    Also the VHD Contents job rate seems higher than I would expect. It may be affected by cache from recently creating the files or something else but I can't confirm this right now due to the main job being bundled in with other backup data.



    I don't have time or the need to investigate this further at the moment though I may revisit this sometime in the future.






    share|improve this answer























      Your Answer








      StackExchange.ready(function()
      var channelOptions =
      tags: "".split(" "),
      id: "2"
      ;
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function()
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled)
      StackExchange.using("snippets", function()
      createEditor();
      );

      else
      createEditor();

      );

      function createEditor()
      StackExchange.prepareEditor(
      heartbeatType: 'answer',
      autoActivateHeartbeat: false,
      convertImagesToLinks: true,
      noModals: true,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: 10,
      bindNavPrevention: true,
      postfix: "",
      imageUploader:
      brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
      contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
      allowUrls: true
      ,
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      );



      );













      draft saved

      draft discarded


















      StackExchange.ready(
      function ()
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fserverfault.com%2fquestions%2f622060%2fbackup-strategy-for-millions-of-files-in-lots-of-directories%23new-answer', 'question_page');

      );

      Post as a guest















      Required, but never shown

























      2 Answers
      2






      active

      oldest

      votes








      2 Answers
      2






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes









      0














      Storing lots of small files in a file system, which itself is kept as a file does have some potential benefits.



      If the format of this file is sparse, then the backups will initially be faster. However as time passes and files are created and deleted, the sparse image may not remain as sparse. Eventually the image may end up being much larger than the files within, which of course wastes space on both disk and tape, and slows down backups compared to the speed when the image was new.



      Another drawback of the image is that if it is being backed up while any writes are being performed to the file system inside the image, you may end up with a backup where integrity is not preserved.






      share|improve this answer



























        0














        Storing lots of small files in a file system, which itself is kept as a file does have some potential benefits.



        If the format of this file is sparse, then the backups will initially be faster. However as time passes and files are created and deleted, the sparse image may not remain as sparse. Eventually the image may end up being much larger than the files within, which of course wastes space on both disk and tape, and slows down backups compared to the speed when the image was new.



        Another drawback of the image is that if it is being backed up while any writes are being performed to the file system inside the image, you may end up with a backup where integrity is not preserved.






        share|improve this answer

























          0












          0








          0







          Storing lots of small files in a file system, which itself is kept as a file does have some potential benefits.



          If the format of this file is sparse, then the backups will initially be faster. However as time passes and files are created and deleted, the sparse image may not remain as sparse. Eventually the image may end up being much larger than the files within, which of course wastes space on both disk and tape, and slows down backups compared to the speed when the image was new.



          Another drawback of the image is that if it is being backed up while any writes are being performed to the file system inside the image, you may end up with a backup where integrity is not preserved.






          share|improve this answer













          Storing lots of small files in a file system, which itself is kept as a file does have some potential benefits.



          If the format of this file is sparse, then the backups will initially be faster. However as time passes and files are created and deleted, the sparse image may not remain as sparse. Eventually the image may end up being much larger than the files within, which of course wastes space on both disk and tape, and slows down backups compared to the speed when the image was new.



          Another drawback of the image is that if it is being backed up while any writes are being performed to the file system inside the image, you may end up with a backup where integrity is not preserved.







          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Aug 17 '14 at 23:05









          kasperdkasperd

          26.8k1252104




          26.8k1252104























              0














              I decided to test this myself.



              For the test I created a 25GB VHD on Server 2008R2 and attached it.



              I then populated it with 20GB worth of data. 129000 files in 1318 directories.



              Then I ran a backup job for the contents of the VHD.
              Then I detached the VHD and backed up the VHD file itself.



              Below are the results.



              Data Elapsed Byte Count Job Rate
              VHD 00:09:51 25.0 GB 14,222.00 MB/min
              VHD Contents 00:07:38 20.2 GB 9,557.00 MB/min


              The Elapsed time is longer for the VHD file, however when scaled up to the actual sizes I'm dealing with I'm sure the increased job rate will take over.



              Also the VHD Contents job rate seems higher than I would expect. It may be affected by cache from recently creating the files or something else but I can't confirm this right now due to the main job being bundled in with other backup data.



              I don't have time or the need to investigate this further at the moment though I may revisit this sometime in the future.






              share|improve this answer



























                0














                I decided to test this myself.



                For the test I created a 25GB VHD on Server 2008R2 and attached it.



                I then populated it with 20GB worth of data. 129000 files in 1318 directories.



                Then I ran a backup job for the contents of the VHD.
                Then I detached the VHD and backed up the VHD file itself.



                Below are the results.



                Data Elapsed Byte Count Job Rate
                VHD 00:09:51 25.0 GB 14,222.00 MB/min
                VHD Contents 00:07:38 20.2 GB 9,557.00 MB/min


                The Elapsed time is longer for the VHD file, however when scaled up to the actual sizes I'm dealing with I'm sure the increased job rate will take over.



                Also the VHD Contents job rate seems higher than I would expect. It may be affected by cache from recently creating the files or something else but I can't confirm this right now due to the main job being bundled in with other backup data.



                I don't have time or the need to investigate this further at the moment though I may revisit this sometime in the future.






                share|improve this answer

























                  0












                  0








                  0







                  I decided to test this myself.



                  For the test I created a 25GB VHD on Server 2008R2 and attached it.



                  I then populated it with 20GB worth of data. 129000 files in 1318 directories.



                  Then I ran a backup job for the contents of the VHD.
                  Then I detached the VHD and backed up the VHD file itself.



                  Below are the results.



                  Data Elapsed Byte Count Job Rate
                  VHD 00:09:51 25.0 GB 14,222.00 MB/min
                  VHD Contents 00:07:38 20.2 GB 9,557.00 MB/min


                  The Elapsed time is longer for the VHD file, however when scaled up to the actual sizes I'm dealing with I'm sure the increased job rate will take over.



                  Also the VHD Contents job rate seems higher than I would expect. It may be affected by cache from recently creating the files or something else but I can't confirm this right now due to the main job being bundled in with other backup data.



                  I don't have time or the need to investigate this further at the moment though I may revisit this sometime in the future.






                  share|improve this answer













                  I decided to test this myself.



                  For the test I created a 25GB VHD on Server 2008R2 and attached it.



                  I then populated it with 20GB worth of data. 129000 files in 1318 directories.



                  Then I ran a backup job for the contents of the VHD.
                  Then I detached the VHD and backed up the VHD file itself.



                  Below are the results.



                  Data Elapsed Byte Count Job Rate
                  VHD 00:09:51 25.0 GB 14,222.00 MB/min
                  VHD Contents 00:07:38 20.2 GB 9,557.00 MB/min


                  The Elapsed time is longer for the VHD file, however when scaled up to the actual sizes I'm dealing with I'm sure the increased job rate will take over.



                  Also the VHD Contents job rate seems higher than I would expect. It may be affected by cache from recently creating the files or something else but I can't confirm this right now due to the main job being bundled in with other backup data.



                  I don't have time or the need to investigate this further at the moment though I may revisit this sometime in the future.







                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered Aug 18 '14 at 10:23









                  Mark PriceMark Price

                  1014




                  1014



























                      draft saved

                      draft discarded
















































                      Thanks for contributing an answer to Server Fault!


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid


                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.

                      To learn more, see our tips on writing great answers.




                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function ()
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fserverfault.com%2fquestions%2f622060%2fbackup-strategy-for-millions-of-files-in-lots-of-directories%23new-answer', 'question_page');

                      );

                      Post as a guest















                      Required, but never shown





















































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown

































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown







                      Popular posts from this blog

                      Wikipedia:Vital articles Мазмуну Biography - Өмүр баян Philosophy and psychology - Философия жана психология Religion - Дин Social sciences - Коомдук илимдер Language and literature - Тил жана адабият Science - Илим Technology - Технология Arts and recreation - Искусство жана эс алуу History and geography - Тарых жана география Навигация менюсу

                      Bruxelas-Capital Índice Historia | Composición | Situación lingüística | Clima | Cidades irmandadas | Notas | Véxase tamén | Menú de navegacióneO uso das linguas en Bruxelas e a situación do neerlandés"Rexión de Bruxelas Capital"o orixinalSitio da rexiónPáxina de Bruselas no sitio da Oficina de Promoción Turística de Valonia e BruxelasMapa Interactivo da Rexión de Bruxelas-CapitaleeWorldCat332144929079854441105155190212ID28008674080552-90000 0001 0666 3698n94104302ID540940339365017018237

                      What should I write in an apology letter, since I have decided not to join a company after accepting an offer letterShould I keep looking after accepting a job offer?What should I do when I've been verbally told I would get an offer letter, but still haven't gotten one after 4 weeks?Do I accept an offer from a company that I am not likely to join?New job hasn't confirmed starting date and I want to give current employer as much notice as possibleHow should I address my manager in my resignation letter?HR delayed background verification, now jobless as resignedNo email communication after accepting a formal written offer. How should I phrase the call?What should I do if after receiving a verbal offer letter I am informed that my written job offer is put on hold due to some internal issues?Should I inform the current employer that I am about to resign within 1-2 weeks since I have signed the offer letter and waiting for visa?What company will do, if I send their offer letter to another company