Backup strategy for millions of files in lots of directoriesbackup solution for media company (large files)Windows Server 2008 - copy imaging backups to multiple devices?Store and backup 200 million small filesCannot get Backup Exec 11d to recognize my RDX device on windows server?recommendations for disk -> usb backup softwareCan I split open/divide a large MS SQL disk media set?How to backup 20+TB of data?Good backup strategy for heterogeneous data consisting of images/databases/office files/svn repositories/Best format/approach for one-off backups to tapes?Backup strategy for user uploaded files
How to handle self harm scars on the arm in work environment?
A IP can traceroute to it, but can not ping
SQL counting distinct over partition
How can I make some of my chapters "come to life"?
Teaching a class likely meant to inflate the GPA of student athletes
Group Integers by Originality
Did Milano or Benatar approve or comment on their namesake MCU ships?
Geopandas and QGIS Calulating Different Polygon Area Values?
How does an ordinary object become radioactive?
How can this tool find out registered domains from an IP?
Why do some employees fill out a W-4 and some don't?
Importance of Building Credit Score?
What is the purpose of the goat for Azazel, as opposed to conventional offerings?
How to manually rewind film?
How do I prevent employees from either switching to competitors or opening their own business?
You have (3^2 + 2^3 + 2^2) Guesses Left. Figure out the Last one
How do governments keep track of their issued currency?
Should I give professor gift at the beginning of my PhD?
Longest bridge/tunnel that can be cycled over/through?
Using "subway" as name for London Underground?
Were Alexander the Great and Hephaestion lovers?
Implement Own Vector Class in C++
How did old MS-DOS games utilize various graphic cards?
How to communicate to my GM that not being allowed to use stealth isn't fun for me?
Backup strategy for millions of files in lots of directories
backup solution for media company (large files)Windows Server 2008 - copy imaging backups to multiple devices?Store and backup 200 million small filesCannot get Backup Exec 11d to recognize my RDX device on windows server?recommendations for disk -> usb backup softwareCan I split open/divide a large MS SQL disk media set?How to backup 20+TB of data?Good backup strategy for heterogeneous data consisting of images/databases/office files/svn repositories/Best format/approach for one-off backups to tapes?Backup strategy for user uploaded files
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;
We have millions of files in lots of directories, for example:
0000.txt
0001.pdf
0002.html
... so on
5551231.txt
backing up these to tape is slow as backing up data in this format is much slower than backing up a single large file.
The total number of files on a disk and the relative size of each file impacts backup performance. Fastest backups occur when the disk contains fewer large size files. Slowest backups occur when the disk contains thousands of small files. Backup Exec Admin Guide.
Would the backup performance significantly increase by creating a virtual hard drive, hosting the data on it once mounted then backing up the vhd instead?
I'm unsure if the underlying data within the vhd would affect this.
what are the drawbacks to this method?
backup filesystems lto
|
show 1 more comment
We have millions of files in lots of directories, for example:
0000.txt
0001.pdf
0002.html
... so on
5551231.txt
backing up these to tape is slow as backing up data in this format is much slower than backing up a single large file.
The total number of files on a disk and the relative size of each file impacts backup performance. Fastest backups occur when the disk contains fewer large size files. Slowest backups occur when the disk contains thousands of small files. Backup Exec Admin Guide.
Would the backup performance significantly increase by creating a virtual hard drive, hosting the data on it once mounted then backing up the vhd instead?
I'm unsure if the underlying data within the vhd would affect this.
what are the drawbacks to this method?
backup filesystems lto
Most backup software allows you to run backups to a hard disk based staging pool, and then relocate those jobs to tape. In this case, the backup archives are created on disk, which are much more well suited to this, and then large archive files are written to tape.
– EEAA
Aug 17 '14 at 22:24
What operating system and filesystem are you writing about?
– ewwhite
Aug 17 '14 at 22:54
1.
A backup to disk job is probably going to be faster than a direct backup to tape job. You can then configure/run a duplicate job, which will backup the backup to disk files to tape.2.
Yes, hosting the files on a VHD and backing up the VHD should be faster. You'll need to make sure that the backup product you use to back up the VHD allows for file level restores from the VHD.
– joeqwerty
Aug 17 '14 at 22:57
Why would backup products use the hard drive as a staging file? Surly they would use RAM? I'm only interested in restoring to a point in time, not individual files. I may do an experiment...
– Mark Price
Aug 18 '14 at 8:04
Is this on Windows? If you had access to ZFS you could send/receive snapshots.
– ptman
Aug 18 '14 at 10:42
|
show 1 more comment
We have millions of files in lots of directories, for example:
0000.txt
0001.pdf
0002.html
... so on
5551231.txt
backing up these to tape is slow as backing up data in this format is much slower than backing up a single large file.
The total number of files on a disk and the relative size of each file impacts backup performance. Fastest backups occur when the disk contains fewer large size files. Slowest backups occur when the disk contains thousands of small files. Backup Exec Admin Guide.
Would the backup performance significantly increase by creating a virtual hard drive, hosting the data on it once mounted then backing up the vhd instead?
I'm unsure if the underlying data within the vhd would affect this.
what are the drawbacks to this method?
backup filesystems lto
We have millions of files in lots of directories, for example:
0000.txt
0001.pdf
0002.html
... so on
5551231.txt
backing up these to tape is slow as backing up data in this format is much slower than backing up a single large file.
The total number of files on a disk and the relative size of each file impacts backup performance. Fastest backups occur when the disk contains fewer large size files. Slowest backups occur when the disk contains thousands of small files. Backup Exec Admin Guide.
Would the backup performance significantly increase by creating a virtual hard drive, hosting the data on it once mounted then backing up the vhd instead?
I'm unsure if the underlying data within the vhd would affect this.
what are the drawbacks to this method?
backup filesystems lto
backup filesystems lto
asked Aug 17 '14 at 22:17
Mark PriceMark Price
1014
1014
Most backup software allows you to run backups to a hard disk based staging pool, and then relocate those jobs to tape. In this case, the backup archives are created on disk, which are much more well suited to this, and then large archive files are written to tape.
– EEAA
Aug 17 '14 at 22:24
What operating system and filesystem are you writing about?
– ewwhite
Aug 17 '14 at 22:54
1.
A backup to disk job is probably going to be faster than a direct backup to tape job. You can then configure/run a duplicate job, which will backup the backup to disk files to tape.2.
Yes, hosting the files on a VHD and backing up the VHD should be faster. You'll need to make sure that the backup product you use to back up the VHD allows for file level restores from the VHD.
– joeqwerty
Aug 17 '14 at 22:57
Why would backup products use the hard drive as a staging file? Surly they would use RAM? I'm only interested in restoring to a point in time, not individual files. I may do an experiment...
– Mark Price
Aug 18 '14 at 8:04
Is this on Windows? If you had access to ZFS you could send/receive snapshots.
– ptman
Aug 18 '14 at 10:42
|
show 1 more comment
Most backup software allows you to run backups to a hard disk based staging pool, and then relocate those jobs to tape. In this case, the backup archives are created on disk, which are much more well suited to this, and then large archive files are written to tape.
– EEAA
Aug 17 '14 at 22:24
What operating system and filesystem are you writing about?
– ewwhite
Aug 17 '14 at 22:54
1.
A backup to disk job is probably going to be faster than a direct backup to tape job. You can then configure/run a duplicate job, which will backup the backup to disk files to tape.2.
Yes, hosting the files on a VHD and backing up the VHD should be faster. You'll need to make sure that the backup product you use to back up the VHD allows for file level restores from the VHD.
– joeqwerty
Aug 17 '14 at 22:57
Why would backup products use the hard drive as a staging file? Surly they would use RAM? I'm only interested in restoring to a point in time, not individual files. I may do an experiment...
– Mark Price
Aug 18 '14 at 8:04
Is this on Windows? If you had access to ZFS you could send/receive snapshots.
– ptman
Aug 18 '14 at 10:42
Most backup software allows you to run backups to a hard disk based staging pool, and then relocate those jobs to tape. In this case, the backup archives are created on disk, which are much more well suited to this, and then large archive files are written to tape.
– EEAA
Aug 17 '14 at 22:24
Most backup software allows you to run backups to a hard disk based staging pool, and then relocate those jobs to tape. In this case, the backup archives are created on disk, which are much more well suited to this, and then large archive files are written to tape.
– EEAA
Aug 17 '14 at 22:24
What operating system and filesystem are you writing about?
– ewwhite
Aug 17 '14 at 22:54
What operating system and filesystem are you writing about?
– ewwhite
Aug 17 '14 at 22:54
1.
A backup to disk job is probably going to be faster than a direct backup to tape job. You can then configure/run a duplicate job, which will backup the backup to disk files to tape. 2.
Yes, hosting the files on a VHD and backing up the VHD should be faster. You'll need to make sure that the backup product you use to back up the VHD allows for file level restores from the VHD.– joeqwerty
Aug 17 '14 at 22:57
1.
A backup to disk job is probably going to be faster than a direct backup to tape job. You can then configure/run a duplicate job, which will backup the backup to disk files to tape. 2.
Yes, hosting the files on a VHD and backing up the VHD should be faster. You'll need to make sure that the backup product you use to back up the VHD allows for file level restores from the VHD.– joeqwerty
Aug 17 '14 at 22:57
Why would backup products use the hard drive as a staging file? Surly they would use RAM? I'm only interested in restoring to a point in time, not individual files. I may do an experiment...
– Mark Price
Aug 18 '14 at 8:04
Why would backup products use the hard drive as a staging file? Surly they would use RAM? I'm only interested in restoring to a point in time, not individual files. I may do an experiment...
– Mark Price
Aug 18 '14 at 8:04
Is this on Windows? If you had access to ZFS you could send/receive snapshots.
– ptman
Aug 18 '14 at 10:42
Is this on Windows? If you had access to ZFS you could send/receive snapshots.
– ptman
Aug 18 '14 at 10:42
|
show 1 more comment
2 Answers
2
active
oldest
votes
Storing lots of small files in a file system, which itself is kept as a file does have some potential benefits.
If the format of this file is sparse, then the backups will initially be faster. However as time passes and files are created and deleted, the sparse image may not remain as sparse. Eventually the image may end up being much larger than the files within, which of course wastes space on both disk and tape, and slows down backups compared to the speed when the image was new.
Another drawback of the image is that if it is being backed up while any writes are being performed to the file system inside the image, you may end up with a backup where integrity is not preserved.
add a comment |
I decided to test this myself.
For the test I created a 25GB VHD on Server 2008R2 and attached it.
I then populated it with 20GB worth of data. 129000 files in 1318 directories.
Then I ran a backup job for the contents of the VHD.
Then I detached the VHD and backed up the VHD file itself.
Below are the results.
Data Elapsed Byte Count Job Rate
VHD 00:09:51 25.0 GB 14,222.00 MB/min
VHD Contents 00:07:38 20.2 GB 9,557.00 MB/min
The Elapsed time is longer for the VHD file, however when scaled up to the actual sizes I'm dealing with I'm sure the increased job rate will take over.
Also the VHD Contents job rate seems higher than I would expect. It may be affected by cache from recently creating the files or something else but I can't confirm this right now due to the main job being bundled in with other backup data.
I don't have time or the need to investigate this further at the moment though I may revisit this sometime in the future.
add a comment |
Your Answer
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "2"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fserverfault.com%2fquestions%2f622060%2fbackup-strategy-for-millions-of-files-in-lots-of-directories%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
Storing lots of small files in a file system, which itself is kept as a file does have some potential benefits.
If the format of this file is sparse, then the backups will initially be faster. However as time passes and files are created and deleted, the sparse image may not remain as sparse. Eventually the image may end up being much larger than the files within, which of course wastes space on both disk and tape, and slows down backups compared to the speed when the image was new.
Another drawback of the image is that if it is being backed up while any writes are being performed to the file system inside the image, you may end up with a backup where integrity is not preserved.
add a comment |
Storing lots of small files in a file system, which itself is kept as a file does have some potential benefits.
If the format of this file is sparse, then the backups will initially be faster. However as time passes and files are created and deleted, the sparse image may not remain as sparse. Eventually the image may end up being much larger than the files within, which of course wastes space on both disk and tape, and slows down backups compared to the speed when the image was new.
Another drawback of the image is that if it is being backed up while any writes are being performed to the file system inside the image, you may end up with a backup where integrity is not preserved.
add a comment |
Storing lots of small files in a file system, which itself is kept as a file does have some potential benefits.
If the format of this file is sparse, then the backups will initially be faster. However as time passes and files are created and deleted, the sparse image may not remain as sparse. Eventually the image may end up being much larger than the files within, which of course wastes space on both disk and tape, and slows down backups compared to the speed when the image was new.
Another drawback of the image is that if it is being backed up while any writes are being performed to the file system inside the image, you may end up with a backup where integrity is not preserved.
Storing lots of small files in a file system, which itself is kept as a file does have some potential benefits.
If the format of this file is sparse, then the backups will initially be faster. However as time passes and files are created and deleted, the sparse image may not remain as sparse. Eventually the image may end up being much larger than the files within, which of course wastes space on both disk and tape, and slows down backups compared to the speed when the image was new.
Another drawback of the image is that if it is being backed up while any writes are being performed to the file system inside the image, you may end up with a backup where integrity is not preserved.
answered Aug 17 '14 at 23:05
kasperdkasperd
26.8k1252104
26.8k1252104
add a comment |
add a comment |
I decided to test this myself.
For the test I created a 25GB VHD on Server 2008R2 and attached it.
I then populated it with 20GB worth of data. 129000 files in 1318 directories.
Then I ran a backup job for the contents of the VHD.
Then I detached the VHD and backed up the VHD file itself.
Below are the results.
Data Elapsed Byte Count Job Rate
VHD 00:09:51 25.0 GB 14,222.00 MB/min
VHD Contents 00:07:38 20.2 GB 9,557.00 MB/min
The Elapsed time is longer for the VHD file, however when scaled up to the actual sizes I'm dealing with I'm sure the increased job rate will take over.
Also the VHD Contents job rate seems higher than I would expect. It may be affected by cache from recently creating the files or something else but I can't confirm this right now due to the main job being bundled in with other backup data.
I don't have time or the need to investigate this further at the moment though I may revisit this sometime in the future.
add a comment |
I decided to test this myself.
For the test I created a 25GB VHD on Server 2008R2 and attached it.
I then populated it with 20GB worth of data. 129000 files in 1318 directories.
Then I ran a backup job for the contents of the VHD.
Then I detached the VHD and backed up the VHD file itself.
Below are the results.
Data Elapsed Byte Count Job Rate
VHD 00:09:51 25.0 GB 14,222.00 MB/min
VHD Contents 00:07:38 20.2 GB 9,557.00 MB/min
The Elapsed time is longer for the VHD file, however when scaled up to the actual sizes I'm dealing with I'm sure the increased job rate will take over.
Also the VHD Contents job rate seems higher than I would expect. It may be affected by cache from recently creating the files or something else but I can't confirm this right now due to the main job being bundled in with other backup data.
I don't have time or the need to investigate this further at the moment though I may revisit this sometime in the future.
add a comment |
I decided to test this myself.
For the test I created a 25GB VHD on Server 2008R2 and attached it.
I then populated it with 20GB worth of data. 129000 files in 1318 directories.
Then I ran a backup job for the contents of the VHD.
Then I detached the VHD and backed up the VHD file itself.
Below are the results.
Data Elapsed Byte Count Job Rate
VHD 00:09:51 25.0 GB 14,222.00 MB/min
VHD Contents 00:07:38 20.2 GB 9,557.00 MB/min
The Elapsed time is longer for the VHD file, however when scaled up to the actual sizes I'm dealing with I'm sure the increased job rate will take over.
Also the VHD Contents job rate seems higher than I would expect. It may be affected by cache from recently creating the files or something else but I can't confirm this right now due to the main job being bundled in with other backup data.
I don't have time or the need to investigate this further at the moment though I may revisit this sometime in the future.
I decided to test this myself.
For the test I created a 25GB VHD on Server 2008R2 and attached it.
I then populated it with 20GB worth of data. 129000 files in 1318 directories.
Then I ran a backup job for the contents of the VHD.
Then I detached the VHD and backed up the VHD file itself.
Below are the results.
Data Elapsed Byte Count Job Rate
VHD 00:09:51 25.0 GB 14,222.00 MB/min
VHD Contents 00:07:38 20.2 GB 9,557.00 MB/min
The Elapsed time is longer for the VHD file, however when scaled up to the actual sizes I'm dealing with I'm sure the increased job rate will take over.
Also the VHD Contents job rate seems higher than I would expect. It may be affected by cache from recently creating the files or something else but I can't confirm this right now due to the main job being bundled in with other backup data.
I don't have time or the need to investigate this further at the moment though I may revisit this sometime in the future.
answered Aug 18 '14 at 10:23
Mark PriceMark Price
1014
1014
add a comment |
add a comment |
Thanks for contributing an answer to Server Fault!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fserverfault.com%2fquestions%2f622060%2fbackup-strategy-for-millions-of-files-in-lots-of-directories%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Most backup software allows you to run backups to a hard disk based staging pool, and then relocate those jobs to tape. In this case, the backup archives are created on disk, which are much more well suited to this, and then large archive files are written to tape.
– EEAA
Aug 17 '14 at 22:24
What operating system and filesystem are you writing about?
– ewwhite
Aug 17 '14 at 22:54
1.
A backup to disk job is probably going to be faster than a direct backup to tape job. You can then configure/run a duplicate job, which will backup the backup to disk files to tape.2.
Yes, hosting the files on a VHD and backing up the VHD should be faster. You'll need to make sure that the backup product you use to back up the VHD allows for file level restores from the VHD.– joeqwerty
Aug 17 '14 at 22:57
Why would backup products use the hard drive as a staging file? Surly they would use RAM? I'm only interested in restoring to a point in time, not individual files. I may do an experiment...
– Mark Price
Aug 18 '14 at 8:04
Is this on Windows? If you had access to ZFS you could send/receive snapshots.
– ptman
Aug 18 '14 at 10:42