Backup strategy for millions of files in lots of directoriesbackup solution for media company (large files)Windows Server 2008 - copy imaging backups to multiple devices?Store and backup 200 million small filesCannot get Backup Exec 11d to recognize my RDX device on windows server?recommendations for disk -> usb backup softwareCan I split open/divide a large MS SQL disk media set?How to backup 20+TB of data?Good backup strategy for heterogeneous data consisting of images/databases/office files/svn repositories/Best format/approach for one-off backups to tapes?Backup strategy for user uploaded files

How to handle self harm scars on the arm in work environment?

A IP can traceroute to it, but can not ping

SQL counting distinct over partition

How can I make some of my chapters "come to life"?

Teaching a class likely meant to inflate the GPA of student athletes

Group Integers by Originality

Did Milano or Benatar approve or comment on their namesake MCU ships?

Geopandas and QGIS Calulating Different Polygon Area Values?

How does an ordinary object become radioactive?

How can this tool find out registered domains from an IP?

Why do some employees fill out a W-4 and some don't?

Importance of Building Credit Score?

What is the purpose of the goat for Azazel, as opposed to conventional offerings?

How to manually rewind film?

How do I prevent employees from either switching to competitors or opening their own business?

You have (3^2 + 2^3 + 2^2) Guesses Left. Figure out the Last one

How do governments keep track of their issued currency?

Should I give professor gift at the beginning of my PhD?

Longest bridge/tunnel that can be cycled over/through?

Using "subway" as name for London Underground?

Were Alexander the Great and Hephaestion lovers?

Implement Own Vector Class in C++

How did old MS-DOS games utilize various graphic cards?

How to communicate to my GM that not being allowed to use stealth isn't fun for me?

Backup strategy for millions of files in lots of directories

backup solution for media company (large files)Windows Server 2008 - copy imaging backups to multiple devices?Store and backup 200 million small filesCannot get Backup Exec 11d to recognize my RDX device on windows server?recommendations for disk -> usb backup softwareCan I split open/divide a large MS SQL disk media set?How to backup 20+TB of data?Good backup strategy for heterogeneous data consisting of images/databases/office files/svn repositories/Best format/approach for one-off backups to tapes?Backup strategy for user uploaded files

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;

We have millions of files in lots of directories, for example:

0000.txt
0001.pdf
0002.html
... so on
5551231.txt

backing up these to tape is slow as backing up data in this format is much slower than backing up a single large file.

The total number of files on a disk and the relative size of each file impacts backup performance. Fastest backups occur when the disk contains fewer large size files. Slowest backups occur when the disk contains thousands of small files. Backup Exec Admin Guide.

Would the backup performance significantly increase by creating a virtual hard drive, hosting the data on it once mounted then backing up the vhd instead?

I'm unsure if the underlying data within the vhd would affect this.

what are the drawbacks to this method?

asked Aug 17 '14 at 22:17

Mark Price

1014

Most backup software allows you to run backups to a hard disk based staging pool, and then relocate those jobs to tape. In this case, the backup archives are created on disk, which are much more well suited to this, and then large archive files are written to tape.

– EEAA
Aug 17 '14 at 22:24

What operating system and filesystem are you writing about?

– ewwhite
Aug 17 '14 at 22:54

1. A backup to disk job is probably going to be faster than a direct backup to tape job. You can then configure/run a duplicate job, which will backup the backup to disk files to tape. 2. Yes, hosting the files on a VHD and backing up the VHD should be faster. You'll need to make sure that the backup product you use to back up the VHD allows for file level restores from the VHD.

– joeqwerty
Aug 17 '14 at 22:57

Why would backup products use the hard drive as a staging file? Surly they would use RAM? I'm only interested in restoring to a point in time, not individual files. I may do an experiment...

– Mark Price
Aug 18 '14 at 8:04

Is this on Windows? If you had access to ZFS you could send/receive snapshots.

– ptman
Aug 18 '14 at 10:42

|
show 1 more comment

We have millions of files in lots of directories, for example:

0000.txt
0001.pdf
0002.html
... so on
5551231.txt

backing up these to tape is slow as backing up data in this format is much slower than backing up a single large file.

The total number of files on a disk and the relative size of each file impacts backup performance. Fastest backups occur when the disk contains fewer large size files. Slowest backups occur when the disk contains thousands of small files. Backup Exec Admin Guide.

Would the backup performance significantly increase by creating a virtual hard drive, hosting the data on it once mounted then backing up the vhd instead?

I'm unsure if the underlying data within the vhd would affect this.

what are the drawbacks to this method?

asked Aug 17 '14 at 22:17

Mark Price

1014

Most backup software allows you to run backups to a hard disk based staging pool, and then relocate those jobs to tape. In this case, the backup archives are created on disk, which are much more well suited to this, and then large archive files are written to tape.

– EEAA
Aug 17 '14 at 22:24

What operating system and filesystem are you writing about?

– ewwhite
Aug 17 '14 at 22:54

1. A backup to disk job is probably going to be faster than a direct backup to tape job. You can then configure/run a duplicate job, which will backup the backup to disk files to tape. 2. Yes, hosting the files on a VHD and backing up the VHD should be faster. You'll need to make sure that the backup product you use to back up the VHD allows for file level restores from the VHD.

– joeqwerty
Aug 17 '14 at 22:57

Why would backup products use the hard drive as a staging file? Surly they would use RAM? I'm only interested in restoring to a point in time, not individual files. I may do an experiment...

– Mark Price
Aug 18 '14 at 8:04

Is this on Windows? If you had access to ZFS you could send/receive snapshots.

– ptman
Aug 18 '14 at 10:42

|
show 1 more comment

We have millions of files in lots of directories, for example:

0000.txt
0001.pdf
0002.html
... so on
5551231.txt

backing up these to tape is slow as backing up data in this format is much slower than backing up a single large file.

The total number of files on a disk and the relative size of each file impacts backup performance. Fastest backups occur when the disk contains fewer large size files. Slowest backups occur when the disk contains thousands of small files. Backup Exec Admin Guide.

Would the backup performance significantly increase by creating a virtual hard drive, hosting the data on it once mounted then backing up the vhd instead?

I'm unsure if the underlying data within the vhd would affect this.

what are the drawbacks to this method?

asked Aug 17 '14 at 22:17

Mark Price

1014

We have millions of files in lots of directories, for example:

0000.txt
0001.pdf
0002.html
... so on
5551231.txt

backing up these to tape is slow as backing up data in this format is much slower than backing up a single large file.

The total number of files on a disk and the relative size of each file impacts backup performance. Fastest backups occur when the disk contains fewer large size files. Slowest backups occur when the disk contains thousands of small files. Backup Exec Admin Guide.

Would the backup performance significantly increase by creating a virtual hard drive, hosting the data on it once mounted then backing up the vhd instead?

I'm unsure if the underlying data within the vhd would affect this.

what are the drawbacks to this method?

backup filesystems lto

asked Aug 17 '14 at 22:17

Mark Price

1014

asked Aug 17 '14 at 22:17

Mark Price

1014

asked Aug 17 '14 at 22:17

Mark Price

1014

asked Aug 17 '14 at 22:17

Mark Price

1014

asked Aug 17 '14 at 22:17

Mark Price

1014

Most backup software allows you to run backups to a hard disk based staging pool, and then relocate those jobs to tape. In this case, the backup archives are created on disk, which are much more well suited to this, and then large archive files are written to tape.

– EEAA
Aug 17 '14 at 22:24

What operating system and filesystem are you writing about?

– ewwhite
Aug 17 '14 at 22:54

1. A backup to disk job is probably going to be faster than a direct backup to tape job. You can then configure/run a duplicate job, which will backup the backup to disk files to tape. 2. Yes, hosting the files on a VHD and backing up the VHD should be faster. You'll need to make sure that the backup product you use to back up the VHD allows for file level restores from the VHD.

– joeqwerty
Aug 17 '14 at 22:57

Why would backup products use the hard drive as a staging file? Surly they would use RAM? I'm only interested in restoring to a point in time, not individual files. I may do an experiment...

– Mark Price
Aug 18 '14 at 8:04

Is this on Windows? If you had access to ZFS you could send/receive snapshots.

– ptman
Aug 18 '14 at 10:42

|
show 1 more comment

Most backup software allows you to run backups to a hard disk based staging pool, and then relocate those jobs to tape. In this case, the backup archives are created on disk, which are much more well suited to this, and then large archive files are written to tape.

– EEAA
Aug 17 '14 at 22:24

What operating system and filesystem are you writing about?

– ewwhite
Aug 17 '14 at 22:54

1. A backup to disk job is probably going to be faster than a direct backup to tape job. You can then configure/run a duplicate job, which will backup the backup to disk files to tape. 2. Yes, hosting the files on a VHD and backing up the VHD should be faster. You'll need to make sure that the backup product you use to back up the VHD allows for file level restores from the VHD.

– joeqwerty
Aug 17 '14 at 22:57

Why would backup products use the hard drive as a staging file? Surly they would use RAM? I'm only interested in restoring to a point in time, not individual files. I may do an experiment...

– Mark Price
Aug 18 '14 at 8:04

Is this on Windows? If you had access to ZFS you could send/receive snapshots.

– ptman
Aug 18 '14 at 10:42

Most backup software allows you to run backups to a hard disk based staging pool, and then relocate those jobs to tape. In this case, the backup archives are created on disk, which are much more well suited to this, and then large archive files are written to tape.

– EEAA
Aug 17 '14 at 22:24

What operating system and filesystem are you writing about?

– ewwhite
Aug 17 '14 at 22:54

1. A backup to disk job is probably going to be faster than a direct backup to tape job. You can then configure/run a duplicate job, which will backup the backup to disk files to tape. 2. Yes, hosting the files on a VHD and backing up the VHD should be faster. You'll need to make sure that the backup product you use to back up the VHD allows for file level restores from the VHD.

– joeqwerty
Aug 17 '14 at 22:57

Why would backup products use the hard drive as a staging file? Surly they would use RAM? I'm only interested in restoring to a point in time, not individual files. I may do an experiment...

– Mark Price
Aug 18 '14 at 8:04

Is this on Windows? If you had access to ZFS you could send/receive snapshots.

– ptman
Aug 18 '14 at 10:42

|
show 1 more comment

2 Answers
2

active

oldest

votes

Storing lots of small files in a file system, which itself is kept as a file does have some potential benefits.

If the format of this file is sparse, then the backups will initially be faster. However as time passes and files are created and deleted, the sparse image may not remain as sparse. Eventually the image may end up being much larger than the files within, which of course wastes space on both disk and tape, and slows down backups compared to the speed when the image was new.

Another drawback of the image is that if it is being backed up while any writes are being performed to the file system inside the image, you may end up with a backup where integrity is not preserved.

answered Aug 17 '14 at 23:05

kasperd

26.8k1252104

add a comment |

I decided to test this myself.

For the test I created a 25GB VHD on Server 2008R2 and attached it.

I then populated it with 20GB worth of data. 129000 files in 1318 directories.

Then I ran a backup job for the contents of the VHD.
Then I detached the VHD and backed up the VHD file itself.

Below are the results.

Data Elapsed Byte Count Job Rate
VHD 00:09:51 25.0 GB 14,222.00 MB/min
VHD Contents 00:07:38 20.2 GB 9,557.00 MB/min

The Elapsed time is longer for the VHD file, however when scaled up to the actual sizes I'm dealing with I'm sure the increased job rate will take over.

Also the VHD Contents job rate seems higher than I would expect. It may be affected by cache from recently creating the files or something else but I can't confirm this right now due to the main job being bundled in with other backup data.

I don't have time or the need to investigate this further at the moment though I may revisit this sometime in the future.

answered Aug 18 '14 at 10:23

Mark Price

1014

add a comment |

Your Answer

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "2"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fserverfault.com%2fquestions%2f622060%2fbackup-strategy-for-millions-of-files-in-lots-of-directories%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

Storing lots of small files in a file system, which itself is kept as a file does have some potential benefits.

Another drawback of the image is that if it is being backed up while any writes are being performed to the file system inside the image, you may end up with a backup where integrity is not preserved.

answered Aug 17 '14 at 23:05

kasperd

26.8k1252104

add a comment |

Storing lots of small files in a file system, which itself is kept as a file does have some potential benefits.

Another drawback of the image is that if it is being backed up while any writes are being performed to the file system inside the image, you may end up with a backup where integrity is not preserved.

answered Aug 17 '14 at 23:05

kasperd

26.8k1252104

add a comment |

Storing lots of small files in a file system, which itself is kept as a file does have some potential benefits.

Another drawback of the image is that if it is being backed up while any writes are being performed to the file system inside the image, you may end up with a backup where integrity is not preserved.

answered Aug 17 '14 at 23:05

kasperd

26.8k1252104

Storing lots of small files in a file system, which itself is kept as a file does have some potential benefits.

Another drawback of the image is that if it is being backed up while any writes are being performed to the file system inside the image, you may end up with a backup where integrity is not preserved.

answered Aug 17 '14 at 23:05

kasperd

26.8k1252104

answered Aug 17 '14 at 23:05

kasperd

26.8k1252104

answered Aug 17 '14 at 23:05

kasperd

26.8k1252104

answered Aug 17 '14 at 23:05

kasperd

26.8k1252104

add a comment |

I decided to test this myself.

For the test I created a 25GB VHD on Server 2008R2 and attached it.

I then populated it with 20GB worth of data. 129000 files in 1318 directories.

Then I ran a backup job for the contents of the VHD.
Then I detached the VHD and backed up the VHD file itself.

Below are the results.

Data Elapsed Byte Count Job Rate
VHD 00:09:51 25.0 GB 14,222.00 MB/min
VHD Contents 00:07:38 20.2 GB 9,557.00 MB/min

The Elapsed time is longer for the VHD file, however when scaled up to the actual sizes I'm dealing with I'm sure the increased job rate will take over.

I don't have time or the need to investigate this further at the moment though I may revisit this sometime in the future.

answered Aug 18 '14 at 10:23

Mark Price

1014

add a comment |

I decided to test this myself.

For the test I created a 25GB VHD on Server 2008R2 and attached it.

I then populated it with 20GB worth of data. 129000 files in 1318 directories.

Then I ran a backup job for the contents of the VHD.
Then I detached the VHD and backed up the VHD file itself.

Below are the results.

Data Elapsed Byte Count Job Rate
VHD 00:09:51 25.0 GB 14,222.00 MB/min
VHD Contents 00:07:38 20.2 GB 9,557.00 MB/min

The Elapsed time is longer for the VHD file, however when scaled up to the actual sizes I'm dealing with I'm sure the increased job rate will take over.

I don't have time or the need to investigate this further at the moment though I may revisit this sometime in the future.

answered Aug 18 '14 at 10:23

Mark Price

1014

add a comment |

I decided to test this myself.

For the test I created a 25GB VHD on Server 2008R2 and attached it.

I then populated it with 20GB worth of data. 129000 files in 1318 directories.

Then I ran a backup job for the contents of the VHD.
Then I detached the VHD and backed up the VHD file itself.

Below are the results.

Data Elapsed Byte Count Job Rate
VHD 00:09:51 25.0 GB 14,222.00 MB/min
VHD Contents 00:07:38 20.2 GB 9,557.00 MB/min

The Elapsed time is longer for the VHD file, however when scaled up to the actual sizes I'm dealing with I'm sure the increased job rate will take over.

I don't have time or the need to investigate this further at the moment though I may revisit this sometime in the future.

answered Aug 18 '14 at 10:23

Mark Price

1014

I decided to test this myself.

For the test I created a 25GB VHD on Server 2008R2 and attached it.

I then populated it with 20GB worth of data. 129000 files in 1318 directories.

Then I ran a backup job for the contents of the VHD.
Then I detached the VHD and backed up the VHD file itself.

Below are the results.

Data Elapsed Byte Count Job Rate
VHD 00:09:51 25.0 GB 14,222.00 MB/min
VHD Contents 00:07:38 20.2 GB 9,557.00 MB/min

The Elapsed time is longer for the VHD file, however when scaled up to the actual sizes I'm dealing with I'm sure the increased job rate will take over.

I don't have time or the need to investigate this further at the moment though I may revisit this sometime in the future.

answered Aug 18 '14 at 10:23

Mark Price

1014

answered Aug 18 '14 at 10:23

Mark Price

1014

answered Aug 18 '14 at 10:23

Mark Price

1014

answered Aug 18 '14 at 10:23

Mark Price

1014

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Server Fault!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

6c6x0CZl,It,xth76 VG2d y HLuhuVy Dn4nA

搜尋此網誌

Otdfbt

2 Answers
2

Your Answer

Post as a guest

2 Answers
2

2 Answers
2

Post as a guest

Popular posts from this blog

Vilaño, A Laracha Índice Patrimonio | Lugares e parroquias | Véxase tamén | Menú de navegación43°14′52″N 8°36′03″O / 43.24775, -8.60070

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

2 Answers 2

2 Answers 2

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

Vilaño, A Laracha Índice Patrimonio | Lugares e parroquias | Véxase tamén | Menú de navegación43°14′52″N 8°36′03″O﻿ / ﻿43.24775, -8.60070

2 Answers
2

2 Answers
2

2 Answers
2

Vilaño, A Laracha Índice Patrimonio | Lugares e parroquias | Véxase tamén | Menú de navegación43°14′52″N 8°36′03″O / 43.24775, -8.60070