Faster rsync of huge directory which was not changedFilesystem large number of files in a single directoryRsync daemon: is it really useful?Do large folder sizes slow down IO performance?rsync two large storage serversRsync takes 8+ hours to backup 15M+ filesCopying a large directory tree locally? cp or rsync?Using rsync to quickly upload a file that is similar to another fileReliable file copy (move) process - mostly Unix/LinuxPerformance problem with QNAP TS-410 NASCygwin's RSYNC for large data transferrsync hanging on local copyrsync many small files with long filenames takes a lot of bandwidthBackup with local manifestMultiple servers Rsync into one NAS simultaneausLive file syncronization across multiple Linux servers with millions of files and directories

Who were the members of the jury in the Game of Thrones finale?

Why is unzipped directory exactly 4.0K (much smaller than zipped file)?

Why was this character made Grand Maester?

Status of proof by contradiction and excluded middle throughout the history of mathematics?

Are PMR446 walkie-talkies legal in Switzerland?

Fill area of x^2+y^2>1 and x^2+y^2>4 using patterns and tikzpicture

What is Orcus doing with Mind Flayers in the art on the last page of Volo's Guide to Monsters?

The disk image is 497GB smaller than the target device

Python script to extract text from PDF with images

Is it safe to redirect stdout and stderr to the same file without file descriptor copies?

Determine direction of mass transfer

What is the limit to a Glyph of Warding's trigger?

How does the Earth's center produce heat?

Why is this integration method not valid?

ifconfig shows UP while ip link shows DOWN

Why Emacs (dired+) asks me twice to delete file?

Why does Bran want to find Drogon?

How to escape dependency hell?

Merge pdfs sequentially

Flatten not working

Alexandrov's generalization of Cauchy's rigidity theorem

What is the use case for non-breathable waterproof pants?

Why is std::ssize() introduced in C++20

Is there an idiom that means that you are in a very strong negotiation position in a negotiation?



Faster rsync of huge directory which was not changed


Filesystem large number of files in a single directoryRsync daemon: is it really useful?Do large folder sizes slow down IO performance?rsync two large storage serversRsync takes 8+ hours to backup 15M+ filesCopying a large directory tree locally? cp or rsync?Using rsync to quickly upload a file that is similar to another fileReliable file copy (move) process - mostly Unix/LinuxPerformance problem with QNAP TS-410 NASCygwin's RSYNC for large data transferrsync hanging on local copyrsync many small files with long filenames takes a lot of bandwidthBackup with local manifestMultiple servers Rsync into one NAS simultaneausLive file syncronization across multiple Linux servers with millions of files and directories






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;








11















We use rsync to backup servers.



Unfortunately the network to some servers is slow.



It takes up to five minutes for rsync to detect, that nothing has changed in huge directories. These huge directory trees contain a lot of small files (about 80k files).



I guess that the rsync clients sends data for each of the 80k files.



Since the network is slow I would like to avoid to send 80k times information about each file.



Is there a way to tell rsync to make a hash-sum of a sub directory tree?



This way the rsync client would send only a few bytes for a huge directory tree.



Update



Up to now my strategy is to use rsync. But if a different tools fits better here, I am able to switch. Both (server and client) are under my control.



Update2



There are 80k files in one directory tree. Each single directory does not have more than 2k files or sub-directories



Update3



Details on the slowness of the network:



time ssh einswp 'cd attachments/200 && ls -lLR' >/tmp/list
real 0m2.645s


Size of tmp/list file: 2MByte



time scp einswp:/tmp/list tmp/
real 0m2.821s


Conclusion: scp has the same speed (no surprise)



time scp einswp:tmp/100MB tmp/
real 1m24.049s


Speed: 1.2MB/s










share|improve this question



















  • 1





    You might read up on zsync. I have not used it myself, but from what I read, it pre-renders the metadata on the server side and might just speed up transfers in your case. It might be worth testing anyway. Beyond that, the only other solution I am aware of is real time block level syncronization that comes with some san/nas solutions.

    – Aaron
    Jan 5 '16 at 4:47

















11















We use rsync to backup servers.



Unfortunately the network to some servers is slow.



It takes up to five minutes for rsync to detect, that nothing has changed in huge directories. These huge directory trees contain a lot of small files (about 80k files).



I guess that the rsync clients sends data for each of the 80k files.



Since the network is slow I would like to avoid to send 80k times information about each file.



Is there a way to tell rsync to make a hash-sum of a sub directory tree?



This way the rsync client would send only a few bytes for a huge directory tree.



Update



Up to now my strategy is to use rsync. But if a different tools fits better here, I am able to switch. Both (server and client) are under my control.



Update2



There are 80k files in one directory tree. Each single directory does not have more than 2k files or sub-directories



Update3



Details on the slowness of the network:



time ssh einswp 'cd attachments/200 && ls -lLR' >/tmp/list
real 0m2.645s


Size of tmp/list file: 2MByte



time scp einswp:/tmp/list tmp/
real 0m2.821s


Conclusion: scp has the same speed (no surprise)



time scp einswp:tmp/100MB tmp/
real 1m24.049s


Speed: 1.2MB/s










share|improve this question



















  • 1





    You might read up on zsync. I have not used it myself, but from what I read, it pre-renders the metadata on the server side and might just speed up transfers in your case. It might be worth testing anyway. Beyond that, the only other solution I am aware of is real time block level syncronization that comes with some san/nas solutions.

    – Aaron
    Jan 5 '16 at 4:47













11












11








11


4






We use rsync to backup servers.



Unfortunately the network to some servers is slow.



It takes up to five minutes for rsync to detect, that nothing has changed in huge directories. These huge directory trees contain a lot of small files (about 80k files).



I guess that the rsync clients sends data for each of the 80k files.



Since the network is slow I would like to avoid to send 80k times information about each file.



Is there a way to tell rsync to make a hash-sum of a sub directory tree?



This way the rsync client would send only a few bytes for a huge directory tree.



Update



Up to now my strategy is to use rsync. But if a different tools fits better here, I am able to switch. Both (server and client) are under my control.



Update2



There are 80k files in one directory tree. Each single directory does not have more than 2k files or sub-directories



Update3



Details on the slowness of the network:



time ssh einswp 'cd attachments/200 && ls -lLR' >/tmp/list
real 0m2.645s


Size of tmp/list file: 2MByte



time scp einswp:/tmp/list tmp/
real 0m2.821s


Conclusion: scp has the same speed (no surprise)



time scp einswp:tmp/100MB tmp/
real 1m24.049s


Speed: 1.2MB/s










share|improve this question
















We use rsync to backup servers.



Unfortunately the network to some servers is slow.



It takes up to five minutes for rsync to detect, that nothing has changed in huge directories. These huge directory trees contain a lot of small files (about 80k files).



I guess that the rsync clients sends data for each of the 80k files.



Since the network is slow I would like to avoid to send 80k times information about each file.



Is there a way to tell rsync to make a hash-sum of a sub directory tree?



This way the rsync client would send only a few bytes for a huge directory tree.



Update



Up to now my strategy is to use rsync. But if a different tools fits better here, I am able to switch. Both (server and client) are under my control.



Update2



There are 80k files in one directory tree. Each single directory does not have more than 2k files or sub-directories



Update3



Details on the slowness of the network:



time ssh einswp 'cd attachments/200 && ls -lLR' >/tmp/list
real 0m2.645s


Size of tmp/list file: 2MByte



time scp einswp:/tmp/list tmp/
real 0m2.821s


Conclusion: scp has the same speed (no surprise)



time scp einswp:tmp/100MB tmp/
real 1m24.049s


Speed: 1.2MB/s







rsync synchronization






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Jan 7 '16 at 15:53







guettli

















asked Jan 4 '16 at 8:53









guettliguettli

44742961




44742961







  • 1





    You might read up on zsync. I have not used it myself, but from what I read, it pre-renders the metadata on the server side and might just speed up transfers in your case. It might be worth testing anyway. Beyond that, the only other solution I am aware of is real time block level syncronization that comes with some san/nas solutions.

    – Aaron
    Jan 5 '16 at 4:47












  • 1





    You might read up on zsync. I have not used it myself, but from what I read, it pre-renders the metadata on the server side and might just speed up transfers in your case. It might be worth testing anyway. Beyond that, the only other solution I am aware of is real time block level syncronization that comes with some san/nas solutions.

    – Aaron
    Jan 5 '16 at 4:47







1




1





You might read up on zsync. I have not used it myself, but from what I read, it pre-renders the metadata on the server side and might just speed up transfers in your case. It might be worth testing anyway. Beyond that, the only other solution I am aware of is real time block level syncronization that comes with some san/nas solutions.

– Aaron
Jan 5 '16 at 4:47





You might read up on zsync. I have not used it myself, but from what I read, it pre-renders the metadata on the server side and might just speed up transfers in your case. It might be worth testing anyway. Beyond that, the only other solution I am aware of is real time block level syncronization that comes with some san/nas solutions.

– Aaron
Jan 5 '16 at 4:47










4 Answers
4






active

oldest

votes


















28





+50









Some unrelated points:



80K is a lot of files.



80,000 files in one directory? No operating system or app handles that situation very well by default. You just happen to notice this problem with rsync.



Check your rsync version



Modern rsync handles large directories a lot better than in the past. Be sure you are using the latest version.



Even old rsync handles large directories fairly well over high latency links... but 80k files isn't large...it is huge!



That said, rsync's memory usage is directly proportional to the number of files in a tree. Large directories take a large amount of RAM. The slowness may be due to a lack of RAM on either side. Do a test run while watching memory usage. Linux uses any left-over RAM as a disk cache, so if you are running low on RAM, there is less disk caching. If you run out of RAM and the system starts using swap, performance will be really bad.



Make sure --checksum is not being used



--checksum (or -c) requires reading each and every block of every file. You probably can get by with the default behavior of just reading the modification times (stored in the inode).



Split the job into small batches.



There are some projects like Gigasync which will "Chop up the workload by using perl to recurse the directory tree, building smallish lists of files to transfer with rsync."



The extra directory scan is going to be a large amount of overhead, but maybe it will be a net win.



OS defaults aren't made for this situation.



If you are using Linux/FreeBSD/etc with all the defaults, performance will be terrible for all your applications. The defaults assume smaller directories so-as not to waste RAM on oversized caches.



Tune your filesystem to better handle large directories: Do large folder sizes slow down IO performance?



Look at the "namei cache"



BSD-like operating systems have a cache that accelerates looking up a name to the inode (the "namei" cache"). There is a namei cache for each directory. If it is too small, it is a hindrance more than an optimization. Since rsync is doing a lstat() on each file, the inode is being accessed for every one of the 80k files. That might be blowing your cache. Research how to tune file directory performance on your system.



Consider a different file system



XFS was designed to handle larger directories. See Filesystem large number of files in a single directory



Maybe 5 minutes is the best you can do.



Consider calculating how many disk blocks are being read, and calculate how fast you should expect the hardware to be able to read that many blocks.



Maybe your expectations are too high. Consider how many disk blocks must be read to do an rsync with no changed files: each server will need to read the directory and read one inode per file. Let's assume nothing is cached because, well, 80k files has probably blown your cache. Let's say that it is 80k blocks to keep the math simple. That's about 40M of data, which should be readable in a few seconds. However if there needs to be a disk seek between each block, that could take much longer.



So you are going to need to read about 80,000 disk blocks. How fast can your hard drive do that? Considering that this is random I/O, not a long linear read, 5 minutes might be pretty excellent. That's 1 / (80000 / 600), or a disk read every 7.5ms. Is that fast or slow for your hard drive? It depends on the model.



Benchmark against something similar



Another way to think about it is this. If no files have changed, ls -Llr does the same amount of disk activity but never reads any file data (just metadata). The time ls -Llr takes to run is your upper bound.



  • Is rsync (with no files changed) significantly slower than ls -Llr? Then the options you are using for rsync can be improved. Maybe -c is enabled or some other flag that reads more than just directories and metadata (inode data).


  • Is rsync (with no files changed) nearly as fast as ls -Llr? Then you've tuned rsync as best as you can. You have to tune the OS, add RAM, get faster drives, change filesystems, etc.


Talk to your devs



80k files is just bad design. Very few file systems and system tools handle such large directories very well. If the filenames are abcdefg.txt, consider storing them in abdc/abcdefg.txt (note the repetition). This breaks the directories up into smaller ones, but doesn't require a huge change to the code.



Also.... consider using a database. If you have 80k files in a directory, maybe your developers are working around the fact that what they really want is a database. MariaDB or MySQL or PostgreSQL would be a much better option for storing large amounts of data.



Hey, what's wrong with 5 minutes?



Lastly, is 5 minutes really so bad? If you run this backup once a day, 5 minutes is not a lot of time. Yes, I love speed. However if 5 minutes is "good enough" for your customers, then it is good enough for you. If you don't have a written SLA, how about an informal discussion with your users to find out how fast they expect the backups to take.



I assume you didn't ask this question if there wasn't a need to improve the performance. However, if your customers are happy with 5 minutes, declare victory and move on to other projects that need your efforts.



Update: After some discussion we determined that the bottleneck is the network. I'm going to recommend 2 things before I give up :-).



  • Try to squeeze more bandwidth out of the pipe with compression. However compression requires more CPU, so if your CPU is overloaded, it might make performance worse. Try rsync with and without -z, and configure your ssh with and without compression. Time all 4 combinations to see if any of them perform significantly better than others.

  • Watch network traffic to see if there are any pauses. If there are pauses, you can find what is causing them and optimize there. If rsync is always sending, then you really are at your limit. Your choices are:

    • a faster network

    • something other than rsync

    • move the source and destination closer together. If you can't do that, can you rsync to a local machine then rsync to the real destination? There may be benefits to doing this if the system has to be down during the initial rsync.






share|improve this answer

























  • 80K is a lot of files.: There are 80k files in one directory tree. Each single directory does not have more than 2k files/subdirectories.

    – guettli
    Jan 7 '16 at 8:50












  • Check your rsync version: done, Make sure --checksum is not being used: done. Split the job into small batches: Thank you I will have a look at gigasync. OS defaults aren't made for this situation: done (the bottleneck is network not OS). Look at the "namei cache": done (it is net, not OS). Consider a different file system: again net, not OS. Maybe 5 minutes is the best you can do.: I think it could be much faster. Talk to your devs (use DB): This would be a giant change. Maybe an filesystem with better backup support would solve it.

    – guettli
    Jan 7 '16 at 8:58











  • 2k files per directory is a lot better. thank you for the update. You hadn't mentioned that the network was slow. Is it low bandwidth, high latency, or both? rsync usually performs well on high latency links (it was developed by someone working on his PhD from Australia while dealing with computers in the U.S.). Try doing that "ls -lLR" over ssh and time how long it takes to transmit the result. "time ssh remotehost 'cd /dest && ls -lLR' >/tmp/list". Make sure the /tmp/list gets created on the local host.

    – TomOnTime
    Jan 7 '16 at 13:40











  • yes the network is slow. It is a pitty.

    – guettli
    Jan 7 '16 at 15:40











  • How slow? If you use "scp" to copy a 100M file, how long does it take? Also, what is the output of "time ssh remotehost 'cd /dest && ls -lLR' >/tmp/list"?

    – TomOnTime
    Jan 7 '16 at 15:41


















2














No, that's not possible with rsync and it would be quite inefficient in another regard:



Normally, rsync only compares file modification dates and file sizes. Your approach would force it to read and checksum the content of all files twice (on the local and remote system) to find changed directories.






share|improve this answer




















  • 1





    AFAIK rsync checks mtime and size. If both matches, the file is not transferred again (at least in the default settings). It would be enough to send the hash of the tuples (filename, size, mtime). There is no need to checksum the content.

    – guettli
    Jan 4 '16 at 10:59











  • Yes, you are correct, but anyway, rsync doesn't do this.

    – Sven
    Jan 4 '16 at 10:59


















1














For synchronisation of large numbers of files (where little has changed), it is also worth setting noatime on the source and destination partitions. This saves writing access times to the disk for each unchanged file.






share|improve this answer























  • Yes, the noatime option makes sense. We use it since several years. I guess an alternative to rsync is needed.

    – guettli
    Aug 28 '17 at 7:07


















0














Use rsync in daemon mode at the server end to speed up the listing/checksum process:



  • Rsync daemon: is it really useful?

  • http://giantdorks.org/alain/achieve-faster-file-transfer-times-by-running-rsync-as-a-daemon/

Note it isn't encrypted, but may be able to be tunneled without losing the listing performance improvement.



Also having rsync do compression rather than ssh should improve performance.






share|improve this answer

























    Your Answer








    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "2"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader:
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    ,
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );













    draft saved

    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fserverfault.com%2fquestions%2f746551%2ffaster-rsync-of-huge-directory-which-was-not-changed%23new-answer', 'question_page');

    );

    Post as a guest















    Required, but never shown

























    4 Answers
    4






    active

    oldest

    votes








    4 Answers
    4






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    28





    +50









    Some unrelated points:



    80K is a lot of files.



    80,000 files in one directory? No operating system or app handles that situation very well by default. You just happen to notice this problem with rsync.



    Check your rsync version



    Modern rsync handles large directories a lot better than in the past. Be sure you are using the latest version.



    Even old rsync handles large directories fairly well over high latency links... but 80k files isn't large...it is huge!



    That said, rsync's memory usage is directly proportional to the number of files in a tree. Large directories take a large amount of RAM. The slowness may be due to a lack of RAM on either side. Do a test run while watching memory usage. Linux uses any left-over RAM as a disk cache, so if you are running low on RAM, there is less disk caching. If you run out of RAM and the system starts using swap, performance will be really bad.



    Make sure --checksum is not being used



    --checksum (or -c) requires reading each and every block of every file. You probably can get by with the default behavior of just reading the modification times (stored in the inode).



    Split the job into small batches.



    There are some projects like Gigasync which will "Chop up the workload by using perl to recurse the directory tree, building smallish lists of files to transfer with rsync."



    The extra directory scan is going to be a large amount of overhead, but maybe it will be a net win.



    OS defaults aren't made for this situation.



    If you are using Linux/FreeBSD/etc with all the defaults, performance will be terrible for all your applications. The defaults assume smaller directories so-as not to waste RAM on oversized caches.



    Tune your filesystem to better handle large directories: Do large folder sizes slow down IO performance?



    Look at the "namei cache"



    BSD-like operating systems have a cache that accelerates looking up a name to the inode (the "namei" cache"). There is a namei cache for each directory. If it is too small, it is a hindrance more than an optimization. Since rsync is doing a lstat() on each file, the inode is being accessed for every one of the 80k files. That might be blowing your cache. Research how to tune file directory performance on your system.



    Consider a different file system



    XFS was designed to handle larger directories. See Filesystem large number of files in a single directory



    Maybe 5 minutes is the best you can do.



    Consider calculating how many disk blocks are being read, and calculate how fast you should expect the hardware to be able to read that many blocks.



    Maybe your expectations are too high. Consider how many disk blocks must be read to do an rsync with no changed files: each server will need to read the directory and read one inode per file. Let's assume nothing is cached because, well, 80k files has probably blown your cache. Let's say that it is 80k blocks to keep the math simple. That's about 40M of data, which should be readable in a few seconds. However if there needs to be a disk seek between each block, that could take much longer.



    So you are going to need to read about 80,000 disk blocks. How fast can your hard drive do that? Considering that this is random I/O, not a long linear read, 5 minutes might be pretty excellent. That's 1 / (80000 / 600), or a disk read every 7.5ms. Is that fast or slow for your hard drive? It depends on the model.



    Benchmark against something similar



    Another way to think about it is this. If no files have changed, ls -Llr does the same amount of disk activity but never reads any file data (just metadata). The time ls -Llr takes to run is your upper bound.



    • Is rsync (with no files changed) significantly slower than ls -Llr? Then the options you are using for rsync can be improved. Maybe -c is enabled or some other flag that reads more than just directories and metadata (inode data).


    • Is rsync (with no files changed) nearly as fast as ls -Llr? Then you've tuned rsync as best as you can. You have to tune the OS, add RAM, get faster drives, change filesystems, etc.


    Talk to your devs



    80k files is just bad design. Very few file systems and system tools handle such large directories very well. If the filenames are abcdefg.txt, consider storing them in abdc/abcdefg.txt (note the repetition). This breaks the directories up into smaller ones, but doesn't require a huge change to the code.



    Also.... consider using a database. If you have 80k files in a directory, maybe your developers are working around the fact that what they really want is a database. MariaDB or MySQL or PostgreSQL would be a much better option for storing large amounts of data.



    Hey, what's wrong with 5 minutes?



    Lastly, is 5 minutes really so bad? If you run this backup once a day, 5 minutes is not a lot of time. Yes, I love speed. However if 5 minutes is "good enough" for your customers, then it is good enough for you. If you don't have a written SLA, how about an informal discussion with your users to find out how fast they expect the backups to take.



    I assume you didn't ask this question if there wasn't a need to improve the performance. However, if your customers are happy with 5 minutes, declare victory and move on to other projects that need your efforts.



    Update: After some discussion we determined that the bottleneck is the network. I'm going to recommend 2 things before I give up :-).



    • Try to squeeze more bandwidth out of the pipe with compression. However compression requires more CPU, so if your CPU is overloaded, it might make performance worse. Try rsync with and without -z, and configure your ssh with and without compression. Time all 4 combinations to see if any of them perform significantly better than others.

    • Watch network traffic to see if there are any pauses. If there are pauses, you can find what is causing them and optimize there. If rsync is always sending, then you really are at your limit. Your choices are:

      • a faster network

      • something other than rsync

      • move the source and destination closer together. If you can't do that, can you rsync to a local machine then rsync to the real destination? There may be benefits to doing this if the system has to be down during the initial rsync.






    share|improve this answer

























    • 80K is a lot of files.: There are 80k files in one directory tree. Each single directory does not have more than 2k files/subdirectories.

      – guettli
      Jan 7 '16 at 8:50












    • Check your rsync version: done, Make sure --checksum is not being used: done. Split the job into small batches: Thank you I will have a look at gigasync. OS defaults aren't made for this situation: done (the bottleneck is network not OS). Look at the "namei cache": done (it is net, not OS). Consider a different file system: again net, not OS. Maybe 5 minutes is the best you can do.: I think it could be much faster. Talk to your devs (use DB): This would be a giant change. Maybe an filesystem with better backup support would solve it.

      – guettli
      Jan 7 '16 at 8:58











    • 2k files per directory is a lot better. thank you for the update. You hadn't mentioned that the network was slow. Is it low bandwidth, high latency, or both? rsync usually performs well on high latency links (it was developed by someone working on his PhD from Australia while dealing with computers in the U.S.). Try doing that "ls -lLR" over ssh and time how long it takes to transmit the result. "time ssh remotehost 'cd /dest && ls -lLR' >/tmp/list". Make sure the /tmp/list gets created on the local host.

      – TomOnTime
      Jan 7 '16 at 13:40











    • yes the network is slow. It is a pitty.

      – guettli
      Jan 7 '16 at 15:40











    • How slow? If you use "scp" to copy a 100M file, how long does it take? Also, what is the output of "time ssh remotehost 'cd /dest && ls -lLR' >/tmp/list"?

      – TomOnTime
      Jan 7 '16 at 15:41















    28





    +50









    Some unrelated points:



    80K is a lot of files.



    80,000 files in one directory? No operating system or app handles that situation very well by default. You just happen to notice this problem with rsync.



    Check your rsync version



    Modern rsync handles large directories a lot better than in the past. Be sure you are using the latest version.



    Even old rsync handles large directories fairly well over high latency links... but 80k files isn't large...it is huge!



    That said, rsync's memory usage is directly proportional to the number of files in a tree. Large directories take a large amount of RAM. The slowness may be due to a lack of RAM on either side. Do a test run while watching memory usage. Linux uses any left-over RAM as a disk cache, so if you are running low on RAM, there is less disk caching. If you run out of RAM and the system starts using swap, performance will be really bad.



    Make sure --checksum is not being used



    --checksum (or -c) requires reading each and every block of every file. You probably can get by with the default behavior of just reading the modification times (stored in the inode).



    Split the job into small batches.



    There are some projects like Gigasync which will "Chop up the workload by using perl to recurse the directory tree, building smallish lists of files to transfer with rsync."



    The extra directory scan is going to be a large amount of overhead, but maybe it will be a net win.



    OS defaults aren't made for this situation.



    If you are using Linux/FreeBSD/etc with all the defaults, performance will be terrible for all your applications. The defaults assume smaller directories so-as not to waste RAM on oversized caches.



    Tune your filesystem to better handle large directories: Do large folder sizes slow down IO performance?



    Look at the "namei cache"



    BSD-like operating systems have a cache that accelerates looking up a name to the inode (the "namei" cache"). There is a namei cache for each directory. If it is too small, it is a hindrance more than an optimization. Since rsync is doing a lstat() on each file, the inode is being accessed for every one of the 80k files. That might be blowing your cache. Research how to tune file directory performance on your system.



    Consider a different file system



    XFS was designed to handle larger directories. See Filesystem large number of files in a single directory



    Maybe 5 minutes is the best you can do.



    Consider calculating how many disk blocks are being read, and calculate how fast you should expect the hardware to be able to read that many blocks.



    Maybe your expectations are too high. Consider how many disk blocks must be read to do an rsync with no changed files: each server will need to read the directory and read one inode per file. Let's assume nothing is cached because, well, 80k files has probably blown your cache. Let's say that it is 80k blocks to keep the math simple. That's about 40M of data, which should be readable in a few seconds. However if there needs to be a disk seek between each block, that could take much longer.



    So you are going to need to read about 80,000 disk blocks. How fast can your hard drive do that? Considering that this is random I/O, not a long linear read, 5 minutes might be pretty excellent. That's 1 / (80000 / 600), or a disk read every 7.5ms. Is that fast or slow for your hard drive? It depends on the model.



    Benchmark against something similar



    Another way to think about it is this. If no files have changed, ls -Llr does the same amount of disk activity but never reads any file data (just metadata). The time ls -Llr takes to run is your upper bound.



    • Is rsync (with no files changed) significantly slower than ls -Llr? Then the options you are using for rsync can be improved. Maybe -c is enabled or some other flag that reads more than just directories and metadata (inode data).


    • Is rsync (with no files changed) nearly as fast as ls -Llr? Then you've tuned rsync as best as you can. You have to tune the OS, add RAM, get faster drives, change filesystems, etc.


    Talk to your devs



    80k files is just bad design. Very few file systems and system tools handle such large directories very well. If the filenames are abcdefg.txt, consider storing them in abdc/abcdefg.txt (note the repetition). This breaks the directories up into smaller ones, but doesn't require a huge change to the code.



    Also.... consider using a database. If you have 80k files in a directory, maybe your developers are working around the fact that what they really want is a database. MariaDB or MySQL or PostgreSQL would be a much better option for storing large amounts of data.



    Hey, what's wrong with 5 minutes?



    Lastly, is 5 minutes really so bad? If you run this backup once a day, 5 minutes is not a lot of time. Yes, I love speed. However if 5 minutes is "good enough" for your customers, then it is good enough for you. If you don't have a written SLA, how about an informal discussion with your users to find out how fast they expect the backups to take.



    I assume you didn't ask this question if there wasn't a need to improve the performance. However, if your customers are happy with 5 minutes, declare victory and move on to other projects that need your efforts.



    Update: After some discussion we determined that the bottleneck is the network. I'm going to recommend 2 things before I give up :-).



    • Try to squeeze more bandwidth out of the pipe with compression. However compression requires more CPU, so if your CPU is overloaded, it might make performance worse. Try rsync with and without -z, and configure your ssh with and without compression. Time all 4 combinations to see if any of them perform significantly better than others.

    • Watch network traffic to see if there are any pauses. If there are pauses, you can find what is causing them and optimize there. If rsync is always sending, then you really are at your limit. Your choices are:

      • a faster network

      • something other than rsync

      • move the source and destination closer together. If you can't do that, can you rsync to a local machine then rsync to the real destination? There may be benefits to doing this if the system has to be down during the initial rsync.






    share|improve this answer

























    • 80K is a lot of files.: There are 80k files in one directory tree. Each single directory does not have more than 2k files/subdirectories.

      – guettli
      Jan 7 '16 at 8:50












    • Check your rsync version: done, Make sure --checksum is not being used: done. Split the job into small batches: Thank you I will have a look at gigasync. OS defaults aren't made for this situation: done (the bottleneck is network not OS). Look at the "namei cache": done (it is net, not OS). Consider a different file system: again net, not OS. Maybe 5 minutes is the best you can do.: I think it could be much faster. Talk to your devs (use DB): This would be a giant change. Maybe an filesystem with better backup support would solve it.

      – guettli
      Jan 7 '16 at 8:58











    • 2k files per directory is a lot better. thank you for the update. You hadn't mentioned that the network was slow. Is it low bandwidth, high latency, or both? rsync usually performs well on high latency links (it was developed by someone working on his PhD from Australia while dealing with computers in the U.S.). Try doing that "ls -lLR" over ssh and time how long it takes to transmit the result. "time ssh remotehost 'cd /dest && ls -lLR' >/tmp/list". Make sure the /tmp/list gets created on the local host.

      – TomOnTime
      Jan 7 '16 at 13:40











    • yes the network is slow. It is a pitty.

      – guettli
      Jan 7 '16 at 15:40











    • How slow? If you use "scp" to copy a 100M file, how long does it take? Also, what is the output of "time ssh remotehost 'cd /dest && ls -lLR' >/tmp/list"?

      – TomOnTime
      Jan 7 '16 at 15:41













    28





    +50







    28





    +50



    28




    +50





    Some unrelated points:



    80K is a lot of files.



    80,000 files in one directory? No operating system or app handles that situation very well by default. You just happen to notice this problem with rsync.



    Check your rsync version



    Modern rsync handles large directories a lot better than in the past. Be sure you are using the latest version.



    Even old rsync handles large directories fairly well over high latency links... but 80k files isn't large...it is huge!



    That said, rsync's memory usage is directly proportional to the number of files in a tree. Large directories take a large amount of RAM. The slowness may be due to a lack of RAM on either side. Do a test run while watching memory usage. Linux uses any left-over RAM as a disk cache, so if you are running low on RAM, there is less disk caching. If you run out of RAM and the system starts using swap, performance will be really bad.



    Make sure --checksum is not being used



    --checksum (or -c) requires reading each and every block of every file. You probably can get by with the default behavior of just reading the modification times (stored in the inode).



    Split the job into small batches.



    There are some projects like Gigasync which will "Chop up the workload by using perl to recurse the directory tree, building smallish lists of files to transfer with rsync."



    The extra directory scan is going to be a large amount of overhead, but maybe it will be a net win.



    OS defaults aren't made for this situation.



    If you are using Linux/FreeBSD/etc with all the defaults, performance will be terrible for all your applications. The defaults assume smaller directories so-as not to waste RAM on oversized caches.



    Tune your filesystem to better handle large directories: Do large folder sizes slow down IO performance?



    Look at the "namei cache"



    BSD-like operating systems have a cache that accelerates looking up a name to the inode (the "namei" cache"). There is a namei cache for each directory. If it is too small, it is a hindrance more than an optimization. Since rsync is doing a lstat() on each file, the inode is being accessed for every one of the 80k files. That might be blowing your cache. Research how to tune file directory performance on your system.



    Consider a different file system



    XFS was designed to handle larger directories. See Filesystem large number of files in a single directory



    Maybe 5 minutes is the best you can do.



    Consider calculating how many disk blocks are being read, and calculate how fast you should expect the hardware to be able to read that many blocks.



    Maybe your expectations are too high. Consider how many disk blocks must be read to do an rsync with no changed files: each server will need to read the directory and read one inode per file. Let's assume nothing is cached because, well, 80k files has probably blown your cache. Let's say that it is 80k blocks to keep the math simple. That's about 40M of data, which should be readable in a few seconds. However if there needs to be a disk seek between each block, that could take much longer.



    So you are going to need to read about 80,000 disk blocks. How fast can your hard drive do that? Considering that this is random I/O, not a long linear read, 5 minutes might be pretty excellent. That's 1 / (80000 / 600), or a disk read every 7.5ms. Is that fast or slow for your hard drive? It depends on the model.



    Benchmark against something similar



    Another way to think about it is this. If no files have changed, ls -Llr does the same amount of disk activity but never reads any file data (just metadata). The time ls -Llr takes to run is your upper bound.



    • Is rsync (with no files changed) significantly slower than ls -Llr? Then the options you are using for rsync can be improved. Maybe -c is enabled or some other flag that reads more than just directories and metadata (inode data).


    • Is rsync (with no files changed) nearly as fast as ls -Llr? Then you've tuned rsync as best as you can. You have to tune the OS, add RAM, get faster drives, change filesystems, etc.


    Talk to your devs



    80k files is just bad design. Very few file systems and system tools handle such large directories very well. If the filenames are abcdefg.txt, consider storing them in abdc/abcdefg.txt (note the repetition). This breaks the directories up into smaller ones, but doesn't require a huge change to the code.



    Also.... consider using a database. If you have 80k files in a directory, maybe your developers are working around the fact that what they really want is a database. MariaDB or MySQL or PostgreSQL would be a much better option for storing large amounts of data.



    Hey, what's wrong with 5 minutes?



    Lastly, is 5 minutes really so bad? If you run this backup once a day, 5 minutes is not a lot of time. Yes, I love speed. However if 5 minutes is "good enough" for your customers, then it is good enough for you. If you don't have a written SLA, how about an informal discussion with your users to find out how fast they expect the backups to take.



    I assume you didn't ask this question if there wasn't a need to improve the performance. However, if your customers are happy with 5 minutes, declare victory and move on to other projects that need your efforts.



    Update: After some discussion we determined that the bottleneck is the network. I'm going to recommend 2 things before I give up :-).



    • Try to squeeze more bandwidth out of the pipe with compression. However compression requires more CPU, so if your CPU is overloaded, it might make performance worse. Try rsync with and without -z, and configure your ssh with and without compression. Time all 4 combinations to see if any of them perform significantly better than others.

    • Watch network traffic to see if there are any pauses. If there are pauses, you can find what is causing them and optimize there. If rsync is always sending, then you really are at your limit. Your choices are:

      • a faster network

      • something other than rsync

      • move the source and destination closer together. If you can't do that, can you rsync to a local machine then rsync to the real destination? There may be benefits to doing this if the system has to be down during the initial rsync.






    share|improve this answer















    Some unrelated points:



    80K is a lot of files.



    80,000 files in one directory? No operating system or app handles that situation very well by default. You just happen to notice this problem with rsync.



    Check your rsync version



    Modern rsync handles large directories a lot better than in the past. Be sure you are using the latest version.



    Even old rsync handles large directories fairly well over high latency links... but 80k files isn't large...it is huge!



    That said, rsync's memory usage is directly proportional to the number of files in a tree. Large directories take a large amount of RAM. The slowness may be due to a lack of RAM on either side. Do a test run while watching memory usage. Linux uses any left-over RAM as a disk cache, so if you are running low on RAM, there is less disk caching. If you run out of RAM and the system starts using swap, performance will be really bad.



    Make sure --checksum is not being used



    --checksum (or -c) requires reading each and every block of every file. You probably can get by with the default behavior of just reading the modification times (stored in the inode).



    Split the job into small batches.



    There are some projects like Gigasync which will "Chop up the workload by using perl to recurse the directory tree, building smallish lists of files to transfer with rsync."



    The extra directory scan is going to be a large amount of overhead, but maybe it will be a net win.



    OS defaults aren't made for this situation.



    If you are using Linux/FreeBSD/etc with all the defaults, performance will be terrible for all your applications. The defaults assume smaller directories so-as not to waste RAM on oversized caches.



    Tune your filesystem to better handle large directories: Do large folder sizes slow down IO performance?



    Look at the "namei cache"



    BSD-like operating systems have a cache that accelerates looking up a name to the inode (the "namei" cache"). There is a namei cache for each directory. If it is too small, it is a hindrance more than an optimization. Since rsync is doing a lstat() on each file, the inode is being accessed for every one of the 80k files. That might be blowing your cache. Research how to tune file directory performance on your system.



    Consider a different file system



    XFS was designed to handle larger directories. See Filesystem large number of files in a single directory



    Maybe 5 minutes is the best you can do.



    Consider calculating how many disk blocks are being read, and calculate how fast you should expect the hardware to be able to read that many blocks.



    Maybe your expectations are too high. Consider how many disk blocks must be read to do an rsync with no changed files: each server will need to read the directory and read one inode per file. Let's assume nothing is cached because, well, 80k files has probably blown your cache. Let's say that it is 80k blocks to keep the math simple. That's about 40M of data, which should be readable in a few seconds. However if there needs to be a disk seek between each block, that could take much longer.



    So you are going to need to read about 80,000 disk blocks. How fast can your hard drive do that? Considering that this is random I/O, not a long linear read, 5 minutes might be pretty excellent. That's 1 / (80000 / 600), or a disk read every 7.5ms. Is that fast or slow for your hard drive? It depends on the model.



    Benchmark against something similar



    Another way to think about it is this. If no files have changed, ls -Llr does the same amount of disk activity but never reads any file data (just metadata). The time ls -Llr takes to run is your upper bound.



    • Is rsync (with no files changed) significantly slower than ls -Llr? Then the options you are using for rsync can be improved. Maybe -c is enabled or some other flag that reads more than just directories and metadata (inode data).


    • Is rsync (with no files changed) nearly as fast as ls -Llr? Then you've tuned rsync as best as you can. You have to tune the OS, add RAM, get faster drives, change filesystems, etc.


    Talk to your devs



    80k files is just bad design. Very few file systems and system tools handle such large directories very well. If the filenames are abcdefg.txt, consider storing them in abdc/abcdefg.txt (note the repetition). This breaks the directories up into smaller ones, but doesn't require a huge change to the code.



    Also.... consider using a database. If you have 80k files in a directory, maybe your developers are working around the fact that what they really want is a database. MariaDB or MySQL or PostgreSQL would be a much better option for storing large amounts of data.



    Hey, what's wrong with 5 minutes?



    Lastly, is 5 minutes really so bad? If you run this backup once a day, 5 minutes is not a lot of time. Yes, I love speed. However if 5 minutes is "good enough" for your customers, then it is good enough for you. If you don't have a written SLA, how about an informal discussion with your users to find out how fast they expect the backups to take.



    I assume you didn't ask this question if there wasn't a need to improve the performance. However, if your customers are happy with 5 minutes, declare victory and move on to other projects that need your efforts.



    Update: After some discussion we determined that the bottleneck is the network. I'm going to recommend 2 things before I give up :-).



    • Try to squeeze more bandwidth out of the pipe with compression. However compression requires more CPU, so if your CPU is overloaded, it might make performance worse. Try rsync with and without -z, and configure your ssh with and without compression. Time all 4 combinations to see if any of them perform significantly better than others.

    • Watch network traffic to see if there are any pauses. If there are pauses, you can find what is causing them and optimize there. If rsync is always sending, then you really are at your limit. Your choices are:

      • a faster network

      • something other than rsync

      • move the source and destination closer together. If you can't do that, can you rsync to a local machine then rsync to the real destination? There may be benefits to doing this if the system has to be down during the initial rsync.







    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited Nov 13 '18 at 11:28









    mkucharski

    32




    32










    answered Jan 5 '16 at 14:50









    TomOnTimeTomOnTime

    5,16532249




    5,16532249












    • 80K is a lot of files.: There are 80k files in one directory tree. Each single directory does not have more than 2k files/subdirectories.

      – guettli
      Jan 7 '16 at 8:50












    • Check your rsync version: done, Make sure --checksum is not being used: done. Split the job into small batches: Thank you I will have a look at gigasync. OS defaults aren't made for this situation: done (the bottleneck is network not OS). Look at the "namei cache": done (it is net, not OS). Consider a different file system: again net, not OS. Maybe 5 minutes is the best you can do.: I think it could be much faster. Talk to your devs (use DB): This would be a giant change. Maybe an filesystem with better backup support would solve it.

      – guettli
      Jan 7 '16 at 8:58











    • 2k files per directory is a lot better. thank you for the update. You hadn't mentioned that the network was slow. Is it low bandwidth, high latency, or both? rsync usually performs well on high latency links (it was developed by someone working on his PhD from Australia while dealing with computers in the U.S.). Try doing that "ls -lLR" over ssh and time how long it takes to transmit the result. "time ssh remotehost 'cd /dest && ls -lLR' >/tmp/list". Make sure the /tmp/list gets created on the local host.

      – TomOnTime
      Jan 7 '16 at 13:40











    • yes the network is slow. It is a pitty.

      – guettli
      Jan 7 '16 at 15:40











    • How slow? If you use "scp" to copy a 100M file, how long does it take? Also, what is the output of "time ssh remotehost 'cd /dest && ls -lLR' >/tmp/list"?

      – TomOnTime
      Jan 7 '16 at 15:41

















    • 80K is a lot of files.: There are 80k files in one directory tree. Each single directory does not have more than 2k files/subdirectories.

      – guettli
      Jan 7 '16 at 8:50












    • Check your rsync version: done, Make sure --checksum is not being used: done. Split the job into small batches: Thank you I will have a look at gigasync. OS defaults aren't made for this situation: done (the bottleneck is network not OS). Look at the "namei cache": done (it is net, not OS). Consider a different file system: again net, not OS. Maybe 5 minutes is the best you can do.: I think it could be much faster. Talk to your devs (use DB): This would be a giant change. Maybe an filesystem with better backup support would solve it.

      – guettli
      Jan 7 '16 at 8:58











    • 2k files per directory is a lot better. thank you for the update. You hadn't mentioned that the network was slow. Is it low bandwidth, high latency, or both? rsync usually performs well on high latency links (it was developed by someone working on his PhD from Australia while dealing with computers in the U.S.). Try doing that "ls -lLR" over ssh and time how long it takes to transmit the result. "time ssh remotehost 'cd /dest && ls -lLR' >/tmp/list". Make sure the /tmp/list gets created on the local host.

      – TomOnTime
      Jan 7 '16 at 13:40











    • yes the network is slow. It is a pitty.

      – guettli
      Jan 7 '16 at 15:40











    • How slow? If you use "scp" to copy a 100M file, how long does it take? Also, what is the output of "time ssh remotehost 'cd /dest && ls -lLR' >/tmp/list"?

      – TomOnTime
      Jan 7 '16 at 15:41
















    80K is a lot of files.: There are 80k files in one directory tree. Each single directory does not have more than 2k files/subdirectories.

    – guettli
    Jan 7 '16 at 8:50






    80K is a lot of files.: There are 80k files in one directory tree. Each single directory does not have more than 2k files/subdirectories.

    – guettli
    Jan 7 '16 at 8:50














    Check your rsync version: done, Make sure --checksum is not being used: done. Split the job into small batches: Thank you I will have a look at gigasync. OS defaults aren't made for this situation: done (the bottleneck is network not OS). Look at the "namei cache": done (it is net, not OS). Consider a different file system: again net, not OS. Maybe 5 minutes is the best you can do.: I think it could be much faster. Talk to your devs (use DB): This would be a giant change. Maybe an filesystem with better backup support would solve it.

    – guettli
    Jan 7 '16 at 8:58





    Check your rsync version: done, Make sure --checksum is not being used: done. Split the job into small batches: Thank you I will have a look at gigasync. OS defaults aren't made for this situation: done (the bottleneck is network not OS). Look at the "namei cache": done (it is net, not OS). Consider a different file system: again net, not OS. Maybe 5 minutes is the best you can do.: I think it could be much faster. Talk to your devs (use DB): This would be a giant change. Maybe an filesystem with better backup support would solve it.

    – guettli
    Jan 7 '16 at 8:58













    2k files per directory is a lot better. thank you for the update. You hadn't mentioned that the network was slow. Is it low bandwidth, high latency, or both? rsync usually performs well on high latency links (it was developed by someone working on his PhD from Australia while dealing with computers in the U.S.). Try doing that "ls -lLR" over ssh and time how long it takes to transmit the result. "time ssh remotehost 'cd /dest && ls -lLR' >/tmp/list". Make sure the /tmp/list gets created on the local host.

    – TomOnTime
    Jan 7 '16 at 13:40





    2k files per directory is a lot better. thank you for the update. You hadn't mentioned that the network was slow. Is it low bandwidth, high latency, or both? rsync usually performs well on high latency links (it was developed by someone working on his PhD from Australia while dealing with computers in the U.S.). Try doing that "ls -lLR" over ssh and time how long it takes to transmit the result. "time ssh remotehost 'cd /dest && ls -lLR' >/tmp/list". Make sure the /tmp/list gets created on the local host.

    – TomOnTime
    Jan 7 '16 at 13:40













    yes the network is slow. It is a pitty.

    – guettli
    Jan 7 '16 at 15:40





    yes the network is slow. It is a pitty.

    – guettli
    Jan 7 '16 at 15:40













    How slow? If you use "scp" to copy a 100M file, how long does it take? Also, what is the output of "time ssh remotehost 'cd /dest && ls -lLR' >/tmp/list"?

    – TomOnTime
    Jan 7 '16 at 15:41





    How slow? If you use "scp" to copy a 100M file, how long does it take? Also, what is the output of "time ssh remotehost 'cd /dest && ls -lLR' >/tmp/list"?

    – TomOnTime
    Jan 7 '16 at 15:41













    2














    No, that's not possible with rsync and it would be quite inefficient in another regard:



    Normally, rsync only compares file modification dates and file sizes. Your approach would force it to read and checksum the content of all files twice (on the local and remote system) to find changed directories.






    share|improve this answer




















    • 1





      AFAIK rsync checks mtime and size. If both matches, the file is not transferred again (at least in the default settings). It would be enough to send the hash of the tuples (filename, size, mtime). There is no need to checksum the content.

      – guettli
      Jan 4 '16 at 10:59











    • Yes, you are correct, but anyway, rsync doesn't do this.

      – Sven
      Jan 4 '16 at 10:59















    2














    No, that's not possible with rsync and it would be quite inefficient in another regard:



    Normally, rsync only compares file modification dates and file sizes. Your approach would force it to read and checksum the content of all files twice (on the local and remote system) to find changed directories.






    share|improve this answer




















    • 1





      AFAIK rsync checks mtime and size. If both matches, the file is not transferred again (at least in the default settings). It would be enough to send the hash of the tuples (filename, size, mtime). There is no need to checksum the content.

      – guettli
      Jan 4 '16 at 10:59











    • Yes, you are correct, but anyway, rsync doesn't do this.

      – Sven
      Jan 4 '16 at 10:59













    2












    2








    2







    No, that's not possible with rsync and it would be quite inefficient in another regard:



    Normally, rsync only compares file modification dates and file sizes. Your approach would force it to read and checksum the content of all files twice (on the local and remote system) to find changed directories.






    share|improve this answer















    No, that's not possible with rsync and it would be quite inefficient in another regard:



    Normally, rsync only compares file modification dates and file sizes. Your approach would force it to read and checksum the content of all files twice (on the local and remote system) to find changed directories.







    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited Jan 4 '16 at 10:55

























    answered Jan 4 '16 at 10:49









    SvenSven

    88.3k10150202




    88.3k10150202







    • 1





      AFAIK rsync checks mtime and size. If both matches, the file is not transferred again (at least in the default settings). It would be enough to send the hash of the tuples (filename, size, mtime). There is no need to checksum the content.

      – guettli
      Jan 4 '16 at 10:59











    • Yes, you are correct, but anyway, rsync doesn't do this.

      – Sven
      Jan 4 '16 at 10:59












    • 1





      AFAIK rsync checks mtime and size. If both matches, the file is not transferred again (at least in the default settings). It would be enough to send the hash of the tuples (filename, size, mtime). There is no need to checksum the content.

      – guettli
      Jan 4 '16 at 10:59











    • Yes, you are correct, but anyway, rsync doesn't do this.

      – Sven
      Jan 4 '16 at 10:59







    1




    1





    AFAIK rsync checks mtime and size. If both matches, the file is not transferred again (at least in the default settings). It would be enough to send the hash of the tuples (filename, size, mtime). There is no need to checksum the content.

    – guettli
    Jan 4 '16 at 10:59





    AFAIK rsync checks mtime and size. If both matches, the file is not transferred again (at least in the default settings). It would be enough to send the hash of the tuples (filename, size, mtime). There is no need to checksum the content.

    – guettli
    Jan 4 '16 at 10:59













    Yes, you are correct, but anyway, rsync doesn't do this.

    – Sven
    Jan 4 '16 at 10:59





    Yes, you are correct, but anyway, rsync doesn't do this.

    – Sven
    Jan 4 '16 at 10:59











    1














    For synchronisation of large numbers of files (where little has changed), it is also worth setting noatime on the source and destination partitions. This saves writing access times to the disk for each unchanged file.






    share|improve this answer























    • Yes, the noatime option makes sense. We use it since several years. I guess an alternative to rsync is needed.

      – guettli
      Aug 28 '17 at 7:07















    1














    For synchronisation of large numbers of files (where little has changed), it is also worth setting noatime on the source and destination partitions. This saves writing access times to the disk for each unchanged file.






    share|improve this answer























    • Yes, the noatime option makes sense. We use it since several years. I guess an alternative to rsync is needed.

      – guettli
      Aug 28 '17 at 7:07













    1












    1








    1







    For synchronisation of large numbers of files (where little has changed), it is also worth setting noatime on the source and destination partitions. This saves writing access times to the disk for each unchanged file.






    share|improve this answer













    For synchronisation of large numbers of files (where little has changed), it is also worth setting noatime on the source and destination partitions. This saves writing access times to the disk for each unchanged file.







    share|improve this answer












    share|improve this answer



    share|improve this answer










    answered Aug 26 '17 at 11:08









    Andy BeverleyAndy Beverley

    1413




    1413












    • Yes, the noatime option makes sense. We use it since several years. I guess an alternative to rsync is needed.

      – guettli
      Aug 28 '17 at 7:07

















    • Yes, the noatime option makes sense. We use it since several years. I guess an alternative to rsync is needed.

      – guettli
      Aug 28 '17 at 7:07
















    Yes, the noatime option makes sense. We use it since several years. I guess an alternative to rsync is needed.

    – guettli
    Aug 28 '17 at 7:07





    Yes, the noatime option makes sense. We use it since several years. I guess an alternative to rsync is needed.

    – guettli
    Aug 28 '17 at 7:07











    0














    Use rsync in daemon mode at the server end to speed up the listing/checksum process:



    • Rsync daemon: is it really useful?

    • http://giantdorks.org/alain/achieve-faster-file-transfer-times-by-running-rsync-as-a-daemon/

    Note it isn't encrypted, but may be able to be tunneled without losing the listing performance improvement.



    Also having rsync do compression rather than ssh should improve performance.






    share|improve this answer





























      0














      Use rsync in daemon mode at the server end to speed up the listing/checksum process:



      • Rsync daemon: is it really useful?

      • http://giantdorks.org/alain/achieve-faster-file-transfer-times-by-running-rsync-as-a-daemon/

      Note it isn't encrypted, but may be able to be tunneled without losing the listing performance improvement.



      Also having rsync do compression rather than ssh should improve performance.






      share|improve this answer



























        0












        0








        0







        Use rsync in daemon mode at the server end to speed up the listing/checksum process:



        • Rsync daemon: is it really useful?

        • http://giantdorks.org/alain/achieve-faster-file-transfer-times-by-running-rsync-as-a-daemon/

        Note it isn't encrypted, but may be able to be tunneled without losing the listing performance improvement.



        Also having rsync do compression rather than ssh should improve performance.






        share|improve this answer















        Use rsync in daemon mode at the server end to speed up the listing/checksum process:



        • Rsync daemon: is it really useful?

        • http://giantdorks.org/alain/achieve-faster-file-transfer-times-by-running-rsync-as-a-daemon/

        Note it isn't encrypted, but may be able to be tunneled without losing the listing performance improvement.



        Also having rsync do compression rather than ssh should improve performance.







        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited Mar 22 at 21:41

























        answered Mar 22 at 21:29









        Gringo SuaveGringo Suave

        314312




        314312



























            draft saved

            draft discarded
















































            Thanks for contributing an answer to Server Fault!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid


            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.

            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fserverfault.com%2fquestions%2f746551%2ffaster-rsync-of-huge-directory-which-was-not-changed%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Club Baloncesto Breogán Índice Historia | Pavillón | Nome | O Breogán na cultura popular | Xogadores | Adestradores | Presidentes | Palmarés | Historial | Líderes | Notas | Véxase tamén | Menú de navegacióncbbreogan.galCadroGuía oficial da ACB 2009-10, páxina 201Guía oficial ACB 1992, páxina 183. Editorial DB.É de 6.500 espectadores sentados axeitándose á última normativa"Estudiantes Junior, entre as mellores canteiras"o orixinalHemeroteca El Mundo Deportivo, 16 setembro de 1970, páxina 12Historia do BreogánAlfredo Pérez, o último canoneiroHistoria C.B. BreogánHemeroteca de El Mundo DeportivoJimmy Wright, norteamericano do Breogán deixará Lugo por ameazas de morteResultados de Breogán en 1986-87Resultados de Breogán en 1990-91Ficha de Velimir Perasović en acb.comResultados de Breogán en 1994-95Breogán arrasa al Barça. "El Mundo Deportivo", 27 de setembro de 1999, páxina 58CB Breogán - FC BarcelonaA FEB invita a participar nunha nova Liga EuropeaCharlie Bell na prensa estatalMáximos anotadores 2005Tempada 2005-06 : Tódolos Xogadores da Xornada""Non quero pensar nunha man negra, mais pregúntome que está a pasar""o orixinalRaúl López, orgulloso dos xogadores, presume da boa saúde económica do BreogánJulio González confirma que cesa como presidente del BreogánHomenaxe a Lisardo GómezA tempada do rexurdimento celesteEntrevista a Lisardo GómezEl COB dinamita el Pazo para forzar el quinto (69-73)Cafés Candelas, patrocinador del CB Breogán"Suso Lázare, novo presidente do Breogán"o orixinalCafés Candelas Breogán firma el mayor triunfo de la historiaEl Breogán realizará 17 homenajes por su cincuenta aniversario"O Breogán honra ao seu fundador e primeiro presidente"o orixinalMiguel Giao recibiu a homenaxe do PazoHomenaxe aos primeiros gladiadores celestesO home que nos amosa como ver o Breo co corazónTita Franco será homenaxeada polos #50anosdeBreoJulio Vila recibirá unha homenaxe in memoriam polos #50anosdeBreo"O Breogán homenaxeará aos seus aboados máis veteráns"Pechada ovación a «Capi» Sanmartín e Ricardo «Corazón de González»Homenaxe por décadas de informaciónPaco García volve ao Pazo con motivo do 50 aniversario"Resultados y clasificaciones""O Cafés Candelas Breogán, campión da Copa Princesa""O Cafés Candelas Breogán, equipo ACB"C.B. Breogán"Proxecto social"o orixinal"Centros asociados"o orixinalFicha en imdb.comMario Camus trata la recuperación del amor en 'La vieja música', su última película"Páxina web oficial""Club Baloncesto Breogán""C. B. Breogán S.A.D."eehttp://www.fegaba.com

            Vilaño, A Laracha Índice Patrimonio | Lugares e parroquias | Véxase tamén | Menú de navegación43°14′52″N 8°36′03″O / 43.24775, -8.60070

            Cegueira Índice Epidemioloxía | Deficiencia visual | Tipos de cegueira | Principais causas de cegueira | Tratamento | Técnicas de adaptación e axudas | Vida dos cegos | Primeiros auxilios | Crenzas respecto das persoas cegas | Crenzas das persoas cegas | O neno deficiente visual | Aspectos psicolóxicos da cegueira | Notas | Véxase tamén | Menú de navegación54.054.154.436928256blindnessDicionario da Real Academia GalegaPortal das Palabras"International Standards: Visual Standards — Aspects and Ranges of Vision Loss with Emphasis on Population Surveys.""Visual impairment and blindness""Presentan un plan para previr a cegueira"o orixinalACCDV Associació Catalana de Cecs i Disminuïts Visuals - PMFTrachoma"Effect of gene therapy on visual function in Leber's congenital amaurosis"1844137110.1056/NEJMoa0802268Cans guía - os mellores amigos dos cegosArquivadoEscola de cans guía para cegos en Mortágua, PortugalArquivado"Tecnología para ciegos y deficientes visuales. Recopilación de recursos gratuitos en la Red""Colorino""‘COL.diesis’, escuchar los sonidos del color""COL.diesis: Transforming Colour into Melody and Implementing the Result in a Colour Sensor Device"o orixinal"Sistema de desarrollo de sinestesia color-sonido para invidentes utilizando un protocolo de audio""Enseñanza táctil - geometría y color. Juegos didácticos para niños ciegos y videntes""Sistema Constanz"L'ocupació laboral dels cecs a l'Estat espanyol està pràcticament equiparada a la de les persones amb visió, entrevista amb Pedro ZuritaONCE (Organización Nacional de Cegos de España)Prevención da cegueiraDescrición de deficiencias visuais (Disc@pnet)Braillín, un boneco atractivo para calquera neno, con ou sen discapacidade, que permite familiarizarse co sistema de escritura e lectura brailleAxudas Técnicas36838ID00897494007150-90057129528256DOID:1432HP:0000618D001766C10.597.751.941.162C97109C0155020