Huge performance difference of the command find with and without using %M option to show permissions Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 23, 2019 at 00:00UTC (8:00pm US/Eastern) 2019 Community Moderator Election Results Why I closed the “Why is Kali so hard” questionPermissions for making some some (but not all) files visible directly under a directoryThe relationship between execute permission on a directory and its inode structureFile inheriting permission of directory it is copied in?python vs bc in evaluating 6^6^6Why does find -inum iterate through the whole filesystem tree?Why does chmod succeed on a file when the user does not have write permission on parent directory?Find files with group permissions more restrictive than owner permissionsIs it possible to run ls or find and pipe it through stat?KVM guest I/O hangs randomly“permission denied” when appending with echo, but working with vi

What does it mean that physics no longer uses mechanical models to describe phenomena?

AppleTVs create a chatty alternate WiFi network

Why aren't air breathing engines used as small first stages?

How could we fake a moon landing now?

Is there a kind of relay only consumes power when switching?

How to tell that you are a giant?

An adverb for when you're not exaggerating

SF book about people trapped in a series of worlds they imagine

Why do we bend a book to keep it straight?

Did Krishna say in Bhagavad Gita "I am in every living being"

Dating a Former Employee

How come Sam didn't become Lord of Horn Hill?

Would the Life Transference spell be unbalanced if it ignored resistance and immunity?

Morning, Afternoon, Night Kanji

Has negative voting ever been officially implemented in elections, or seriously proposed, or even studied?

Performance gap between vector<bool> and array

Sum letters are not two different

Do wooden building fires get hotter than 600°C?

Take 2! Is this homebrew Lady of Pain warlock patron balanced?

How can I reduce the gap between left and right of cdot with a macro?

How to compare two different files line by line in unix?

What was the first language to use conditional keywords?

Maximum summed subsequences with non-adjacent items

Is it fair for a professor to grade us on the possession of past papers?



Huge performance difference of the command find with and without using %M option to show permissions



Announcing the arrival of Valued Associate #679: Cesar Manara
Planned maintenance scheduled April 23, 2019 at 00:00UTC (8:00pm US/Eastern)
2019 Community Moderator Election Results
Why I closed the “Why is Kali so hard” questionPermissions for making some some (but not all) files visible directly under a directoryThe relationship between execute permission on a directory and its inode structureFile inheriting permission of directory it is copied in?python vs bc in evaluating 6^6^6Why does find -inum iterate through the whole filesystem tree?Why does chmod succeed on a file when the user does not have write permission on parent directory?Find files with group permissions more restrictive than owner permissionsIs it possible to run ls or find and pipe it through stat?KVM guest I/O hangs randomly“permission denied” when appending with echo, but working with vi



.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;








6















On my CentOS 7.6, I have created a folder (called many_files) with 3,000,000 files, by running:



for i in 1..3000000; do echo $i>$i; done;


I am using the command find to write the information about files in this directory into a file. This works surprisingly fast:



$ time find many_files -printf '%i %y %pn'>info_file

real 0m6.970s
user 0m3.812s
sys 0m0.904s


Now if I add %M to get the permissions:



$ time find many_files -printf '%i %y %M %pn'>info_file

real 2m30.677s
user 0m5.148s
sys 0m37.338s


The command takes much longer. This is very surprising to me, since in a C program we can use struct stat to get inode and permission information of a file and in the kernel the struct inode saves both these information.



My Questions:



  1. What causes this behavior?

  2. Is there a faster way to get file permissions for so many files?









share|improve this question
























  • The second question is the wrong question to ask. The real question is what you are doing with the output. If you are piping it somewhere for later processing of files based on the permissions, then you are probably doing it in a roundabout way. Instead you may want to use -perm with find to pick out the files with the permissions you're looking for.

    – Kusalananda
    Apr 12 at 20:26












  • @Kusalananda, Why is it wrong to ask that? If you're faced with an unexpected 20x slowdown, then surely you want to know if it can be avoided? find -perm will still need to look at the permissions, even if not output them, so would using it affect the slowdown in any way?

    – ilkkachu
    Apr 13 at 12:13











  • @ilkkachu You are correct. I assumed that the slowdown was due to the extra data produced, just like 0xSheepdog initially thought (which seems to not be the case). I would still not want to get the permissions as text like that if the intention is to process the files based on the permissions though.

    – Kusalananda
    Apr 13 at 12:19

















6















On my CentOS 7.6, I have created a folder (called many_files) with 3,000,000 files, by running:



for i in 1..3000000; do echo $i>$i; done;


I am using the command find to write the information about files in this directory into a file. This works surprisingly fast:



$ time find many_files -printf '%i %y %pn'>info_file

real 0m6.970s
user 0m3.812s
sys 0m0.904s


Now if I add %M to get the permissions:



$ time find many_files -printf '%i %y %M %pn'>info_file

real 2m30.677s
user 0m5.148s
sys 0m37.338s


The command takes much longer. This is very surprising to me, since in a C program we can use struct stat to get inode and permission information of a file and in the kernel the struct inode saves both these information.



My Questions:



  1. What causes this behavior?

  2. Is there a faster way to get file permissions for so many files?









share|improve this question
























  • The second question is the wrong question to ask. The real question is what you are doing with the output. If you are piping it somewhere for later processing of files based on the permissions, then you are probably doing it in a roundabout way. Instead you may want to use -perm with find to pick out the files with the permissions you're looking for.

    – Kusalananda
    Apr 12 at 20:26












  • @Kusalananda, Why is it wrong to ask that? If you're faced with an unexpected 20x slowdown, then surely you want to know if it can be avoided? find -perm will still need to look at the permissions, even if not output them, so would using it affect the slowdown in any way?

    – ilkkachu
    Apr 13 at 12:13











  • @ilkkachu You are correct. I assumed that the slowdown was due to the extra data produced, just like 0xSheepdog initially thought (which seems to not be the case). I would still not want to get the permissions as text like that if the intention is to process the files based on the permissions though.

    – Kusalananda
    Apr 13 at 12:19













6












6








6








On my CentOS 7.6, I have created a folder (called many_files) with 3,000,000 files, by running:



for i in 1..3000000; do echo $i>$i; done;


I am using the command find to write the information about files in this directory into a file. This works surprisingly fast:



$ time find many_files -printf '%i %y %pn'>info_file

real 0m6.970s
user 0m3.812s
sys 0m0.904s


Now if I add %M to get the permissions:



$ time find many_files -printf '%i %y %M %pn'>info_file

real 2m30.677s
user 0m5.148s
sys 0m37.338s


The command takes much longer. This is very surprising to me, since in a C program we can use struct stat to get inode and permission information of a file and in the kernel the struct inode saves both these information.



My Questions:



  1. What causes this behavior?

  2. Is there a faster way to get file permissions for so many files?









share|improve this question
















On my CentOS 7.6, I have created a folder (called many_files) with 3,000,000 files, by running:



for i in 1..3000000; do echo $i>$i; done;


I am using the command find to write the information about files in this directory into a file. This works surprisingly fast:



$ time find many_files -printf '%i %y %pn'>info_file

real 0m6.970s
user 0m3.812s
sys 0m0.904s


Now if I add %M to get the permissions:



$ time find many_files -printf '%i %y %M %pn'>info_file

real 2m30.677s
user 0m5.148s
sys 0m37.338s


The command takes much longer. This is very surprising to me, since in a C program we can use struct stat to get inode and permission information of a file and in the kernel the struct inode saves both these information.



My Questions:



  1. What causes this behavior?

  2. Is there a faster way to get file permissions for so many files?






linux files permissions find performance






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Apr 12 at 22:26









Jeff Schaller

45.1k1164147




45.1k1164147










asked Apr 12 at 20:07









BahramBahram

384




384












  • The second question is the wrong question to ask. The real question is what you are doing with the output. If you are piping it somewhere for later processing of files based on the permissions, then you are probably doing it in a roundabout way. Instead you may want to use -perm with find to pick out the files with the permissions you're looking for.

    – Kusalananda
    Apr 12 at 20:26












  • @Kusalananda, Why is it wrong to ask that? If you're faced with an unexpected 20x slowdown, then surely you want to know if it can be avoided? find -perm will still need to look at the permissions, even if not output them, so would using it affect the slowdown in any way?

    – ilkkachu
    Apr 13 at 12:13











  • @ilkkachu You are correct. I assumed that the slowdown was due to the extra data produced, just like 0xSheepdog initially thought (which seems to not be the case). I would still not want to get the permissions as text like that if the intention is to process the files based on the permissions though.

    – Kusalananda
    Apr 13 at 12:19

















  • The second question is the wrong question to ask. The real question is what you are doing with the output. If you are piping it somewhere for later processing of files based on the permissions, then you are probably doing it in a roundabout way. Instead you may want to use -perm with find to pick out the files with the permissions you're looking for.

    – Kusalananda
    Apr 12 at 20:26












  • @Kusalananda, Why is it wrong to ask that? If you're faced with an unexpected 20x slowdown, then surely you want to know if it can be avoided? find -perm will still need to look at the permissions, even if not output them, so would using it affect the slowdown in any way?

    – ilkkachu
    Apr 13 at 12:13











  • @ilkkachu You are correct. I assumed that the slowdown was due to the extra data produced, just like 0xSheepdog initially thought (which seems to not be the case). I would still not want to get the permissions as text like that if the intention is to process the files based on the permissions though.

    – Kusalananda
    Apr 13 at 12:19
















The second question is the wrong question to ask. The real question is what you are doing with the output. If you are piping it somewhere for later processing of files based on the permissions, then you are probably doing it in a roundabout way. Instead you may want to use -perm with find to pick out the files with the permissions you're looking for.

– Kusalananda
Apr 12 at 20:26






The second question is the wrong question to ask. The real question is what you are doing with the output. If you are piping it somewhere for later processing of files based on the permissions, then you are probably doing it in a roundabout way. Instead you may want to use -perm with find to pick out the files with the permissions you're looking for.

– Kusalananda
Apr 12 at 20:26














@Kusalananda, Why is it wrong to ask that? If you're faced with an unexpected 20x slowdown, then surely you want to know if it can be avoided? find -perm will still need to look at the permissions, even if not output them, so would using it affect the slowdown in any way?

– ilkkachu
Apr 13 at 12:13





@Kusalananda, Why is it wrong to ask that? If you're faced with an unexpected 20x slowdown, then surely you want to know if it can be avoided? find -perm will still need to look at the permissions, even if not output them, so would using it affect the slowdown in any way?

– ilkkachu
Apr 13 at 12:13













@ilkkachu You are correct. I assumed that the slowdown was due to the extra data produced, just like 0xSheepdog initially thought (which seems to not be the case). I would still not want to get the permissions as text like that if the intention is to process the files based on the permissions though.

– Kusalananda
Apr 13 at 12:19





@ilkkachu You are correct. I assumed that the slowdown was due to the extra data produced, just like 0xSheepdog initially thought (which seems to not be the case). I would still not want to get the permissions as text like that if the intention is to process the files based on the permissions though.

– Kusalananda
Apr 13 at 12:19










1 Answer
1






active

oldest

votes


















11














The first version requires only to readdir(3)/getdents(2) the directory, when run on a filesystem supporting this feature (ext4: filetype feature displayed with tune2fs -l /dev/xxx, xfs: ftype=1 displayed with xfs_info /mount/point ...).



The second version in addition also requires to stat(2) each file, requiring an additional inode lookup, and thus more seeks on the filesystem and device, possibly quite slower if it's a rotating disk and cache wasn't kept. This stat is not required when looking only for name, inode and filetype because the directory entry is enough:




 The linux_dirent structure is declared as follows:

struct linux_dirent
unsigned long d_ino; /* Inode number */
unsigned long d_off; /* Offset to next linux_dirent */
unsigned short d_reclen; /* Length of this linux_dirent */
char d_name[]; /* Filename (null-terminated) */
/* length is actually (d_reclen - 2 -
offsetof(struct linux_dirent, d_name)) */
/*
char pad; // Zero padding byte
char d_type; // File type (only since Linux
// 2.6.4); offset is (d_reclen - 1)
*/




the same informations are available to readdir(3):




struct dirent 
ino_t d_ino; /* Inode number */
off_t d_off; /* Not an offset; see below */
unsigned short d_reclen; /* Length of this record */
unsigned char d_type; /* Type of file; not supported
by all filesystem types */
char d_name[256]; /* Null-terminated filename */
;



Suspected but confirmed by comparing (on a smaller sample...) the two outputs of:



strace -o v1 find many_files -printf '%i %y %pn'>info_file
strace -o v2 find many_files -printf '%i %y %M %pn'>info_file


Which on my Linux amd64 kernel 5.0.x just shows as main difference:



[...]



 getdents(4, /* 0 entries */, 32768) = 0
close(4) = 0
fcntl(5, F_DUPFD_CLOEXEC, 0) = 4
-write(1, "25499894 d many_filesn25502410 f"..., 4096) = 4096
-write(1, "iles/844n25502253 f many_files/8"..., 4096) = 4096
-write(1, "096 f many_files/686n25502095 f "..., 4096) = 4096
-write(1, "es/529n25501938 f many_files/528"..., 4096) = 4096
-write(1, "1 f many_files/371n25501780 f ma"..., 4096) = 4096
-write(1, "/214n25497527 f many_files/213n2"..., 4096) = 4096
-brk(0x55b29a933000) = 0x55b29a933000
+newfstatat(5, "1000", 0644, st_size=5, ..., AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "999", st_mode=S_IFREG, AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "998", st_mode=S_IFREG, AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "997", st_mode=S_IFREG, AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "996", st_mode=S_IFREG, AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "995", st_mode=S_IFREG, AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "994", st_mode=S_IFREG, AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "993", st_mode=S_IFREG, AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "992", st_mode=S_IFREG, AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "991", st_mode=S_IFREG, AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "990", st_mode=S_IFREG, AT_SYMLINK_NOFOLLOW) = 0


[...]



+newfstatat(5, "891", st_mode=S_IFREG, AT_SYMLINK_NOFOLLOW) = 0
+write(1, "25499894 d drwxr-xr-x many_files"..., 4096) = 4096
+newfstatat(5, "890", st_mode=S_IFREG, AT_SYMLINK_NOFOLLOW) = 0


[...]






share|improve this answer

























  • Unfortunately, the d_type field of a dir entry is a non-standard feature, only present on Linux and BSD, as mentioned in the readdir(3) manpage. (Though on Linux it is implemented on most filesystems that matter).

    – mosvy
    Apr 12 at 21:49












  • @mosvy That's ok, the question is tagged CentOS. But yes I understand that on other *nix, results may differ

    – A.B
    Apr 12 at 21:49







  • 1





    I think it's supported on xfs -- when I was making a testcase for a glibc glob(3) that only triggered when the d_type field was absent, I had to use either minixfs or use the GLOB_ALTDIRFUNC.

    – mosvy
    Apr 12 at 22:12






  • 1





    Ah yes CentOS7' mkfs.xfs' man tells ftype=1 is the default.

    – A.B
    Apr 12 at 22:18







  • 1





    It really is supported on centos 7 + xfs. Just tested it.

    – mosvy
    Apr 12 at 22:33











Your Answer








StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "106"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f512167%2fhuge-performance-difference-of-the-command-find-with-and-without-using-m-option%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









11














The first version requires only to readdir(3)/getdents(2) the directory, when run on a filesystem supporting this feature (ext4: filetype feature displayed with tune2fs -l /dev/xxx, xfs: ftype=1 displayed with xfs_info /mount/point ...).



The second version in addition also requires to stat(2) each file, requiring an additional inode lookup, and thus more seeks on the filesystem and device, possibly quite slower if it's a rotating disk and cache wasn't kept. This stat is not required when looking only for name, inode and filetype because the directory entry is enough:




 The linux_dirent structure is declared as follows:

struct linux_dirent
unsigned long d_ino; /* Inode number */
unsigned long d_off; /* Offset to next linux_dirent */
unsigned short d_reclen; /* Length of this linux_dirent */
char d_name[]; /* Filename (null-terminated) */
/* length is actually (d_reclen - 2 -
offsetof(struct linux_dirent, d_name)) */
/*
char pad; // Zero padding byte
char d_type; // File type (only since Linux
// 2.6.4); offset is (d_reclen - 1)
*/




the same informations are available to readdir(3):




struct dirent 
ino_t d_ino; /* Inode number */
off_t d_off; /* Not an offset; see below */
unsigned short d_reclen; /* Length of this record */
unsigned char d_type; /* Type of file; not supported
by all filesystem types */
char d_name[256]; /* Null-terminated filename */
;



Suspected but confirmed by comparing (on a smaller sample...) the two outputs of:



strace -o v1 find many_files -printf '%i %y %pn'>info_file
strace -o v2 find many_files -printf '%i %y %M %pn'>info_file


Which on my Linux amd64 kernel 5.0.x just shows as main difference:



[...]



 getdents(4, /* 0 entries */, 32768) = 0
close(4) = 0
fcntl(5, F_DUPFD_CLOEXEC, 0) = 4
-write(1, "25499894 d many_filesn25502410 f"..., 4096) = 4096
-write(1, "iles/844n25502253 f many_files/8"..., 4096) = 4096
-write(1, "096 f many_files/686n25502095 f "..., 4096) = 4096
-write(1, "es/529n25501938 f many_files/528"..., 4096) = 4096
-write(1, "1 f many_files/371n25501780 f ma"..., 4096) = 4096
-write(1, "/214n25497527 f many_files/213n2"..., 4096) = 4096
-brk(0x55b29a933000) = 0x55b29a933000
+newfstatat(5, "1000", 0644, st_size=5, ..., AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "999", st_mode=S_IFREG, AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "998", st_mode=S_IFREG, AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "997", st_mode=S_IFREG, AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "996", st_mode=S_IFREG, AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "995", st_mode=S_IFREG, AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "994", st_mode=S_IFREG, AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "993", st_mode=S_IFREG, AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "992", st_mode=S_IFREG, AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "991", st_mode=S_IFREG, AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "990", st_mode=S_IFREG, AT_SYMLINK_NOFOLLOW) = 0


[...]



+newfstatat(5, "891", st_mode=S_IFREG, AT_SYMLINK_NOFOLLOW) = 0
+write(1, "25499894 d drwxr-xr-x many_files"..., 4096) = 4096
+newfstatat(5, "890", st_mode=S_IFREG, AT_SYMLINK_NOFOLLOW) = 0


[...]






share|improve this answer

























  • Unfortunately, the d_type field of a dir entry is a non-standard feature, only present on Linux and BSD, as mentioned in the readdir(3) manpage. (Though on Linux it is implemented on most filesystems that matter).

    – mosvy
    Apr 12 at 21:49












  • @mosvy That's ok, the question is tagged CentOS. But yes I understand that on other *nix, results may differ

    – A.B
    Apr 12 at 21:49







  • 1





    I think it's supported on xfs -- when I was making a testcase for a glibc glob(3) that only triggered when the d_type field was absent, I had to use either minixfs or use the GLOB_ALTDIRFUNC.

    – mosvy
    Apr 12 at 22:12






  • 1





    Ah yes CentOS7' mkfs.xfs' man tells ftype=1 is the default.

    – A.B
    Apr 12 at 22:18







  • 1





    It really is supported on centos 7 + xfs. Just tested it.

    – mosvy
    Apr 12 at 22:33















11














The first version requires only to readdir(3)/getdents(2) the directory, when run on a filesystem supporting this feature (ext4: filetype feature displayed with tune2fs -l /dev/xxx, xfs: ftype=1 displayed with xfs_info /mount/point ...).



The second version in addition also requires to stat(2) each file, requiring an additional inode lookup, and thus more seeks on the filesystem and device, possibly quite slower if it's a rotating disk and cache wasn't kept. This stat is not required when looking only for name, inode and filetype because the directory entry is enough:




 The linux_dirent structure is declared as follows:

struct linux_dirent
unsigned long d_ino; /* Inode number */
unsigned long d_off; /* Offset to next linux_dirent */
unsigned short d_reclen; /* Length of this linux_dirent */
char d_name[]; /* Filename (null-terminated) */
/* length is actually (d_reclen - 2 -
offsetof(struct linux_dirent, d_name)) */
/*
char pad; // Zero padding byte
char d_type; // File type (only since Linux
// 2.6.4); offset is (d_reclen - 1)
*/




the same informations are available to readdir(3):




struct dirent 
ino_t d_ino; /* Inode number */
off_t d_off; /* Not an offset; see below */
unsigned short d_reclen; /* Length of this record */
unsigned char d_type; /* Type of file; not supported
by all filesystem types */
char d_name[256]; /* Null-terminated filename */
;



Suspected but confirmed by comparing (on a smaller sample...) the two outputs of:



strace -o v1 find many_files -printf '%i %y %pn'>info_file
strace -o v2 find many_files -printf '%i %y %M %pn'>info_file


Which on my Linux amd64 kernel 5.0.x just shows as main difference:



[...]



 getdents(4, /* 0 entries */, 32768) = 0
close(4) = 0
fcntl(5, F_DUPFD_CLOEXEC, 0) = 4
-write(1, "25499894 d many_filesn25502410 f"..., 4096) = 4096
-write(1, "iles/844n25502253 f many_files/8"..., 4096) = 4096
-write(1, "096 f many_files/686n25502095 f "..., 4096) = 4096
-write(1, "es/529n25501938 f many_files/528"..., 4096) = 4096
-write(1, "1 f many_files/371n25501780 f ma"..., 4096) = 4096
-write(1, "/214n25497527 f many_files/213n2"..., 4096) = 4096
-brk(0x55b29a933000) = 0x55b29a933000
+newfstatat(5, "1000", 0644, st_size=5, ..., AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "999", st_mode=S_IFREG, AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "998", st_mode=S_IFREG, AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "997", st_mode=S_IFREG, AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "996", st_mode=S_IFREG, AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "995", st_mode=S_IFREG, AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "994", st_mode=S_IFREG, AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "993", st_mode=S_IFREG, AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "992", st_mode=S_IFREG, AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "991", st_mode=S_IFREG, AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "990", st_mode=S_IFREG, AT_SYMLINK_NOFOLLOW) = 0


[...]



+newfstatat(5, "891", st_mode=S_IFREG, AT_SYMLINK_NOFOLLOW) = 0
+write(1, "25499894 d drwxr-xr-x many_files"..., 4096) = 4096
+newfstatat(5, "890", st_mode=S_IFREG, AT_SYMLINK_NOFOLLOW) = 0


[...]






share|improve this answer

























  • Unfortunately, the d_type field of a dir entry is a non-standard feature, only present on Linux and BSD, as mentioned in the readdir(3) manpage. (Though on Linux it is implemented on most filesystems that matter).

    – mosvy
    Apr 12 at 21:49












  • @mosvy That's ok, the question is tagged CentOS. But yes I understand that on other *nix, results may differ

    – A.B
    Apr 12 at 21:49







  • 1





    I think it's supported on xfs -- when I was making a testcase for a glibc glob(3) that only triggered when the d_type field was absent, I had to use either minixfs or use the GLOB_ALTDIRFUNC.

    – mosvy
    Apr 12 at 22:12






  • 1





    Ah yes CentOS7' mkfs.xfs' man tells ftype=1 is the default.

    – A.B
    Apr 12 at 22:18







  • 1





    It really is supported on centos 7 + xfs. Just tested it.

    – mosvy
    Apr 12 at 22:33













11












11








11







The first version requires only to readdir(3)/getdents(2) the directory, when run on a filesystem supporting this feature (ext4: filetype feature displayed with tune2fs -l /dev/xxx, xfs: ftype=1 displayed with xfs_info /mount/point ...).



The second version in addition also requires to stat(2) each file, requiring an additional inode lookup, and thus more seeks on the filesystem and device, possibly quite slower if it's a rotating disk and cache wasn't kept. This stat is not required when looking only for name, inode and filetype because the directory entry is enough:




 The linux_dirent structure is declared as follows:

struct linux_dirent
unsigned long d_ino; /* Inode number */
unsigned long d_off; /* Offset to next linux_dirent */
unsigned short d_reclen; /* Length of this linux_dirent */
char d_name[]; /* Filename (null-terminated) */
/* length is actually (d_reclen - 2 -
offsetof(struct linux_dirent, d_name)) */
/*
char pad; // Zero padding byte
char d_type; // File type (only since Linux
// 2.6.4); offset is (d_reclen - 1)
*/




the same informations are available to readdir(3):




struct dirent 
ino_t d_ino; /* Inode number */
off_t d_off; /* Not an offset; see below */
unsigned short d_reclen; /* Length of this record */
unsigned char d_type; /* Type of file; not supported
by all filesystem types */
char d_name[256]; /* Null-terminated filename */
;



Suspected but confirmed by comparing (on a smaller sample...) the two outputs of:



strace -o v1 find many_files -printf '%i %y %pn'>info_file
strace -o v2 find many_files -printf '%i %y %M %pn'>info_file


Which on my Linux amd64 kernel 5.0.x just shows as main difference:



[...]



 getdents(4, /* 0 entries */, 32768) = 0
close(4) = 0
fcntl(5, F_DUPFD_CLOEXEC, 0) = 4
-write(1, "25499894 d many_filesn25502410 f"..., 4096) = 4096
-write(1, "iles/844n25502253 f many_files/8"..., 4096) = 4096
-write(1, "096 f many_files/686n25502095 f "..., 4096) = 4096
-write(1, "es/529n25501938 f many_files/528"..., 4096) = 4096
-write(1, "1 f many_files/371n25501780 f ma"..., 4096) = 4096
-write(1, "/214n25497527 f many_files/213n2"..., 4096) = 4096
-brk(0x55b29a933000) = 0x55b29a933000
+newfstatat(5, "1000", 0644, st_size=5, ..., AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "999", st_mode=S_IFREG, AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "998", st_mode=S_IFREG, AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "997", st_mode=S_IFREG, AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "996", st_mode=S_IFREG, AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "995", st_mode=S_IFREG, AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "994", st_mode=S_IFREG, AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "993", st_mode=S_IFREG, AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "992", st_mode=S_IFREG, AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "991", st_mode=S_IFREG, AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "990", st_mode=S_IFREG, AT_SYMLINK_NOFOLLOW) = 0


[...]



+newfstatat(5, "891", st_mode=S_IFREG, AT_SYMLINK_NOFOLLOW) = 0
+write(1, "25499894 d drwxr-xr-x many_files"..., 4096) = 4096
+newfstatat(5, "890", st_mode=S_IFREG, AT_SYMLINK_NOFOLLOW) = 0


[...]






share|improve this answer















The first version requires only to readdir(3)/getdents(2) the directory, when run on a filesystem supporting this feature (ext4: filetype feature displayed with tune2fs -l /dev/xxx, xfs: ftype=1 displayed with xfs_info /mount/point ...).



The second version in addition also requires to stat(2) each file, requiring an additional inode lookup, and thus more seeks on the filesystem and device, possibly quite slower if it's a rotating disk and cache wasn't kept. This stat is not required when looking only for name, inode and filetype because the directory entry is enough:




 The linux_dirent structure is declared as follows:

struct linux_dirent
unsigned long d_ino; /* Inode number */
unsigned long d_off; /* Offset to next linux_dirent */
unsigned short d_reclen; /* Length of this linux_dirent */
char d_name[]; /* Filename (null-terminated) */
/* length is actually (d_reclen - 2 -
offsetof(struct linux_dirent, d_name)) */
/*
char pad; // Zero padding byte
char d_type; // File type (only since Linux
// 2.6.4); offset is (d_reclen - 1)
*/




the same informations are available to readdir(3):




struct dirent 
ino_t d_ino; /* Inode number */
off_t d_off; /* Not an offset; see below */
unsigned short d_reclen; /* Length of this record */
unsigned char d_type; /* Type of file; not supported
by all filesystem types */
char d_name[256]; /* Null-terminated filename */
;



Suspected but confirmed by comparing (on a smaller sample...) the two outputs of:



strace -o v1 find many_files -printf '%i %y %pn'>info_file
strace -o v2 find many_files -printf '%i %y %M %pn'>info_file


Which on my Linux amd64 kernel 5.0.x just shows as main difference:



[...]



 getdents(4, /* 0 entries */, 32768) = 0
close(4) = 0
fcntl(5, F_DUPFD_CLOEXEC, 0) = 4
-write(1, "25499894 d many_filesn25502410 f"..., 4096) = 4096
-write(1, "iles/844n25502253 f many_files/8"..., 4096) = 4096
-write(1, "096 f many_files/686n25502095 f "..., 4096) = 4096
-write(1, "es/529n25501938 f many_files/528"..., 4096) = 4096
-write(1, "1 f many_files/371n25501780 f ma"..., 4096) = 4096
-write(1, "/214n25497527 f many_files/213n2"..., 4096) = 4096
-brk(0x55b29a933000) = 0x55b29a933000
+newfstatat(5, "1000", 0644, st_size=5, ..., AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "999", st_mode=S_IFREG, AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "998", st_mode=S_IFREG, AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "997", st_mode=S_IFREG, AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "996", st_mode=S_IFREG, AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "995", st_mode=S_IFREG, AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "994", st_mode=S_IFREG, AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "993", st_mode=S_IFREG, AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "992", st_mode=S_IFREG, AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "991", st_mode=S_IFREG, AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "990", st_mode=S_IFREG, AT_SYMLINK_NOFOLLOW) = 0


[...]



+newfstatat(5, "891", st_mode=S_IFREG, AT_SYMLINK_NOFOLLOW) = 0
+write(1, "25499894 d drwxr-xr-x many_files"..., 4096) = 4096
+newfstatat(5, "890", st_mode=S_IFREG, AT_SYMLINK_NOFOLLOW) = 0


[...]







share|improve this answer














share|improve this answer



share|improve this answer








edited Apr 12 at 22:02

























answered Apr 12 at 21:30









A.BA.B

6,18711131




6,18711131












  • Unfortunately, the d_type field of a dir entry is a non-standard feature, only present on Linux and BSD, as mentioned in the readdir(3) manpage. (Though on Linux it is implemented on most filesystems that matter).

    – mosvy
    Apr 12 at 21:49












  • @mosvy That's ok, the question is tagged CentOS. But yes I understand that on other *nix, results may differ

    – A.B
    Apr 12 at 21:49







  • 1





    I think it's supported on xfs -- when I was making a testcase for a glibc glob(3) that only triggered when the d_type field was absent, I had to use either minixfs or use the GLOB_ALTDIRFUNC.

    – mosvy
    Apr 12 at 22:12






  • 1





    Ah yes CentOS7' mkfs.xfs' man tells ftype=1 is the default.

    – A.B
    Apr 12 at 22:18







  • 1





    It really is supported on centos 7 + xfs. Just tested it.

    – mosvy
    Apr 12 at 22:33

















  • Unfortunately, the d_type field of a dir entry is a non-standard feature, only present on Linux and BSD, as mentioned in the readdir(3) manpage. (Though on Linux it is implemented on most filesystems that matter).

    – mosvy
    Apr 12 at 21:49












  • @mosvy That's ok, the question is tagged CentOS. But yes I understand that on other *nix, results may differ

    – A.B
    Apr 12 at 21:49







  • 1





    I think it's supported on xfs -- when I was making a testcase for a glibc glob(3) that only triggered when the d_type field was absent, I had to use either minixfs or use the GLOB_ALTDIRFUNC.

    – mosvy
    Apr 12 at 22:12






  • 1





    Ah yes CentOS7' mkfs.xfs' man tells ftype=1 is the default.

    – A.B
    Apr 12 at 22:18







  • 1





    It really is supported on centos 7 + xfs. Just tested it.

    – mosvy
    Apr 12 at 22:33
















Unfortunately, the d_type field of a dir entry is a non-standard feature, only present on Linux and BSD, as mentioned in the readdir(3) manpage. (Though on Linux it is implemented on most filesystems that matter).

– mosvy
Apr 12 at 21:49






Unfortunately, the d_type field of a dir entry is a non-standard feature, only present on Linux and BSD, as mentioned in the readdir(3) manpage. (Though on Linux it is implemented on most filesystems that matter).

– mosvy
Apr 12 at 21:49














@mosvy That's ok, the question is tagged CentOS. But yes I understand that on other *nix, results may differ

– A.B
Apr 12 at 21:49






@mosvy That's ok, the question is tagged CentOS. But yes I understand that on other *nix, results may differ

– A.B
Apr 12 at 21:49





1




1





I think it's supported on xfs -- when I was making a testcase for a glibc glob(3) that only triggered when the d_type field was absent, I had to use either minixfs or use the GLOB_ALTDIRFUNC.

– mosvy
Apr 12 at 22:12





I think it's supported on xfs -- when I was making a testcase for a glibc glob(3) that only triggered when the d_type field was absent, I had to use either minixfs or use the GLOB_ALTDIRFUNC.

– mosvy
Apr 12 at 22:12




1




1





Ah yes CentOS7' mkfs.xfs' man tells ftype=1 is the default.

– A.B
Apr 12 at 22:18






Ah yes CentOS7' mkfs.xfs' man tells ftype=1 is the default.

– A.B
Apr 12 at 22:18





1




1





It really is supported on centos 7 + xfs. Just tested it.

– mosvy
Apr 12 at 22:33





It really is supported on centos 7 + xfs. Just tested it.

– mosvy
Apr 12 at 22:33

















draft saved

draft discarded
















































Thanks for contributing an answer to Unix & Linux Stack Exchange!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f512167%2fhuge-performance-difference-of-the-command-find-with-and-without-using-m-option%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Club Baloncesto Breogán Índice Historia | Pavillón | Nome | O Breogán na cultura popular | Xogadores | Adestradores | Presidentes | Palmarés | Historial | Líderes | Notas | Véxase tamén | Menú de navegacióncbbreogan.galCadroGuía oficial da ACB 2009-10, páxina 201Guía oficial ACB 1992, páxina 183. Editorial DB.É de 6.500 espectadores sentados axeitándose á última normativa"Estudiantes Junior, entre as mellores canteiras"o orixinalHemeroteca El Mundo Deportivo, 16 setembro de 1970, páxina 12Historia do BreogánAlfredo Pérez, o último canoneiroHistoria C.B. BreogánHemeroteca de El Mundo DeportivoJimmy Wright, norteamericano do Breogán deixará Lugo por ameazas de morteResultados de Breogán en 1986-87Resultados de Breogán en 1990-91Ficha de Velimir Perasović en acb.comResultados de Breogán en 1994-95Breogán arrasa al Barça. "El Mundo Deportivo", 27 de setembro de 1999, páxina 58CB Breogán - FC BarcelonaA FEB invita a participar nunha nova Liga EuropeaCharlie Bell na prensa estatalMáximos anotadores 2005Tempada 2005-06 : Tódolos Xogadores da Xornada""Non quero pensar nunha man negra, mais pregúntome que está a pasar""o orixinalRaúl López, orgulloso dos xogadores, presume da boa saúde económica do BreogánJulio González confirma que cesa como presidente del BreogánHomenaxe a Lisardo GómezA tempada do rexurdimento celesteEntrevista a Lisardo GómezEl COB dinamita el Pazo para forzar el quinto (69-73)Cafés Candelas, patrocinador del CB Breogán"Suso Lázare, novo presidente do Breogán"o orixinalCafés Candelas Breogán firma el mayor triunfo de la historiaEl Breogán realizará 17 homenajes por su cincuenta aniversario"O Breogán honra ao seu fundador e primeiro presidente"o orixinalMiguel Giao recibiu a homenaxe do PazoHomenaxe aos primeiros gladiadores celestesO home que nos amosa como ver o Breo co corazónTita Franco será homenaxeada polos #50anosdeBreoJulio Vila recibirá unha homenaxe in memoriam polos #50anosdeBreo"O Breogán homenaxeará aos seus aboados máis veteráns"Pechada ovación a «Capi» Sanmartín e Ricardo «Corazón de González»Homenaxe por décadas de informaciónPaco García volve ao Pazo con motivo do 50 aniversario"Resultados y clasificaciones""O Cafés Candelas Breogán, campión da Copa Princesa""O Cafés Candelas Breogán, equipo ACB"C.B. Breogán"Proxecto social"o orixinal"Centros asociados"o orixinalFicha en imdb.comMario Camus trata la recuperación del amor en 'La vieja música', su última película"Páxina web oficial""Club Baloncesto Breogán""C. B. Breogán S.A.D."eehttp://www.fegaba.com

Vilaño, A Laracha Índice Patrimonio | Lugares e parroquias | Véxase tamén | Menú de navegación43°14′52″N 8°36′03″O / 43.24775, -8.60070

Cegueira Índice Epidemioloxía | Deficiencia visual | Tipos de cegueira | Principais causas de cegueira | Tratamento | Técnicas de adaptación e axudas | Vida dos cegos | Primeiros auxilios | Crenzas respecto das persoas cegas | Crenzas das persoas cegas | O neno deficiente visual | Aspectos psicolóxicos da cegueira | Notas | Véxase tamén | Menú de navegación54.054.154.436928256blindnessDicionario da Real Academia GalegaPortal das Palabras"International Standards: Visual Standards — Aspects and Ranges of Vision Loss with Emphasis on Population Surveys.""Visual impairment and blindness""Presentan un plan para previr a cegueira"o orixinalACCDV Associació Catalana de Cecs i Disminuïts Visuals - PMFTrachoma"Effect of gene therapy on visual function in Leber's congenital amaurosis"1844137110.1056/NEJMoa0802268Cans guía - os mellores amigos dos cegosArquivadoEscola de cans guía para cegos en Mortágua, PortugalArquivado"Tecnología para ciegos y deficientes visuales. Recopilación de recursos gratuitos en la Red""Colorino""‘COL.diesis’, escuchar los sonidos del color""COL.diesis: Transforming Colour into Melody and Implementing the Result in a Colour Sensor Device"o orixinal"Sistema de desarrollo de sinestesia color-sonido para invidentes utilizando un protocolo de audio""Enseñanza táctil - geometría y color. Juegos didácticos para niños ciegos y videntes""Sistema Constanz"L'ocupació laboral dels cecs a l'Estat espanyol està pràcticament equiparada a la de les persones amb visió, entrevista amb Pedro ZuritaONCE (Organización Nacional de Cegos de España)Prevención da cegueiraDescrición de deficiencias visuais (Disc@pnet)Braillín, un boneco atractivo para calquera neno, con ou sen discapacidade, que permite familiarizarse co sistema de escritura e lectura brailleAxudas Técnicas36838ID00897494007150-90057129528256DOID:1432HP:0000618D001766C10.597.751.941.162C97109C0155020