Slow sequential reading in large zfs poolZFS on FreeBSD: recovery from data corruptionHosting a ZFS server as a virtual guestZFS Configuration adviceZFS configuration opinionsDoes ZFS really stripe across every vdev, even in very large zpools?Advice for building production 140 disk (420 TB) ZFS zpoolZFS pool slow sequential readZFS: good read but poor write speedsZFS vdevs accumulate checksum errors, but individual disks do notZFS SLOG IOPS closer to random IOPS than sequential

How did Gollum enter Moria?

Is declining an undergraduate award which causes me discomfort appropriate?

Should the party get XP for a monster they never attacked?

Why isn't it a compile-time error to return a nullptr as a std::string?

Mathematically modelling RC circuit with a linear input

What does this Swiss black on yellow rectangular traffic sign with a symbol looking like a dart mean?

Drawing a second weapon as part of an attack?

How do I remove this inheritance-related code smell?

Syntax and semantics of XDV commands (XeTeX)

What is the highest voltage from the power supply a Raspberry Pi 3 B can handle without getting damaged?

Why is it easier to balance a non-moving bike standing up than sitting down?

What is the oldest commercial MS-DOS program that can run on modern versions of Windows without third-party software?

How do I professionally let my manager know I'll quit over an issue?

How long did the SR-71 take to get to cruising altitude?

How could empty set be unique if it could be vacuously false

Greeting with "Ho"

Counterfeit checks were created for my account. How does this type of fraud work?

Extending prime numbers digit by digit while retaining primality

Can I enter the UK for 24 hours from a Schengen area, holding an Indian passport?

Is there any proof that high saturation and contrast makes a picture more appealing in social media?

Why isn't my calculation that we should be able to see the sun well beyond the observable universe valid?

How can I prevent a user from copying files on another hard drive?

Do I have to explain the mechanical superiority of the player-character within the fiction of the game?

Print one file per line using echo



Slow sequential reading in large zfs pool


ZFS on FreeBSD: recovery from data corruptionHosting a ZFS server as a virtual guestZFS Configuration adviceZFS configuration opinionsDoes ZFS really stripe across every vdev, even in very large zpools?Advice for building production 140 disk (420 TB) ZFS zpoolZFS pool slow sequential readZFS: good read but poor write speedsZFS vdevs accumulate checksum errors, but individual disks do notZFS SLOG IOPS closer to random IOPS than sequential






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;








1















We are running a zfs pool as temporary storage for scientific data with 24 10TB disks in 4 vdev, each consisting of 6 disks in raidz2 config (recordsize 128K).



~ # zpool status
pool: tank
state: ONLINE
scan: scrub canceled on Mon Jun 3 11:14:39 2019
config:

NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
raidz2-0 ONLINE 0 0 0
wwn-0x5000cca25160d3c8 ONLINE 0 0 0
wwn-0x5000cca25165cf30 ONLINE 0 0 0
wwn-0x5000cca2516711a4 ONLINE 0 0 0
wwn-0x5000cca251673b88 ONLINE 0 0 0
wwn-0x5000cca251673b94 ONLINE 0 0 0
wwn-0x5000cca251674214 ONLINE 0 0 0
raidz2-1 ONLINE 0 0 0
wwn-0x5000cca251683628 ONLINE 0 0 0
wwn-0x5000cca25168771c ONLINE 0 0 0
wwn-0x5000cca25168f234 ONLINE 0 0 0
wwn-0x5000cca251692890 ONLINE 0 0 0
wwn-0x5000cca251695484 ONLINE 0 0 0
wwn-0x5000cca2516969b0 ONLINE 0 0 0
raidz2-2 ONLINE 0 0 0
wwn-0x5000c500a774ba03 ONLINE 0 0 0
wwn-0x5000c500a7800c3b ONLINE 0 0 0
wwn-0x5000c500a7800feb ONLINE 0 0 0
wwn-0x5000c500a7802abf ONLINE 0 0 0
wwn-0x5000c500a78033cb ONLINE 0 0 0
wwn-0x5000c500a78039c7 ONLINE 0 0 0
raidz2-3 ONLINE 0 0 0
wwn-0x5000c500a780416b ONLINE 0 0 0
wwn-0x5000c500a7804733 ONLINE 0 0 0
wwn-0x5000c500a7804797 ONLINE 0 0 0
wwn-0x5000c500a7805df3 ONLINE 0 0 0
wwn-0x5000c500a7806a0b ONLINE 0 0 0
wwn-0x5000c500a7807ccf ONLINE 0 0 0

errors: No known data errors


When we set this up a few month ago, performance looked ok with rates between 500MB and 1GB/s.
In the meantime, we noticed a few performance issues, but assumed it had to do with other possible bottlenecks. Now we want to move our data to the final storage und find that we can only get around 60MB/s sequential (file size > 100GB) from the pool.



  • Data was written in this config and should be automatically distributed to the vdevs

  • ashift is set to 12

  • We have ruled out that the target is slow. We can write random data much faster, and we get the same slow rate when writing to a local SSD tmp

  • We have checked the speed of the individual drives, and they look fine (see below).

  • Scrub has been run biweekly (running longer than 1 week itself) and was interrupted for these tests.

  • Pool is filled to about 80% (USED 113T, AVAIL 23.7T)

  • top does not show anything suspicious, none of the 24 cores is maxed out, system mostly idle


  • zfs iostat 10 is in line with the rates measured with rsync --progress and time cp

  • RAM is not used up (see below)

  • Data is backed up on tape, but we would like to avoid downtime due to "experiments"

Test for disk performance



~ # echo 1 > /proc/sys/vm/drop_caches
for i in $(zpool status | grep wwn- | awk 'print $1'); do
echo $i;dd if=/dev/disk/by-id/$i of=/dev/null status=progress bs=1G count=1 seek=1G; echo; echo; sleep 1
done
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 4.81339 s, 223 MB/s
... (similar rates for all 24 disks)


RAM:



~ # free -h
total used free shared buff/cache available
Mem: 62G 2.1G 35G 18M 25G 59G


All properties of the filesystem we copy from (another larger one exists)



zfs get all tank/storage/bulk
NAME PROPERTY VALUE SOURCE
tank/storage/bulk type filesystem -
tank/storage/bulk creation Fr Mär 1 9:48 2019 -
tank/storage/bulk used 3.96T -
tank/storage/bulk available 23.7T -
tank/storage/bulk referenced 3.96T -
tank/storage/bulk compressratio 1.17x -
tank/storage/bulk mounted yes -
tank/storage/bulk quota none default
tank/storage/bulk reservation none default
tank/storage/bulk recordsize 128K default
tank/storage/bulk mountpoint /storage/bulk inherited from tank/storage
tank/storage/bulk sharenfs rw=@10.40.20.201,rw=@10.40.20.202 local
tank/storage/bulk checksum on default
tank/storage/bulk compression on inherited from tank
tank/storage/bulk atime off inherited from tank
tank/storage/bulk devices on default
tank/storage/bulk exec off inherited from tank
tank/storage/bulk setuid off inherited from tank
tank/storage/bulk readonly off default
tank/storage/bulk zoned off default
tank/storage/bulk snapdir hidden default
tank/storage/bulk aclinherit restricted default
tank/storage/bulk canmount on default
tank/storage/bulk xattr sa inherited from tank
tank/storage/bulk copies 1 default
tank/storage/bulk version 5 -
tank/storage/bulk utf8only off -
tank/storage/bulk normalization none -
tank/storage/bulk casesensitivity sensitive -
tank/storage/bulk vscan off default
tank/storage/bulk nbmand off default
tank/storage/bulk sharesmb off inherited from tank
tank/storage/bulk refquota none default
tank/storage/bulk refreservation none default
tank/storage/bulk primarycache all default
tank/storage/bulk secondarycache all default
tank/storage/bulk usedbysnapshots 2.40M -
tank/storage/bulk usedbydataset 3.96T -
tank/storage/bulk usedbychildren 0 -
tank/storage/bulk usedbyrefreservation 0 -
tank/storage/bulk logbias latency default
tank/storage/bulk dedup off default
tank/storage/bulk mlslabel none default
tank/storage/bulk sync standard default
tank/storage/bulk refcompressratio 1.17x -
tank/storage/bulk written 0 -
tank/storage/bulk logicalused 4.55T -
tank/storage/bulk logicalreferenced 4.55T -
tank/storage/bulk filesystem_limit none default
tank/storage/bulk snapshot_limit none default
tank/storage/bulk filesystem_count none default
tank/storage/bulk snapshot_count none default
tank/storage/bulk snapdev hidden default
tank/storage/bulk acltype posixacl inherited from tank
tank/storage/bulk context none default
tank/storage/bulk fscontext none default
tank/storage/bulk defcontext none default
tank/storage/bulk rootcontext none default
tank/storage/bulk relatime on inherited from tank
tank/storage/bulk redundant_metadata all default
tank/storage/bulk overlay off default
tank/storage/bulk com.sun:auto-snapshot true local


Versions



srv01 ~ # apt policy zfsutils-linux 
zfsutils-linux:
Installed: 0.6.5.6-0ubuntu27
Candidate: 0.6.5.6-0ubuntu27
Version table:
*** 0.6.5.6-0ubuntu27 500
500 http://de.archive.ubuntu.com/ubuntu xenial-updates/main amd64 Packages
100 /var/lib/dpkg/status
0.6.5.6-0ubuntu8 500
500 http://de.archive.ubuntu.com/ubuntu xenial/universe amd64 Packages
srv01 ~ # uname -a
Linux srv01 4.15.0-50-generic #54~16.04.1-Ubuntu SMP Wed May 8 15:55:19 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux


Disk usage



~ # iostat -x 3 /dev/disk/by-id/wwn-0x????????????????
Linux 4.15.0-50-generic (bbo3102) 04.06.2019 _x86_64_ (48 CPU)

[...]

avg-cpu: %user %nice %system %iowait %steal %idle
3.12 0.00 2.09 0.29 0.00 94.50

Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sdn 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdr 0.00 0.00 487.00 0.00 10744.00 0.00 44.12 0.56 1.16 1.16 0.00 0.51 24.93
sdt 1.67 0.00 484.33 0.00 12640.00 0.00 52.20 0.52 1.09 1.09 0.00 0.44 21.47
sdu 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdv 0.00 0.00 0.67 0.00 8.00 0.00 24.00 0.00 6.00 6.00 0.00 6.00 0.40
sdw 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdx 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdy 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdo 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdq 0.00 0.00 472.33 0.00 10812.00 0.00 45.78 0.64 1.35 1.35 0.00 0.59 27.73
sdb 0.00 0.00 469.33 0.00 10908.00 0.00 46.48 0.10 0.22 0.22 0.00 0.14 6.53
sdc 0.00 0.00 192.33 0.00 4696.00 0.00 48.83 0.05 0.25 0.25 0.00 0.12 2.27
sdg 0.33 0.00 281.33 0.00 6978.67 0.00 49.61 0.07 0.27 0.27 0.00 0.15 4.13
sdh 0.67 0.00 449.33 0.00 10524.00 0.00 46.84 0.16 0.36 0.36 0.00 0.17 7.73
sdj 0.00 0.00 271.33 0.00 6580.00 0.00 48.50 0.04 0.13 0.13 0.00 0.09 2.53
sdi 0.00 0.00 183.67 0.00 3928.00 0.00 42.77 0.07 0.36 0.36 0.00 0.23 4.27
sde 0.00 0.00 280.00 0.00 5860.00 0.00 41.86 0.10 0.36 0.36 0.00 0.22 6.27
sdf 0.00 0.00 177.33 0.00 4662.67 0.00 52.59 0.07 0.38 0.38 0.00 0.18 3.20
sdk 0.33 0.00 464.33 0.00 10498.67 0.00 45.22 0.05 0.10 0.10 0.00 0.07 3.47
sdp 0.00 0.00 0.67 0.00 8.00 0.00 24.00 0.00 4.00 4.00 0.00 4.00 0.27
sds 1.00 0.00 489.67 0.00 12650.67 0.00 51.67 0.16 0.34 0.34 0.00 0.16 7.87
sdl 0.00 0.00 464.67 0.00 10200.00 0.00 43.90 0.05 0.11 0.11 0.00 0.08 3.73
sdd 0.00 0.00 268.00 0.00 5509.33 0.00 41.11 0.07 0.26 0.26 0.00 0.18 4.93
sda 0.00 0.00 192.00 0.00 3928.00 0.00 40.92 0.03 0.17 0.17 0.00 0.09 1.73


Any ideas or suggestions for further tests?










share|improve this question
























  • While you copy, try iostat -x 3 /dev/sd? or similar and see if the %util or await columns on any drives show high numbers. zfs 0.6.5.6 is quite dated (0.8.0 is out) and there have been many performance improvements; an upgrade may be worthwhile.

    – András Korn
    Jun 3 at 20:16











  • %util hardly ever exceed 50%. It seems to be 2 digit numbers into the 30s on 4-5 drives, around 5 show 0 and the rest in the one digit % range. await is 5-8 on the higher load disks, <1 on all the others. (Added full report in main post) Wait does not seem to be the problem, but I am a little surprised that so few disks are used.

    – mcandril
    Jun 4 at 6:59












  • sdq, sdr, and sdt all have significantly higher wait times, %util , and svctm values. sds also seems a bit of an outlier, too. Either those drives have problems, or you've managed to select data that's concentrated on that one ZFS vdev for some reason, assuming those drives all are part of the same vdev. Can you select other data to read?

    – Andrew Henle
    Jun 4 at 9:44











  • If we are assuming the disks to be the bottleneck, should we not expect %util around 100 somewhere? anyway, all of these are from raidz2-1. So one of the issues seems to be that the data is not properly striped between the vdevs, despite the fact that they all existed when the data was written. The second question is why at least the disks of that one vdev are not fully used. I did find out, though, that I can sync another folder at the same time and also get around 50-980MB/s.

    – mcandril
    Jun 4 at 11:30











  • What's in your /etc/modprobe.d/zfs.conf config file?

    – ewwhite
    Jun 4 at 13:30

















1















We are running a zfs pool as temporary storage for scientific data with 24 10TB disks in 4 vdev, each consisting of 6 disks in raidz2 config (recordsize 128K).



~ # zpool status
pool: tank
state: ONLINE
scan: scrub canceled on Mon Jun 3 11:14:39 2019
config:

NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
raidz2-0 ONLINE 0 0 0
wwn-0x5000cca25160d3c8 ONLINE 0 0 0
wwn-0x5000cca25165cf30 ONLINE 0 0 0
wwn-0x5000cca2516711a4 ONLINE 0 0 0
wwn-0x5000cca251673b88 ONLINE 0 0 0
wwn-0x5000cca251673b94 ONLINE 0 0 0
wwn-0x5000cca251674214 ONLINE 0 0 0
raidz2-1 ONLINE 0 0 0
wwn-0x5000cca251683628 ONLINE 0 0 0
wwn-0x5000cca25168771c ONLINE 0 0 0
wwn-0x5000cca25168f234 ONLINE 0 0 0
wwn-0x5000cca251692890 ONLINE 0 0 0
wwn-0x5000cca251695484 ONLINE 0 0 0
wwn-0x5000cca2516969b0 ONLINE 0 0 0
raidz2-2 ONLINE 0 0 0
wwn-0x5000c500a774ba03 ONLINE 0 0 0
wwn-0x5000c500a7800c3b ONLINE 0 0 0
wwn-0x5000c500a7800feb ONLINE 0 0 0
wwn-0x5000c500a7802abf ONLINE 0 0 0
wwn-0x5000c500a78033cb ONLINE 0 0 0
wwn-0x5000c500a78039c7 ONLINE 0 0 0
raidz2-3 ONLINE 0 0 0
wwn-0x5000c500a780416b ONLINE 0 0 0
wwn-0x5000c500a7804733 ONLINE 0 0 0
wwn-0x5000c500a7804797 ONLINE 0 0 0
wwn-0x5000c500a7805df3 ONLINE 0 0 0
wwn-0x5000c500a7806a0b ONLINE 0 0 0
wwn-0x5000c500a7807ccf ONLINE 0 0 0

errors: No known data errors


When we set this up a few month ago, performance looked ok with rates between 500MB and 1GB/s.
In the meantime, we noticed a few performance issues, but assumed it had to do with other possible bottlenecks. Now we want to move our data to the final storage und find that we can only get around 60MB/s sequential (file size > 100GB) from the pool.



  • Data was written in this config and should be automatically distributed to the vdevs

  • ashift is set to 12

  • We have ruled out that the target is slow. We can write random data much faster, and we get the same slow rate when writing to a local SSD tmp

  • We have checked the speed of the individual drives, and they look fine (see below).

  • Scrub has been run biweekly (running longer than 1 week itself) and was interrupted for these tests.

  • Pool is filled to about 80% (USED 113T, AVAIL 23.7T)

  • top does not show anything suspicious, none of the 24 cores is maxed out, system mostly idle


  • zfs iostat 10 is in line with the rates measured with rsync --progress and time cp

  • RAM is not used up (see below)

  • Data is backed up on tape, but we would like to avoid downtime due to "experiments"

Test for disk performance



~ # echo 1 > /proc/sys/vm/drop_caches
for i in $(zpool status | grep wwn- | awk 'print $1'); do
echo $i;dd if=/dev/disk/by-id/$i of=/dev/null status=progress bs=1G count=1 seek=1G; echo; echo; sleep 1
done
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 4.81339 s, 223 MB/s
... (similar rates for all 24 disks)


RAM:



~ # free -h
total used free shared buff/cache available
Mem: 62G 2.1G 35G 18M 25G 59G


All properties of the filesystem we copy from (another larger one exists)



zfs get all tank/storage/bulk
NAME PROPERTY VALUE SOURCE
tank/storage/bulk type filesystem -
tank/storage/bulk creation Fr Mär 1 9:48 2019 -
tank/storage/bulk used 3.96T -
tank/storage/bulk available 23.7T -
tank/storage/bulk referenced 3.96T -
tank/storage/bulk compressratio 1.17x -
tank/storage/bulk mounted yes -
tank/storage/bulk quota none default
tank/storage/bulk reservation none default
tank/storage/bulk recordsize 128K default
tank/storage/bulk mountpoint /storage/bulk inherited from tank/storage
tank/storage/bulk sharenfs rw=@10.40.20.201,rw=@10.40.20.202 local
tank/storage/bulk checksum on default
tank/storage/bulk compression on inherited from tank
tank/storage/bulk atime off inherited from tank
tank/storage/bulk devices on default
tank/storage/bulk exec off inherited from tank
tank/storage/bulk setuid off inherited from tank
tank/storage/bulk readonly off default
tank/storage/bulk zoned off default
tank/storage/bulk snapdir hidden default
tank/storage/bulk aclinherit restricted default
tank/storage/bulk canmount on default
tank/storage/bulk xattr sa inherited from tank
tank/storage/bulk copies 1 default
tank/storage/bulk version 5 -
tank/storage/bulk utf8only off -
tank/storage/bulk normalization none -
tank/storage/bulk casesensitivity sensitive -
tank/storage/bulk vscan off default
tank/storage/bulk nbmand off default
tank/storage/bulk sharesmb off inherited from tank
tank/storage/bulk refquota none default
tank/storage/bulk refreservation none default
tank/storage/bulk primarycache all default
tank/storage/bulk secondarycache all default
tank/storage/bulk usedbysnapshots 2.40M -
tank/storage/bulk usedbydataset 3.96T -
tank/storage/bulk usedbychildren 0 -
tank/storage/bulk usedbyrefreservation 0 -
tank/storage/bulk logbias latency default
tank/storage/bulk dedup off default
tank/storage/bulk mlslabel none default
tank/storage/bulk sync standard default
tank/storage/bulk refcompressratio 1.17x -
tank/storage/bulk written 0 -
tank/storage/bulk logicalused 4.55T -
tank/storage/bulk logicalreferenced 4.55T -
tank/storage/bulk filesystem_limit none default
tank/storage/bulk snapshot_limit none default
tank/storage/bulk filesystem_count none default
tank/storage/bulk snapshot_count none default
tank/storage/bulk snapdev hidden default
tank/storage/bulk acltype posixacl inherited from tank
tank/storage/bulk context none default
tank/storage/bulk fscontext none default
tank/storage/bulk defcontext none default
tank/storage/bulk rootcontext none default
tank/storage/bulk relatime on inherited from tank
tank/storage/bulk redundant_metadata all default
tank/storage/bulk overlay off default
tank/storage/bulk com.sun:auto-snapshot true local


Versions



srv01 ~ # apt policy zfsutils-linux 
zfsutils-linux:
Installed: 0.6.5.6-0ubuntu27
Candidate: 0.6.5.6-0ubuntu27
Version table:
*** 0.6.5.6-0ubuntu27 500
500 http://de.archive.ubuntu.com/ubuntu xenial-updates/main amd64 Packages
100 /var/lib/dpkg/status
0.6.5.6-0ubuntu8 500
500 http://de.archive.ubuntu.com/ubuntu xenial/universe amd64 Packages
srv01 ~ # uname -a
Linux srv01 4.15.0-50-generic #54~16.04.1-Ubuntu SMP Wed May 8 15:55:19 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux


Disk usage



~ # iostat -x 3 /dev/disk/by-id/wwn-0x????????????????
Linux 4.15.0-50-generic (bbo3102) 04.06.2019 _x86_64_ (48 CPU)

[...]

avg-cpu: %user %nice %system %iowait %steal %idle
3.12 0.00 2.09 0.29 0.00 94.50

Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sdn 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdr 0.00 0.00 487.00 0.00 10744.00 0.00 44.12 0.56 1.16 1.16 0.00 0.51 24.93
sdt 1.67 0.00 484.33 0.00 12640.00 0.00 52.20 0.52 1.09 1.09 0.00 0.44 21.47
sdu 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdv 0.00 0.00 0.67 0.00 8.00 0.00 24.00 0.00 6.00 6.00 0.00 6.00 0.40
sdw 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdx 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdy 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdo 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdq 0.00 0.00 472.33 0.00 10812.00 0.00 45.78 0.64 1.35 1.35 0.00 0.59 27.73
sdb 0.00 0.00 469.33 0.00 10908.00 0.00 46.48 0.10 0.22 0.22 0.00 0.14 6.53
sdc 0.00 0.00 192.33 0.00 4696.00 0.00 48.83 0.05 0.25 0.25 0.00 0.12 2.27
sdg 0.33 0.00 281.33 0.00 6978.67 0.00 49.61 0.07 0.27 0.27 0.00 0.15 4.13
sdh 0.67 0.00 449.33 0.00 10524.00 0.00 46.84 0.16 0.36 0.36 0.00 0.17 7.73
sdj 0.00 0.00 271.33 0.00 6580.00 0.00 48.50 0.04 0.13 0.13 0.00 0.09 2.53
sdi 0.00 0.00 183.67 0.00 3928.00 0.00 42.77 0.07 0.36 0.36 0.00 0.23 4.27
sde 0.00 0.00 280.00 0.00 5860.00 0.00 41.86 0.10 0.36 0.36 0.00 0.22 6.27
sdf 0.00 0.00 177.33 0.00 4662.67 0.00 52.59 0.07 0.38 0.38 0.00 0.18 3.20
sdk 0.33 0.00 464.33 0.00 10498.67 0.00 45.22 0.05 0.10 0.10 0.00 0.07 3.47
sdp 0.00 0.00 0.67 0.00 8.00 0.00 24.00 0.00 4.00 4.00 0.00 4.00 0.27
sds 1.00 0.00 489.67 0.00 12650.67 0.00 51.67 0.16 0.34 0.34 0.00 0.16 7.87
sdl 0.00 0.00 464.67 0.00 10200.00 0.00 43.90 0.05 0.11 0.11 0.00 0.08 3.73
sdd 0.00 0.00 268.00 0.00 5509.33 0.00 41.11 0.07 0.26 0.26 0.00 0.18 4.93
sda 0.00 0.00 192.00 0.00 3928.00 0.00 40.92 0.03 0.17 0.17 0.00 0.09 1.73


Any ideas or suggestions for further tests?










share|improve this question
























  • While you copy, try iostat -x 3 /dev/sd? or similar and see if the %util or await columns on any drives show high numbers. zfs 0.6.5.6 is quite dated (0.8.0 is out) and there have been many performance improvements; an upgrade may be worthwhile.

    – András Korn
    Jun 3 at 20:16











  • %util hardly ever exceed 50%. It seems to be 2 digit numbers into the 30s on 4-5 drives, around 5 show 0 and the rest in the one digit % range. await is 5-8 on the higher load disks, <1 on all the others. (Added full report in main post) Wait does not seem to be the problem, but I am a little surprised that so few disks are used.

    – mcandril
    Jun 4 at 6:59












  • sdq, sdr, and sdt all have significantly higher wait times, %util , and svctm values. sds also seems a bit of an outlier, too. Either those drives have problems, or you've managed to select data that's concentrated on that one ZFS vdev for some reason, assuming those drives all are part of the same vdev. Can you select other data to read?

    – Andrew Henle
    Jun 4 at 9:44











  • If we are assuming the disks to be the bottleneck, should we not expect %util around 100 somewhere? anyway, all of these are from raidz2-1. So one of the issues seems to be that the data is not properly striped between the vdevs, despite the fact that they all existed when the data was written. The second question is why at least the disks of that one vdev are not fully used. I did find out, though, that I can sync another folder at the same time and also get around 50-980MB/s.

    – mcandril
    Jun 4 at 11:30











  • What's in your /etc/modprobe.d/zfs.conf config file?

    – ewwhite
    Jun 4 at 13:30













1












1








1








We are running a zfs pool as temporary storage for scientific data with 24 10TB disks in 4 vdev, each consisting of 6 disks in raidz2 config (recordsize 128K).



~ # zpool status
pool: tank
state: ONLINE
scan: scrub canceled on Mon Jun 3 11:14:39 2019
config:

NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
raidz2-0 ONLINE 0 0 0
wwn-0x5000cca25160d3c8 ONLINE 0 0 0
wwn-0x5000cca25165cf30 ONLINE 0 0 0
wwn-0x5000cca2516711a4 ONLINE 0 0 0
wwn-0x5000cca251673b88 ONLINE 0 0 0
wwn-0x5000cca251673b94 ONLINE 0 0 0
wwn-0x5000cca251674214 ONLINE 0 0 0
raidz2-1 ONLINE 0 0 0
wwn-0x5000cca251683628 ONLINE 0 0 0
wwn-0x5000cca25168771c ONLINE 0 0 0
wwn-0x5000cca25168f234 ONLINE 0 0 0
wwn-0x5000cca251692890 ONLINE 0 0 0
wwn-0x5000cca251695484 ONLINE 0 0 0
wwn-0x5000cca2516969b0 ONLINE 0 0 0
raidz2-2 ONLINE 0 0 0
wwn-0x5000c500a774ba03 ONLINE 0 0 0
wwn-0x5000c500a7800c3b ONLINE 0 0 0
wwn-0x5000c500a7800feb ONLINE 0 0 0
wwn-0x5000c500a7802abf ONLINE 0 0 0
wwn-0x5000c500a78033cb ONLINE 0 0 0
wwn-0x5000c500a78039c7 ONLINE 0 0 0
raidz2-3 ONLINE 0 0 0
wwn-0x5000c500a780416b ONLINE 0 0 0
wwn-0x5000c500a7804733 ONLINE 0 0 0
wwn-0x5000c500a7804797 ONLINE 0 0 0
wwn-0x5000c500a7805df3 ONLINE 0 0 0
wwn-0x5000c500a7806a0b ONLINE 0 0 0
wwn-0x5000c500a7807ccf ONLINE 0 0 0

errors: No known data errors


When we set this up a few month ago, performance looked ok with rates between 500MB and 1GB/s.
In the meantime, we noticed a few performance issues, but assumed it had to do with other possible bottlenecks. Now we want to move our data to the final storage und find that we can only get around 60MB/s sequential (file size > 100GB) from the pool.



  • Data was written in this config and should be automatically distributed to the vdevs

  • ashift is set to 12

  • We have ruled out that the target is slow. We can write random data much faster, and we get the same slow rate when writing to a local SSD tmp

  • We have checked the speed of the individual drives, and they look fine (see below).

  • Scrub has been run biweekly (running longer than 1 week itself) and was interrupted for these tests.

  • Pool is filled to about 80% (USED 113T, AVAIL 23.7T)

  • top does not show anything suspicious, none of the 24 cores is maxed out, system mostly idle


  • zfs iostat 10 is in line with the rates measured with rsync --progress and time cp

  • RAM is not used up (see below)

  • Data is backed up on tape, but we would like to avoid downtime due to "experiments"

Test for disk performance



~ # echo 1 > /proc/sys/vm/drop_caches
for i in $(zpool status | grep wwn- | awk 'print $1'); do
echo $i;dd if=/dev/disk/by-id/$i of=/dev/null status=progress bs=1G count=1 seek=1G; echo; echo; sleep 1
done
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 4.81339 s, 223 MB/s
... (similar rates for all 24 disks)


RAM:



~ # free -h
total used free shared buff/cache available
Mem: 62G 2.1G 35G 18M 25G 59G


All properties of the filesystem we copy from (another larger one exists)



zfs get all tank/storage/bulk
NAME PROPERTY VALUE SOURCE
tank/storage/bulk type filesystem -
tank/storage/bulk creation Fr Mär 1 9:48 2019 -
tank/storage/bulk used 3.96T -
tank/storage/bulk available 23.7T -
tank/storage/bulk referenced 3.96T -
tank/storage/bulk compressratio 1.17x -
tank/storage/bulk mounted yes -
tank/storage/bulk quota none default
tank/storage/bulk reservation none default
tank/storage/bulk recordsize 128K default
tank/storage/bulk mountpoint /storage/bulk inherited from tank/storage
tank/storage/bulk sharenfs rw=@10.40.20.201,rw=@10.40.20.202 local
tank/storage/bulk checksum on default
tank/storage/bulk compression on inherited from tank
tank/storage/bulk atime off inherited from tank
tank/storage/bulk devices on default
tank/storage/bulk exec off inherited from tank
tank/storage/bulk setuid off inherited from tank
tank/storage/bulk readonly off default
tank/storage/bulk zoned off default
tank/storage/bulk snapdir hidden default
tank/storage/bulk aclinherit restricted default
tank/storage/bulk canmount on default
tank/storage/bulk xattr sa inherited from tank
tank/storage/bulk copies 1 default
tank/storage/bulk version 5 -
tank/storage/bulk utf8only off -
tank/storage/bulk normalization none -
tank/storage/bulk casesensitivity sensitive -
tank/storage/bulk vscan off default
tank/storage/bulk nbmand off default
tank/storage/bulk sharesmb off inherited from tank
tank/storage/bulk refquota none default
tank/storage/bulk refreservation none default
tank/storage/bulk primarycache all default
tank/storage/bulk secondarycache all default
tank/storage/bulk usedbysnapshots 2.40M -
tank/storage/bulk usedbydataset 3.96T -
tank/storage/bulk usedbychildren 0 -
tank/storage/bulk usedbyrefreservation 0 -
tank/storage/bulk logbias latency default
tank/storage/bulk dedup off default
tank/storage/bulk mlslabel none default
tank/storage/bulk sync standard default
tank/storage/bulk refcompressratio 1.17x -
tank/storage/bulk written 0 -
tank/storage/bulk logicalused 4.55T -
tank/storage/bulk logicalreferenced 4.55T -
tank/storage/bulk filesystem_limit none default
tank/storage/bulk snapshot_limit none default
tank/storage/bulk filesystem_count none default
tank/storage/bulk snapshot_count none default
tank/storage/bulk snapdev hidden default
tank/storage/bulk acltype posixacl inherited from tank
tank/storage/bulk context none default
tank/storage/bulk fscontext none default
tank/storage/bulk defcontext none default
tank/storage/bulk rootcontext none default
tank/storage/bulk relatime on inherited from tank
tank/storage/bulk redundant_metadata all default
tank/storage/bulk overlay off default
tank/storage/bulk com.sun:auto-snapshot true local


Versions



srv01 ~ # apt policy zfsutils-linux 
zfsutils-linux:
Installed: 0.6.5.6-0ubuntu27
Candidate: 0.6.5.6-0ubuntu27
Version table:
*** 0.6.5.6-0ubuntu27 500
500 http://de.archive.ubuntu.com/ubuntu xenial-updates/main amd64 Packages
100 /var/lib/dpkg/status
0.6.5.6-0ubuntu8 500
500 http://de.archive.ubuntu.com/ubuntu xenial/universe amd64 Packages
srv01 ~ # uname -a
Linux srv01 4.15.0-50-generic #54~16.04.1-Ubuntu SMP Wed May 8 15:55:19 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux


Disk usage



~ # iostat -x 3 /dev/disk/by-id/wwn-0x????????????????
Linux 4.15.0-50-generic (bbo3102) 04.06.2019 _x86_64_ (48 CPU)

[...]

avg-cpu: %user %nice %system %iowait %steal %idle
3.12 0.00 2.09 0.29 0.00 94.50

Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sdn 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdr 0.00 0.00 487.00 0.00 10744.00 0.00 44.12 0.56 1.16 1.16 0.00 0.51 24.93
sdt 1.67 0.00 484.33 0.00 12640.00 0.00 52.20 0.52 1.09 1.09 0.00 0.44 21.47
sdu 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdv 0.00 0.00 0.67 0.00 8.00 0.00 24.00 0.00 6.00 6.00 0.00 6.00 0.40
sdw 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdx 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdy 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdo 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdq 0.00 0.00 472.33 0.00 10812.00 0.00 45.78 0.64 1.35 1.35 0.00 0.59 27.73
sdb 0.00 0.00 469.33 0.00 10908.00 0.00 46.48 0.10 0.22 0.22 0.00 0.14 6.53
sdc 0.00 0.00 192.33 0.00 4696.00 0.00 48.83 0.05 0.25 0.25 0.00 0.12 2.27
sdg 0.33 0.00 281.33 0.00 6978.67 0.00 49.61 0.07 0.27 0.27 0.00 0.15 4.13
sdh 0.67 0.00 449.33 0.00 10524.00 0.00 46.84 0.16 0.36 0.36 0.00 0.17 7.73
sdj 0.00 0.00 271.33 0.00 6580.00 0.00 48.50 0.04 0.13 0.13 0.00 0.09 2.53
sdi 0.00 0.00 183.67 0.00 3928.00 0.00 42.77 0.07 0.36 0.36 0.00 0.23 4.27
sde 0.00 0.00 280.00 0.00 5860.00 0.00 41.86 0.10 0.36 0.36 0.00 0.22 6.27
sdf 0.00 0.00 177.33 0.00 4662.67 0.00 52.59 0.07 0.38 0.38 0.00 0.18 3.20
sdk 0.33 0.00 464.33 0.00 10498.67 0.00 45.22 0.05 0.10 0.10 0.00 0.07 3.47
sdp 0.00 0.00 0.67 0.00 8.00 0.00 24.00 0.00 4.00 4.00 0.00 4.00 0.27
sds 1.00 0.00 489.67 0.00 12650.67 0.00 51.67 0.16 0.34 0.34 0.00 0.16 7.87
sdl 0.00 0.00 464.67 0.00 10200.00 0.00 43.90 0.05 0.11 0.11 0.00 0.08 3.73
sdd 0.00 0.00 268.00 0.00 5509.33 0.00 41.11 0.07 0.26 0.26 0.00 0.18 4.93
sda 0.00 0.00 192.00 0.00 3928.00 0.00 40.92 0.03 0.17 0.17 0.00 0.09 1.73


Any ideas or suggestions for further tests?










share|improve this question
















We are running a zfs pool as temporary storage for scientific data with 24 10TB disks in 4 vdev, each consisting of 6 disks in raidz2 config (recordsize 128K).



~ # zpool status
pool: tank
state: ONLINE
scan: scrub canceled on Mon Jun 3 11:14:39 2019
config:

NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
raidz2-0 ONLINE 0 0 0
wwn-0x5000cca25160d3c8 ONLINE 0 0 0
wwn-0x5000cca25165cf30 ONLINE 0 0 0
wwn-0x5000cca2516711a4 ONLINE 0 0 0
wwn-0x5000cca251673b88 ONLINE 0 0 0
wwn-0x5000cca251673b94 ONLINE 0 0 0
wwn-0x5000cca251674214 ONLINE 0 0 0
raidz2-1 ONLINE 0 0 0
wwn-0x5000cca251683628 ONLINE 0 0 0
wwn-0x5000cca25168771c ONLINE 0 0 0
wwn-0x5000cca25168f234 ONLINE 0 0 0
wwn-0x5000cca251692890 ONLINE 0 0 0
wwn-0x5000cca251695484 ONLINE 0 0 0
wwn-0x5000cca2516969b0 ONLINE 0 0 0
raidz2-2 ONLINE 0 0 0
wwn-0x5000c500a774ba03 ONLINE 0 0 0
wwn-0x5000c500a7800c3b ONLINE 0 0 0
wwn-0x5000c500a7800feb ONLINE 0 0 0
wwn-0x5000c500a7802abf ONLINE 0 0 0
wwn-0x5000c500a78033cb ONLINE 0 0 0
wwn-0x5000c500a78039c7 ONLINE 0 0 0
raidz2-3 ONLINE 0 0 0
wwn-0x5000c500a780416b ONLINE 0 0 0
wwn-0x5000c500a7804733 ONLINE 0 0 0
wwn-0x5000c500a7804797 ONLINE 0 0 0
wwn-0x5000c500a7805df3 ONLINE 0 0 0
wwn-0x5000c500a7806a0b ONLINE 0 0 0
wwn-0x5000c500a7807ccf ONLINE 0 0 0

errors: No known data errors


When we set this up a few month ago, performance looked ok with rates between 500MB and 1GB/s.
In the meantime, we noticed a few performance issues, but assumed it had to do with other possible bottlenecks. Now we want to move our data to the final storage und find that we can only get around 60MB/s sequential (file size > 100GB) from the pool.



  • Data was written in this config and should be automatically distributed to the vdevs

  • ashift is set to 12

  • We have ruled out that the target is slow. We can write random data much faster, and we get the same slow rate when writing to a local SSD tmp

  • We have checked the speed of the individual drives, and they look fine (see below).

  • Scrub has been run biweekly (running longer than 1 week itself) and was interrupted for these tests.

  • Pool is filled to about 80% (USED 113T, AVAIL 23.7T)

  • top does not show anything suspicious, none of the 24 cores is maxed out, system mostly idle


  • zfs iostat 10 is in line with the rates measured with rsync --progress and time cp

  • RAM is not used up (see below)

  • Data is backed up on tape, but we would like to avoid downtime due to "experiments"

Test for disk performance



~ # echo 1 > /proc/sys/vm/drop_caches
for i in $(zpool status | grep wwn- | awk 'print $1'); do
echo $i;dd if=/dev/disk/by-id/$i of=/dev/null status=progress bs=1G count=1 seek=1G; echo; echo; sleep 1
done
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 4.81339 s, 223 MB/s
... (similar rates for all 24 disks)


RAM:



~ # free -h
total used free shared buff/cache available
Mem: 62G 2.1G 35G 18M 25G 59G


All properties of the filesystem we copy from (another larger one exists)



zfs get all tank/storage/bulk
NAME PROPERTY VALUE SOURCE
tank/storage/bulk type filesystem -
tank/storage/bulk creation Fr Mär 1 9:48 2019 -
tank/storage/bulk used 3.96T -
tank/storage/bulk available 23.7T -
tank/storage/bulk referenced 3.96T -
tank/storage/bulk compressratio 1.17x -
tank/storage/bulk mounted yes -
tank/storage/bulk quota none default
tank/storage/bulk reservation none default
tank/storage/bulk recordsize 128K default
tank/storage/bulk mountpoint /storage/bulk inherited from tank/storage
tank/storage/bulk sharenfs rw=@10.40.20.201,rw=@10.40.20.202 local
tank/storage/bulk checksum on default
tank/storage/bulk compression on inherited from tank
tank/storage/bulk atime off inherited from tank
tank/storage/bulk devices on default
tank/storage/bulk exec off inherited from tank
tank/storage/bulk setuid off inherited from tank
tank/storage/bulk readonly off default
tank/storage/bulk zoned off default
tank/storage/bulk snapdir hidden default
tank/storage/bulk aclinherit restricted default
tank/storage/bulk canmount on default
tank/storage/bulk xattr sa inherited from tank
tank/storage/bulk copies 1 default
tank/storage/bulk version 5 -
tank/storage/bulk utf8only off -
tank/storage/bulk normalization none -
tank/storage/bulk casesensitivity sensitive -
tank/storage/bulk vscan off default
tank/storage/bulk nbmand off default
tank/storage/bulk sharesmb off inherited from tank
tank/storage/bulk refquota none default
tank/storage/bulk refreservation none default
tank/storage/bulk primarycache all default
tank/storage/bulk secondarycache all default
tank/storage/bulk usedbysnapshots 2.40M -
tank/storage/bulk usedbydataset 3.96T -
tank/storage/bulk usedbychildren 0 -
tank/storage/bulk usedbyrefreservation 0 -
tank/storage/bulk logbias latency default
tank/storage/bulk dedup off default
tank/storage/bulk mlslabel none default
tank/storage/bulk sync standard default
tank/storage/bulk refcompressratio 1.17x -
tank/storage/bulk written 0 -
tank/storage/bulk logicalused 4.55T -
tank/storage/bulk logicalreferenced 4.55T -
tank/storage/bulk filesystem_limit none default
tank/storage/bulk snapshot_limit none default
tank/storage/bulk filesystem_count none default
tank/storage/bulk snapshot_count none default
tank/storage/bulk snapdev hidden default
tank/storage/bulk acltype posixacl inherited from tank
tank/storage/bulk context none default
tank/storage/bulk fscontext none default
tank/storage/bulk defcontext none default
tank/storage/bulk rootcontext none default
tank/storage/bulk relatime on inherited from tank
tank/storage/bulk redundant_metadata all default
tank/storage/bulk overlay off default
tank/storage/bulk com.sun:auto-snapshot true local


Versions



srv01 ~ # apt policy zfsutils-linux 
zfsutils-linux:
Installed: 0.6.5.6-0ubuntu27
Candidate: 0.6.5.6-0ubuntu27
Version table:
*** 0.6.5.6-0ubuntu27 500
500 http://de.archive.ubuntu.com/ubuntu xenial-updates/main amd64 Packages
100 /var/lib/dpkg/status
0.6.5.6-0ubuntu8 500
500 http://de.archive.ubuntu.com/ubuntu xenial/universe amd64 Packages
srv01 ~ # uname -a
Linux srv01 4.15.0-50-generic #54~16.04.1-Ubuntu SMP Wed May 8 15:55:19 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux


Disk usage



~ # iostat -x 3 /dev/disk/by-id/wwn-0x????????????????
Linux 4.15.0-50-generic (bbo3102) 04.06.2019 _x86_64_ (48 CPU)

[...]

avg-cpu: %user %nice %system %iowait %steal %idle
3.12 0.00 2.09 0.29 0.00 94.50

Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sdn 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdr 0.00 0.00 487.00 0.00 10744.00 0.00 44.12 0.56 1.16 1.16 0.00 0.51 24.93
sdt 1.67 0.00 484.33 0.00 12640.00 0.00 52.20 0.52 1.09 1.09 0.00 0.44 21.47
sdu 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdv 0.00 0.00 0.67 0.00 8.00 0.00 24.00 0.00 6.00 6.00 0.00 6.00 0.40
sdw 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdx 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdy 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdo 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdq 0.00 0.00 472.33 0.00 10812.00 0.00 45.78 0.64 1.35 1.35 0.00 0.59 27.73
sdb 0.00 0.00 469.33 0.00 10908.00 0.00 46.48 0.10 0.22 0.22 0.00 0.14 6.53
sdc 0.00 0.00 192.33 0.00 4696.00 0.00 48.83 0.05 0.25 0.25 0.00 0.12 2.27
sdg 0.33 0.00 281.33 0.00 6978.67 0.00 49.61 0.07 0.27 0.27 0.00 0.15 4.13
sdh 0.67 0.00 449.33 0.00 10524.00 0.00 46.84 0.16 0.36 0.36 0.00 0.17 7.73
sdj 0.00 0.00 271.33 0.00 6580.00 0.00 48.50 0.04 0.13 0.13 0.00 0.09 2.53
sdi 0.00 0.00 183.67 0.00 3928.00 0.00 42.77 0.07 0.36 0.36 0.00 0.23 4.27
sde 0.00 0.00 280.00 0.00 5860.00 0.00 41.86 0.10 0.36 0.36 0.00 0.22 6.27
sdf 0.00 0.00 177.33 0.00 4662.67 0.00 52.59 0.07 0.38 0.38 0.00 0.18 3.20
sdk 0.33 0.00 464.33 0.00 10498.67 0.00 45.22 0.05 0.10 0.10 0.00 0.07 3.47
sdp 0.00 0.00 0.67 0.00 8.00 0.00 24.00 0.00 4.00 4.00 0.00 4.00 0.27
sds 1.00 0.00 489.67 0.00 12650.67 0.00 51.67 0.16 0.34 0.34 0.00 0.16 7.87
sdl 0.00 0.00 464.67 0.00 10200.00 0.00 43.90 0.05 0.11 0.11 0.00 0.08 3.73
sdd 0.00 0.00 268.00 0.00 5509.33 0.00 41.11 0.07 0.26 0.26 0.00 0.18 4.93
sda 0.00 0.00 192.00 0.00 3928.00 0.00 40.92 0.03 0.17 0.17 0.00 0.09 1.73


Any ideas or suggestions for further tests?







ubuntu storage zfs






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Jun 4 at 7:03







mcandril

















asked Jun 3 at 12:12









mcandrilmcandril

1439




1439












  • While you copy, try iostat -x 3 /dev/sd? or similar and see if the %util or await columns on any drives show high numbers. zfs 0.6.5.6 is quite dated (0.8.0 is out) and there have been many performance improvements; an upgrade may be worthwhile.

    – András Korn
    Jun 3 at 20:16











  • %util hardly ever exceed 50%. It seems to be 2 digit numbers into the 30s on 4-5 drives, around 5 show 0 and the rest in the one digit % range. await is 5-8 on the higher load disks, <1 on all the others. (Added full report in main post) Wait does not seem to be the problem, but I am a little surprised that so few disks are used.

    – mcandril
    Jun 4 at 6:59












  • sdq, sdr, and sdt all have significantly higher wait times, %util , and svctm values. sds also seems a bit of an outlier, too. Either those drives have problems, or you've managed to select data that's concentrated on that one ZFS vdev for some reason, assuming those drives all are part of the same vdev. Can you select other data to read?

    – Andrew Henle
    Jun 4 at 9:44











  • If we are assuming the disks to be the bottleneck, should we not expect %util around 100 somewhere? anyway, all of these are from raidz2-1. So one of the issues seems to be that the data is not properly striped between the vdevs, despite the fact that they all existed when the data was written. The second question is why at least the disks of that one vdev are not fully used. I did find out, though, that I can sync another folder at the same time and also get around 50-980MB/s.

    – mcandril
    Jun 4 at 11:30











  • What's in your /etc/modprobe.d/zfs.conf config file?

    – ewwhite
    Jun 4 at 13:30

















  • While you copy, try iostat -x 3 /dev/sd? or similar and see if the %util or await columns on any drives show high numbers. zfs 0.6.5.6 is quite dated (0.8.0 is out) and there have been many performance improvements; an upgrade may be worthwhile.

    – András Korn
    Jun 3 at 20:16











  • %util hardly ever exceed 50%. It seems to be 2 digit numbers into the 30s on 4-5 drives, around 5 show 0 and the rest in the one digit % range. await is 5-8 on the higher load disks, <1 on all the others. (Added full report in main post) Wait does not seem to be the problem, but I am a little surprised that so few disks are used.

    – mcandril
    Jun 4 at 6:59












  • sdq, sdr, and sdt all have significantly higher wait times, %util , and svctm values. sds also seems a bit of an outlier, too. Either those drives have problems, or you've managed to select data that's concentrated on that one ZFS vdev for some reason, assuming those drives all are part of the same vdev. Can you select other data to read?

    – Andrew Henle
    Jun 4 at 9:44











  • If we are assuming the disks to be the bottleneck, should we not expect %util around 100 somewhere? anyway, all of these are from raidz2-1. So one of the issues seems to be that the data is not properly striped between the vdevs, despite the fact that they all existed when the data was written. The second question is why at least the disks of that one vdev are not fully used. I did find out, though, that I can sync another folder at the same time and also get around 50-980MB/s.

    – mcandril
    Jun 4 at 11:30











  • What's in your /etc/modprobe.d/zfs.conf config file?

    – ewwhite
    Jun 4 at 13:30
















While you copy, try iostat -x 3 /dev/sd? or similar and see if the %util or await columns on any drives show high numbers. zfs 0.6.5.6 is quite dated (0.8.0 is out) and there have been many performance improvements; an upgrade may be worthwhile.

– András Korn
Jun 3 at 20:16





While you copy, try iostat -x 3 /dev/sd? or similar and see if the %util or await columns on any drives show high numbers. zfs 0.6.5.6 is quite dated (0.8.0 is out) and there have been many performance improvements; an upgrade may be worthwhile.

– András Korn
Jun 3 at 20:16













%util hardly ever exceed 50%. It seems to be 2 digit numbers into the 30s on 4-5 drives, around 5 show 0 and the rest in the one digit % range. await is 5-8 on the higher load disks, <1 on all the others. (Added full report in main post) Wait does not seem to be the problem, but I am a little surprised that so few disks are used.

– mcandril
Jun 4 at 6:59






%util hardly ever exceed 50%. It seems to be 2 digit numbers into the 30s on 4-5 drives, around 5 show 0 and the rest in the one digit % range. await is 5-8 on the higher load disks, <1 on all the others. (Added full report in main post) Wait does not seem to be the problem, but I am a little surprised that so few disks are used.

– mcandril
Jun 4 at 6:59














sdq, sdr, and sdt all have significantly higher wait times, %util , and svctm values. sds also seems a bit of an outlier, too. Either those drives have problems, or you've managed to select data that's concentrated on that one ZFS vdev for some reason, assuming those drives all are part of the same vdev. Can you select other data to read?

– Andrew Henle
Jun 4 at 9:44





sdq, sdr, and sdt all have significantly higher wait times, %util , and svctm values. sds also seems a bit of an outlier, too. Either those drives have problems, or you've managed to select data that's concentrated on that one ZFS vdev for some reason, assuming those drives all are part of the same vdev. Can you select other data to read?

– Andrew Henle
Jun 4 at 9:44













If we are assuming the disks to be the bottleneck, should we not expect %util around 100 somewhere? anyway, all of these are from raidz2-1. So one of the issues seems to be that the data is not properly striped between the vdevs, despite the fact that they all existed when the data was written. The second question is why at least the disks of that one vdev are not fully used. I did find out, though, that I can sync another folder at the same time and also get around 50-980MB/s.

– mcandril
Jun 4 at 11:30





If we are assuming the disks to be the bottleneck, should we not expect %util around 100 somewhere? anyway, all of these are from raidz2-1. So one of the issues seems to be that the data is not properly striped between the vdevs, despite the fact that they all existed when the data was written. The second question is why at least the disks of that one vdev are not fully used. I did find out, though, that I can sync another folder at the same time and also get around 50-980MB/s.

– mcandril
Jun 4 at 11:30













What's in your /etc/modprobe.d/zfs.conf config file?

– ewwhite
Jun 4 at 13:30





What's in your /etc/modprobe.d/zfs.conf config file?

– ewwhite
Jun 4 at 13:30










0






active

oldest

votes












Your Answer








StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "2"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fserverfault.com%2fquestions%2f969927%2fslow-sequential-reading-in-large-zfs-pool%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























0






active

oldest

votes








0






active

oldest

votes









active

oldest

votes






active

oldest

votes















draft saved

draft discarded
















































Thanks for contributing an answer to Server Fault!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fserverfault.com%2fquestions%2f969927%2fslow-sequential-reading-in-large-zfs-pool%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Club Baloncesto Breogán Índice Historia | Pavillón | Nome | O Breogán na cultura popular | Xogadores | Adestradores | Presidentes | Palmarés | Historial | Líderes | Notas | Véxase tamén | Menú de navegacióncbbreogan.galCadroGuía oficial da ACB 2009-10, páxina 201Guía oficial ACB 1992, páxina 183. Editorial DB.É de 6.500 espectadores sentados axeitándose á última normativa"Estudiantes Junior, entre as mellores canteiras"o orixinalHemeroteca El Mundo Deportivo, 16 setembro de 1970, páxina 12Historia do BreogánAlfredo Pérez, o último canoneiroHistoria C.B. BreogánHemeroteca de El Mundo DeportivoJimmy Wright, norteamericano do Breogán deixará Lugo por ameazas de morteResultados de Breogán en 1986-87Resultados de Breogán en 1990-91Ficha de Velimir Perasović en acb.comResultados de Breogán en 1994-95Breogán arrasa al Barça. "El Mundo Deportivo", 27 de setembro de 1999, páxina 58CB Breogán - FC BarcelonaA FEB invita a participar nunha nova Liga EuropeaCharlie Bell na prensa estatalMáximos anotadores 2005Tempada 2005-06 : Tódolos Xogadores da Xornada""Non quero pensar nunha man negra, mais pregúntome que está a pasar""o orixinalRaúl López, orgulloso dos xogadores, presume da boa saúde económica do BreogánJulio González confirma que cesa como presidente del BreogánHomenaxe a Lisardo GómezA tempada do rexurdimento celesteEntrevista a Lisardo GómezEl COB dinamita el Pazo para forzar el quinto (69-73)Cafés Candelas, patrocinador del CB Breogán"Suso Lázare, novo presidente do Breogán"o orixinalCafés Candelas Breogán firma el mayor triunfo de la historiaEl Breogán realizará 17 homenajes por su cincuenta aniversario"O Breogán honra ao seu fundador e primeiro presidente"o orixinalMiguel Giao recibiu a homenaxe do PazoHomenaxe aos primeiros gladiadores celestesO home que nos amosa como ver o Breo co corazónTita Franco será homenaxeada polos #50anosdeBreoJulio Vila recibirá unha homenaxe in memoriam polos #50anosdeBreo"O Breogán homenaxeará aos seus aboados máis veteráns"Pechada ovación a «Capi» Sanmartín e Ricardo «Corazón de González»Homenaxe por décadas de informaciónPaco García volve ao Pazo con motivo do 50 aniversario"Resultados y clasificaciones""O Cafés Candelas Breogán, campión da Copa Princesa""O Cafés Candelas Breogán, equipo ACB"C.B. Breogán"Proxecto social"o orixinal"Centros asociados"o orixinalFicha en imdb.comMario Camus trata la recuperación del amor en 'La vieja música', su última película"Páxina web oficial""Club Baloncesto Breogán""C. B. Breogán S.A.D."eehttp://www.fegaba.com

Vilaño, A Laracha Índice Patrimonio | Lugares e parroquias | Véxase tamén | Menú de navegación43°14′52″N 8°36′03″O / 43.24775, -8.60070

Cegueira Índice Epidemioloxía | Deficiencia visual | Tipos de cegueira | Principais causas de cegueira | Tratamento | Técnicas de adaptación e axudas | Vida dos cegos | Primeiros auxilios | Crenzas respecto das persoas cegas | Crenzas das persoas cegas | O neno deficiente visual | Aspectos psicolóxicos da cegueira | Notas | Véxase tamén | Menú de navegación54.054.154.436928256blindnessDicionario da Real Academia GalegaPortal das Palabras"International Standards: Visual Standards — Aspects and Ranges of Vision Loss with Emphasis on Population Surveys.""Visual impairment and blindness""Presentan un plan para previr a cegueira"o orixinalACCDV Associació Catalana de Cecs i Disminuïts Visuals - PMFTrachoma"Effect of gene therapy on visual function in Leber's congenital amaurosis"1844137110.1056/NEJMoa0802268Cans guía - os mellores amigos dos cegosArquivadoEscola de cans guía para cegos en Mortágua, PortugalArquivado"Tecnología para ciegos y deficientes visuales. Recopilación de recursos gratuitos en la Red""Colorino""‘COL.diesis’, escuchar los sonidos del color""COL.diesis: Transforming Colour into Melody and Implementing the Result in a Colour Sensor Device"o orixinal"Sistema de desarrollo de sinestesia color-sonido para invidentes utilizando un protocolo de audio""Enseñanza táctil - geometría y color. Juegos didácticos para niños ciegos y videntes""Sistema Constanz"L'ocupació laboral dels cecs a l'Estat espanyol està pràcticament equiparada a la de les persones amb visió, entrevista amb Pedro ZuritaONCE (Organización Nacional de Cegos de España)Prevención da cegueiraDescrición de deficiencias visuais (Disc@pnet)Braillín, un boneco atractivo para calquera neno, con ou sen discapacidade, que permite familiarizarse co sistema de escritura e lectura brailleAxudas Técnicas36838ID00897494007150-90057129528256DOID:1432HP:0000618D001766C10.597.751.941.162C97109C0155020