Slow sequential reading in large zfs poolZFS on FreeBSD: recovery from data corruptionHosting a ZFS server as a virtual guestZFS Configuration adviceZFS configuration opinionsDoes ZFS really stripe across every vdev, even in very large zpools?Advice for building production 140 disk (420 TB) ZFS zpoolZFS pool slow sequential readZFS: good read but poor write speedsZFS vdevs accumulate checksum errors, but individual disks do notZFS SLOG IOPS closer to random IOPS than sequential

How did Gollum enter Moria?

Is declining an undergraduate award which causes me discomfort appropriate?

Should the party get XP for a monster they never attacked?

Why isn't it a compile-time error to return a nullptr as a std::string?

Mathematically modelling RC circuit with a linear input

What does this Swiss black on yellow rectangular traffic sign with a symbol looking like a dart mean?

Drawing a second weapon as part of an attack?

How do I remove this inheritance-related code smell?

Syntax and semantics of XDV commands (XeTeX)

What is the highest voltage from the power supply a Raspberry Pi 3 B can handle without getting damaged?

Why is it easier to balance a non-moving bike standing up than sitting down?

What is the oldest commercial MS-DOS program that can run on modern versions of Windows without third-party software?

How do I professionally let my manager know I'll quit over an issue?

How long did the SR-71 take to get to cruising altitude?

How could empty set be unique if it could be vacuously false

Greeting with "Ho"

Counterfeit checks were created for my account. How does this type of fraud work?

Extending prime numbers digit by digit while retaining primality

Can I enter the UK for 24 hours from a Schengen area, holding an Indian passport?

Is there any proof that high saturation and contrast makes a picture more appealing in social media?

Why isn't my calculation that we should be able to see the sun well beyond the observable universe valid?

How can I prevent a user from copying files on another hard drive?

Do I have to explain the mechanical superiority of the player-character within the fiction of the game?

Print one file per line using echo



Slow sequential reading in large zfs pool


ZFS on FreeBSD: recovery from data corruptionHosting a ZFS server as a virtual guestZFS Configuration adviceZFS configuration opinionsDoes ZFS really stripe across every vdev, even in very large zpools?Advice for building production 140 disk (420 TB) ZFS zpoolZFS pool slow sequential readZFS: good read but poor write speedsZFS vdevs accumulate checksum errors, but individual disks do notZFS SLOG IOPS closer to random IOPS than sequential






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;








1















We are running a zfs pool as temporary storage for scientific data with 24 10TB disks in 4 vdev, each consisting of 6 disks in raidz2 config (recordsize 128K).



~ # zpool status
pool: tank
state: ONLINE
scan: scrub canceled on Mon Jun 3 11:14:39 2019
config:

NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
raidz2-0 ONLINE 0 0 0
wwn-0x5000cca25160d3c8 ONLINE 0 0 0
wwn-0x5000cca25165cf30 ONLINE 0 0 0
wwn-0x5000cca2516711a4 ONLINE 0 0 0
wwn-0x5000cca251673b88 ONLINE 0 0 0
wwn-0x5000cca251673b94 ONLINE 0 0 0
wwn-0x5000cca251674214 ONLINE 0 0 0
raidz2-1 ONLINE 0 0 0
wwn-0x5000cca251683628 ONLINE 0 0 0
wwn-0x5000cca25168771c ONLINE 0 0 0
wwn-0x5000cca25168f234 ONLINE 0 0 0
wwn-0x5000cca251692890 ONLINE 0 0 0
wwn-0x5000cca251695484 ONLINE 0 0 0
wwn-0x5000cca2516969b0 ONLINE 0 0 0
raidz2-2 ONLINE 0 0 0
wwn-0x5000c500a774ba03 ONLINE 0 0 0
wwn-0x5000c500a7800c3b ONLINE 0 0 0
wwn-0x5000c500a7800feb ONLINE 0 0 0
wwn-0x5000c500a7802abf ONLINE 0 0 0
wwn-0x5000c500a78033cb ONLINE 0 0 0
wwn-0x5000c500a78039c7 ONLINE 0 0 0
raidz2-3 ONLINE 0 0 0
wwn-0x5000c500a780416b ONLINE 0 0 0
wwn-0x5000c500a7804733 ONLINE 0 0 0
wwn-0x5000c500a7804797 ONLINE 0 0 0
wwn-0x5000c500a7805df3 ONLINE 0 0 0
wwn-0x5000c500a7806a0b ONLINE 0 0 0
wwn-0x5000c500a7807ccf ONLINE 0 0 0

errors: No known data errors


When we set this up a few month ago, performance looked ok with rates between 500MB and 1GB/s.
In the meantime, we noticed a few performance issues, but assumed it had to do with other possible bottlenecks. Now we want to move our data to the final storage und find that we can only get around 60MB/s sequential (file size > 100GB) from the pool.



  • Data was written in this config and should be automatically distributed to the vdevs

  • ashift is set to 12

  • We have ruled out that the target is slow. We can write random data much faster, and we get the same slow rate when writing to a local SSD tmp

  • We have checked the speed of the individual drives, and they look fine (see below).

  • Scrub has been run biweekly (running longer than 1 week itself) and was interrupted for these tests.

  • Pool is filled to about 80% (USED 113T, AVAIL 23.7T)

  • top does not show anything suspicious, none of the 24 cores is maxed out, system mostly idle


  • zfs iostat 10 is in line with the rates measured with rsync --progress and time cp

  • RAM is not used up (see below)

  • Data is backed up on tape, but we would like to avoid downtime due to "experiments"

Test for disk performance



~ # echo 1 > /proc/sys/vm/drop_caches
for i in $(zpool status | grep wwn- | awk 'print $1'); do
echo $i;dd if=/dev/disk/by-id/$i of=/dev/null status=progress bs=1G count=1 seek=1G; echo; echo; sleep 1
done
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 4.81339 s, 223 MB/s
... (similar rates for all 24 disks)


RAM:



~ # free -h
total used free shared buff/cache available
Mem: 62G 2.1G 35G 18M 25G 59G


All properties of the filesystem we copy from (another larger one exists)



zfs get all tank/storage/bulk
NAME PROPERTY VALUE SOURCE
tank/storage/bulk type filesystem -
tank/storage/bulk creation Fr Mär 1 9:48 2019 -
tank/storage/bulk used 3.96T -
tank/storage/bulk available 23.7T -
tank/storage/bulk referenced 3.96T -
tank/storage/bulk compressratio 1.17x -
tank/storage/bulk mounted yes -
tank/storage/bulk quota none default
tank/storage/bulk reservation none default
tank/storage/bulk recordsize 128K default
tank/storage/bulk mountpoint /storage/bulk inherited from tank/storage
tank/storage/bulk sharenfs rw=@10.40.20.201,rw=@10.40.20.202 local
tank/storage/bulk checksum on default
tank/storage/bulk compression on inherited from tank
tank/storage/bulk atime off inherited from tank
tank/storage/bulk devices on default
tank/storage/bulk exec off inherited from tank
tank/storage/bulk setuid off inherited from tank
tank/storage/bulk readonly off default
tank/storage/bulk zoned off default
tank/storage/bulk snapdir hidden default
tank/storage/bulk aclinherit restricted default
tank/storage/bulk canmount on default
tank/storage/bulk xattr sa inherited from tank
tank/storage/bulk copies 1 default
tank/storage/bulk version 5 -
tank/storage/bulk utf8only off -
tank/storage/bulk normalization none -
tank/storage/bulk casesensitivity sensitive -
tank/storage/bulk vscan off default
tank/storage/bulk nbmand off default
tank/storage/bulk sharesmb off inherited from tank
tank/storage/bulk refquota none default
tank/storage/bulk refreservation none default
tank/storage/bulk primarycache all default
tank/storage/bulk secondarycache all default
tank/storage/bulk usedbysnapshots 2.40M -
tank/storage/bulk usedbydataset 3.96T -
tank/storage/bulk usedbychildren 0 -
tank/storage/bulk usedbyrefreservation 0 -
tank/storage/bulk logbias latency default
tank/storage/bulk dedup off default
tank/storage/bulk mlslabel none default
tank/storage/bulk sync standard default
tank/storage/bulk refcompressratio 1.17x -
tank/storage/bulk written 0 -
tank/storage/bulk logicalused 4.55T -
tank/storage/bulk logicalreferenced 4.55T -
tank/storage/bulk filesystem_limit none default
tank/storage/bulk snapshot_limit none default
tank/storage/bulk filesystem_count none default
tank/storage/bulk snapshot_count none default
tank/storage/bulk snapdev hidden default
tank/storage/bulk acltype posixacl inherited from tank
tank/storage/bulk context none default
tank/storage/bulk fscontext none default
tank/storage/bulk defcontext none default
tank/storage/bulk rootcontext none default
tank/storage/bulk relatime on inherited from tank
tank/storage/bulk redundant_metadata all default
tank/storage/bulk overlay off default
tank/storage/bulk com.sun:auto-snapshot true local


Versions



srv01 ~ # apt policy zfsutils-linux 
zfsutils-linux:
Installed: 0.6.5.6-0ubuntu27
Candidate: 0.6.5.6-0ubuntu27
Version table:
*** 0.6.5.6-0ubuntu27 500
500 http://de.archive.ubuntu.com/ubuntu xenial-updates/main amd64 Packages
100 /var/lib/dpkg/status
0.6.5.6-0ubuntu8 500
500 http://de.archive.ubuntu.com/ubuntu xenial/universe amd64 Packages
srv01 ~ # uname -a
Linux srv01 4.15.0-50-generic #54~16.04.1-Ubuntu SMP Wed May 8 15:55:19 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux


Disk usage



~ # iostat -x 3 /dev/disk/by-id/wwn-0x????????????????
Linux 4.15.0-50-generic (bbo3102) 04.06.2019 _x86_64_ (48 CPU)

[...]

avg-cpu: %user %nice %system %iowait %steal %idle
3.12 0.00 2.09 0.29 0.00 94.50

Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sdn 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdr 0.00 0.00 487.00 0.00 10744.00 0.00 44.12 0.56 1.16 1.16 0.00 0.51 24.93
sdt 1.67 0.00 484.33 0.00 12640.00 0.00 52.20 0.52 1.09 1.09 0.00 0.44 21.47
sdu 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdv 0.00 0.00 0.67 0.00 8.00 0.00 24.00 0.00 6.00 6.00 0.00 6.00 0.40
sdw 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdx 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdy 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdo 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdq 0.00 0.00 472.33 0.00 10812.00 0.00 45.78 0.64 1.35 1.35 0.00 0.59 27.73
sdb 0.00 0.00 469.33 0.00 10908.00 0.00 46.48 0.10 0.22 0.22 0.00 0.14 6.53
sdc 0.00 0.00 192.33 0.00 4696.00 0.00 48.83 0.05 0.25 0.25 0.00 0.12 2.27
sdg 0.33 0.00 281.33 0.00 6978.67 0.00 49.61 0.07 0.27 0.27 0.00 0.15 4.13
sdh 0.67 0.00 449.33 0.00 10524.00 0.00 46.84 0.16 0.36 0.36 0.00 0.17 7.73
sdj 0.00 0.00 271.33 0.00 6580.00 0.00 48.50 0.04 0.13 0.13 0.00 0.09 2.53
sdi 0.00 0.00 183.67 0.00 3928.00 0.00 42.77 0.07 0.36 0.36 0.00 0.23 4.27
sde 0.00 0.00 280.00 0.00 5860.00 0.00 41.86 0.10 0.36 0.36 0.00 0.22 6.27
sdf 0.00 0.00 177.33 0.00 4662.67 0.00 52.59 0.07 0.38 0.38 0.00 0.18 3.20
sdk 0.33 0.00 464.33 0.00 10498.67 0.00 45.22 0.05 0.10 0.10 0.00 0.07 3.47
sdp 0.00 0.00 0.67 0.00 8.00 0.00 24.00 0.00 4.00 4.00 0.00 4.00 0.27
sds 1.00 0.00 489.67 0.00 12650.67 0.00 51.67 0.16 0.34 0.34 0.00 0.16 7.87
sdl 0.00 0.00 464.67 0.00 10200.00 0.00 43.90 0.05 0.11 0.11 0.00 0.08 3.73
sdd 0.00 0.00 268.00 0.00 5509.33 0.00 41.11 0.07 0.26 0.26 0.00 0.18 4.93
sda 0.00 0.00 192.00 0.00 3928.00 0.00 40.92 0.03 0.17 0.17 0.00 0.09 1.73


Any ideas or suggestions for further tests?










share|improve this question
























  • While you copy, try iostat -x 3 /dev/sd? or similar and see if the %util or await columns on any drives show high numbers. zfs 0.6.5.6 is quite dated (0.8.0 is out) and there have been many performance improvements; an upgrade may be worthwhile.

    – András Korn
    Jun 3 at 20:16











  • %util hardly ever exceed 50%. It seems to be 2 digit numbers into the 30s on 4-5 drives, around 5 show 0 and the rest in the one digit % range. await is 5-8 on the higher load disks, <1 on all the others. (Added full report in main post) Wait does not seem to be the problem, but I am a little surprised that so few disks are used.

    – mcandril
    Jun 4 at 6:59












  • sdq, sdr, and sdt all have significantly higher wait times, %util , and svctm values. sds also seems a bit of an outlier, too. Either those drives have problems, or you've managed to select data that's concentrated on that one ZFS vdev for some reason, assuming those drives all are part of the same vdev. Can you select other data to read?

    – Andrew Henle
    Jun 4 at 9:44











  • If we are assuming the disks to be the bottleneck, should we not expect %util around 100 somewhere? anyway, all of these are from raidz2-1. So one of the issues seems to be that the data is not properly striped between the vdevs, despite the fact that they all existed when the data was written. The second question is why at least the disks of that one vdev are not fully used. I did find out, though, that I can sync another folder at the same time and also get around 50-980MB/s.

    – mcandril
    Jun 4 at 11:30











  • What's in your /etc/modprobe.d/zfs.conf config file?

    – ewwhite
    Jun 4 at 13:30

















1















We are running a zfs pool as temporary storage for scientific data with 24 10TB disks in 4 vdev, each consisting of 6 disks in raidz2 config (recordsize 128K).



~ # zpool status
pool: tank
state: ONLINE
scan: scrub canceled on Mon Jun 3 11:14:39 2019
config:

NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
raidz2-0 ONLINE 0 0 0
wwn-0x5000cca25160d3c8 ONLINE 0 0 0
wwn-0x5000cca25165cf30 ONLINE 0 0 0
wwn-0x5000cca2516711a4 ONLINE 0 0 0
wwn-0x5000cca251673b88 ONLINE 0 0 0
wwn-0x5000cca251673b94 ONLINE 0 0 0
wwn-0x5000cca251674214 ONLINE 0 0 0
raidz2-1 ONLINE 0 0 0
wwn-0x5000cca251683628 ONLINE 0 0 0
wwn-0x5000cca25168771c ONLINE 0 0 0
wwn-0x5000cca25168f234 ONLINE 0 0 0
wwn-0x5000cca251692890 ONLINE 0 0 0
wwn-0x5000cca251695484 ONLINE 0 0 0
wwn-0x5000cca2516969b0 ONLINE 0 0 0
raidz2-2 ONLINE 0 0 0
wwn-0x5000c500a774ba03 ONLINE 0 0 0
wwn-0x5000c500a7800c3b ONLINE 0 0 0
wwn-0x5000c500a7800feb ONLINE 0 0 0
wwn-0x5000c500a7802abf ONLINE 0 0 0
wwn-0x5000c500a78033cb ONLINE 0 0 0
wwn-0x5000c500a78039c7 ONLINE 0 0 0
raidz2-3 ONLINE 0 0 0
wwn-0x5000c500a780416b ONLINE 0 0 0
wwn-0x5000c500a7804733 ONLINE 0 0 0
wwn-0x5000c500a7804797 ONLINE 0 0 0
wwn-0x5000c500a7805df3 ONLINE 0 0 0
wwn-0x5000c500a7806a0b ONLINE 0 0 0
wwn-0x5000c500a7807ccf ONLINE 0 0 0

errors: No known data errors


When we set this up a few month ago, performance looked ok with rates between 500MB and 1GB/s.
In the meantime, we noticed a few performance issues, but assumed it had to do with other possible bottlenecks. Now we want to move our data to the final storage und find that we can only get around 60MB/s sequential (file size > 100GB) from the pool.



  • Data was written in this config and should be automatically distributed to the vdevs

  • ashift is set to 12

  • We have ruled out that the target is slow. We can write random data much faster, and we get the same slow rate when writing to a local SSD tmp

  • We have checked the speed of the individual drives, and they look fine (see below).

  • Scrub has been run biweekly (running longer than 1 week itself) and was interrupted for these tests.

  • Pool is filled to about 80% (USED 113T, AVAIL 23.7T)

  • top does not show anything suspicious, none of the 24 cores is maxed out, system mostly idle


  • zfs iostat 10 is in line with the rates measured with rsync --progress and time cp

  • RAM is not used up (see below)

  • Data is backed up on tape, but we would like to avoid downtime due to "experiments"

Test for disk performance



~ # echo 1 > /proc/sys/vm/drop_caches
for i in $(zpool status | grep wwn- | awk 'print $1'); do
echo $i;dd if=/dev/disk/by-id/$i of=/dev/null status=progress bs=1G count=1 seek=1G; echo; echo; sleep 1
done
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 4.81339 s, 223 MB/s
... (similar rates for all 24 disks)


RAM:



~ # free -h
total used free shared buff/cache available
Mem: 62G 2.1G 35G 18M 25G 59G


All properties of the filesystem we copy from (another larger one exists)



zfs get all tank/storage/bulk
NAME PROPERTY VALUE SOURCE
tank/storage/bulk type filesystem -
tank/storage/bulk creation Fr Mär 1 9:48 2019 -
tank/storage/bulk used 3.96T -
tank/storage/bulk available 23.7T -
tank/storage/bulk referenced 3.96T -
tank/storage/bulk compressratio 1.17x -
tank/storage/bulk mounted yes -
tank/storage/bulk quota none default
tank/storage/bulk reservation none default
tank/storage/bulk recordsize 128K default
tank/storage/bulk mountpoint /storage/bulk inherited from tank/storage
tank/storage/bulk sharenfs rw=@10.40.20.201,rw=@10.40.20.202 local
tank/storage/bulk checksum on default
tank/storage/bulk compression on inherited from tank
tank/storage/bulk atime off inherited from tank
tank/storage/bulk devices on default
tank/storage/bulk exec off inherited from tank
tank/storage/bulk setuid off inherited from tank
tank/storage/bulk readonly off default
tank/storage/bulk zoned off default
tank/storage/bulk snapdir hidden default
tank/storage/bulk aclinherit restricted default
tank/storage/bulk canmount on default
tank/storage/bulk xattr sa inherited from tank
tank/storage/bulk copies 1 default
tank/storage/bulk version 5 -
tank/storage/bulk utf8only off -
tank/storage/bulk normalization none -
tank/storage/bulk casesensitivity sensitive -
tank/storage/bulk vscan off default
tank/storage/bulk nbmand off default
tank/storage/bulk sharesmb off inherited from tank
tank/storage/bulk refquota none default
tank/storage/bulk refreservation none default
tank/storage/bulk primarycache all default
tank/storage/bulk secondarycache all default
tank/storage/bulk usedbysnapshots 2.40M -
tank/storage/bulk usedbydataset 3.96T -
tank/storage/bulk usedbychildren 0 -
tank/storage/bulk usedbyrefreservation 0 -
tank/storage/bulk logbias latency default
tank/storage/bulk dedup off default
tank/storage/bulk mlslabel none default
tank/storage/bulk sync standard default
tank/storage/bulk refcompressratio 1.17x -
tank/storage/bulk written 0 -
tank/storage/bulk logicalused 4.55T -
tank/storage/bulk logicalreferenced 4.55T -
tank/storage/bulk filesystem_limit none default
tank/storage/bulk snapshot_limit none default
tank/storage/bulk filesystem_count none default
tank/storage/bulk snapshot_count none default
tank/storage/bulk snapdev hidden default
tank/storage/bulk acltype posixacl inherited from tank
tank/storage/bulk context none default
tank/storage/bulk fscontext none default
tank/storage/bulk defcontext none default
tank/storage/bulk rootcontext none default
tank/storage/bulk relatime on inherited from tank
tank/storage/bulk redundant_metadata all default
tank/storage/bulk overlay off default
tank/storage/bulk com.sun:auto-snapshot true local


Versions



srv01 ~ # apt policy zfsutils-linux 
zfsutils-linux:
Installed: 0.6.5.6-0ubuntu27
Candidate: 0.6.5.6-0ubuntu27
Version table:
*** 0.6.5.6-0ubuntu27 500
500 http://de.archive.ubuntu.com/ubuntu xenial-updates/main amd64 Packages
100 /var/lib/dpkg/status
0.6.5.6-0ubuntu8 500
500 http://de.archive.ubuntu.com/ubuntu xenial/universe amd64 Packages
srv01 ~ # uname -a
Linux srv01 4.15.0-50-generic #54~16.04.1-Ubuntu SMP Wed May 8 15:55:19 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux


Disk usage



~ # iostat -x 3 /dev/disk/by-id/wwn-0x????????????????
Linux 4.15.0-50-generic (bbo3102) 04.06.2019 _x86_64_ (48 CPU)

[...]

avg-cpu: %user %nice %system %iowait %steal %idle
3.12 0.00 2.09 0.29 0.00 94.50

Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sdn 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdr 0.00 0.00 487.00 0.00 10744.00 0.00 44.12 0.56 1.16 1.16 0.00 0.51 24.93
sdt 1.67 0.00 484.33 0.00 12640.00 0.00 52.20 0.52 1.09 1.09 0.00 0.44 21.47
sdu 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdv 0.00 0.00 0.67 0.00 8.00 0.00 24.00 0.00 6.00 6.00 0.00 6.00 0.40
sdw 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdx 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdy 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdo 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdq 0.00 0.00 472.33 0.00 10812.00 0.00 45.78 0.64 1.35 1.35 0.00 0.59 27.73
sdb 0.00 0.00 469.33 0.00 10908.00 0.00 46.48 0.10 0.22 0.22 0.00 0.14 6.53
sdc 0.00 0.00 192.33 0.00 4696.00 0.00 48.83 0.05 0.25 0.25 0.00 0.12 2.27
sdg 0.33 0.00 281.33 0.00 6978.67 0.00 49.61 0.07 0.27 0.27 0.00 0.15 4.13
sdh 0.67 0.00 449.33 0.00 10524.00 0.00 46.84 0.16 0.36 0.36 0.00 0.17 7.73
sdj 0.00 0.00 271.33 0.00 6580.00 0.00 48.50 0.04 0.13 0.13 0.00 0.09 2.53
sdi 0.00 0.00 183.67 0.00 3928.00 0.00 42.77 0.07 0.36 0.36 0.00 0.23 4.27
sde 0.00 0.00 280.00 0.00 5860.00 0.00 41.86 0.10 0.36 0.36 0.00 0.22 6.27
sdf 0.00 0.00 177.33 0.00 4662.67 0.00 52.59 0.07 0.38 0.38 0.00 0.18 3.20
sdk 0.33 0.00 464.33 0.00 10498.67 0.00 45.22 0.05 0.10 0.10 0.00 0.07 3.47
sdp 0.00 0.00 0.67 0.00 8.00 0.00 24.00 0.00 4.00 4.00 0.00 4.00 0.27
sds 1.00 0.00 489.67 0.00 12650.67 0.00 51.67 0.16 0.34 0.34 0.00 0.16 7.87
sdl 0.00 0.00 464.67 0.00 10200.00 0.00 43.90 0.05 0.11 0.11 0.00 0.08 3.73
sdd 0.00 0.00 268.00 0.00 5509.33 0.00 41.11 0.07 0.26 0.26 0.00 0.18 4.93
sda 0.00 0.00 192.00 0.00 3928.00 0.00 40.92 0.03 0.17 0.17 0.00 0.09 1.73


Any ideas or suggestions for further tests?










share|improve this question
























  • While you copy, try iostat -x 3 /dev/sd? or similar and see if the %util or await columns on any drives show high numbers. zfs 0.6.5.6 is quite dated (0.8.0 is out) and there have been many performance improvements; an upgrade may be worthwhile.

    – András Korn
    Jun 3 at 20:16











  • %util hardly ever exceed 50%. It seems to be 2 digit numbers into the 30s on 4-5 drives, around 5 show 0 and the rest in the one digit % range. await is 5-8 on the higher load disks, <1 on all the others. (Added full report in main post) Wait does not seem to be the problem, but I am a little surprised that so few disks are used.

    – mcandril
    Jun 4 at 6:59












  • sdq, sdr, and sdt all have significantly higher wait times, %util , and svctm values. sds also seems a bit of an outlier, too. Either those drives have problems, or you've managed to select data that's concentrated on that one ZFS vdev for some reason, assuming those drives all are part of the same vdev. Can you select other data to read?

    – Andrew Henle
    Jun 4 at 9:44











  • If we are assuming the disks to be the bottleneck, should we not expect %util around 100 somewhere? anyway, all of these are from raidz2-1. So one of the issues seems to be that the data is not properly striped between the vdevs, despite the fact that they all existed when the data was written. The second question is why at least the disks of that one vdev are not fully used. I did find out, though, that I can sync another folder at the same time and also get around 50-980MB/s.

    – mcandril
    Jun 4 at 11:30











  • What's in your /etc/modprobe.d/zfs.conf config file?

    – ewwhite
    Jun 4 at 13:30













1












1








1








We are running a zfs pool as temporary storage for scientific data with 24 10TB disks in 4 vdev, each consisting of 6 disks in raidz2 config (recordsize 128K).



~ # zpool status
pool: tank
state: ONLINE
scan: scrub canceled on Mon Jun 3 11:14:39 2019
config:

NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
raidz2-0 ONLINE 0 0 0
wwn-0x5000cca25160d3c8 ONLINE 0 0 0
wwn-0x5000cca25165cf30 ONLINE 0 0 0
wwn-0x5000cca2516711a4 ONLINE 0 0 0
wwn-0x5000cca251673b88 ONLINE 0 0 0
wwn-0x5000cca251673b94 ONLINE 0 0 0
wwn-0x5000cca251674214 ONLINE 0 0 0
raidz2-1 ONLINE 0 0 0
wwn-0x5000cca251683628 ONLINE 0 0 0
wwn-0x5000cca25168771c ONLINE 0 0 0
wwn-0x5000cca25168f234 ONLINE 0 0 0
wwn-0x5000cca251692890 ONLINE 0 0 0
wwn-0x5000cca251695484 ONLINE 0 0 0
wwn-0x5000cca2516969b0 ONLINE 0 0 0
raidz2-2 ONLINE 0 0 0
wwn-0x5000c500a774ba03 ONLINE 0 0 0
wwn-0x5000c500a7800c3b ONLINE 0 0 0
wwn-0x5000c500a7800feb ONLINE 0 0 0
wwn-0x5000c500a7802abf ONLINE 0 0 0
wwn-0x5000c500a78033cb ONLINE 0 0 0
wwn-0x5000c500a78039c7 ONLINE 0 0 0
raidz2-3 ONLINE 0 0 0
wwn-0x5000c500a780416b ONLINE 0 0 0
wwn-0x5000c500a7804733 ONLINE 0 0 0
wwn-0x5000c500a7804797 ONLINE 0 0 0
wwn-0x5000c500a7805df3 ONLINE 0 0 0
wwn-0x5000c500a7806a0b ONLINE 0 0 0
wwn-0x5000c500a7807ccf ONLINE 0 0 0

errors: No known data errors


When we set this up a few month ago, performance looked ok with rates between 500MB and 1GB/s.
In the meantime, we noticed a few performance issues, but assumed it had to do with other possible bottlenecks. Now we want to move our data to the final storage und find that we can only get around 60MB/s sequential (file size > 100GB) from the pool.



  • Data was written in this config and should be automatically distributed to the vdevs

  • ashift is set to 12

  • We have ruled out that the target is slow. We can write random data much faster, and we get the same slow rate when writing to a local SSD tmp

  • We have checked the speed of the individual drives, and they look fine (see below).

  • Scrub has been run biweekly (running longer than 1 week itself) and was interrupted for these tests.

  • Pool is filled to about 80% (USED 113T, AVAIL 23.7T)

  • top does not show anything suspicious, none of the 24 cores is maxed out, system mostly idle


  • zfs iostat 10 is in line with the rates measured with rsync --progress and time cp

  • RAM is not used up (see below)

  • Data is backed up on tape, but we would like to avoid downtime due to "experiments"

Test for disk performance



~ # echo 1 > /proc/sys/vm/drop_caches
for i in $(zpool status | grep wwn- | awk 'print $1'); do
echo $i;dd if=/dev/disk/by-id/$i of=/dev/null status=progress bs=1G count=1 seek=1G; echo; echo; sleep 1
done
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 4.81339 s, 223 MB/s
... (similar rates for all 24 disks)


RAM:



~ # free -h
total used free shared buff/cache available
Mem: 62G 2.1G 35G 18M 25G 59G


All properties of the filesystem we copy from (another larger one exists)



zfs get all tank/storage/bulk
NAME PROPERTY VALUE SOURCE
tank/storage/bulk type filesystem -
tank/storage/bulk creation Fr Mär 1 9:48 2019 -
tank/storage/bulk used 3.96T -
tank/storage/bulk available 23.7T -
tank/storage/bulk referenced 3.96T -
tank/storage/bulk compressratio 1.17x -
tank/storage/bulk mounted yes -
tank/storage/bulk quota none default
tank/storage/bulk reservation none default
tank/storage/bulk recordsize 128K default
tank/storage/bulk mountpoint /storage/bulk inherited from tank/storage
tank/storage/bulk sharenfs rw=@10.40.20.201,rw=@10.40.20.202 local
tank/storage/bulk checksum on default
tank/storage/bulk compression on inherited from tank
tank/storage/bulk atime off inherited from tank
tank/storage/bulk devices on default
tank/storage/bulk exec off inherited from tank
tank/storage/bulk setuid off inherited from tank
tank/storage/bulk readonly off default
tank/storage/bulk zoned off default
tank/storage/bulk snapdir hidden default
tank/storage/bulk aclinherit restricted default
tank/storage/bulk canmount on default
tank/storage/bulk xattr sa inherited from tank
tank/storage/bulk copies 1 default
tank/storage/bulk version 5 -
tank/storage/bulk utf8only off -
tank/storage/bulk normalization none -
tank/storage/bulk casesensitivity sensitive -
tank/storage/bulk vscan off default
tank/storage/bulk nbmand off default
tank/storage/bulk sharesmb off inherited from tank
tank/storage/bulk refquota none default
tank/storage/bulk refreservation none default
tank/storage/bulk primarycache all default
tank/storage/bulk secondarycache all default
tank/storage/bulk usedbysnapshots 2.40M -
tank/storage/bulk usedbydataset 3.96T -
tank/storage/bulk usedbychildren 0 -
tank/storage/bulk usedbyrefreservation 0 -
tank/storage/bulk logbias latency default
tank/storage/bulk dedup off default
tank/storage/bulk mlslabel none default
tank/storage/bulk sync standard default
tank/storage/bulk refcompressratio 1.17x -
tank/storage/bulk written 0 -
tank/storage/bulk logicalused 4.55T -
tank/storage/bulk logicalreferenced 4.55T -
tank/storage/bulk filesystem_limit none default
tank/storage/bulk snapshot_limit none default
tank/storage/bulk filesystem_count none default
tank/storage/bulk snapshot_count none default
tank/storage/bulk snapdev hidden default
tank/storage/bulk acltype posixacl inherited from tank
tank/storage/bulk context none default
tank/storage/bulk fscontext none default
tank/storage/bulk defcontext none default
tank/storage/bulk rootcontext none default
tank/storage/bulk relatime on inherited from tank
tank/storage/bulk redundant_metadata all default
tank/storage/bulk overlay off default
tank/storage/bulk com.sun:auto-snapshot true local


Versions



srv01 ~ # apt policy zfsutils-linux 
zfsutils-linux:
Installed: 0.6.5.6-0ubuntu27
Candidate: 0.6.5.6-0ubuntu27
Version table:
*** 0.6.5.6-0ubuntu27 500
500 http://de.archive.ubuntu.com/ubuntu xenial-updates/main amd64 Packages
100 /var/lib/dpkg/status
0.6.5.6-0ubuntu8 500
500 http://de.archive.ubuntu.com/ubuntu xenial/universe amd64 Packages
srv01 ~ # uname -a
Linux srv01 4.15.0-50-generic #54~16.04.1-Ubuntu SMP Wed May 8 15:55:19 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux


Disk usage



~ # iostat -x 3 /dev/disk/by-id/wwn-0x????????????????
Linux 4.15.0-50-generic (bbo3102) 04.06.2019 _x86_64_ (48 CPU)

[...]

avg-cpu: %user %nice %system %iowait %steal %idle
3.12 0.00 2.09 0.29 0.00 94.50

Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sdn 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdr 0.00 0.00 487.00 0.00 10744.00 0.00 44.12 0.56 1.16 1.16 0.00 0.51 24.93
sdt 1.67 0.00 484.33 0.00 12640.00 0.00 52.20 0.52 1.09 1.09 0.00 0.44 21.47
sdu 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdv 0.00 0.00 0.67 0.00 8.00 0.00 24.00 0.00 6.00 6.00 0.00 6.00 0.40
sdw 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdx 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdy 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdo 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdq 0.00 0.00 472.33 0.00 10812.00 0.00 45.78 0.64 1.35 1.35 0.00 0.59 27.73
sdb 0.00 0.00 469.33 0.00 10908.00 0.00 46.48 0.10 0.22 0.22 0.00 0.14 6.53
sdc 0.00 0.00 192.33 0.00 4696.00 0.00 48.83 0.05 0.25 0.25 0.00 0.12 2.27
sdg 0.33 0.00 281.33 0.00 6978.67 0.00 49.61 0.07 0.27 0.27 0.00 0.15 4.13
sdh 0.67 0.00 449.33 0.00 10524.00 0.00 46.84 0.16 0.36 0.36 0.00 0.17 7.73
sdj 0.00 0.00 271.33 0.00 6580.00 0.00 48.50 0.04 0.13 0.13 0.00 0.09 2.53
sdi 0.00 0.00 183.67 0.00 3928.00 0.00 42.77 0.07 0.36 0.36 0.00 0.23 4.27
sde 0.00 0.00 280.00 0.00 5860.00 0.00 41.86 0.10 0.36 0.36 0.00 0.22 6.27
sdf 0.00 0.00 177.33 0.00 4662.67 0.00 52.59 0.07 0.38 0.38 0.00 0.18 3.20
sdk 0.33 0.00 464.33 0.00 10498.67 0.00 45.22 0.05 0.10 0.10 0.00 0.07 3.47
sdp 0.00 0.00 0.67 0.00 8.00 0.00 24.00 0.00 4.00 4.00 0.00 4.00 0.27
sds 1.00 0.00 489.67 0.00 12650.67 0.00 51.67 0.16 0.34 0.34 0.00 0.16 7.87
sdl 0.00 0.00 464.67 0.00 10200.00 0.00 43.90 0.05 0.11 0.11 0.00 0.08 3.73
sdd 0.00 0.00 268.00 0.00 5509.33 0.00 41.11 0.07 0.26 0.26 0.00 0.18 4.93
sda 0.00 0.00 192.00 0.00 3928.00 0.00 40.92 0.03 0.17 0.17 0.00 0.09 1.73


Any ideas or suggestions for further tests?










share|improve this question
















We are running a zfs pool as temporary storage for scientific data with 24 10TB disks in 4 vdev, each consisting of 6 disks in raidz2 config (recordsize 128K).



~ # zpool status
pool: tank
state: ONLINE
scan: scrub canceled on Mon Jun 3 11:14:39 2019
config:

NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
raidz2-0 ONLINE 0 0 0
wwn-0x5000cca25160d3c8 ONLINE 0 0 0
wwn-0x5000cca25165cf30 ONLINE 0 0 0
wwn-0x5000cca2516711a4 ONLINE 0 0 0
wwn-0x5000cca251673b88 ONLINE 0 0 0
wwn-0x5000cca251673b94 ONLINE 0 0 0
wwn-0x5000cca251674214 ONLINE 0 0 0
raidz2-1 ONLINE 0 0 0
wwn-0x5000cca251683628 ONLINE 0 0 0
wwn-0x5000cca25168771c ONLINE 0 0 0
wwn-0x5000cca25168f234 ONLINE 0 0 0
wwn-0x5000cca251692890 ONLINE 0 0 0
wwn-0x5000cca251695484 ONLINE 0 0 0
wwn-0x5000cca2516969b0 ONLINE 0 0 0
raidz2-2 ONLINE 0 0 0
wwn-0x5000c500a774ba03 ONLINE 0 0 0
wwn-0x5000c500a7800c3b ONLINE 0 0 0
wwn-0x5000c500a7800feb ONLINE 0 0 0
wwn-0x5000c500a7802abf ONLINE 0 0 0
wwn-0x5000c500a78033cb ONLINE 0 0 0
wwn-0x5000c500a78039c7 ONLINE 0 0 0
raidz2-3 ONLINE 0 0 0
wwn-0x5000c500a780416b ONLINE 0 0 0
wwn-0x5000c500a7804733 ONLINE 0 0 0
wwn-0x5000c500a7804797 ONLINE 0 0 0
wwn-0x5000c500a7805df3 ONLINE 0 0 0
wwn-0x5000c500a7806a0b ONLINE 0 0 0
wwn-0x5000c500a7807ccf ONLINE 0 0 0

errors: No known data errors


When we set this up a few month ago, performance looked ok with rates between 500MB and 1GB/s.
In the meantime, we noticed a few performance issues, but assumed it had to do with other possible bottlenecks. Now we want to move our data to the final storage und find that we can only get around 60MB/s sequential (file size > 100GB) from the pool.



  • Data was written in this config and should be automatically distributed to the vdevs

  • ashift is set to 12

  • We have ruled out that the target is slow. We can write random data much faster, and we get the same slow rate when writing to a local SSD tmp

  • We have checked the speed of the individual drives, and they look fine (see below).

  • Scrub has been run biweekly (running longer than 1 week itself) and was interrupted for these tests.

  • Pool is filled to about 80% (USED 113T, AVAIL 23.7T)

  • top does not show anything suspicious, none of the 24 cores is maxed out, system mostly idle


  • zfs iostat 10 is in line with the rates measured with rsync --progress and time cp

  • RAM is not used up (see below)

  • Data is backed up on tape, but we would like to avoid downtime due to "experiments"

Test for disk performance



~ # echo 1 > /proc/sys/vm/drop_caches
for i in $(zpool status | grep wwn- | awk 'print $1'); do
echo $i;dd if=/dev/disk/by-id/$i of=/dev/null status=progress bs=1G count=1 seek=1G; echo; echo; sleep 1
done
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 4.81339 s, 223 MB/s
... (similar rates for all 24 disks)


RAM:



~ # free -h
total used free shared buff/cache available
Mem: 62G 2.1G 35G 18M 25G 59G


All properties of the filesystem we copy from (another larger one exists)



zfs get all tank/storage/bulk
NAME PROPERTY VALUE SOURCE
tank/storage/bulk type filesystem -
tank/storage/bulk creation Fr Mär 1 9:48 2019 -
tank/storage/bulk used 3.96T -
tank/storage/bulk available 23.7T -
tank/storage/bulk referenced 3.96T -
tank/storage/bulk compressratio 1.17x -
tank/storage/bulk mounted yes -
tank/storage/bulk quota none default
tank/storage/bulk reservation none default
tank/storage/bulk recordsize 128K default
tank/storage/bulk mountpoint /storage/bulk inherited from tank/storage
tank/storage/bulk sharenfs rw=@10.40.20.201,rw=@10.40.20.202 local
tank/storage/bulk checksum on default
tank/storage/bulk compression on inherited from tank
tank/storage/bulk atime off inherited from tank
tank/storage/bulk devices on default
tank/storage/bulk exec off inherited from tank
tank/storage/bulk setuid off inherited from tank
tank/storage/bulk readonly off default
tank/storage/bulk zoned off default
tank/storage/bulk snapdir hidden default
tank/storage/bulk aclinherit restricted default
tank/storage/bulk canmount on default
tank/storage/bulk xattr sa inherited from tank
tank/storage/bulk copies 1 default
tank/storage/bulk version 5 -
tank/storage/bulk utf8only off -
tank/storage/bulk normalization none -
tank/storage/bulk casesensitivity sensitive -
tank/storage/bulk vscan off default
tank/storage/bulk nbmand off default
tank/storage/bulk sharesmb off inherited from tank
tank/storage/bulk refquota none default
tank/storage/bulk refreservation none default
tank/storage/bulk primarycache all default
tank/storage/bulk secondarycache all default
tank/storage/bulk usedbysnapshots 2.40M -
tank/storage/bulk usedbydataset 3.96T -
tank/storage/bulk usedbychildren 0 -
tank/storage/bulk usedbyrefreservation 0 -
tank/storage/bulk logbias latency default
tank/storage/bulk dedup off default
tank/storage/bulk mlslabel none default
tank/storage/bulk sync standard default
tank/storage/bulk refcompressratio 1.17x -
tank/storage/bulk written 0 -
tank/storage/bulk logicalused 4.55T -
tank/storage/bulk logicalreferenced 4.55T -
tank/storage/bulk filesystem_limit none default
tank/storage/bulk snapshot_limit none default
tank/storage/bulk filesystem_count none default
tank/storage/bulk snapshot_count none default
tank/storage/bulk snapdev hidden default
tank/storage/bulk acltype posixacl inherited from tank
tank/storage/bulk context none default
tank/storage/bulk fscontext none default
tank/storage/bulk defcontext none default
tank/storage/bulk rootcontext none default
tank/storage/bulk relatime on inherited from tank
tank/storage/bulk redundant_metadata all default
tank/storage/bulk overlay off default
tank/storage/bulk com.sun:auto-snapshot true local


Versions



srv01 ~ # apt policy zfsutils-linux 
zfsutils-linux:
Installed: 0.6.5.6-0ubuntu27
Candidate: 0.6.5.6-0ubuntu27
Version table:
*** 0.6.5.6-0ubuntu27 500
500 http://de.archive.ubuntu.com/ubuntu xenial-updates/main amd64 Packages
100 /var/lib/dpkg/status
0.6.5.6-0ubuntu8 500
500 http://de.archive.ubuntu.com/ubuntu xenial/universe amd64 Packages
srv01 ~ # uname -a
Linux srv01 4.15.0-50-generic #54~16.04.1-Ubuntu SMP Wed May 8 15:55:19 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux


Disk usage



~ # iostat -x 3 /dev/disk/by-id/wwn-0x????????????????
Linux 4.15.0-50-generic (bbo3102) 04.06.2019 _x86_64_ (48 CPU)

[...]

avg-cpu: %user %nice %system %iowait %steal %idle
3.12 0.00 2.09 0.29 0.00 94.50

Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sdn 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdr 0.00 0.00 487.00 0.00 10744.00 0.00 44.12 0.56 1.16 1.16 0.00 0.51 24.93
sdt 1.67 0.00 484.33 0.00 12640.00 0.00 52.20 0.52 1.09 1.09 0.00 0.44 21.47
sdu 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdv 0.00 0.00 0.67 0.00 8.00 0.00 24.00 0.00 6.00 6.00 0.00 6.00 0.40
sdw 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdx 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdy 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdo 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdq 0.00 0.00 472.33 0.00 10812.00 0.00 45.78 0.64 1.35 1.35 0.00 0.59 27.73
sdb 0.00 0.00 469.33 0.00 10908.00 0.00 46.48 0.10 0.22 0.22 0.00 0.14 6.53
sdc 0.00 0.00 192.33 0.00 4696.00 0.00 48.83 0.05 0.25 0.25 0.00 0.12 2.27
sdg 0.33 0.00 281.33 0.00 6978.67 0.00 49.61 0.07 0.27 0.27 0.00 0.15 4.13
sdh 0.67 0.00 449.33 0.00 10524.00 0.00 46.84 0.16 0.36 0.36 0.00 0.17 7.73
sdj 0.00 0.00 271.33 0.00 6580.00 0.00 48.50 0.04 0.13 0.13 0.00 0.09 2.53
sdi 0.00 0.00 183.67 0.00 3928.00 0.00 42.77 0.07 0.36 0.36 0.00 0.23 4.27
sde 0.00 0.00 280.00 0.00 5860.00 0.00 41.86 0.10 0.36 0.36 0.00 0.22 6.27
sdf 0.00 0.00 177.33 0.00 4662.67 0.00 52.59 0.07 0.38 0.38 0.00 0.18 3.20
sdk 0.33 0.00 464.33 0.00 10498.67 0.00 45.22 0.05 0.10 0.10 0.00 0.07 3.47
sdp 0.00 0.00 0.67 0.00 8.00 0.00 24.00 0.00 4.00 4.00 0.00 4.00 0.27
sds 1.00 0.00 489.67 0.00 12650.67 0.00 51.67 0.16 0.34 0.34 0.00 0.16 7.87
sdl 0.00 0.00 464.67 0.00 10200.00 0.00 43.90 0.05 0.11 0.11 0.00 0.08 3.73
sdd 0.00 0.00 268.00 0.00 5509.33 0.00 41.11 0.07 0.26 0.26 0.00 0.18 4.93
sda 0.00 0.00 192.00 0.00 3928.00 0.00 40.92 0.03 0.17 0.17 0.00 0.09 1.73


Any ideas or suggestions for further tests?







ubuntu storage zfs






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Jun 4 at 7:03







mcandril

















asked Jun 3 at 12:12









mcandrilmcandril

1439




1439












  • While you copy, try iostat -x 3 /dev/sd? or similar and see if the %util or await columns on any drives show high numbers. zfs 0.6.5.6 is quite dated (0.8.0 is out) and there have been many performance improvements; an upgrade may be worthwhile.

    – András Korn
    Jun 3 at 20:16











  • %util hardly ever exceed 50%. It seems to be 2 digit numbers into the 30s on 4-5 drives, around 5 show 0 and the rest in the one digit % range. await is 5-8 on the higher load disks, <1 on all the others. (Added full report in main post) Wait does not seem to be the problem, but I am a little surprised that so few disks are used.

    – mcandril
    Jun 4 at 6:59












  • sdq, sdr, and sdt all have significantly higher wait times, %util , and svctm values. sds also seems a bit of an outlier, too. Either those drives have problems, or you've managed to select data that's concentrated on that one ZFS vdev for some reason, assuming those drives all are part of the same vdev. Can you select other data to read?

    – Andrew Henle
    Jun 4 at 9:44











  • If we are assuming the disks to be the bottleneck, should we not expect %util around 100 somewhere? anyway, all of these are from raidz2-1. So one of the issues seems to be that the data is not properly striped between the vdevs, despite the fact that they all existed when the data was written. The second question is why at least the disks of that one vdev are not fully used. I did find out, though, that I can sync another folder at the same time and also get around 50-980MB/s.

    – mcandril
    Jun 4 at 11:30











  • What's in your /etc/modprobe.d/zfs.conf config file?

    – ewwhite
    Jun 4 at 13:30

















  • While you copy, try iostat -x 3 /dev/sd? or similar and see if the %util or await columns on any drives show high numbers. zfs 0.6.5.6 is quite dated (0.8.0 is out) and there have been many performance improvements; an upgrade may be worthwhile.

    – András Korn
    Jun 3 at 20:16











  • %util hardly ever exceed 50%. It seems to be 2 digit numbers into the 30s on 4-5 drives, around 5 show 0 and the rest in the one digit % range. await is 5-8 on the higher load disks, <1 on all the others. (Added full report in main post) Wait does not seem to be the problem, but I am a little surprised that so few disks are used.

    – mcandril
    Jun 4 at 6:59












  • sdq, sdr, and sdt all have significantly higher wait times, %util , and svctm values. sds also seems a bit of an outlier, too. Either those drives have problems, or you've managed to select data that's concentrated on that one ZFS vdev for some reason, assuming those drives all are part of the same vdev. Can you select other data to read?

    – Andrew Henle
    Jun 4 at 9:44











  • If we are assuming the disks to be the bottleneck, should we not expect %util around 100 somewhere? anyway, all of these are from raidz2-1. So one of the issues seems to be that the data is not properly striped between the vdevs, despite the fact that they all existed when the data was written. The second question is why at least the disks of that one vdev are not fully used. I did find out, though, that I can sync another folder at the same time and also get around 50-980MB/s.

    – mcandril
    Jun 4 at 11:30











  • What's in your /etc/modprobe.d/zfs.conf config file?

    – ewwhite
    Jun 4 at 13:30
















While you copy, try iostat -x 3 /dev/sd? or similar and see if the %util or await columns on any drives show high numbers. zfs 0.6.5.6 is quite dated (0.8.0 is out) and there have been many performance improvements; an upgrade may be worthwhile.

– András Korn
Jun 3 at 20:16





While you copy, try iostat -x 3 /dev/sd? or similar and see if the %util or await columns on any drives show high numbers. zfs 0.6.5.6 is quite dated (0.8.0 is out) and there have been many performance improvements; an upgrade may be worthwhile.

– András Korn
Jun 3 at 20:16













%util hardly ever exceed 50%. It seems to be 2 digit numbers into the 30s on 4-5 drives, around 5 show 0 and the rest in the one digit % range. await is 5-8 on the higher load disks, <1 on all the others. (Added full report in main post) Wait does not seem to be the problem, but I am a little surprised that so few disks are used.

– mcandril
Jun 4 at 6:59






%util hardly ever exceed 50%. It seems to be 2 digit numbers into the 30s on 4-5 drives, around 5 show 0 and the rest in the one digit % range. await is 5-8 on the higher load disks, <1 on all the others. (Added full report in main post) Wait does not seem to be the problem, but I am a little surprised that so few disks are used.

– mcandril
Jun 4 at 6:59














sdq, sdr, and sdt all have significantly higher wait times, %util , and svctm values. sds also seems a bit of an outlier, too. Either those drives have problems, or you've managed to select data that's concentrated on that one ZFS vdev for some reason, assuming those drives all are part of the same vdev. Can you select other data to read?

– Andrew Henle
Jun 4 at 9:44





sdq, sdr, and sdt all have significantly higher wait times, %util , and svctm values. sds also seems a bit of an outlier, too. Either those drives have problems, or you've managed to select data that's concentrated on that one ZFS vdev for some reason, assuming those drives all are part of the same vdev. Can you select other data to read?

– Andrew Henle
Jun 4 at 9:44













If we are assuming the disks to be the bottleneck, should we not expect %util around 100 somewhere? anyway, all of these are from raidz2-1. So one of the issues seems to be that the data is not properly striped between the vdevs, despite the fact that they all existed when the data was written. The second question is why at least the disks of that one vdev are not fully used. I did find out, though, that I can sync another folder at the same time and also get around 50-980MB/s.

– mcandril
Jun 4 at 11:30





If we are assuming the disks to be the bottleneck, should we not expect %util around 100 somewhere? anyway, all of these are from raidz2-1. So one of the issues seems to be that the data is not properly striped between the vdevs, despite the fact that they all existed when the data was written. The second question is why at least the disks of that one vdev are not fully used. I did find out, though, that I can sync another folder at the same time and also get around 50-980MB/s.

– mcandril
Jun 4 at 11:30













What's in your /etc/modprobe.d/zfs.conf config file?

– ewwhite
Jun 4 at 13:30





What's in your /etc/modprobe.d/zfs.conf config file?

– ewwhite
Jun 4 at 13:30










0






active

oldest

votes












Your Answer








StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "2"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fserverfault.com%2fquestions%2f969927%2fslow-sequential-reading-in-large-zfs-pool%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























0






active

oldest

votes








0






active

oldest

votes









active

oldest

votes






active

oldest

votes















draft saved

draft discarded
















































Thanks for contributing an answer to Server Fault!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fserverfault.com%2fquestions%2f969927%2fslow-sequential-reading-in-large-zfs-pool%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Wikipedia:Vital articles Мазмуну Biography - Өмүр баян Philosophy and psychology - Философия жана психология Religion - Дин Social sciences - Коомдук илимдер Language and literature - Тил жана адабият Science - Илим Technology - Технология Arts and recreation - Искусство жана эс алуу History and geography - Тарых жана география Навигация менюсу

Bruxelas-Capital Índice Historia | Composición | Situación lingüística | Clima | Cidades irmandadas | Notas | Véxase tamén | Menú de navegacióneO uso das linguas en Bruxelas e a situación do neerlandés"Rexión de Bruxelas Capital"o orixinalSitio da rexiónPáxina de Bruselas no sitio da Oficina de Promoción Turística de Valonia e BruxelasMapa Interactivo da Rexión de Bruxelas-CapitaleeWorldCat332144929079854441105155190212ID28008674080552-90000 0001 0666 3698n94104302ID540940339365017018237

What should I write in an apology letter, since I have decided not to join a company after accepting an offer letterShould I keep looking after accepting a job offer?What should I do when I've been verbally told I would get an offer letter, but still haven't gotten one after 4 weeks?Do I accept an offer from a company that I am not likely to join?New job hasn't confirmed starting date and I want to give current employer as much notice as possibleHow should I address my manager in my resignation letter?HR delayed background verification, now jobless as resignedNo email communication after accepting a formal written offer. How should I phrase the call?What should I do if after receiving a verbal offer letter I am informed that my written job offer is put on hold due to some internal issues?Should I inform the current employer that I am about to resign within 1-2 weeks since I have signed the offer letter and waiting for visa?What company will do, if I send their offer letter to another company