ceph shows wrong USED space in a single replicated poolCeph Pool Snapshot for CephFS Backup and/or Data RecoveryDeleting files in Ceph does not free up spaceUsing available space in a Ceph poolCan VM hard disk size be bigger than single SSD in Ceph Storage?Ceph PG's stuck creating / Pool creation slowCeph replicated file permissions
Using credit/debit card details vs swiping a card in a payment (credit card) terminal
Why aren't space telescopes put in GEO?
Why do Windows registry hives appear empty?
Compactness of finite sets
Should one buy new hardware after a system compromise?
Count Even Digits In Number
How to respond to an upset student?
Line of lights moving in a straight line , with a few following
Does the unit of measure matter when you are solving for the diameter of a circumference?
Why do airplanes use an axial flow jet engine instead of a more compact centrifugal jet engine?
Why were helmets and other body armour not commonplace in the 1800s?
Why does a perfectly-identical repetition of a drawing command given within an earlier loop 𝘯𝘰𝘵 produce exactly the same line?
Why is this Simple Puzzle impossible to solve?
What to do when you've set the wrong ISO for your film?
Image processing: Removal of two spots in fundus images
How to know if a folder is a symbolic link?
In general, would I need to season a meat when making a sauce?
Is the Starlink array really visible from Earth?
Why do Ryanair allow me to book connecting itineraries through a third party, but not through their own website?
How to execute this code on startup?
Is the field of q-series 'dead'?
At what point in European history could a government build a printing press given a basic description?
I think I may have violated academic integrity last year - what should I do?
Did people go back to where they were?
ceph shows wrong USED space in a single replicated pool
Ceph Pool Snapshot for CephFS Backup and/or Data RecoveryDeleting files in Ceph does not free up spaceUsing available space in a Ceph poolCan VM hard disk size be bigger than single SSD in Ceph Storage?Ceph PG's stuck creating / Pool creation slowCeph replicated file permissions
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;
We are using ceph version 14.2.0
. We have 4 hosts with 24 BlueStore OSDs, each is 1.8TB (2TB spinning disk).
We have only a single pool with size 2
and I am absolutely sure that we are using more space than what ceph df
shows:
[root@blackmirror ~]# ceph osd dump | grep 'replicated size'
pool 2 'one' replicated size 2 min_size 1 crush_rule 0 object_hash rjenkins pg_num 900 pgp_num 900 autoscale_mode warn last_change 37311 flags hashpspool,selfmanaged_snaps stripe_width 0 application rbd
[root@blackmirror ~]# ceph df
RAW STORAGE:
CLASS SIZE AVAIL USED RAW USED %RAW USED
hdd 44 TiB 21 TiB 22 TiB 23 TiB 51.61
TOTAL 44 TiB 21 TiB 22 TiB 23 TiB 51.61
POOLS:
POOL ID STORED OBJECTS USED %USED MAX AVAIL
one 2 2.7 TiB 2.94M 5.5 TiB 28.81 6.7 TiB
Not sure about MAX AVAIL
, but I think it's wrong too.
Here's the output of ceph osd df
:
ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS
0 hdd 1.81310 1.00000 1.8 TiB 1.2 TiB 1.2 TiB 152 KiB 2.8 GiB 669 GiB 64.10 1.24 94 up
1 hdd 1.81310 1.00000 1.8 TiB 937 GiB 935 GiB 80 KiB 2.2 GiB 926 GiB 50.31 0.97 72 up
2 hdd 1.81310 1.00000 1.8 TiB 788 GiB 786 GiB 36 KiB 1.9 GiB 1.0 TiB 42.33 0.82 65 up
3 hdd 1.81310 1.00000 1.8 TiB 868 GiB 866 GiB 128 KiB 2.1 GiB 995 GiB 46.59 0.90 69 up
4 hdd 1.81310 1.00000 1.8 TiB 958 GiB 956 GiB 84 KiB 2.3 GiB 904 GiB 51.45 1.00 72 up
5 hdd 1.81879 1.00000 1.8 TiB 1015 GiB 1013 GiB 64 KiB 2.4 GiB 847 GiB 54.50 1.06 77 up
6 hdd 1.81310 1.00000 1.8 TiB 1015 GiB 1012 GiB 32 KiB 2.6 GiB 848 GiB 54.48 1.06 81 up
7 hdd 1.81310 1.00000 1.8 TiB 935 GiB 932 GiB 40 KiB 2.3 GiB 928 GiB 50.18 0.97 70 up
8 hdd 1.81310 1.00000 1.8 TiB 1.0 TiB 1.0 TiB 48 KiB 2.5 GiB 800 GiB 57.05 1.11 83 up
9 hdd 1.81310 1.00000 1.8 TiB 1002 GiB 1000 GiB 96 KiB 2.3 GiB 861 GiB 53.79 1.04 77 up
10 hdd 1.81310 1.00000 1.8 TiB 779 GiB 777 GiB 168 KiB 1.9 GiB 1.1 TiB 41.80 0.81 63 up
11 hdd 1.81310 1.00000 1.8 TiB 1.1 TiB 1.1 TiB 128 KiB 2.6 GiB 768 GiB 58.77 1.14 83 up
12 hdd 1.81310 1.00000 1.8 TiB 798 GiB 796 GiB 120 KiB 1.9 GiB 1.0 TiB 42.85 0.83 67 up
13 hdd 1.81310 1.00000 1.8 TiB 1.1 TiB 1.1 TiB 64 KiB 2.6 GiB 761 GiB 59.12 1.15 89 up
14 hdd 1.81310 1.00000 1.8 TiB 1.2 TiB 1.2 TiB 128 KiB 2.7 GiB 680 GiB 63.51 1.23 88 up
15 hdd 1.81310 1.00000 1.8 TiB 766 GiB 764 GiB 64 KiB 1.9 GiB 1.1 TiB 41.15 0.80 58 up
16 hdd 1.81310 1.00000 1.8 TiB 990 GiB 988 GiB 80 KiB 2.4 GiB 873 GiB 53.15 1.03 81 up
17 hdd 1.81310 1.00000 1.8 TiB 980 GiB 977 GiB 80 KiB 2.3 GiB 883 GiB 52.61 1.02 77 up
18 hdd 1.81310 1.00000 1.8 TiB 891 GiB 890 GiB 68 KiB 1.7 GiB 971 GiB 47.87 0.93 73 up
19 hdd 1.81310 1.00000 1.8 TiB 1.1 TiB 1.1 TiB 60 KiB 2.0 GiB 784 GiB 57.87 1.12 87 up
20 hdd 1.81310 1.00000 1.8 TiB 956 GiB 955 GiB 48 KiB 1.8 GiB 906 GiB 51.37 1.00 73 up
21 hdd 1.81310 1.00000 1.8 TiB 762 GiB 760 GiB 32 KiB 1.6 GiB 1.1 TiB 40.91 0.79 58 up
22 hdd 1.81310 1.00000 1.8 TiB 979 GiB 977 GiB 80 KiB 1.9 GiB 883 GiB 52.60 1.02 72 up
23 hdd 1.81310 1.00000 1.8 TiB 935 GiB 934 GiB 164 KiB 1.8 GiB 927 GiB 50.24 0.97 71 up
TOTAL 44 TiB 23 TiB 22 TiB 2.0 MiB 53 GiB 21 TiB 51.61
MIN/MAX VAR: 0.79/1.24 STDDEV: 6.54
And here is the output of rados df
[root@blackmirror ~]# rados df
POOL_NAME USED OBJECTS CLONES COPIES MISSING_ON_PRIMARY UNFOUND DEGRADED RD_OPS RD WR_OPS WR USED COMPR UNDER COMPR
one 5.5 TiB 2943372 0 5886744 0 0 0 11291297816 114 TiB 24110141554 778 TiB 0 B 0 B
total_objects 2943372
total_used 23 TiB
total_avail 21 TiB
total_space 44 TiB
In reality we are storing around 11TB of data, so total_used
above looks right because our replication size is 2.
This started happening after we changed OSDs 18-23. They were initially 1TB disks, but we upgraded them to 2TB to balance the cluster. After we changed the first disk, USED
and MAX AVAIL
from ceph df
dropped to around 1TB. I thought this is just a matter of time, but even after all recovery operations has finished, we are left with the picture above. I have tried to force a deep scrub on all disks, which nearly killed all applications in the cluster for 12 hours, but it did nothing at the end. I am clueless as to what to do now. Please help.
ceph
add a comment |
We are using ceph version 14.2.0
. We have 4 hosts with 24 BlueStore OSDs, each is 1.8TB (2TB spinning disk).
We have only a single pool with size 2
and I am absolutely sure that we are using more space than what ceph df
shows:
[root@blackmirror ~]# ceph osd dump | grep 'replicated size'
pool 2 'one' replicated size 2 min_size 1 crush_rule 0 object_hash rjenkins pg_num 900 pgp_num 900 autoscale_mode warn last_change 37311 flags hashpspool,selfmanaged_snaps stripe_width 0 application rbd
[root@blackmirror ~]# ceph df
RAW STORAGE:
CLASS SIZE AVAIL USED RAW USED %RAW USED
hdd 44 TiB 21 TiB 22 TiB 23 TiB 51.61
TOTAL 44 TiB 21 TiB 22 TiB 23 TiB 51.61
POOLS:
POOL ID STORED OBJECTS USED %USED MAX AVAIL
one 2 2.7 TiB 2.94M 5.5 TiB 28.81 6.7 TiB
Not sure about MAX AVAIL
, but I think it's wrong too.
Here's the output of ceph osd df
:
ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS
0 hdd 1.81310 1.00000 1.8 TiB 1.2 TiB 1.2 TiB 152 KiB 2.8 GiB 669 GiB 64.10 1.24 94 up
1 hdd 1.81310 1.00000 1.8 TiB 937 GiB 935 GiB 80 KiB 2.2 GiB 926 GiB 50.31 0.97 72 up
2 hdd 1.81310 1.00000 1.8 TiB 788 GiB 786 GiB 36 KiB 1.9 GiB 1.0 TiB 42.33 0.82 65 up
3 hdd 1.81310 1.00000 1.8 TiB 868 GiB 866 GiB 128 KiB 2.1 GiB 995 GiB 46.59 0.90 69 up
4 hdd 1.81310 1.00000 1.8 TiB 958 GiB 956 GiB 84 KiB 2.3 GiB 904 GiB 51.45 1.00 72 up
5 hdd 1.81879 1.00000 1.8 TiB 1015 GiB 1013 GiB 64 KiB 2.4 GiB 847 GiB 54.50 1.06 77 up
6 hdd 1.81310 1.00000 1.8 TiB 1015 GiB 1012 GiB 32 KiB 2.6 GiB 848 GiB 54.48 1.06 81 up
7 hdd 1.81310 1.00000 1.8 TiB 935 GiB 932 GiB 40 KiB 2.3 GiB 928 GiB 50.18 0.97 70 up
8 hdd 1.81310 1.00000 1.8 TiB 1.0 TiB 1.0 TiB 48 KiB 2.5 GiB 800 GiB 57.05 1.11 83 up
9 hdd 1.81310 1.00000 1.8 TiB 1002 GiB 1000 GiB 96 KiB 2.3 GiB 861 GiB 53.79 1.04 77 up
10 hdd 1.81310 1.00000 1.8 TiB 779 GiB 777 GiB 168 KiB 1.9 GiB 1.1 TiB 41.80 0.81 63 up
11 hdd 1.81310 1.00000 1.8 TiB 1.1 TiB 1.1 TiB 128 KiB 2.6 GiB 768 GiB 58.77 1.14 83 up
12 hdd 1.81310 1.00000 1.8 TiB 798 GiB 796 GiB 120 KiB 1.9 GiB 1.0 TiB 42.85 0.83 67 up
13 hdd 1.81310 1.00000 1.8 TiB 1.1 TiB 1.1 TiB 64 KiB 2.6 GiB 761 GiB 59.12 1.15 89 up
14 hdd 1.81310 1.00000 1.8 TiB 1.2 TiB 1.2 TiB 128 KiB 2.7 GiB 680 GiB 63.51 1.23 88 up
15 hdd 1.81310 1.00000 1.8 TiB 766 GiB 764 GiB 64 KiB 1.9 GiB 1.1 TiB 41.15 0.80 58 up
16 hdd 1.81310 1.00000 1.8 TiB 990 GiB 988 GiB 80 KiB 2.4 GiB 873 GiB 53.15 1.03 81 up
17 hdd 1.81310 1.00000 1.8 TiB 980 GiB 977 GiB 80 KiB 2.3 GiB 883 GiB 52.61 1.02 77 up
18 hdd 1.81310 1.00000 1.8 TiB 891 GiB 890 GiB 68 KiB 1.7 GiB 971 GiB 47.87 0.93 73 up
19 hdd 1.81310 1.00000 1.8 TiB 1.1 TiB 1.1 TiB 60 KiB 2.0 GiB 784 GiB 57.87 1.12 87 up
20 hdd 1.81310 1.00000 1.8 TiB 956 GiB 955 GiB 48 KiB 1.8 GiB 906 GiB 51.37 1.00 73 up
21 hdd 1.81310 1.00000 1.8 TiB 762 GiB 760 GiB 32 KiB 1.6 GiB 1.1 TiB 40.91 0.79 58 up
22 hdd 1.81310 1.00000 1.8 TiB 979 GiB 977 GiB 80 KiB 1.9 GiB 883 GiB 52.60 1.02 72 up
23 hdd 1.81310 1.00000 1.8 TiB 935 GiB 934 GiB 164 KiB 1.8 GiB 927 GiB 50.24 0.97 71 up
TOTAL 44 TiB 23 TiB 22 TiB 2.0 MiB 53 GiB 21 TiB 51.61
MIN/MAX VAR: 0.79/1.24 STDDEV: 6.54
And here is the output of rados df
[root@blackmirror ~]# rados df
POOL_NAME USED OBJECTS CLONES COPIES MISSING_ON_PRIMARY UNFOUND DEGRADED RD_OPS RD WR_OPS WR USED COMPR UNDER COMPR
one 5.5 TiB 2943372 0 5886744 0 0 0 11291297816 114 TiB 24110141554 778 TiB 0 B 0 B
total_objects 2943372
total_used 23 TiB
total_avail 21 TiB
total_space 44 TiB
In reality we are storing around 11TB of data, so total_used
above looks right because our replication size is 2.
This started happening after we changed OSDs 18-23. They were initially 1TB disks, but we upgraded them to 2TB to balance the cluster. After we changed the first disk, USED
and MAX AVAIL
from ceph df
dropped to around 1TB. I thought this is just a matter of time, but even after all recovery operations has finished, we are left with the picture above. I have tried to force a deep scrub on all disks, which nearly killed all applications in the cluster for 12 hours, but it did nothing at the end. I am clueless as to what to do now. Please help.
ceph
add a comment |
We are using ceph version 14.2.0
. We have 4 hosts with 24 BlueStore OSDs, each is 1.8TB (2TB spinning disk).
We have only a single pool with size 2
and I am absolutely sure that we are using more space than what ceph df
shows:
[root@blackmirror ~]# ceph osd dump | grep 'replicated size'
pool 2 'one' replicated size 2 min_size 1 crush_rule 0 object_hash rjenkins pg_num 900 pgp_num 900 autoscale_mode warn last_change 37311 flags hashpspool,selfmanaged_snaps stripe_width 0 application rbd
[root@blackmirror ~]# ceph df
RAW STORAGE:
CLASS SIZE AVAIL USED RAW USED %RAW USED
hdd 44 TiB 21 TiB 22 TiB 23 TiB 51.61
TOTAL 44 TiB 21 TiB 22 TiB 23 TiB 51.61
POOLS:
POOL ID STORED OBJECTS USED %USED MAX AVAIL
one 2 2.7 TiB 2.94M 5.5 TiB 28.81 6.7 TiB
Not sure about MAX AVAIL
, but I think it's wrong too.
Here's the output of ceph osd df
:
ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS
0 hdd 1.81310 1.00000 1.8 TiB 1.2 TiB 1.2 TiB 152 KiB 2.8 GiB 669 GiB 64.10 1.24 94 up
1 hdd 1.81310 1.00000 1.8 TiB 937 GiB 935 GiB 80 KiB 2.2 GiB 926 GiB 50.31 0.97 72 up
2 hdd 1.81310 1.00000 1.8 TiB 788 GiB 786 GiB 36 KiB 1.9 GiB 1.0 TiB 42.33 0.82 65 up
3 hdd 1.81310 1.00000 1.8 TiB 868 GiB 866 GiB 128 KiB 2.1 GiB 995 GiB 46.59 0.90 69 up
4 hdd 1.81310 1.00000 1.8 TiB 958 GiB 956 GiB 84 KiB 2.3 GiB 904 GiB 51.45 1.00 72 up
5 hdd 1.81879 1.00000 1.8 TiB 1015 GiB 1013 GiB 64 KiB 2.4 GiB 847 GiB 54.50 1.06 77 up
6 hdd 1.81310 1.00000 1.8 TiB 1015 GiB 1012 GiB 32 KiB 2.6 GiB 848 GiB 54.48 1.06 81 up
7 hdd 1.81310 1.00000 1.8 TiB 935 GiB 932 GiB 40 KiB 2.3 GiB 928 GiB 50.18 0.97 70 up
8 hdd 1.81310 1.00000 1.8 TiB 1.0 TiB 1.0 TiB 48 KiB 2.5 GiB 800 GiB 57.05 1.11 83 up
9 hdd 1.81310 1.00000 1.8 TiB 1002 GiB 1000 GiB 96 KiB 2.3 GiB 861 GiB 53.79 1.04 77 up
10 hdd 1.81310 1.00000 1.8 TiB 779 GiB 777 GiB 168 KiB 1.9 GiB 1.1 TiB 41.80 0.81 63 up
11 hdd 1.81310 1.00000 1.8 TiB 1.1 TiB 1.1 TiB 128 KiB 2.6 GiB 768 GiB 58.77 1.14 83 up
12 hdd 1.81310 1.00000 1.8 TiB 798 GiB 796 GiB 120 KiB 1.9 GiB 1.0 TiB 42.85 0.83 67 up
13 hdd 1.81310 1.00000 1.8 TiB 1.1 TiB 1.1 TiB 64 KiB 2.6 GiB 761 GiB 59.12 1.15 89 up
14 hdd 1.81310 1.00000 1.8 TiB 1.2 TiB 1.2 TiB 128 KiB 2.7 GiB 680 GiB 63.51 1.23 88 up
15 hdd 1.81310 1.00000 1.8 TiB 766 GiB 764 GiB 64 KiB 1.9 GiB 1.1 TiB 41.15 0.80 58 up
16 hdd 1.81310 1.00000 1.8 TiB 990 GiB 988 GiB 80 KiB 2.4 GiB 873 GiB 53.15 1.03 81 up
17 hdd 1.81310 1.00000 1.8 TiB 980 GiB 977 GiB 80 KiB 2.3 GiB 883 GiB 52.61 1.02 77 up
18 hdd 1.81310 1.00000 1.8 TiB 891 GiB 890 GiB 68 KiB 1.7 GiB 971 GiB 47.87 0.93 73 up
19 hdd 1.81310 1.00000 1.8 TiB 1.1 TiB 1.1 TiB 60 KiB 2.0 GiB 784 GiB 57.87 1.12 87 up
20 hdd 1.81310 1.00000 1.8 TiB 956 GiB 955 GiB 48 KiB 1.8 GiB 906 GiB 51.37 1.00 73 up
21 hdd 1.81310 1.00000 1.8 TiB 762 GiB 760 GiB 32 KiB 1.6 GiB 1.1 TiB 40.91 0.79 58 up
22 hdd 1.81310 1.00000 1.8 TiB 979 GiB 977 GiB 80 KiB 1.9 GiB 883 GiB 52.60 1.02 72 up
23 hdd 1.81310 1.00000 1.8 TiB 935 GiB 934 GiB 164 KiB 1.8 GiB 927 GiB 50.24 0.97 71 up
TOTAL 44 TiB 23 TiB 22 TiB 2.0 MiB 53 GiB 21 TiB 51.61
MIN/MAX VAR: 0.79/1.24 STDDEV: 6.54
And here is the output of rados df
[root@blackmirror ~]# rados df
POOL_NAME USED OBJECTS CLONES COPIES MISSING_ON_PRIMARY UNFOUND DEGRADED RD_OPS RD WR_OPS WR USED COMPR UNDER COMPR
one 5.5 TiB 2943372 0 5886744 0 0 0 11291297816 114 TiB 24110141554 778 TiB 0 B 0 B
total_objects 2943372
total_used 23 TiB
total_avail 21 TiB
total_space 44 TiB
In reality we are storing around 11TB of data, so total_used
above looks right because our replication size is 2.
This started happening after we changed OSDs 18-23. They were initially 1TB disks, but we upgraded them to 2TB to balance the cluster. After we changed the first disk, USED
and MAX AVAIL
from ceph df
dropped to around 1TB. I thought this is just a matter of time, but even after all recovery operations has finished, we are left with the picture above. I have tried to force a deep scrub on all disks, which nearly killed all applications in the cluster for 12 hours, but it did nothing at the end. I am clueless as to what to do now. Please help.
ceph
We are using ceph version 14.2.0
. We have 4 hosts with 24 BlueStore OSDs, each is 1.8TB (2TB spinning disk).
We have only a single pool with size 2
and I am absolutely sure that we are using more space than what ceph df
shows:
[root@blackmirror ~]# ceph osd dump | grep 'replicated size'
pool 2 'one' replicated size 2 min_size 1 crush_rule 0 object_hash rjenkins pg_num 900 pgp_num 900 autoscale_mode warn last_change 37311 flags hashpspool,selfmanaged_snaps stripe_width 0 application rbd
[root@blackmirror ~]# ceph df
RAW STORAGE:
CLASS SIZE AVAIL USED RAW USED %RAW USED
hdd 44 TiB 21 TiB 22 TiB 23 TiB 51.61
TOTAL 44 TiB 21 TiB 22 TiB 23 TiB 51.61
POOLS:
POOL ID STORED OBJECTS USED %USED MAX AVAIL
one 2 2.7 TiB 2.94M 5.5 TiB 28.81 6.7 TiB
Not sure about MAX AVAIL
, but I think it's wrong too.
Here's the output of ceph osd df
:
ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS
0 hdd 1.81310 1.00000 1.8 TiB 1.2 TiB 1.2 TiB 152 KiB 2.8 GiB 669 GiB 64.10 1.24 94 up
1 hdd 1.81310 1.00000 1.8 TiB 937 GiB 935 GiB 80 KiB 2.2 GiB 926 GiB 50.31 0.97 72 up
2 hdd 1.81310 1.00000 1.8 TiB 788 GiB 786 GiB 36 KiB 1.9 GiB 1.0 TiB 42.33 0.82 65 up
3 hdd 1.81310 1.00000 1.8 TiB 868 GiB 866 GiB 128 KiB 2.1 GiB 995 GiB 46.59 0.90 69 up
4 hdd 1.81310 1.00000 1.8 TiB 958 GiB 956 GiB 84 KiB 2.3 GiB 904 GiB 51.45 1.00 72 up
5 hdd 1.81879 1.00000 1.8 TiB 1015 GiB 1013 GiB 64 KiB 2.4 GiB 847 GiB 54.50 1.06 77 up
6 hdd 1.81310 1.00000 1.8 TiB 1015 GiB 1012 GiB 32 KiB 2.6 GiB 848 GiB 54.48 1.06 81 up
7 hdd 1.81310 1.00000 1.8 TiB 935 GiB 932 GiB 40 KiB 2.3 GiB 928 GiB 50.18 0.97 70 up
8 hdd 1.81310 1.00000 1.8 TiB 1.0 TiB 1.0 TiB 48 KiB 2.5 GiB 800 GiB 57.05 1.11 83 up
9 hdd 1.81310 1.00000 1.8 TiB 1002 GiB 1000 GiB 96 KiB 2.3 GiB 861 GiB 53.79 1.04 77 up
10 hdd 1.81310 1.00000 1.8 TiB 779 GiB 777 GiB 168 KiB 1.9 GiB 1.1 TiB 41.80 0.81 63 up
11 hdd 1.81310 1.00000 1.8 TiB 1.1 TiB 1.1 TiB 128 KiB 2.6 GiB 768 GiB 58.77 1.14 83 up
12 hdd 1.81310 1.00000 1.8 TiB 798 GiB 796 GiB 120 KiB 1.9 GiB 1.0 TiB 42.85 0.83 67 up
13 hdd 1.81310 1.00000 1.8 TiB 1.1 TiB 1.1 TiB 64 KiB 2.6 GiB 761 GiB 59.12 1.15 89 up
14 hdd 1.81310 1.00000 1.8 TiB 1.2 TiB 1.2 TiB 128 KiB 2.7 GiB 680 GiB 63.51 1.23 88 up
15 hdd 1.81310 1.00000 1.8 TiB 766 GiB 764 GiB 64 KiB 1.9 GiB 1.1 TiB 41.15 0.80 58 up
16 hdd 1.81310 1.00000 1.8 TiB 990 GiB 988 GiB 80 KiB 2.4 GiB 873 GiB 53.15 1.03 81 up
17 hdd 1.81310 1.00000 1.8 TiB 980 GiB 977 GiB 80 KiB 2.3 GiB 883 GiB 52.61 1.02 77 up
18 hdd 1.81310 1.00000 1.8 TiB 891 GiB 890 GiB 68 KiB 1.7 GiB 971 GiB 47.87 0.93 73 up
19 hdd 1.81310 1.00000 1.8 TiB 1.1 TiB 1.1 TiB 60 KiB 2.0 GiB 784 GiB 57.87 1.12 87 up
20 hdd 1.81310 1.00000 1.8 TiB 956 GiB 955 GiB 48 KiB 1.8 GiB 906 GiB 51.37 1.00 73 up
21 hdd 1.81310 1.00000 1.8 TiB 762 GiB 760 GiB 32 KiB 1.6 GiB 1.1 TiB 40.91 0.79 58 up
22 hdd 1.81310 1.00000 1.8 TiB 979 GiB 977 GiB 80 KiB 1.9 GiB 883 GiB 52.60 1.02 72 up
23 hdd 1.81310 1.00000 1.8 TiB 935 GiB 934 GiB 164 KiB 1.8 GiB 927 GiB 50.24 0.97 71 up
TOTAL 44 TiB 23 TiB 22 TiB 2.0 MiB 53 GiB 21 TiB 51.61
MIN/MAX VAR: 0.79/1.24 STDDEV: 6.54
And here is the output of rados df
[root@blackmirror ~]# rados df
POOL_NAME USED OBJECTS CLONES COPIES MISSING_ON_PRIMARY UNFOUND DEGRADED RD_OPS RD WR_OPS WR USED COMPR UNDER COMPR
one 5.5 TiB 2943372 0 5886744 0 0 0 11291297816 114 TiB 24110141554 778 TiB 0 B 0 B
total_objects 2943372
total_used 23 TiB
total_avail 21 TiB
total_space 44 TiB
In reality we are storing around 11TB of data, so total_used
above looks right because our replication size is 2.
This started happening after we changed OSDs 18-23. They were initially 1TB disks, but we upgraded them to 2TB to balance the cluster. After we changed the first disk, USED
and MAX AVAIL
from ceph df
dropped to around 1TB. I thought this is just a matter of time, but even after all recovery operations has finished, we are left with the picture above. I have tried to force a deep scrub on all disks, which nearly killed all applications in the cluster for 12 hours, but it did nothing at the end. I am clueless as to what to do now. Please help.
ceph
ceph
edited May 14 at 7:01
Jacket
asked May 13 at 19:54
JacketJacket
1015
1015
add a comment |
add a comment |
0
active
oldest
votes
Your Answer
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "2"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fserverfault.com%2fquestions%2f967096%2fceph-shows-wrong-used-space-in-a-single-replicated-pool%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
0
active
oldest
votes
0
active
oldest
votes
active
oldest
votes
active
oldest
votes
Thanks for contributing an answer to Server Fault!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fserverfault.com%2fquestions%2f967096%2fceph-shows-wrong-used-space-in-a-single-replicated-pool%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown