我正在尝试安装和部署一个ceph集群。 由于我没有足够的物理服务器,我在OpenStack上使用官方的Ubuntu 14.04镜像创build了4个虚拟机。 我想部署一个集群,1个监控节点和3个OSD节点, 0.80.7-0ubuntu0.14.04.1版本0.80.7-0ubuntu0.14.04.1 。 我遵循手动部署文档中的步骤,并成功安装了监控节点。 但是,OSD节点安装完毕后,似乎OSD守护进程正在运行,但没有正确地向监视节点报告。 当我请求命令ceph --cluster cephcluster1 osd tree时,osd树总是显示down 。
以下是可能与我的问题有关的命令和相应的结果。
root@monitor:/home/ubuntu# ceph --cluster cephcluster1 osd tree # id weight type name up/down reweight -1 3 root default -2 1 host osd1 0 1 osd.0 down 1 -3 1 host osd2 1 1 osd.1 down 1 -4 1 host osd3 2 1 osd.2 down 1 root@monitor:/home/ubuntu# ceph --cluster cephcluster1 -s cluster fd78cbf8-8c64-4b12-9cfa-0e75bc6c8d98 health HEALTH_WARN 192 pgs stuck inactive; 192 pgs stuck unclean; 3/3 in osds are down monmap e1: 1 mons at {monitor=172.26.111.4:6789/0}, election epoch 1, quorum 0 monitor osdmap e21: 3 osds: 0 up, 3 in pgmap v22: 192 pgs, 3 pools, 0 bytes data, 0 objects 0 kB used, 0 kB / 0 kB avail 192 creating
所有节点上的configuration文件/etc/ceph/cephcluster1.conf :
[global] fsid = fd78cbf8-8c64-4b12-9cfa-0e75bc6c8d98 mon initial members = monitor mon host = 172.26.111.4 public network = 10.5.0.0/16 cluster network = 172.26.111.0/24 auth cluster required = cephx auth service required = cephx auth client required = cephx osd journal size = 1024 filestore xattr use omap = true osd pool default size = 2 osd pool default min size = 1 osd pool default pg num = 333 osd pool default pgp num = 333 osd crush chooseleaf type = 1 [osd] osd journal size = 1024 [osd.0] osd host = osd1 [osd.1] osd host = osd2 [osd.2] osd host = osd3
当我通过start ceph-osd cluster=cephcluster1 id=x启动一个osd守护进程时logging日志start ceph-osd cluster=cephcluster1 id=x其中x是OSD ID:
OSD节点#1上的/var/log/ceph/cephcluster1-osd.0.log:
2015-02-11 09:59:56.626899 7f5409d74800 0 ceph version 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3), process ceph-osd, pid 11230 2015-02-11 09:59:56.646218 7f5409d74800 0 genericfilestorebackend(/var/lib/ceph/osd/cephcluster1-0) detect_features: FIEMAP ioctl is supported and appears to work 2015-02-11 09:59:56.646372 7f5409d74800 0 genericfilestorebackend(/var/lib/ceph/osd/cephcluster1-0) detect_features: FIEMAP ioctl is disabled via 'filestore fiemap' config option 2015-02-11 09:59:56.658227 7f5409d74800 0 genericfilestorebackend(/var/lib/ceph/osd/cephcluster1-0) detect_features: syncfs(2) syscall fully supported (by glibc and kernel) 2015-02-11 09:59:56.679515 7f5409d74800 0 filestore(/var/lib/ceph/osd/cephcluster1-0) limited size xattrs 2015-02-11 09:59:56.699721 7f5409d74800 0 filestore(/var/lib/ceph/osd/cephcluster1-0) mount: enabling WRITEAHEAD journal mode: checkpoint is not enabled 2015-02-11 09:59:56.700107 7f5409d74800 -1 journal FileJournal::_open: disabling aio for non-block journal. Use journal_force_aio to force use of aio anyway 2015-02-11 09:59:56.700454 7f5409d74800 1 journal _open /var/lib/ceph/osd/cephcluster1-0/journal fd 20: 1073741824 bytes, block size 4096 bytes, directio = 1, aio = 0 2015-02-11 09:59:56.704025 7f5409d74800 1 journal _open /var/lib/ceph/osd/cephcluster1-0/journal fd 20: 1073741824 bytes, block size 4096 bytes, directio = 1, aio = 0 2015-02-11 09:59:56.704884 7f5409d74800 1 journal close /var/lib/ceph/osd/cephcluster1-0/journal 2015-02-11 09:59:56.725281 7f5409d74800 0 genericfilestorebackend(/var/lib/ceph/osd/cephcluster1-0) detect_features: FIEMAP ioctl is supported and appears to work 2015-02-11 09:59:56.725397 7f5409d74800 0 genericfilestorebackend(/var/lib/ceph/osd/cephcluster1-0) detect_features: FIEMAP ioctl is disabled via 'filestore fiemap' config option 2015-02-11 09:59:56.736445 7f5409d74800 0 genericfilestorebackend(/var/lib/ceph/osd/cephcluster1-0) detect_features: syncfs(2) syscall fully supported (by glibc and kernel) 2015-02-11 09:59:56.756912 7f5409d74800 0 filestore(/var/lib/ceph/osd/cephcluster1-0) limited size xattrs 2015-02-11 09:59:56.776471 7f5409d74800 0 filestore(/var/lib/ceph/osd/cephcluster1-0) mount: WRITEAHEAD journal mode explicitly enabled in conf 2015-02-11 09:59:56.776748 7f5409d74800 -1 journal FileJournal::_open: disabling aio for non-block journal. Use journal_force_aio to force use of aio anyway 2015-02-11 09:59:56.776848 7f5409d74800 1 journal _open /var/lib/ceph/osd/cephcluster1-0/journal fd 21: 1073741824 bytes, block size 4096 bytes, directio = 1, aio = 0 2015-02-11 09:59:56.777069 7f5409d74800 1 journal _open /var/lib/ceph/osd/cephcluster1-0/journal fd 21: 1073741824 bytes, block size 4096 bytes, directio = 1, aio = 0 2015-02-11 09:59:56.783019 7f5409d74800 0 <cls> cls/hello/cls_hello.cc:271: loading cls_hello 2015-02-11 09:59:56.783584 7f5409d74800 0 osd.0 11 crush map has features 1107558400, adjusting msgr requires for clients 2015-02-11 09:59:56.783645 7f5409d74800 0 osd.0 11 crush map has features 1107558400 was 8705, adjusting msgr requires for mons 2015-02-11 09:59:56.783687 7f5409d74800 0 osd.0 11 crush map has features 1107558400, adjusting msgr requires for osds 2015-02-11 09:59:56.783750 7f5409d74800 0 osd.0 11 load_pgs 2015-02-11 09:59:56.783831 7f5409d74800 0 osd.0 11 load_pgs opened 0 pgs 2015-02-11 09:59:56.792167 7f53f9b57700 0 osd.0 11 ignoring osdmap until we have initialized 2015-02-11 09:59:56.792334 7f53f9b57700 0 osd.0 11 ignoring osdmap until we have initialized 2015-02-11 09:59:56.792838 7f5409d74800 0 osd.0 11 done with init, starting boot process
监视节点上的/var/log/ceph/ceph-mon.monitor.log:
2015-02-11 09:59:56.593494 7f24cc41d700 0 mon.monitor@0(leader) e1 handle_command mon_command({"prefix": "osd crush create-or-move", "args": ["host=osd1", "root=default"], "id": 0, "weight": 0.05} v 0) v1 2015-02-11 09:59:56.593955 7f24cc41d700 0 mon.monitor@0(leader).osd e21 create-or-move crush item name 'osd.0' initial_weight 0.05 at location {host=osd1,root=default}
虽然在使用相同的安装步骤将环境从Ubuntu 14.04更改为CentOS 6.6时,ceph OSD正常显示,但我仍然希望解决这个问题,因为我比Ubuntu更熟悉Ubuntu。
任何build议表示感谢。 非常感谢!
我在相同的环境中遇到同样的问题。 我终于把这个问题跟踪到了一个糟糕的OSD UUID。 是什么给了它在MON日志(不是OSD日志!)下面的行:
... mon.minion-001@0(leader).osd e75 preprocess_boot from osd.0 10.208.66.2:6800/3427 clashes with existing osd: different fsid (ours: 71b33e7f-b464-4ba9-96b3-8c814921fea2 ; theirs: 5401be6f-b4ff-42ef-8531-78ee73772d5b)
我解决了这个问题,首先手动删除OSD,销毁其文件系统,并从头开始手动重新创build它。 问题是如何产生的,我随后必须追查。
鉴于我使用puppet来设置OSD,其原因可能与我的环境有关,这意味着您所遇到的问题可能是一个不同的问题,但也许你可以检查你的MON日志。 你将不得不在MON上启用debugging,不过,在ceph.conf中指出类似这样的内容:
[mon] debug mon = 9
有问题的信息logging在7级,所以这给你一些更详细的信息,没有使一切非常健谈。
@LoicDachary:logging这个错误/警告消息在0级没有意义吗? 如果之前login过,我肯定会发现这个问题。