我正在为我们的MogileFS集群升级存储,并使用重新平衡和设备排空function将数据从一组设备迁移到另一组设备。 我们在一套设备上有大约55TB的存储容量,我想迁移到88TB的新设备上。
我有以下策略设置:
[ashinn@mogile2 ~]$ sudo mogadm rebalance settings rebal_policy = from_devices=2001,2002,2003,2004,2005,2006,2007,2008,2009,2010,2011,2012,2013,2014,2015,2016,2017,2018,2019,2020,2021,2022,2023,2024,2025,2026,2027,2028 to_hosts=5,6,7 leave_in_drain_mode=1
但是它似乎只是一次排空/重新平衡一个设备:
[ashinn@mogile2 ~]$ sudo mogadm rebalance status Rebalance is running Rebalance status: bytes_queued = 250755303323 completed_devs = fids_queued = 7785000 limit = 0 sdev_current = 2005 sdev_lastfid = 1444986524 sdev_limit = none source_devs = 2016,2028,2007,2013,2012,2022,2008,2001,2024,2017,2023,2025,2009,2015,2006,2026,2021,2020,2019,2010,2027,2004,2018,2014,2002,2011,2003 time_finished = 0 time_started = 1340960590 time_stopped = 0
以这样的速度,将需要4个月的时间来排除所有的旧设备,并重新平衡到新的! 这里是我试图消耗的设备和添加的新设备的列表。 dev2001到dev2028被设置为排空并重新平衡到所有3个主机(包括主机ID为6的新设备dev2029到dev2036):
[ashinn@mogile2 ~]$ sudo mogadm device list | grep dev20 dev2001: drain 2018.942 731.216 2750.158 dev2002: drain 2022.452 727.706 2750.158 dev2003: drain 2022.311 727.848 2750.158 dev2004: drain 2022.211 727.947 2750.158 dev2005: drain 1472.550 1277.608 2750.158 dev2006: drain 2022.135 728.023 2750.158 dev2007: drain 2022.139 728.020 2750.158 dev2008: drain 2022.246 727.912 2750.158 dev2009: drain 2022.369 727.789 2750.158 dev2010: drain 2022.191 727.967 2750.158 dev2011: drain 2022.694 727.464 2750.158 dev2012: drain 2022.256 727.902 2750.158 dev2013: drain 2022.117 728.041 2750.158 dev2014: drain 2022.271 727.887 2750.158 dev2015: drain 2021.590 728.568 2750.158 dev2016: drain 2021.499 728.659 2750.158 dev2017: drain 2021.712 728.446 2750.158 dev2018: drain 2021.191 728.967 2750.158 dev2019: drain 2020.846 729.312 2750.158 dev2020: drain 2021.758 728.400 2750.158 dev2021: drain 2021.490 728.668 2750.158 dev2022: drain 2021.217 728.941 2750.158 dev2023: drain 2020.922 729.236 2750.158 dev2024: drain 2019.909 730.249 2750.158 dev2025: drain 2020.503 729.655 2750.158 dev2026: drain 2020.807 729.352 2750.158 dev2027: drain 2021.056 729.103 2750.158 dev2028: drain 2020.487 729.671 2750.158 dev2029: alive 182.120 10818.996 11001.116 dev2030: alive 184.549 10816.567 11001.116 dev2031: alive 185.268 10815.849 11001.116 dev2032: alive 182.004 10819.112 11001.116 dev2033: alive 189.295 10811.821 11001.116 dev2034: alive 183.199 10817.917 11001.116 dev2035: alive 178.625 10822.491 11001.116 dev2036: alive 180.549 10820.567 11001.116
我们已经尝试调整queue_rate_for_rebal , queue_size_for_rebal和复制worker。
我们曾经使用过两个数据中心的区域,复制速度更快。 我们希望重新平衡就像复制一样。 但以这样的速度,似乎将旧设备标记为死亡复制fids会更快。
有没有其他的方法来加快重新平衡(比如多个设备一次),而不必将设备标记为死亡?