我目前正在构build一个提供裸机和虚拟机等的基础架构pipe理工具。我们有一个工作虚拟机,通过SSH在远程节点上运行命令(通过安全)。
其中一个步骤需要重新启动节点来应用某些configuration。 重启完成后,工作进程必须在节点上运行更多的命令(必须同步完成)。
我的问题是,如何检查重新启动是否完成?
我可以添加一个睡眠定时器(等到重启完成),但是我觉得这是一个不好的解决scheme,原因有很多。
另一个select是尝试从我的工作进程每隔5秒左右尝试SSH到远程节点,如果失败,请继续尝试,直到获得成功的连接。
有没有另外一种方法呢?
正如你所提到的,你正在通过ansible运行命令,这里是我用来在一个剧本重新启动(我pipe理Ubuntu 14 / 16.04机器):
--- # execute like: # ansible-playbook reboot.yaml --inventory hosts --extra-vars "hosts=all user=admin" # or # ansible-playbook reboot.yaml -i hosts -e "hosts=all user=admin" - hosts: "{{ hosts }}" remote_user: "{{ user }}" become: yes tasks: # add this to to guard you from yourself ;) #- name: "ask for verification" # pause: # prompt: "Are you sure you want to restart all specified hosts?" # here comes the juicy part - name: "reboot hosts" shell: "sleep 2 && shutdown -r now 'Reboot triggered by Ansible'" # sleep 2 is needed, else this task might fail async: "1" # run asynchronously poll: "0" # don't ask for the status of the command, just fire and forget ignore_errors: yes # this command will get cut off by the reboot, so ignore errors - name: "wait for hosts to come up again" wait_for: host: "{{ inventory_hostname }}" port: "22" # wait for ssh as this is what is needed for ansible state: "started" delay: "120" # start checking after this amount of time timeout: "360" # give up after this amount of time delegate_to: "localhost" # check from the machine executing the playbook ...
如果你想检查主机的状态,重新启动的时间和许多其他参数,那么你应该使用监视软件,如Zabbix , Nagios等。
重启时间可以通过uptime系统参数来检查。 它显示自上次启动以来的时间。 在主机上运行snmpd服务时,可以通过Linux / UNIX主机上的命令uptime或远程SNMP协议来获得:
snmpget -v2c -c public host_name_or_ip_address sysUpTime.0