Riak“错误”:“insufficient_vnodes_available”

我们有4个节点Riak安装。 他们运行在Ubuntu 12.04 LTS精确安装的服务器上。 我们在2012年8月1日安装了1.1.4版本,并在1.2.0版本可用时升级。

服务器名称是:

f1 – 10.10.0.12 – 这是第一个安装的服务器。 我们已经join了其他的这个服务器。 这也服务于Riak控制。 s2 – 10.10.0.22 – s3 – 10.10.0.23 – s4 – 10.10.0.24 – 该服务器还提供Riak控制。

今天早上,我们在应用程序日志中看到“节点不足”错误,并重新启动所有节点。 其中3个变得可用,除了“f1”

更新:当我准备这个消息现场3节点变得不可用,需要重新启动Riak。

wolfiem@f01:~$ sudo /etc/init.d/riak start Riak failed to start within 15 seconds, see the output of 'riak console' for more information. If you want to wait longer, set the environment variable WAIT_FOR_ERLANG to the number of seconds to wait. 

我试图设置WAIT_FOR_ERLANG值为60秒,但我不能。

在vm.args中添加这行不起作用:

 -env WAIT_FOR_ERLANG 60 

我也试图从terminal设置,但它也没有工作。

 wolfiem@f01:~$ export WAIT_FOR_ERLANG=60 

它仍然说“Riak未能在15秒内开始”

这是console.log输出:

 2012-09-11 10:58:02.532 [info] <0.7.0> Application lager started on node '[email protected]' 2012-09-11 10:58:02.560 [warning] <0.148.0>@riak_core_ring_manager:reload_ring:231 No ring file available. 2012-09-11 10:58:02.585 [error] <0.164.0> CRASH REPORT Process <0.164.0> with 0 neighbours exited with reason: eaddrnotavail in gen_server:init_it/6 line 320 

这是error.log输出

 2012-09-11 10:58:02.585 [error] <0.164.0> CRASH REPORT Process <0.164.0> with 0 neighbours exited with reason: eaddrnotavail in gen_server:init_it/6 line 320 

这是crash.log输出:

 2012-09-11 10:58:02 =CRASH REPORT==== crasher: initial call: mochiweb_socket_server:init/1 pid: <0.164.0> registered_name: [] exception exit: {eaddrnotavail,[{gen_server,init_it,6,[{file,"gen_server.erl"},{line,320}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,227}]}]} ancestors: [riak_core_sup,<0.135.0>] messages: [] links: [<0.136.0>] dictionary: [] trap_exit: true status: running heap_size: 377 stack_size: 24 reductions: 403 neighbours: 

您可以在下面findriak控制台输出:

 wolfiem@f01:~$ riak console Attempting to restart script through sudo -H -u riak Exec: /usr/lib/riak/erts-5.9.1/bin/erlexec -boot /usr/lib/riak/releases/1.2.0/riak -embedded -config /etc/riak/app.config -pa /usr/lib/riak/basho-patches -args_file /etc/riak/vm.args -- console Root: /usr/lib/riak Erlang R15B01 (erts-5.9.1) [source] [64-bit] [smp:8:8] [async-threads:64] [kernel-poll:true] =INFO REPORT==== 11-Sep-2012::10:44:18 === alarm_handler: {set,{system_memory_high_watermark,[]}} ** /usr/lib/riak/lib/observer-1.1/ebin/etop_txt.beam hides /usr/lib/riak/lib/basho-patches/etop_txt.beam ** Found 1 name clashes in code paths 10:44:19.099 [info] Application lager started on node '[email protected]' 10:44:19.130 [warning] No ring file available. 10:44:19.158 [error] CRASH REPORT Process <0.164.0> with 0 neighbours exited with reason: eaddrnotavail in gen_server:init_it/6 line 320 /usr/lib/riak/lib/os_mon-2.2.9/priv/bin/memsup: Erlang has closed. =INFO REPORT==== 11-Sep-2012::10:44:19 === alarm_handler: {clear,system_memory_high_watermark} Erlang has closed {"Kernel pid terminated",application_controller,"{application_start_failure,riak_core,{shutdown,{riak_core_app,start,[normal,[]]}}}"} Crash dump was written to: /var/log/riak/erl_crash.dump Kernel pid terminated (application_controller) ({application_start_failure,riak_core,{shutdown,{riak_core_app,start,[normal,[]]}}}) 

在这里: http : //smartcloud.blogspot.hu/2013/01/setting-riak-cluster-in-amazon-ec2-just.html它说, with 0 neighbours exited with reason错误是由于(至less部分)运行riak实例,它位于某个端口或其他资源上。

对我来说,这是一个epmd实例,我用ps ax |grep riak riakfind了它。 杀死之后,问题就消失了。