我们无法将NFS共享从我们的Fedora 8 NFS服务器挂载到我们的Debian Lenny Database / Web-App NFS客户端。 带有选项的手动装载命令和使用fstab选项的辅助装载返回相同的行为。 机器意外地在6天前坠毁,但这个问题似乎在3天前出现了。 (是的,今天上午刚刚由负责人员向我汇报)
对于所有其他NFS客户端,此相同的服务器正常工作。 NFS客户端还将其部分共享服务器返回给其他客户端和NFS服务器,该服务器也正常工作。
根据这些坐骑的过程挂起,并从26日起。 Cron已closures,以保持平均负载水平。
挂载在NFS服务器上根据服务器上的“经过身份validation的挂载请求”消息正确进行身份validation,但客户端是
# mount -vvv -t nfs server.example.org:/shared/foo /shared/foo/ mount: fstab path: "/etc/fstab" mount: lock path: "/etc/mtab~" mount: temp path: "/etc/mtab.tmp" mount: spec: "server.example.org:/shared/foo" mount: node: "/shared/foo/" mount: types: "nfs" mount: opts: "(null)" mount: external mount: argv[0] = "/sbin/mount.nfs" mount: external mount: argv[1] = "server.example.org:/shared/foo" mount: external mount: argv[2] = "/shared/foo/" mount: external mount: argv[3] = "-v" mount: external mount: argv[4] = "-o" mount: external mount: argv[5] = "rw" mount.nfs: trying 192.168.xxx.xxx prog 100003 vers 3 prot TCP port 2049 mount.nfs: trying 192.168.xxx.xxx prog 100005 vers 3 prot UDP port 51852
在那里它无限期地在屏幕上输出。 很可能是因为以下问题:
Mar 28 10:17:14 db kernel: [1299206.229436] mount.nfs D e250c5d5 0 20597 20596 Mar 28 10:17:14 db kernel: [1299206.229439] c0a3cde0 00000086 f7555b00 e250c5d5 0001ca16 c0a3cf6c ce0a9020 0000000d Mar 28 10:17:14 db kernel: [1299206.229444] 0013bc68 077ffe57 00000003 00000000 00000000 00000000 00000000 00000246 Mar 28 10:17:14 db kernel: [1299206.229447] c0a77c90 00000000 c0a77c98 ce000a7c f8e047c1 c02c93a4 f8e0479c f4518588 Mar 28 10:17:14 db kernel: [1299206.229451] Call Trace: Mar 28 10:17:14 db kernel: [1299206.229465] [<f8e047c1>] rpc_wait_bit_killable+0x25/0x2a [sunrpc] Mar 28 10:17:14 db kernel: [1299206.229485] [<c02c93a4>] __wait_on_bit+0x33/0x58 Mar 28 10:17:14 db kernel: [1299206.229490] [<f8e0479c>] rpc_wait_bit_killable+0x0/0x2a [sunrpc] Mar 28 10:17:14 db kernel: [1299206.229505] [<f8e0479c>] rpc_wait_bit_killable+0x0/0x2a [sunrpc] Mar 28 10:17:14 db kernel: [1299206.229519] [<c02c9428>] out_of_line_wait_on_bit+0x5f/0x67 Mar 28 10:17:14 db kernel: [1299206.229523] [<c0138859>] wake_bit_function+0x0/0x3c Mar 28 10:17:14 db kernel: [1299206.229528] [<f8e04c06>] __rpc_execute+0xbe/0x1d9 [sunrpc] Mar 28 10:17:14 db kernel: [1299206.229543] [<f8dffa72>] rpc_run_task+0x40/0x45 [sunrpc] Mar 28 10:17:14 db kernel: [1299206.229557] [<f8dffb00>] rpc_call_sync+0x38/0x52 [sunrpc] Mar 28 10:17:14 db kernel: [1299206.229573] [<f8e80351>] nfs3_rpc_wrapper+0x14/0x49 [nfs] Mar 28 10:17:14 db kernel: [1299206.229591] [<f8e8044f>] do_proc_fsinfo+0x54/0x75 [nfs] Mar 28 10:17:14 db kernel: [1299206.229607] [<f8e80481>] nfs3_proc_fsinfo+0x11/0x36 [nfs] Mar 28 10:17:14 db kernel: [1299206.229621] [<f8e70514>] nfs_probe_fsinfo+0x78/0x47f [nfs] Mar 28 10:17:14 db kernel: [1299206.229634] [<f8dffd1f>] rpc_shutdown_client+0x9d/0xa5 [sunrpc] Mar 28 10:17:14 db kernel: [1299206.229647] [<f8dffb58>] rpc_ping+0x3e/0x47 [sunrpc] Mar 28 10:17:14 db kernel: [1299206.229662] [<f8e00845>] rpc_bind_new_program+0x69/0x6f [sunrpc] Mar 28 10:17:14 db kernel: [1299206.229677] [<f8e71584>] nfs_create_server+0x37b/0x4fa [nfs] Mar 28 10:17:14 db kernel: [1299206.229693] [<c01621c1>] __alloc_pages_internal+0xb5/0x34e Mar 28 10:17:14 db kernel: [1299206.229700] [<c013882c>] autoremove_wake_function+0x0/0x2d Mar 28 10:17:14 db kernel: [1299206.229703] [<c01e7e3c>] idr_get_empty_slot+0x11c/0x1ed Mar 28 10:17:14 db kernel: [1299206.229711] [<f8e78fbd>] nfs_get_sb+0x528/0x810 [nfs] Mar 28 10:17:14 db kernel: [1299206.229724] [<c01e8125>] idr_pre_get+0x21/0x2f Mar 28 10:17:14 db kernel: [1299206.229729] [<c0180159>] vfs_kern_mount+0x7b/0xed Mar 28 10:17:14 db kernel: [1299206.229734] [<c0180209>] do_kern_mount+0x2f/0xb8 Mar 28 10:17:14 db kernel: [1299206.229738] [<c019264a>] do_new_mount+0x55/0x89 Mar 28 10:17:14 db kernel: [1299206.229743] [<c0192825>] do_mount+0x1a7/0x1c6 Mar 28 10:17:14 db kernel: [1299206.229747] [<c02ca52a>] error_code+0x72/0x78 Mar 28 10:17:14 db kernel: [1299206.229752] [<c0190895>] copy_mount_options+0x90/0x109 Mar 28 10:17:14 db kernel: [1299206.229756] [<c01928b1>] sys_mount+0x6d/0xa8 Mar 28 10:17:14 db kernel: [1299206.229760] [<c0108857>] sysenter_past_esp+0x78/0xb1 Mar 28 10:17:14 db kernel: [1299206.229766] =======================
networking运行正常,因为数据库Web应用程序前端的生产用户没有看到服务中断或任何性能问题。
内存很好:
db:/var/log# free -m total used free shared buffers cached Mem: 24352 19426 4926 0 281 18283 -/+ buffers/cache: 860 23492 Swap: 7632 0 7632
/ etc / fstab中:
server.example.org:/shared/foo /foo nfs defaults 0 0
服务器的/ etc / exports中的相关行:/ shared / foo 192.168.xxx.xxx(rw,no_root_squash)
TCPDump看起来很正常。 如果有人想要我,我可以发布它,但是它相当大,并且在输出中似乎没有任何明显的恶意。
我耗尽了时间排除故障,并最终重新启动服务后,开发人员也发射了一些其他挂起的尝试。
重新启动portmap和Debian nfs服务后,再次得到这个工作后杀死卡住的客户端安装尝试。 NFS服务重新启动了rpc.statd,rpc.idmapd和rpc.mountd进程。
旧的安装尝试被终止后,堆栈跟踪不再为新的安装请求生成。