我发现一个奇怪的情况,在启用了交换空间的系统上触发了杀手,但显然没有被使用。
这是一个具有2Gb RAM(OVH)的32位ARM系统,用于进行备份。 OOM在每次备份过程中都不会发生,但它们只发生在备份期间(而不是在系统空闲时)。
OOM总是由文件系统的(ext4)内核代码触发的:
rm invoked oom-killer: gfp_mask=0x2420848(GFP_NOFS|__GFP_NOFAIL|__GFP_HARDWALL|__GFP_MOVABLE), nodemask=0, order=0, oom_score_adj=0 rm cpuset=/ mems_allowed=0 CPU: 1 PID: 21300 Comm: rm Tainted: GO 4.9.2-armada375 #1 Hardware name: Marvell Armada 375 (Device Tree) [<c010f7c4>] (unwind_backtrace) from [<c010b2a0>] (show_stack+0x10/0x14) [<c010b2a0>] (show_stack) from [<c04bb318>] (dump_stack+0x84/0x98) [<c04bb318>] (dump_stack) from [<c01dcc84>] (dump_header+0x98/0x1d8) [<c01dcc84>] (dump_header) from [<c0198c80>] (oom_kill_process+0x42c/0x4b4) [<c0198c80>] (oom_kill_process) from [<c0199000>] (out_of_memory+0x114/0x304) [<c0199000>] (out_of_memory) from [<c019d1fc>] (__alloc_pages_nodemask+0xb60/0xc1c) [<c019d1fc>] (__alloc_pages_nodemask) from [<c01943ac>] (pagecache_get_page+0x100/0x2c4) [<c01943ac>] (pagecache_get_page) from [<c0211170>] (__getblk_gfp+0x100/0x360) [<c0211170>] (__getblk_gfp) from [<c021285c>] (__breadahead+0x18/0x50) [<c021285c>] (__breadahead) from [<c024fcf0>] (__ext4_get_inode_loc+0x410/0x45c) [<c024fcf0>] (__ext4_get_inode_loc) from [<c025238c>] (ext4_iget+0x58/0xa7c) [<c025238c>] (ext4_iget) from [<c025ca3c>] (ext4_lookup+0xa8/0x1f0) [<c025ca3c>] (ext4_lookup) from [<c01edd24>] (__lookup_hash+0x58/0x88) [<c01edd24>] (__lookup_hash) from [<c01eeb58>] (do_unlinkat+0x10c/0x24c) [<c01eeb58>] (do_unlinkat) from [<c01075c0>] (ret_fast_syscall+0x0/0x3c)
该系统似乎有大量的可用内存,虽然不在“正常”区域(我可以以某种方式增加?)如何?
Mem-Info: active_anon:14752 inactive_anon:14221 isolated_anon:0 active_file:30720 inactive_file:234434 isolated_file:35 unevictable:0 dirty:0 writeback:0 unstable:0 slab_reclaimable:178562 slab_unreclaimable:4581 mapped:3299 shmem:23539 pagetables:91 bounce:0 free:28732 free_pcp:187 free_cma:0 Node 0 active_anon:59008kB inactive_anon:56884kB active_file:122964kB inactive_file:937720kB unevictable:0kB isolated(anon):0kB isolated(file):140kB mapped:13196kB dirty:0kB writeback:0kB shmem:94156kB writeback_tmp:0kB unstable:0kB pages_scanned:4288575 all_unreclaimable? yes Normal free:3428kB min:3476kB low:4344kB high:5212kB active_anon:0kB inactive_anon:0kB active_file:11544kB inactive_file:164kB unevictable:0kB writepending:0kB present:786432kB managed:757356kB mlocked:0kB slab_reclaimable:714248kB slab_unreclaimable:18324kB kernel_stack:648kB pagetables:364kB bounce:0kB free_pcp:624kB local_pcp:204kB free_cma:0kB lowmem_reserve[]: 0 10240 10240 HighMem free:111500kB min:512kB low:2016kB high:3520kB active_anon:59008kB inactive_anon:56884kB active_file:111392kB inactive_file:937592kB unevictable:0kB writepending:0kB present:1310720kB managed:1310720kB mlocked:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:124kB local_pcp:0kB free_cma:0kB lowmem_reserve[]: 0 0 0 Normal: 519*4kB (U) 169*8kB (U) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 3428kB HighMem: 5*4kB (UM) 7*8kB (UM) 4*16kB (U) 2*32kB (U) 3*64kB (U) 266*128kB (UM) 265*256kB (UM) 16*512kB (UM) 1*1024kB (U) 0*2048kB 0*4096kB = 111500kB
而且,它并没有触及可用的交换空间:
288734 total pagecache pages 4 pages in swap cache Swap cache stats: add 40359, delete 40355, find 16093/19606 Free swap = 4094944kB Total swap = 4094972kB 524288 pages RAM 327680 pages HighMem/MovableOnly 7269 pages reserved
为了完成,这里是完整的进程列表和最终的kill消息:
[ pid ] uid tgid total_vm rss nr_ptes nr_pmds swapents oom_score_adj name [ 725] 0 725 2867 1296 8 0 0 0 systemd-journal [ 968] 0 968 2445 552 5 0 0 -1000 systemd-udevd [ 1016] 0 1016 1023 631 5 0 0 0 smartd [ 1017] 0 1017 1096 492 5 0 0 0 cron [ 1019] 0 1019 669 394 4 0 0 0 systemd-logind [ 1025] 104 1025 1144 599 5 0 0 -900 dbus-daemon [ 1035] 106 1035 1195 756 4 0 0 0 ntpd [ 1065] 0 1065 494 314 3 0 0 0 agetty [ 1071] 0 1071 833 400 4 0 0 0 agetty [20106] 0 20106 1290 469 5 0 0 0 cron [20110] 0 20110 975 564 4 0 0 0 rsnapshot-seque [20867] 0 20867 973 516 4 0 0 0 rsnapshot [20868] 0 20868 2496 1856 7 0 0 0 rsnapshot [20869] 0 20869 974 378 4 0 0 0 rsnapshot [20870] 0 20870 901 420 5 0 0 0 grep [21300] 0 21300 3769 3371 9 0 0 0 rm Out of memory: Kill process 21300 (rm) score 2 or sacrifice child Killed process 21300 (rm) total-vm:15076kB, anon-rss:12216kB, file-rss:1268kB, shmem-rss:0kB oom_reaper: reaped process 21300 (rm), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
我完全迷失在这里发生的事情上。 有人可以解释为什么系统触发这些OOM杀人? 我可以做些什么吗?
我正在使用内核4.9.2(armv7l GNU / Linux)。 这里是vm sysctl设置:
# sysctl -a | grep '^vm' vm.admin_reserve_kbytes = 8192 vm.block_dump = 0 vm.dirty_background_bytes = 0 vm.dirty_background_ratio = 10 vm.dirty_bytes = 0 vm.dirty_expire_centisecs = 3000 vm.dirty_ratio = 20 vm.dirty_writeback_centisecs = 500 vm.dirtytime_expire_seconds = 43200 vm.drop_caches = 3 vm.highmem_is_dirtyable = 0 vm.laptop_mode = 0 vm.legacy_va_layout = 0 vm.lowmem_reserve_ratio = 32 32 vm.max_map_count = 65530 vm.min_free_kbytes = 3478 vm.mmap_min_addr = 4096 vm.mmap_rnd_bits = 8 vm.nr_pdflush_threads = 0 vm.oom_dump_tasks = 1 vm.oom_kill_allocating_task = 0 vm.overcommit_kbytes = 0 vm.overcommit_memory = 0 vm.overcommit_ratio = 50 vm.page-cluster = 3 vm.panic_on_oom = 0 vm.percpu_pagelist_fraction = 0 vm.stat_interval = 1 vm.swappiness = 60 vm.user_reserve_kbytes = 64454 vm.vfs_cache_pressure = 100 vm.watermark_scale_factor = 10