大多数长时间运行的命令在Amazon EC2上立即死亡(Ubuntu 10.04)

当在terminal中运行任何types的长时间运行的命令时,程序立即死亡,terminal输出被Killed的文本。

任何指针? 也许有一个数据日志文件解释为什么命令被杀害?

更新

这是dmesg一个片段,希望能够说明引起问题的原因。 另一个可能有用的注意事项是这是一个Amazon EC2实例。

 May 14 20:29:15 ip-10-112-33-63 kernel: [11144050.184209] Call Trace: May 14 20:29:15 ip-10-112-33-63 kernel: [11144050.184218] [<c01e49ea>] dump_header+0x7a/0xb0 May 14 20:29:15 ip-10-112-33-63 kernel: [11144050.184221] [<c01e4a7c>] oom_kill_process+0x5c/0x160 May 14 20:29:15 ip-10-112-33-63 kernel: [11144050.184224] [<c01e4fe9>] ? select_bad_process+0xa9/0xe0 May 14 20:29:15 ip-10-112-33-63 kernel: [11144050.184227] [<c01e5071>] __out_of_memory+0x51/0xb0 May 14 20:29:15 ip-10-112-33-63 kernel: [11144050.184229] [<c01e5128>] out_of_memory+0x58/0xd0 May 14 20:29:15 ip-10-112-33-63 kernel: [11144050.184232] [<c01e7f16>] __alloc_pages_slowpath+0x416/0x4b0 May 14 20:29:15 ip-10-112-33-63 kernel: [11144050.184235] [<c01e811f>] __alloc_pages_nodemask+0x16f/0x1c0 May 14 20:29:15 ip-10-112-33-63 kernel: [11144050.184238] [<c01ea2ca>] __do_page_cache_readahead+0xea/0x210 May 14 20:29:15 ip-10-112-33-63 kernel: [11144050.184241] [<c01ea416>] ra_submit+0x26/0x30 May 14 20:29:15 ip-10-112-33-63 kernel: [11144050.184244] [<c01e3aef>] filemap_fault+0x3cf/0x400 May 14 20:29:15 ip-10-112-33-63 kernel: [11144050.184247] [<c02329ad>] ? core_sys_select+0x19d/0x240 May 14 20:29:15 ip-10-112-33-63 kernel: [11144050.184252] [<c01fb65c>] __do_fault+0x4c/0x5e0 May 14 20:29:15 ip-10-112-33-63 kernel: [11144050.184254] [<c01e4161>] ? generic_file_aio_write+0xa1/0xc0 May 14 20:29:15 ip-10-112-33-63 kernel: [11144050.184257] [<c01fd60b>] handle_mm_fault+0x19b/0x510 May 14 20:29:15 ip-10-112-33-63 kernel: [11144050.184262] [<c05f80d6>] do_page_fault+0x146/0x440 May 14 20:29:15 ip-10-112-33-63 kernel: [11144050.184265] [<c0232c62>] ? sys_select+0x42/0xc0 May 14 20:29:15 ip-10-112-33-63 kernel: [11144050.184268] [<c05f7f90>] ? do_page_fault+0x0/0x440 May 14 20:29:15 ip-10-112-33-63 kernel: [11144050.184270] [<c05f53c7>] error_code+0x73/0x78 May 14 20:29:15 ip-10-112-33-63 kernel: [11144050.184274] [<c05f007b>] ? setup_local_APIC+0xce/0x33e May 14 20:29:15 ip-10-112-33-63 kernel: [11144050.272161] [<c05f0000>] ? setup_local_APIC+0x53/0x33e May 14 20:29:15 ip-10-112-33-63 kernel: [11144050.272163] Mem-Info: May 14 20:29:15 ip-10-112-33-63 kernel: [11144050.272164] DMA per-cpu: May 14 20:29:15 ip-10-112-33-63 kernel: [11144050.272166] CPU 0: hi: 0, btch: 1 usd: 0 May 14 20:29:15 ip-10-112-33-63 kernel: [11144050.272168] Normal per-cpu: May 14 20:29:15 ip-10-112-33-63 kernel: [11144050.272169] CPU 0: hi: 186, btch: 31 usd: 50 May 14 20:29:15 ip-10-112-33-63 kernel: [11144050.272171] HighMem per-cpu: May 14 20:29:15 ip-10-112-33-63 kernel: [11144050.272172] CPU 0: hi: 186, btch: 31 usd: 30 May 14 20:29:15 ip-10-112-33-63 kernel: [11144050.272176] active_anon:204223 inactive_anon:204177 isolated_anon:0 May 14 20:29:15 ip-10-112-33-63 kernel: [11144050.272177] active_file:47 inactive_file:141 isolated_file:0 May 14 20:29:15 ip-10-112-33-63 kernel: [11144050.272178] unevictable:0 dirty:0 writeback:0 unstable:0 May 14 20:29:15 ip-10-112-33-63 kernel: [11144050.272179] free:10375 slab_reclaimable:1650 slab_unreclaimable:1856 May 14 20:29:15 ip-10-112-33-63 kernel: [11144050.272180] mapped:2127 shmem:3918 pagetables:1812 bounce:0May 14 20:29:15 ip-10-112-33-63 kernel: [11144050.272186] DMA free:6744kB min:72kB low:88kB high:108kB active_anon:300kB inactive_anon:308kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15812kB mlocked:0kB dirty:0kB writeback:0kB mapped:4kB shmem:0kB slab_reclaimable:8kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes May 14 20:29:15 ip-10-112-33-63 kernel: [11144050.272190] lowmem_reserve[]: 0 702 1670 1670May 14 20:29:15 ip-10-112-33-63 kernel: [11144050.272197] Normal free:34256kB min:3352kB low:4188kB high:5028kB active_anon:317736kB inactive_anon:317308kB active_file:144kB inactive_file:16kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:719320kB mlocked:0kB dirty:4kB writeback:0kB mapped:32kB shmem:0kB slab_reclaimable:6592kB slab_unreclaimable:7424kB kernel_stack:2592kB pagetables:7248kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:571 all_unreclaimable? yes May 14 20:29:15 ip-10-112-33-63 kernel: [11144050.272201] lowmem_reserve[]: 0 0 7747 7747May 14 20:29:15 ip-10-112-33-63 kernel: [11144050.272207] HighMem free:500kB min:512kB low:1668kB high:2824kB active_anon:498856kB inactive_anon:499092kB active_file:44kB inactive_file:548kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:991620kB mlocked:0kB dirty:0kB writeback:0kB mapped:8472kB shmem:15672kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:430 all_unreclaimable? yes May 14 20:29:15 ip-10-112-33-63 kernel: [11144050.272211] lowmem_reserve[]: 0 0 0 0May 14 20:29:15 ip-10-112-33-63 kernel: [11144050.272215] DMA: 10*4kB 22*8kB 38*16kB 33*32kB 16*64kB 10*128kB 4*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 6744kBMay 14 20:29:15 ip-10-112-33-63 kernel: [11144050.272223] Normal: 476*4kB 1396*8kB 676*16kB 206*32kB 23*64kB 2*128kB 0*256kB 0*512kB 0*1024kB 1*2048kB 0*4096kB = 34256kBMay 14 20:29:15 ip-10-112-33-63 kernel: [11144050.272231] HighMem: 1*4kB 2*8kB 28*16kB 1*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 500kB May 14 20:29:15 ip-10-112-33-63 kernel: [11144050.272238] 4108 total pagecache pages May 14 20:29:15 ip-10-112-33-63 kernel: [11144050.272240] 0 pages in swap cache May 14 20:29:15 ip-10-112-33-63 kernel: [11144050.272242] Swap cache stats: add 0, delete 0, find 0/0 May 14 20:29:15 ip-10-112-33-63 kernel: [11144050.272243] Free swap = 0kB May 14 20:29:15 ip-10-112-33-63 kernel: [11144050.272244] Total swap = 0kB May 14 20:29:15 ip-10-112-33-63 kernel: [11144050.276842] 435199 pages RAM May 14 20:29:15 ip-10-112-33-63 kernel: [11144050.276845] 249858 pages HighMem May 14 20:29:15 ip-10-112-33-63 kernel: [11144050.276846] 8771 pages reserved May 14 20:29:15 ip-10-112-33-63 kernel: [11144050.276847] 23955 pages shared May 14 20:29:15 ip-10-112-33-63 kernel: [11144050.276849] 405696 pages non-shared 

你应该能够通过查看dmesg命令的输出来找出什么可以使你的进程死亡。 或者在日志文件/var/log/kern.log /var/log/messages/var/log/syslog

有很多事情会导致一个进程被立即杀害:

  • 如果它超出了您可以使用ulimit -H -a检查的各种内存或cpu使用types的硬性限制
  • 如果系统虚拟内存不足,进程可能被内核杀手杀死以释放内存(在你的情况下,可能不是这样)
  • 如果系统安装了SELinux和/或PaX / grsecurity,那么如果某个进程尝试执行安全策略所不允许的操作,或者尝试执行自我修改的代码,则可能会终止该进程。

日志或dmesg应该告诉你为什么进程被杀害。

您在update中发布的日志表示您的系统内存不足,正在调用OOM杀手来终止进程,以便在“其他所有操作失败”时保持空闲内存。 OOM杀手的selectalgorithm可能会有利于您的“长时间运行”过程。 有关selectalgorithm的说明,请参阅链接页面。

显而易见的解决scheme是更多的内存,但由于某处存在内存泄漏,您可能会耗尽内存,并且增加更多的内存可能只会延迟OOM杀手被调用(如果是这种情况)。 使用最喜欢的工具(顶部,ps等)使用最多的内存检查进程表,然后从那里进行。

正如其他人已经解释过的那样,你的内存不足,所以内存不足的杀手会被触发并杀死一些进程。

你可以解决这个问题:

a)将您的ec2机器升级到更强大的机器,“小实例”比“微型实例”(0.64GB)多2.5倍的内存(1.7GB),花费额外的资金

b)添加交换分区 – 添加额外的EBS驱动器, mkswap /dev/sdxswapon /dev/sdx ,花费EBS存储和IO的费用

c)添加交换文件 – dd if=/dev/zero of=/swap bs=1M count=500mkswap /swapswapon /swap ,花费IO根和EBS上的可用空间

c)应该是足够的,但请记住,由于cpu限制(只允许短突发),微型实例不应该运行长时间运行的cpu密集型任务。

我有同样的问题。 我的进程正在被杀害。

我发现我使用的Ubuntu AMI没有设置交换空间。 当内存已满并且没有可用的交换空间时,内核将不可预知地开始查杀进程以保护自己。 交换空间阻止了这一点。 (由于613 MB的内存很小,所以此问题与Micro实例尤其相关。)

要检查是否有交换空间设置types: swapon -s

设置交换空间: http : //www.linux.com/news/software/applications/8208-all-about-linux-swap-space

其他资源: http : //wiki.sysconfig.org.uk/display/howto/Build+your+own+Core+CentOS+5.x+AMI+for+Amazon+EC2

日志说你正在用尽swap / cache内存。

     5月14日20:29:15 ip-10-112-33-63内核:[11144050.272240] 0个交换caching
     5月14日20:29:15 ip-10-112-33-63内核:[11144050.272242]交换caching统计信息:添加0,删除0,查找0/0
     5月14日20:29:15 ip-10-112-33-63 kernel:[11144050.272243] Free swap = 0kB
     5月14日20:29:15 ip-10-112-33-63内核:[11144050.272244]总掉期= 0kB

你可以拆分你正在批量运行的工作/stream程吗? 也许你可以尝试在停止其他进程后单独运行它?