如何debugging系统中的futex拥塞

我有perlmultithreading脚本(“assp antispam代理”)使用多个perl库。 在我的一台服务器上,大约2个小时后,脚本开始制造大量的消耗大量CPU功耗的futexed。 在其他服务器上这不会发生,所以它可能会在服务器资源的一些竞争条件之后开始发生。 向开发人员询问有关如何debugging此问题的任何信息。

使用strace或sysdig我看到这个Perl脚本开始做大量的futex请求。 我怎样才能debugging,并找出什么文件的所有这些内核futex? 我试过sysdig和oprofile,但没有find任何方法来检查它。

10秒运行strace:

# strace -f -p 6130 -c Process 6130 attached with 8 threads - interrupt to quit % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 57.44 3.265528 16493 198 nanosleep 28.53 1.622291 19313 84 poll 8.54 0.485535 1 787123 169255 futex 5.28 0.300018 75005 4 restart_syscall 0.21 0.012001 2400 5 select 0.00 0.000209 0 35751 sched_yield ------ ----------- ----------- --------- --------- ---------------- 100.00 5.685582 823355 169273 total 

这些futexes的小例子:

  75853 13:09:36.167879875 2 perl (6248) > futex addr=7F51EF8E0A5C op=128(FUTEX_PRIVATE_FLAG) val=115743049 75855 13:09:36.167880152 2 perl (6248) > switch next=0 pgft_maj=0 pgft_min=40961 vm_size=990864 vm_rss=669216 vm_swap=0 75856 13:09:36.167880436 0 perl (6154) < futex res=0 75857 13:09:36.167880702 0 perl (6154) > futex addr=7F51EF8E0A5C op=133(FUTEX_PRIVATE_FLAG|FUTEX_WAKE_OP) val=1 75858 13:09:36.167881606 0 perl (6154) < futex res=1 75863 13:09:36.167882554 2 perl (6248) < futex res=0 75865 13:09:36.167882949 2 perl (6248) > futex addr=7F51EF8E0A20 op=128(FUTEX_PRIVATE_FLAG) val=2 75866 13:09:36.167883026 0 perl (6154) > futex addr=7F51EF8E0A58 op=129(FUTEX_PRIVATE_FLAG|FUTEX_WAKE) val=1 75869 13:09:36.167883226 2 perl (6248) > switch next=0 pgft_maj=0 pgft_min=40961 vm_size=990864 vm_rss=669216 vm_swap=0 75870 13:09:36.167883231 0 perl (6154) < futex res=0 75873 13:09:36.167883503 0 perl (6154) > futex addr=7F51EF8E0A20 op=129(FUTEX_PRIVATE_FLAG|FUTEX_WAKE) val=1 75877 13:09:36.167884305 0 perl (6154) < futex res=1 75885 13:09:36.167885569 2 perl (6248) < futex res=0 75887 13:09:36.167885810 2 perl (6248) > futex addr=7F51EF8E0A20 op=129(FUTEX_PRIVATE_FLAG|FUTEX_WAKE) val=1 75888 13:09:36.167885945 2 perl (6248) < futex res=0 75890 13:09:36.167886138 2 perl (6248) > futex addr=7F51EF8E0A5C op=128(FUTEX_PRIVATE_FLAG) val=115743051 75891 13:09:36.167886404 2 perl (6248) > switch next=0 pgft_maj=0 pgft_min=40961 vm_size=990864 vm_rss=669216 vm_swap=0 75892 13:09:36.167886585 0 perl (6154) > futex addr=7F51EF8E0A5C op=133(FUTEX_PRIVATE_FLAG|FUTEX_WAKE_OP) val=1 75894 13:09:36.167887449 0 perl (6154) < futex res=1 75896 13:09:36.167888690 2 perl (6248) < futex res=0 75897 13:09:36.167888969 2 perl (6248) > futex addr=7F51EF8E0A20 op=129(FUTEX_PRIVATE_FLAG|FUTEX_WAKE) val=1 75899 13:09:36.167889092 2 perl (6248) < futex res=0 75900 13:09:36.167889194 0 perl (6154) > futex addr=7F51EF8E0A58 op=128(FUTEX_PRIVATE_FLAG) val=2 75902 13:09:36.167889326 2 perl (6248) > futex addr=7F51EF8E0A58 op=129(FUTEX_PRIVATE_FLAG|FUTEX_WAKE) val=1 75903 13:09:36.167889510 2 perl (6248) < futex res=0 75904 13:09:36.167889619 0 perl (6154) < futex res=-11(EAGAIN) 75905 13:09:36.167889716 2 perl (6248) > futex addr=7F51EF8E0A5C op=128(FUTEX_PRIVATE_FLAG) val=115743053 75906 13:09:36.167889945 0 perl (6154) > futex addr=7F51EF8E0A5C op=133(FUTEX_PRIVATE_FLAG|FUTEX_WAKE_OP) val=1 75907 13:09:36.167889977 2 perl (6248) > switch next=0 pgft_maj=0 pgft_min=40961 vm_size=990864 vm_rss=669216 vm_swap=0 75908 13:09:36.167890792 0 perl (6154) < futex res=1 75910 13:09:36.167892302 2 perl (6248) < futex res=0 75911 13:09:36.167892572 2 perl (6248) > futex addr=7F51EF8E0A20 op=129(FUTEX_PRIVATE_FLAG|FUTEX_WAKE) val=1 75912 13:09:36.167892712 2 perl (6248) < futex res=0 75913 13:09:36.167892744 0 perl (6154) > futex addr=7F51EF8E0A5C op=128(FUTEX_PRIVATE_FLAG) val=115743055 75915 13:09:36.167893152 2 perl (6248) > futex addr=7F51EF8E0A5C op=133(FUTEX_PRIVATE_FLAG|FUTEX_WAKE_OP) val=1 75916 13:09:36.167893352 0 perl (6154) > switch next=0 pgft_maj=2 pgft_min=224301 vm_size=990864 vm_rss=669216 vm_swap=0 75917 13:09:36.167894180 2 perl (6248) < futex res=1 75921 13:09:36.167894912 0 perl (6154) < futex res=0 75922 13:09:36.167895247 0 perl (6154) > futex addr=7F51EF8E0A20 op=129(FUTEX_PRIVATE_FLAG|FUTEX_WAKE) val=1 75924 13:09:36.167895461 0 perl (6154) < futex res=0 75925 13:09:36.167895929 2 perl (6248) > futex addr=7F51EF8E0A5C op=128(FUTEX_PRIVATE_FLAG) val=115743057 75926 13:09:36.167895988 0 perl (6154) > futex addr=7F51EF8E0A5C op=133(FUTEX_PRIVATE_FLAG|FUTEX_WAKE_OP) val=1 75927 13:09:36.167896092 2 perl (6248) < futex res=-11(EAGAIN) 75929 13:09:36.167896259 0 perl (6154) < futex res=0 75930 13:09:36.167896312 2 perl (6248) > futex addr=7F51EF8E0A20 op=128(FUTEX_PRIVATE_FLAG) val=2 75931 13:09:36.167896547 0 perl (6154) > futex addr=7F51EF8E0A20 op=129(FUTEX_PRIVATE_FLAG|FUTEX_WAKE) val=1 75933 13:09:36.167896605 2 perl (6248) > switch next=0 pgft_maj=0 pgft_min=40961 vm_size=990864 vm_rss=669216 vm_swap=0 75939 13:09:36.167897494 0 perl (6154) < futex res=1 75944 13:09:36.167899107 2 perl (6248) < futex res=0 75945 13:09:36.167899365 2 perl (6248) > futex addr=7F51EF8E0A20 op=129(FUTEX_PRIVATE_FLAG|FUTEX_WAKE) val=1 75946 13:09:36.167899498 2 perl (6248) < futex res=0 75947 13:09:36.167899541 0 perl (6154) > futex addr=7F51EF8E0A5C op=128(FUTEX_PRIVATE_FLAG) val=115743059 75950 13:09:36.167900094 0 perl (6154) > switch next=0 pgft_maj=2 pgft_min=224301 vm_size=990864 vm_rss=669216 vm_swap=0 75952 13:09:36.167900811 2 perl (6248) > futex addr=7F51EF8E0A5C op=133(FUTEX_PRIVATE_FLAG|FUTEX_WAKE_OP) val=1 75953 13:09:36.167901474 2 perl (6248) < futex res=1 75958 13:09:36.167903232 0 perl (6154) < futex res=0 75959 13:09:36.167903727 2 perl (6248) > futex addr=7F51EF8E0A20 op=128(FUTEX_PRIVATE_FLAG) val=2 75960 13:09:36.167903788 0 perl (6154) > futex addr=7F51EF8E0A20 op=129(FUTEX_PRIVATE_FLAG|FUTEX_WAKE) val=1 75961 13:09:36.167903875 2 perl (6248) < futex res=-11(EAGAIN) 75963 13:09:36.167904047 0 perl (6154) < futex res=0 75964 13:09:36.167904106 2 perl (6248) > futex addr=7F51EF8E0A58 op=128(FUTEX_PRIVATE_FLAG) val=2 75967 13:09:36.167904348 0 perl (6154) > futex addr=7F51EF8E0A58 op=129(FUTEX_PRIVATE_FLAG|FUTEX_WAKE) val=1 75968 13:09:36.167904374 2 perl (6248) > switch next=0 pgft_maj=0 pgft_min=40961 vm_size=990864 vm_rss=669216 vm_swap=0 75971 13:09:36.167905247 0 perl (6154) < futex res=1 75973 13:09:36.167905490 0 perl (6154) > futex addr=7F51EF8E0A5C op=128(FUTEX_PRIVATE_FLAG) val=115743061 75978 13:09:36.167905988 0 perl (6154) > switch next=0 pgft_maj=2 pgft_min=224301 vm_size=990864 vm_rss=669216 vm_swap=0 75983 13:09:36.167906745 2 perl (6248) < futex res=0 75985 13:09:36.167906986 2 perl (6248) > futex addr=7F51EF8E0A5C op=133(FUTEX_PRIVATE_FLAG|FUTEX_WAKE_OP) val=1 75990 13:09:36.167907588 2 perl (6248) < futex res=1 75992 13:09:36.167907788 2 perl (6248) > futex addr=7F51EF8E0A20 op=129(FUTEX_PRIVATE_FLAG|FUTEX_WAKE) val=1 75993 13:09:36.167907927 2 perl (6248) < futex res=0 75998 13:09:36.167909046 0 perl (6154) < futex res=0 76002 13:09:36.167909432 0 perl (6154) > futex addr=7F51EF8E0A20 op=129(FUTEX_PRIVATE_FLAG|FUTEX_WAKE) val=1 76004 13:09:36.167909584 2 perl (6248) > futex addr=7F51EF8E0A58 op=128(FUTEX_PRIVATE_FLAG) val=2 76005 13:09:36.167909674 0 perl (6154) < futex res=0 76008 13:09:36.167909846 2 perl (6248) > switch next=0 pgft_maj=0 pgft_min=40961 vm_size=990864 vm_rss=669216 vm_swap=0 76009 13:09:36.167909969 0 perl (6154) > futex addr=7F51EF8E0A58 op=129(FUTEX_PRIVATE_FLAG|FUTEX_WAKE) val=1 76011 13:09:36.167910704 0 perl (6154) < futex res=1 76012 13:09:36.167910946 0 perl (6154) > futex addr=7F51EF8E0A5C op=128(FUTEX_PRIVATE_FLAG) val=115743063 76015 13:09:36.167911454 0 perl (6154) > switch next=0 pgft_maj=2 pgft_min=224301 vm_size=990864 vm_rss=669216 vm_swap=0 76020 13:09:36.167912232 2 perl (6248) < futex res=0 76022 13:09:36.167912491 2 perl (6248) > futex addr=7F51EF8E0A5C op=133(FUTEX_PRIVATE_FLAG|FUTEX_WAKE_OP) val=1 

在GNU / Linux上你可以试试systemtap 。 事先需要一点准备。 请查阅这方面的在线文件 。

你会发现一个 快速的用户空间互斥系统调用的例子 ,下面转载完整版。

值得注意的是(从上面引用的两个资料来看):

从维基百科的文章:

一个正确编程的基于futex的锁将不会使用系统调用,除非该锁被争用; 因为大多数操作不需要在进程之间进行仲裁,所以这在大多数情况下不会发生

systemtap文档:

简而言之,当多个进程试图同时访问同一个锁variables时,会发生futex争用。 这可能会导致性能较差,因为锁序列化执行; 一个进程获得锁,而其他进程必须等待锁variables再次变为可用。

来自systemtap例子的脚本:

 #! /usr/bin/env stap # This script tries to identify contended user-space locks by hooking # into the futex system call. global FUTEX_WAIT = 0 /*, FUTEX_WAKE = 1 */ global FUTEX_PRIVATE_FLAG = 128 /* linux 2.6.22+ */ global FUTEX_CLOCK_REALTIME = 256 /* linux 2.6.29+ */ global lock_waits # long-lived stats on (tid,lock) blockage elapsed time global process_names # long-lived pid-to-execname mapping probe syscall.futex.return { if (($op & ~(FUTEX_PRIVATE_FLAG|FUTEX_CLOCK_REALTIME)) != FUTEX_WAIT) next process_names[pid()] = execname() elapsed = gettimeofday_us() - @entry(gettimeofday_us()) lock_waits[pid(), $uaddr] <<< elapsed } probe end { foreach ([pid+, lock] in lock_waits) printf ("%s[%d] lock %p contended %d times, %d avg us\n", process_names[pid], pid, lock, @count(lock_waits[pid,lock]), @avg(lock_waits[pid,lock])) } 

你也可以尝试在你的perl程序中使用Devel :: NYTProf来查找那些行为不当的代码。