HAProxy高CPU使用率 – configuration问题？

我们最近的stream量急剧上升，虽然只有中等规模，但却导致haproxy最大限度地利用了其中一个CPU内核（而服务器变得没有响应）。我猜测我正在做一些效率低下的configuration，所以想要问下所有的haproxy专家，他们是否会如此评价我的configuration文件（主要是从性能的angular度来看）。

该configuration旨在分发一组http应用程序服务器，一组处理websockets连接（在不同端口上有多个独立进程）的服务器以及一个静态文件web服务器。从性能问题来看，它运行良好。（一些细节已被编辑。）

任何指导，你可以提供将非常感激！

HAProxy v1.4.8

#--------------------------------------------------------------------- # Global settings #--------------------------------------------------------------------- global daemon maxconn 100000 log 127.0.0.1 local0 notice #--------------------------------------------------------------------- # common defaults that all the 'listen' and 'backend' sections will # use if not designated in their block #--------------------------------------------------------------------- defaults log global mode http option httplog option httpclose #http://serverfault.com/a/104782/52811 timeout connect 5000ms timeout client 50000ms timeout server 5h #long timeouts to stop WS drops - when v1.5 is stable, use 'timeout tunnel'; #--------------------------------------------------------------------- # FRONTEND #--------------------------------------------------------------------- frontend public bind *:80 maxconn 100000 reqidel ^X-Forwarded-For:.* #Remove any x-forwarded-for headers option forwardfor #Set the forwarded for header (needs option httpclose) default_backend app redirect prefix http://xxxxxxxxxxxxxxxxx code 301 if { hdr(host) -i www.xxxxxxxxxxxxxxxxxxx } timeout client 5h #long timeouts to stop WS drops - when v1.5 is stable, use 'timeout tunnel'; # ACLs ########## acl static_request hdr_beg(host) -i i. acl static_request hdr_beg(host) -i static. acl static_request path_beg /favicon.ico /robots.txt acl test_request hdr_beg(host) -i test. acl ws_request hdr_beg(host) -i ws # ws11 acl ws11x1_request hdr_beg(host) -i ws11x1 acl ws11x2_request hdr_beg(host) -i ws11x2 acl ws11x3_request hdr_beg(host) -i ws11x3 acl ws11x4_request hdr_beg(host) -i ws11x4 acl ws11x5_request hdr_beg(host) -i ws11x5 acl ws11x6_request hdr_beg(host) -i ws11x6 # ws12 acl ws12x1_request hdr_beg(host) -i ws12x1 acl ws12x2_request hdr_beg(host) -i ws12x2 acl ws12x3_request hdr_beg(host) -i ws12x3 acl ws12x4_request hdr_beg(host) -i ws12x4 acl ws12x5_request hdr_beg(host) -i ws12x5 acl ws12x6_request hdr_beg(host) -i ws12x6 # Which backend.... ################### use_backend static if static_request #ws11 use_backend ws11x1 if ws11x1_request use_backend ws11x2 if ws11x2_request use_backend ws11x3 if ws11x3_request use_backend ws11x4 if ws11x4_request use_backend ws11x5 if ws11x5_request use_backend ws11x6 if ws11x6_request #ws12 use_backend ws12x1 if ws12x1_request use_backend ws12x2 if ws12x2_request use_backend ws12x3 if ws12x3_request use_backend ws12x4 if ws12x4_request use_backend ws12x5 if ws12x5_request use_backend ws12x6 if ws12x6_request #--------------------------------------------------------------------- # BACKEND - APP #--------------------------------------------------------------------- backend app timeout server 50000ms #To counter the WS default mode http balance roundrobin option httpchk HEAD /upchk.txt server app1 app1:8000 maxconn 100000 check server app2 app2:8000 maxconn 100000 check server app3 app3:8000 maxconn 100000 check server app4 app4:8000 maxconn 100000 check #--------------------------------------------------------------------- # BACKENDs - WS #--------------------------------------------------------------------- #Server ws11 backend ws11x1 server ws11 ws11:8001 maxconn 100000 backend ws11x2 server ws11 ws11:8002 maxconn 100000 backend ws11x3 server ws11 ws11:8003 maxconn 100000 backend ws11x4 server ws11 ws11:8004 maxconn 100000 backend ws11x5 server ws11 ws11:8005 maxconn 100000 backend ws11x6 server ws11 ws11:8006 maxconn 100000 #Server ws12 backend ws12x1 server ws12 ws12:8001 maxconn 100000 backend ws12x2 server ws12 ws12:8002 maxconn 100000 backend ws12x3 server ws12 ws12:8003 maxconn 100000 backend ws12x4 server ws12 ws12:8004 maxconn 100000 backend ws12x5 server ws12 ws12:8005 maxconn 100000 backend ws12x6 server ws12 ws12:8006 maxconn 100000 #--------------------------------------------------------------------- # BACKEND - STATIC #--------------------------------------------------------------------- backend static server static1 static1:80 maxconn 40000

十万个连接是很多…你在推这么多吗？如果是这样的话…可能会分裂前端，使它绑定在一个静态内容的IP和一个IP的应用程序内容，然后运行静态和应用程序变种作为单独的haproxy进程（假设你有服务器上的第二个核心/ CPU） …

如果没有别的，它会缩小到应用程序或静态stream量的使用…

如果我正确地记住了我的networking101类，HaProxy不应该能够连接到ws12:8001或任何其他后端主机：端口，因为ws12:8001端口限制在大多数系统上更接近于28232 cat /proc/sys/net/ipv4/ip_local_port_range ）。您可能会耗尽本地端口，从而导致cpu挂起，因为它等待端口释放。

也许把每个后端的最大连接降低到接近28000可以缓解这个问题？或者更改本地端口范围以包容更多？

看看nbproc设置，看看是否有利于多利用一个核心。对于大多数硬件负载平衡器，您可以处理的通信量由负载平衡器的CPU /内存限制。

 1.5) Increasing the overall processing power -------------------------------------------- On multi-processor systems, it may seem to be a shame to use only one processor, eventhough the load needed to saturate a recent processor is far above common usage. Anyway, for very specific needs, the proxy can start several processes between which the operating system will spread the incoming connections. The number of processes is controlled by the 'nbproc' parameter in the 'global' section. It defaults to 1, and obviously works only in 'daemon' mode. One typical usage of this parameter has been to workaround the default per-process file-descriptor limit that Solaris imposes to user processes. Example : --------- global daemon quiet nbproc 2

在haproxy的configuration之外，这将有助于做一些networking调整。

一个特定的事情可能会帮助确保您的networking接口不固定到单个CPU（假设您使用多个接口）。如果你在Linux上运行haproxy，你可以这样检查天平：

 egrep CPU\|eth /proc/interrupts

例如，这表明eth0和eth1的中断正在由不同的CPU处理：

 $ egrep CPU\|eth /proc/interrupts CPU0 CPU1 CPU2 CPU3 103: 3515635238 0 0 0 IR-PCI-MSI-edge eth0 104: 0 1976927064 0 0 IR-PCI-MSI-edge eth1

而这表明它们由同一个CPU处理：

 $ egrep CPU\|eth /proc/interrupts CPU0 CPU1 CPU2 CPU3 272: 1526254507 0 0 0 Dynamic-irq eth0 273: 4877925 0 0 0 Dynamic-irq eth1

您将要为这些接口启用smp亲和力。对于上面的示例，您可以执行以下操作：

 echo 010 > /proc/irq/272/smp_affinity echo 010 > /proc/irq/273/smp_affinity