麒麟工人间歇性地超时

我似乎没有理由得到麒麟工人的间歇性超时，我想要一些帮助来debugging实际的问题。这更糟，因为它可以处理大约10到20个请求，然后1将超时，然后再有10 – 20个请求，同样的事情会再次发生。

我已经创build了一个开发环境来说明这个问题，所以除了我之外没有任何交通。

这个堆栈是Ubuntu 14.04，Rails 3.2.21，PostgreSQL 9.3.4，Unicorn 4.8.3，Nginx 1.6.2。

问题

我会详细描述它不工作的时间。

我通过浏览器请求一个url。

Started GET "/offers.xml?q%5bupdated_at_greater_than_or_equal_to%5d=2014-12-28T18:01:16Z&q%5bupdated_at_less_than_or_equal_to%5d=2014-12-28T19:30:21Z" for 127.0.0.1 at 2014-12-30 15:58:59 +0000 Completed 200 OK in 10.3ms (Views: 0.0ms | ActiveRecord: 2.1ms)

正如你所看到的，这个请求在10.3ms内成功地完成了200个响应状态。

然而，浏览器挂了大约30秒，独angular兽杀死了工人：

 E, [2014-12-30T15:59:30.267605 #13678] ERROR -- : worker=0 PID:14594 timeout (31s > 30s), killing E, [2014-12-30T15:59:30.279000 #13678] ERROR -- : reaped #<Process::Status: pid 14594 SIGKILL (signal 9)> worker=0 I, [2014-12-30T15:59:30.355085 #23533] INFO -- : worker=0 ready

并在Nginx日志中的以下错误：

 2014/12/30 15:59:30 [error] 23463#0: *27 upstream prematurely closed connection while reading response header from upstream, client: 127.0.0.1, server: localhost, request: "GET /offers.xml?q%5bupdated_at_greater_than_or_equal_to%5d=2014-12-28T18:01:16Z&q%5bupdated_at_less_than_or_equal_to%5d=2014-12-28T19:30:21Z HTTP/1.1", upstream: "http://unix:/app/shared/tmp/sockets/unicorn.sock:/offers.xml?q%5bupdated_at_greater_than_or_equal_to%5d=2014-12-28T18:01:16Z&q%5bupdated_at_less_than_or_equal_to%5d=2014-12-28T19:30:21Z", host: "localhost", referrer: "http://localhost/offers.xml?q%5bupdated_at_greater_than_or_equal_to%5d=2014-12-28T18:01:16Z&q%5bupdated_at_less_than_or_equal_to%5d=2014-12-28T19:30:21Z"

再次。根本没有服务器上的负载。唯一的要求是我自己的，每10-20个随机请求都有同样的问题。

它看起来不像独angular兽在吃东西。我知道这一点，因为我正在使用watch -n 0.5 free -m ，这是结果。

  total used free shared buffers cached Mem: 1995 765 1229 0 94 405 -/+ buffers/cache: 264 1730 Swap: 511 0 511

所以服务器没有用完内存。

还有什么我可以做的debugging这个问题呢？或任何洞察到发生了什么？

我从Unicorn的家伙那里得到了一点点帮助。

该问题源自作为定制中间件的一部分运行的查询。这就是为什么它没有出现在任何日志。

这是违规的代码：

 connection = PG::Connection.open(db_info) query_result = connection.exec(sql)

该代码打开了一个到数据库的连接，执行了一个sql查询，但从未closures连接。我使用PG bouncer作为连接池，最多可以连接20个连接。

由于中间件始终都在build立新的连接，但从来没有closures它们，PGBouncer认为所有的连接都在使用中，并且阻止了更多的连接被打开。因此，请求被挂起等待连接到数据库。

我重构了代码，添加了以下几行来closures连接，现在一切运行顺利。

 connection.flush connection.finish

没有更多的超时。