清漆随机和零星的503错误

编辑1 :我尝试了在我的preprod环境build议没有成功的设置。 由于我可以在另一个环境中重现这个问题,我试图寻找相似之处。

我禁用了logrotate清漆没有积极的结果。 唯一的共同点是cron.hourly。 哪个是空的

这是我在/ var / log / cron中的

Feb 3 14:01:01 SPRX0032 CROND[32006]: (root) CMD (run-parts /etc/cron.hourly) Feb 3 14:01:01 SPRX0032 run-parts(/etc/cron.hourly)[32006]: starting 0anacron Feb 3 14:01:01 SPRX0032 run-parts(/etc/cron.hourly)[32020]: finished 0anacron 

在同一时刻,我的探针:

 03-02-14_14:01:01 - - - /!\ WARNING /!\|HTTP code 503| website1 SERVER IS DOWN booh! /!\ WARNING /!\ 03-02-14_14:01:01 - - - /!\ WARNING /!\|HTTP code 503| website2 SERVER IS DOWN booh! /!\ WARNING /!\ 03-02-14_14:01:01 - - - /!\ WARNING /!\|HTTP code 503| website3 SERVER IS DOWN booh! /!\ WARNING /!\ 03-02-14_14:01:01 - - - /!\ WARNING /!\|HTTP code 503| website4 SERVER IS DOWN booh! /!\ WARNING /!\ 

原文:

我有一个2 varnish(3.0.4)群集与3 Apache(2.2.15)前面的故障转移主任,我遇到偶发的503错误。 这种情况完全是随机的,在很多页面上,持续一分钟或更less。 为了find罪魁祸首,我设置了3个探测器,包括一个curl脚本,每分钟获取页面的HTTP状态代码。 我的探针设置为从以下页面获取页面:

  • 个人联系。
  • 局域网打光油。
  • 局域网击中Apache。

请求Apache的那个没有错误。 来自外部的两个探针和一个打上清漆的探针都会产生错误。

  29-01-14_06:17:01 - - - |HTTP code 200||response time:0.254|website1 is UP woot! 29-01-14_06:17:02 - - - |HTTP code 200||response time:0.264|website2 is UP woot! 29-01-14_06:17:02 - - - |HTTP code 200||response time:0.477|website3 is UP woot! 29-01-14_06:17:03 - - - |HTTP code 200||response time:0.283|website4 is UP woot! 29-01-14_06:17:04 - - - |HTTP code 200||response time:0.782|website5 is UP woot! 29-01-14_06:18:28 - - - |HTTP code 200||response time:0.167|website1 is UP woot! 29-01-14_06:18:28 - - - /!\ WARNING /!\|HTTP code 503| website2 SERVER IS DOWN booh! /!\ WARNING /!\ 29-01-14_06:18:28 - - - /!\ WARNING /!\|HTTP code 503| website3 SERVER IS DOWN booh! /!\ WARNING /!\ 29-01-14_06:18:29 - - - /!\ WARNING /!\|HTTP code 503| website4 SERVER IS DOWN booh! /!\ WARNING /!\ 29-01-14_06:18:29 - - - /!\ WARNING /!\|HTTP code 503| website5 SERVER IS DOWN booh! /!\ WARNING /!\ 29-01-14_06:19:01 - - - |HTTP code 200||response time:0.243|website1 is UP woot! 29-01-14_06:19:02 - - - |HTTP code 200||response time:0.313|website2 is UP woot! 29-01-14_06:19:03 - - - |HTTP code 200||response time:0.552|website3 is UP woot! 29-01-14_06:19:03 - - - |HTTP code 200||response time:0.348|website4 is UP woot! 29-01-14_06:19:05 - - - |HTTP code 200||response time:0.704|website5 is UP woot! 

这是我的后端

 backend srv1 { .host = "srv1"; .port = "80"; .first_byte_timeout = 300s; .connect_timeout = 5s; .between_bytes_timeout = 60s; .probe = { .url = "/"; .interval = 5s; .timeout = 2s; .window = 5; .threshold = 3; } } backend srv2{ .host = "srv2"; .port = "80"; .first_byte_timeout = 300s; .connect_timeout = 5s; .between_bytes_timeout = 60s; .probe = { .url = "/"; .interval = 5s; .timeout = 2s; .window = 5; .threshold = 3; } } backend srv3 { .host = "srv3"; .port = "80"; .first_byte_timeout = 300s; .connect_timeout = 5s; .between_bytes_timeout = 60s; .probe = { .url = "/"; .interval = 5s; .timeout = 2s; .window = 5; .threshold = 3; } } 

Varnish说我的后端在503时刻是健康的:

 varnishadm debug.health Backend srv1 is Healthy Current states good: 5 threshold: 3 window: 5 Average responsetime of good probes: 0.070728 Oldest Newest ================================================================ 4444444444444444444444444444444444444444444444444444444444444444 Good IPv4 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX Good Xmit RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR Good Recv HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH Happy Backend srv2 is Healthy Current states good: 5 threshold: 3 window: 5 Average responsetime of good probes: 0.089797 Oldest Newest ================================================================ 4444444444444444444444444444444444444444444444444444444444444444 Good IPv4 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX Good Xmit RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR Good Recv HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH Happy Backend srv3 is Healthy Current states good: 5 threshold: 3 window: 5 Average responsetime of good probes: 0.068935 Oldest Newest ================================================================ 4444444444444444444444444444444444444444444444444444444444444444 Good IPv4 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX Good Xmit RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR Good Recv HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH Happy 

我运行多个网站(约200个域名),没有其他问题。

Varnishstat报告连接后端的一些错误

清漆#1

  13+23:09:28 Hitrate ratio: 3 3 3 Hitrate avg: 0.7669 0.7669 0.7669 [...] 8 0.00 0.00 backend_fail - Backend conn. failures 

清漆#2

 13+05:56:44 Hitrate ratio: 4 4 4 Hitrate avg: 0.8783 0.8783 0.8783 [...] 16 0.00 0.00 backend_fail - Backend conn. failures 

错误可能发生在POST或GET上

用FetchError c后端写入错误:0(成功)

  23 SessionOpen c xxx 40518 :80 23 ReqStart c xxx 40518 1750308108 23 RxRequest c POST 23 RxURL c /index.php?option=com_jce&task=plugin&plugin=imgmanager&file=imgmanager&method=form&cid=20&6bc427c8a7981f4fe1f5ac65c1246b5f=cf6dd3cf1923c950586d0dd595c8e20b 23 RxProtocol c HTTP/1.1 23 RxHeader c Reverse-Via: xxxx 23 RxHeader c Host: xxxxx 23 RxHeader c Content-Type: multipart/form-data; boundary=---------------------------41184676334 23 RxHeader c User-Agent: BOT/0.1 (BOT for JCE) 23 RxHeader c Connection: Keep-Alive 23 RxHeader c Content-Length: 5000 23 VCL_call c recv pass 23 VCL_call c hash 23 Hash c /index.php?option=com_jce&task=plugin&plugin=imgmanager&file=imgmanager&method=form&cid=20&6bc427c8a7981f4fe1f5ac65c1246b5f=cf6dd3cf1923c950586d0dd595c8e20b 23 Hash c xxxx 23 VCL_return c hash 23 VCL_call c pass pass 23 Backend c 46 cluster srv2 23 FetchError c backend write error: 0 (Success) 23 VCL_call c error deliver 23 VCL_call c deliver deliver 23 TxProtocol c HTTP/1.1 23 TxStatus c 503 23 TxResponse c Service Unavailable 23 TxHeader c Server: Varnish 23 TxHeader c Content-Type: text/html; charset=utf-8 23 TxHeader c Retry-After: 5 23 TxHeader c Content-Length: 419 23 TxHeader c Accept-Ranges: bytes 23 TxHeader c Date: Tue, 28 Jan 2014 22:18:53 GMT 23 TxHeader c X-Varnish: 1750308108 23 TxHeader c Age: 0 23 TxHeader c Via: 1.1 varnish 23 TxHeader c Connection: close 23 TxHeader c X-Cache: MISS from Varnish 23 Length c 419 23 ReqEnd c 1750308108 1390947533.416723251 1390947533.822463989 0.000115156 0.405680656 0.000060081 

使用FetchError进行GET c没有后端连接

  52 ReqStart c xxx 57491 1750328476 52 RxRequest c GET 52 RxURL c /filestore/retriever.flv 52 RxProtocol c HTTP/1.1 52 RxHeader c Reverse-Via: xxx 52 RxHeader c Host: xxxx 52 RxHeader c Cookie: ISAWPLB{2722D8D3-4E35-4741-92E0-22ADE0EA8C7F}={FD7F277E-829D-4B70-8BCA-9DCB3390060F} 52 RxHeader c Referer: xxxx 52 RxHeader c User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; Trident/7.0; MAARJS; rv:11.0) like Gecko 52 RxHeader c Accept: */* 52 RxHeader c Accept-Language: nb-NO 52 RxHeader c x-flash-version: 12,0,0,38 52 RxHeader c Cache-Control: no-cache 52 RxHeader c Connection: Keep-Alive 52 VCL_call c recv lookup 52 VCL_call c hash 52 Hash c /filestore/retriever.flv 52 Hash c xxx 52 VCL_return c hash 52 VCL_call c miss fetch 52 FetchError c no backend connection 52 VCL_call c error deliver 52 VCL_call c deliver deliver 52 TxProtocol c HTTP/1.1 52 TxStatus c 503 52 TxResponse c Service Unavailable 52 TxHeader c Server: Varnish 52 TxHeader c Content-Type: text/html; charset=utf-8 52 TxHeader c Retry-After: 5 52 TxHeader c Content-Length: 419 52 TxHeader c Accept-Ranges: bytes 52 TxHeader c Date: Tue, 28 Jan 2014 23:00:53 GMT 52 TxHeader c X-Varnish: 1750328476 52 TxHeader c Age: 0 52 TxHeader c Via: 1.1 varnish 52 TxHeader c Connection: close 52 TxHeader c X-Cache: MISS from Varnish 52 Length c 419 52 ReqEnd c 1750328476 1390950053.502151966 1390950053.502464533 1.025068998 0.000263214 0.000049353 

我有更多的条目

 FetchError c http first read error: -1 0 (Success) FetchError c Could not get storage FetchError c http first read error: -1 11 (Resource temporarily unavailable) 

Apache超时设置为300秒,保持连接被禁用。

清漆如何确定服务器错误而探针

我希望尽可能清楚。 如果你需要更多的信息,请告诉我。

谢谢。