请原谅我这个问题的天真,但这不是我目前了解得多的话题。
我的公司目前正在运行kubernetes-managed fluentd进程将日志推送到logstash。 这些stream畅的进程在启动后立即启动并失败,然后再次启动等等。
stream利的进程在CoreOS AWS实例上的Docker容器中运行。
当我查看正在运行的15个fluentd节点的任何日志时,它们都显示相同的内容。 下面是这些日志的缩减版本,并删除了一些重复内容和时间戳:
Connection opened to Elasticsearch cluster => {:host=>"elasticsearch-logging", :port=>9200, :scheme=>"http"} process finished code=9 fluentd main process died unexpectedly. restarting. starting fluentd-0.12.29 gem 'fluent-mixin-config-placeholders' version '0.4.0' gem 'fluent-mixin-plaintextformatter' version '0.2.6' gem 'fluent-plugin-docker_metadata_filter' version '0.1.3' gem 'fluent-plugin-elasticsearch' version '1.5.0' gem 'fluent-plugin-kafka' version '0.3.1' gem 'fluent-plugin-kubernetes_metadata_filter' version '0.24.0' gem 'fluent-plugin-mongo' version '0.7.15' gem 'fluent-plugin-rewrite-tag-filter' version '1.5.5' gem 'fluent-plugin-s3' version '0.7.1' gem 'fluent-plugin-scribe' version '0.10.14' gem 'fluent-plugin-td' version '0.10.29' gem 'fluent-plugin-td-monitoring' version '0.2.2' gem 'fluent-plugin-webhdfs' version '0.4.2' gem 'fluentd' version '0.12.29' adding match pattern="fluent.**" type="null" adding filter pattern="kubernetes.*" type="parser" adding filter pattern="kubernetes.*" type="parser" adding filter pattern="kubernetes.*" type="parser" adding filter pattern="kubernetes.**" type="kubernetes_metadata" adding match pattern="**" type="elasticsearch" adding source type="tail" adding source type="tail" adding source type="tail" ... using configuration file: <ROOT> <match fluent.**> type null </match> <source> type tail path /var/log/containers/*.log pos_file /var/log/es-containers.log.pos time_format %Y-%m-%dT%H:%M:%S.%NZ tag kubernetes.* format json read_from_head true </source> <filter kubernetes.*> @type parser format json key_name log reserve_data true suppress_parse_error_log true </filter> ... ... <match **> type elasticsearch log_level info include_tag_key true host elasticsearch-logging port 9200 logstash_format true buffer_chunk_limit 2M buffer_queue_limit 32 flush_interval 5s max_retry_wait 30 disable_retry_limit num_threads 8 </match> </ROOT> following tail of /var/log/containers/node-exporter-rqwwn_prometheus_node-exporter-78027c5c818ab42a143fdd684ce2e71bf15cc22e085cfb4f0155854d2248d572.log following tail of /var/log/containers/fluentd-elasticsearch-0qc6r_kube-system_fluentd-elasticsearch-fccf8db40a19df4a84575c77ac845921386db098d96ef27d1f565da1d928c336.log following tail of /var/log/containers/node-exporter-rqwwn_prometheus_POD-65ed0741bb78a32e6e129ebc9a96b56284f32d81aba0d66c129df02c9e05fb5b.log following tail of /var/log/containers/alertmanager-1407110495-s8j6k_prometheus_POD-1807d1ab9c99ce2c4da81fcd5b589e604f4c0dc85cc85a351706b52dc747d21b.log ... following tail of /var/log/containers/rail-prod-v071-n0zgz_prod_rail-a301220a36cf2a2a537668db44197e2c029f9cc1c60c345218909cd86a84e717.log Connection opened to Elasticsearch cluster => {:host=>"elasticsearch-logging", :port=>9200, :scheme=>"http"} process finished code=9 fluentd main process died unexpectedly. restarting. starting fluentd-0.12.29 ...
我想象没有足够的内存configuration,或沿着这些线的服务在启动时立即重新启动? 消息“处理完成代码= 9”是否指向特定的问题?
如果有人看到过这样的事情,请帮助我的评论。 谢谢。