我发现k8s内coredns的解析结果有点问题。经常解析不出来。

/ # nslookup kubernetes-dashboard.kube-system.svc.cluster.local
Server:         10.253.255.10
Address:        10.253.255.10:53

Non-authoritative answer:

*** Can't find kubernetes-dashboard.kube-system.svc.cluster.local: No answer

/ # nslookup kubernetes-dashboard.kube-system.svc.cluster.local
Server:         10.253.255.10
Address:        10.253.255.10:53

Name:   kubernetes-dashboard.kube-system.svc.cluster.local
Address: 10.253.255.40

*** Can't find kubernetes-dashboard.kube-system.svc.cluster.local: No answer

/ # nslookup kubernetes-dashboard.kube-system.svc.cluster.local
Server:         10.253.255.10
Address:        10.253.255.10:53

Name:   kubernetes-dashboard.kube-system.svc.cluster.local
Address: 10.253.255.40

阅读全文

在使用fluentd采集数据到kafka时,一直不通,碰到了很多报错。 fluentd版本为:1.2.5 fluent-plugin-kafka版本为:0.7.8 kafka版本为:0.9 最开始碰到了这个报错:

2018-09-05 01:42:06 +0000 [warn]: fluent/log.rb:342:warn: Send exception occurred: unknown topic 
2018-09-05 01:42:06 +0000 [warn]: fluent/log.rb:342:warn: Exception Backtrace : /var/lib/gems/2.3.0/gems/ruby-kafka-0.6.8/lib/kafka/protocol/metadata_response.rb:141:in `partitions_for'
/var/lib/gems/2.3.0/gems/ruby-kafka-0.6.8/lib/kafka/cluster.rb:155:in `partitions_for'
/var/lib/gems/2.3.0/gems/fluent-plugin-kafka-0.7.6/lib/fluent/plugin/kafka_producer_ext.rb:190:in `assign_partitions!'
/var/lib/gems/2.3.0/gems/fluent-plugin-kafka-0.7.6/lib/fluent/plugin/kafka_producer_ext.rb:153:in `block in deliver_messages_with_retries'
/var/lib/gems/2.3.0/gems/fluent-plugin-kafka-0.7.6/lib/fluent/plugin/kafka_producer_ext.rb:148:in `loop'
/var/lib/gems/2.3.0/gems/fluent-plugin-kafka-0.7.6/lib/fluent/plugin/kafka_producer_ext.rb:148:in `deliver_messages_with_retries'
/var/lib/gems/2.3.0/gems/fluent-plugin-kafka-0.7.6/lib/fluent/plugin/kafka_producer_ext.rb:102:in `deliver_messages'
/var/lib/gems/2.3.0/gems/fluent-plugin-kafka-0.7.6/lib/fluent/plugin/out_kafka2.rb:220:in `write'
/var/lib/gems/2.3.0/gems/fluentd-1.2.4/lib/fluent/plugin/output.rb:1110:in `try_flush'
/var/lib/gems/2.3.0/gems/fluentd-1.2.4/lib/fluent/plugin/output.rb:1389:in `flush_thread_run'
/var/lib/gems/2.3.0/gems/fluentd-1.2.4/lib/fluent/plugin/output.rb:444:in `block (2 levels) in start'
/var/lib/gems/2.3.0/gems/fluentd-1.2.4/lib/fluent/plugin_helper/thread.rb:78:in `block in thread_create'
2018-09-05 01:42:06 +0000 [info]: fluent/log.rb:322:info: initialized kafka producer: fluentd
2018-09-05 01:42:06 +0000 [debug]: fluent/log.rb:302:debug: taking back chunk for errors. chunk="57515e0ef787da843836cc864f9d1581"
2018-09-05 01:42:06 +0000 [warn]: fluent/log.rb:342:warn: failed to flush the buffer. retry_time=2 next_retry_seconds=2018-09-05 01:42:06 +0000 chunk="57515e0ef787da843836cc864f9d1581" error_class=Kafka::UnknownTopicOrPartition error="unknown topic "
  2018-09-05 01:42:06 +0000 [warn]: plugin/output.rb:1157:rescue in try_flush: suppressed same stacktrace
2018-09-05 01:42:09 +0000 [debug]: fluent/log.rb:302:debug: 61 messages send.
2018-09-05 01:42:09 +0000 [warn]: fluent/log.rb:342:warn: Send exception occurred: unknown topic 
2018-09-05 01:42:09 +0000 [warn]: fluent/log.rb:342:warn: Exception Backtrace : /var/lib/gems/2.3.0/gems/ruby-kafka-0.6.8/lib/kafka/protocol/metadata_response.rb:141:in `partitions_for'
/var/lib/gems/2.3.0/gems/ruby-kafka-0.6.8/lib/kafka/cluster.rb:155:in `partitions_for'
/var/lib/gems/2.3.0/gems/fluent-plugin-kafka-0.7.6/lib/fluent/plugin/kafka_producer_ext.rb:190:in `assign_partitions!'
/var/lib/gems/2.3.0/gems/fluent-plugin-kafka-0.7.6/lib/fluent/plugin/kafka_producer_ext.rb:153:in `block in deliver_messages_with_retries'
/var/lib/gems/2.3.0/gems/fluent-plugin-kafka-0.7.6/lib/fluent/plugin/kafka_producer_ext.rb:148:in `loop'
/var/lib/gems/2.3.0/gems/fluent-plugin-kafka-0.7.6/lib/fluent/plugin/kafka_producer_ext.rb:148:in `deliver_messages_with_retries'
/var/lib/gems/2.3.0/gems/fluent-plugin-kafka-0.7.6/lib/fluent/plugin/kafka_producer_ext.rb:102:in `deliver_messages'
/var/lib/gems/2.3.0/gems/fluent-plugin-kafka-0.7.6/lib/fluent/plugin/out_kafka2.rb:220:in `write'
/var/lib/gems/2.3.0/gems/fluentd-1.2.4/lib/fluent/plugin/output.rb:1110:in `try_flush'
/var/lib/gems/2.3.0/gems/fluentd-1.2.4/lib/fluent/plugin/output.rb:1389:in `flush_thread_run'
/var/lib/gems/2.3.0/gems/fluentd-1.2.4/lib/fluent/plugin/output.rb:444:in `block (2 levels) in start'
/var/lib/gems/2.3.0/gems/fluentd-1.2.4/lib/fluent/plugin_helper/thread.rb:78:in `block in thread_create'
2018-09-05 01:42:09 +0000 [info]: fluent/log.rb:322:info: initialized kafka producer: fluentd
2018-09-05 01:42:09 +0000 [debug]: fluent/log.rb:302:debug: taking back chunk for errors. chunk="57515e0ef787da843836cc864f9d1581"
2018-09-05 01:42:09 +0000 [warn]: fluent/log.rb:342:warn: failed to flush the buffer. retry_time=3 next_retry_seconds=2018-09-05 01:42:09 +0000 

这是因为没有配置default_topic,使用下面的配置指定topic就可以了。

阅读全文

nginx配置维护页面

经常性的,在版本上线时,我们需要配置一个维护页面,以便让用户看到。而同时自己还需要能访问。

也就是说在维护的同时,还需要指定的IP能访问。

以下就是一个nginx配置维护页面的例子:

其中:

/weihu/是维护页面的URL,应该在/data/www下建一个weihu的目录,把维护页面index.html放到这个目录内.

103.214.84.224|101.231.194.4|180.168.251.235为允许访问的IP地址。

最终效果:当用户访问真实的URL时,会显示跳转至/weihu/

阅读全文

目录

docker的swarm集群已经支持多主机的overlay网络,而且目前测试下来发现安装及配置非常方便,跟k8s相比,安装及配置要轻松好多。

1. 测试环境

使用2台虚拟机来测试,操作系统为ubuntu 14.04.04,系统自带内核为4.2,注意overlay需要3.16以上的内核版本。

主机名IP备注
ubuntu1192.168.11.21manger
ubuntu2192.168.11.22worker

2. 安装docker

在所有主机上安装docker,使用官方APT源。

#删除系统自带的docker
apt-get remove docker docker-engine docker.io

#安装内核模块
apt-get install \
    linux-image-extra-$(uname -r) \
    linux-image-extra-virtual

#下载安装Docker APT库源证书
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | apt-key add -
apt-key fingerprint 0EBFCD88

#增加APT库,使用阿里云镜像
add-apt-repository \
   "deb [arch=amd64] https://mirrors.aliyun.com/docker-ce/linux/ubuntu/ \
   $(lsb_release -cs) \
   stable"

#安装docker
apt-get update
apt-get install docker-ce

阅读全文

最近在配置nginx时,发现了一个问题,是关于nginx配置文件测试的。

如下的nginx配置,在upstream没有配置的情况下:

    location /frontend-gateway/ {
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header Host $host;
        proxy_http_version 1.1;
        proxy_set_header Connection "";
        set $globalTicket $pid--$remote_addr-$request_length-$connection;
        proxy_set_header globalTicket $globalTicket;

        proxy_pass http://o2o-frontend-gateway/;
    }

我们通过nginx -t测试可以发现,是可以测试通过的。

[root@sh-o2o-nginx-router-online-04 vhost.d]# /usr/local/nginx/sbin/nginx -t
the configuration file /usr/local/nginx/conf/nginx.conf syntax is ok
configuration file /usr/local/nginx/conf/nginx.conf test is successful

照说要报错才对。初步怀疑是把o2o-frontend-gateway当成一个域名了?

阅读全文

目录

1. k8s集群系统规划

1.1. kubernetes 1.10的依赖

k8s V1.10对一些相关的软件包,如etcd,docker并不是全版本支持或全版本测试,建议的版本如下:

  • docker: 1.11.2 to 1.13.1 and 17.03.x
  • etcd: 3.1.12
  • 全部信息如下:

参考:External Dependencies

  • The supported etcd server version is 3.1.12, as compared to 3.0.17 in v1.9 (#60988)
  • The validated docker versions are the same as for v1.9: 1.11.2 to 1.13.1 and 17.03.x (ref)
  • The Go version is go1.9.3, as compared to go1.9.2 in v1.9. (#59012)
  • The minimum supported go is the same as for v1.9: go1.9.1. (#55301)
  • CNI is the same as v1.9: v0.6.0 (#51250)
  • CSI is updated to 0.2.0 as compared to 0.1.0 in v1.9. (#60736)
  • The dashboard add-on has been updated to v1.8.3, as compared to 1.8.0 in v1.9. (#57326)
  • Heapster has is the same as v1.9: v1.5.0. It will be upgraded in v1.11. (ref)
  • Cluster Autoscaler has been updated to v1.2.0. (#60842, @mwielgus)
  • Updates kube-dns to v1.14.8 (#57918, @rramkumar1)
  • Influxdb is unchanged from v1.9: v1.3.3 (#53319)
  • Grafana is unchanged from v1.9: v4.4.3 (#53319)
  • CAdvisor is v0.29.1 (#60867)
  • fluentd-gcp-scaler is v0.3.0 (#61269)
  • Updated fluentd in fluentd-es-image to fluentd v1.1.0 (#58525, @monotek)
  • fluentd-elasticsearch is v2.0.4 (#58525)
  • Updated fluentd-gcp to v3.0.0. (#60722)
  • Ingress glbc is v1.0.0 (#61302)
  • OIDC authentication is coreos/go-oidc v2 (#58544)
  • Updated fluentd-gcp updated to v2.0.11. (#56927, @x13n)
  • Calico has been updated to v2.6.7 (#59130, @caseydavenport)

1.2 测试服务器准备及环境规划

服务器名IP功 能安装服务
sh-saas-cvmk8s-master-0110.12.96.3mastermaster,etcd
sh-saas-cvmk8s-master-0210.12.96.5mastermaster,etcd
sh-saas-cvmk8s-master-0310.12.96.13mastermaster,etcd
sh-saas-cvmk8s-node-0110.12.96.2nodenode
sh-saas-cvmk8s-node-0210.12.96.4nodenode
sh-saas-cvmk8s-node-0310.12.96.6nodenode
bs-ops-test-docker-dev-04172.21.248.242私有镜像仓库harbor
VIP10.12.96.100master vipnetmask:255.255.255.0

netmask都为:255.255.255.0

所有的测试服务器安装centos linux 7.4最新版本.

VIP:10.12.96.100只是用于keepalived的测试,实际本文使用的是腾讯云LB+haproxy的模式,使用的腾讯云LB VIP为:10.12.16.101

容器网段:10.254.0.0/16 容器网段需要避免这些冲突:

  • 同vpc的其它集群的集群网络cidr
  • 所在vpc的cidr
  • 所在vpc的子网路由的cidr
  • route-ctl list 能看到的所有route table 的 cidr 容器网段不要在VPC内创建,也要不在VPC的路由表内,使用一个VPC内不存在的网络。

k8s service cluster网络:10.254.255.0/24

阅读全文

一个跑了满久的activemq停止后再启动就自动退出了,查看日志有以下报错:

2018-06-11 17:54:21,483 | WARN  | Some journal files are missing: [17496, 17495, 17494, 11811, 11807, 11793] | org.apache.activemq.store.kahadb.MessageDatabase | main
2018-06-11 17:54:21,704 | ERROR | [0:ExceptionDLQ.ActivityResultPostProcess] references corrupt locations. 10 messages affected. | org.apache.activemq.store.kahadb.MessageDatabase | m
ain
2018-06-11 17:54:21,706 | ERROR | Failed to start Apache ActiveMQ ([localhost, null], java.io.IOException: Detected missing/corrupt journal files referenced by:[0:ExceptionDLQ.Activit
yResultPostProcess] 10 messages affected.) | org.apache.activemq.broker.BrokerService | main
2018-06-11 17:54:21,711 | INFO  | Apache ActiveMQ 5.13.3 (localhost, null) is shutting down | org.apache.activemq.broker.BrokerService | main
2018-06-11 17:54:21,715 | INFO  | Connector openwire stopped | org.apache.activemq.broker.TransportConnector | main
2018-06-11 17:54:21,718 | INFO  | Connector amqp stopped | org.apache.activemq.broker.TransportConnector | main
2018-06-11 17:54:21,721 | INFO  | Connector stomp stopped | org.apache.activemq.broker.TransportConnector | main
2018-06-11 17:54:21,724 | INFO  | Connector mqtt stopped | org.apache.activemq.broker.TransportConnector | main
2018-06-11 17:54:21,727 | INFO  | Connector ws stopped | org.apache.activemq.broker.TransportConnector | main

解决方法:

把conf/activemq.xml文件内以下配置从:

        <persistenceadapter>
            <kahadb directory="${activemq.data}/kahadb"></kahadb>
        </persistenceadapter>

改成:

        <persistenceadapter>
            <kahadb directory="${activemq.data}/kahadb"
                    ignoreMissingJournalfiles="true"
                    checkForCorruptJournalFiles="true"
                    checksumJournalFiles="true"></kahadb>
        </persistenceadapter>

保存后退出。

阅读全文

作者的图片

阿辉

容器技术及容器集群等分布式系统研究

容器平台负责人

上海