skywalking集群部署及业务接入

2019/04/22 skywalking

skywalking集群部署及业务接入

容量规划

IP zk es sw
xx.xx.xx.64 zookeeper node-64 OAPServer、skywalking-webapp
xx.xx.xx.65 zookeeper node-65 OAPServer
xx.xx.xx.66 zookeeper node-66  

安装ZooKeeper

下载

进入官网下载,当前使用 zookeeper-3.5.4-beta.tar.gz 版本

解压

tar -zxvf zookeeper-3.5.4-beta.tar.gz -C /data/

配置(先在一台节点上配置)

# 添加一个zoo.cfg配置文件
$ZOOKEEPER/conf
mv zoo_sample.cfg zoo.cfg

# 修改配置文件(zoo.cfg)
dataDir=/data/zookeeper-3.4.6/data

server.1=xx.xx.xx.64:2888:3888
server.2=xx.xx.xx.65:2888:3888
server.3=xx.xx.xx.66:2888:3888

# 在(dataDir=/data/zookeeper/data)创建一个myid文件,里面内容是server.N中的N(server.2里面内容为2)
echo "1" > myid

# 将配置好的zk拷贝到其他节点

# 注意:在其他节点上一定要修改myid的内容
在xx.xx.xx.65应该讲myid的内容改为2 (echo "2" > /data/zookeeper/data/myid)
在xx.xx.xx.66应该讲myid的内容改为3 (echo "3" > /data/zookeeper/data/myid)

启动集群

分别启动zk
/data/zookeeper/bin/zkServer.sh start

检查启动状态
/data/zookeeper-3.4.6/bin/zkServer.sh status
JMX enabled by default
Using config: /data/zookeeper-3.4.6/bin/../conf/zoo.cfg
Mode: leader

连接

/data/zookeeper/bin/zkCli.sh
[zk: localhost:2181(CONNECTED) 2] ls /skywalking/sw/remote
[81d77f0b-da8e-44ac-815f-b3bdc389877a, e7094b12-7b76-4c8e-8ae0-f4adbb28264e

安装elasticsearch

下载

wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-6.5.4.tar.gz

设置用户

三台机器都统一用户为es

[root@es1 ~]# useradd es
You have new mail in /var/spool/mail/root
[root@biluos ~]# passwd es                                
Changing password for user es.
New password: 
BAD PASSWORD: it is based on a dictionary word
BAD PASSWORD: is too simple
Retype new password: 
passwd: all authentication tokens updated successfully.
[root@es1 ~]#  mkdir /home/es
mkdir: cannot create directory `/home/es': File exists
[root@es1 ~]#  ll /home/   # 注意是不是es用户和用户组 
total 4
drwx------ 3 es es 4096 Feb 25 03:51 es

三台机器都建立/data/elasticsearch目录,用来存放es软件包和数据存储,使用es用户

[root@es1 ~]# su es
[es@es1 ~]$ mkdir -p /data/elasticsearch

其余两台此处省略

解压

三台机器都解压安装包到/home/es/elasticsearch 下载包:elasticsearch-6.5.4.tar.gz

解压:

tar -zxvf /home/es/elasticsearch/elasticsearch-6.5.4.tar.gz -C /data/elasticsearch

修改权限

三台机器都修改es软件包的权限为es用户

使用root用户修改权限
[es@es1 ~]$ su root
Password: 
[root@es1 es]# chown -R es:es /data/elasticsearch/

其余两台此处省略

三台机器都创建data数据目录和日志目录,使用es用户

[root@es1 es]# su es
[es@es1 ~]$ mkdir -p /data/elasticsearch/data/
[es@es1 ~]$ mkdir -p /data/elasticsearch/logs/

其余两台此处省略

修改es配置

三台机器都修改配置

vim /data/elasticsearch/config/elasticsearch.yml

# 集群名称
cluster.name: skywalking-es

# 日志路径
path.logs: /data/elasticsearch/logs/

# 服务端口
http.port: 9200

# 集群发现 集群节点ip或者主机
discovery.zen.ping.unicast.hosts: ["xx.xx.xx.64", "xx.xx.xx.65" ,"xx.xx.xx.66"]

#设置这个参数来保证集群中的节点可以知道其它N个有master资格的节点。默认为1,对于大的集群来说,可以设置大一点的值(2-4)
discovery.zen.minimum_master_nodes: 2

# 下面两行配置为haad插件配置,三台服务器一致。
http.cors.enabled: true
http.cors.allow-origin: "*"

修改系统配置

三台机器都修改 Linux下/etc/security/limits.conf文件设置

更改linux的最大文件描述限制要求
添加或修改如下:
* soft nofile 262144
* hard nofile 262144 更改linux的锁内存限制要求
添加或修改如下:
es soft memlock unlimited
es hard memlock unlimited 

最后配置如下
# End of file
* soft nofile 262144
* hard nofile 262144
es soft memlock unlimited                                                                                                                                         
es hard memlock unlimited

三台机器都修改配置 Linux下/etc/security/limits.d/90-nproc.conf文件设置 更改linux的的最大线程数,添加或修改如下(这里es是es用户):

* soft nproc unlimited
vim /etc/security/limits.d/90-nproc.conf
*          soft    nproc     unlimited
es       soft    nproc     unlimited

三台机器都修改配置 Linux下/etc/sysctl.conf文件设置 更改linux一个进行能拥有的最多的内存区域要求,添加或修改如下:

vm.max_map_count = 262144 更改linux禁用swapping,添加或修改如下:
vm.swappiness = 1 
vim /etc/sysctl.conf

vm.max_map_count = 262144
vm.swappiness = 1

启动

[es@es1]$ /data/elasticsearch/bin/elasticsearch -d
[es@es2]$ /data/elasticsearch/bin/elasticsearch -d
[es@es3]$ /data/elasticsearch/bin/elasticsearch -d

结果

  • http://xx.xx.xx.64:9200/
{
    "name": "node-64",
    "cluster_name": "skywalking-es",
    "cluster_uuid": "2JLf_nL6RnixLkrQ4pD03Q",
    "version": {
        "number": "6.5.4",
        "build_flavor": "default",
        "build_type": "tar",
        "build_hash": "d2ef93d",
        "build_date": "2018-12-17T21:17:40.758843Z",
        "build_snapshot": false,
        "lucene_version": "7.5.0",
        "minimum_wire_compatibility_version": "5.6.0",
        "minimum_index_compatibility_version": "5.0.0"
    },
    "tagline": "You Know, for Search"
}
  • http://xx.xx.xx.64:9200/_cat/nodes?v
ip          heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name
xx.xx.xx.66           34          21   3    0.96    1.01     1.04 mdi       -      node-66
xx.xx.xx.65           36          98   0    0.87    0.63     0.42 mdi       -      node-65
xx.xx.xx.64           39          45   0    0.14    0.13     0.14 mdi       *      node-64
  • http://xx.xx.xx.64:9200/_cat/shards?v
index                                      shard prirep state   docs   store ip          node
sw_instance_jvm_memory_heap_max_month      1     p      STARTED    1   4.1kb xx.xx.xx.65 node-65
sw_instance_jvm_memory_heap_max_month      0     p      STARTED    2     8kb xx.xx.xx.66 node-66
sw_endpoint_relation_cpm                   1     p      STARTED    0    261b xx.xx.xx.66 node-66
sw_endpoint_relation_cpm                   0     p      STARTED    0    261b xx.xx.xx.64 node-64
sw_instance_jvm_old_gc_time_hour           1     p      STARTED    3  12.1kb xx.xx.xx.66 node-66
sw_instance_jvm_old_gc_time_hour           0     p      STARTED    3  11.8kb xx.xx.xx.64 node-64
sw_service_p90_month                       1     p      STARTED    0    261b xx.xx.xx.65 node-65
sw_service_p90_month                       0     p      STARTED    0    261b xx.xx.xx.66 node-66
sw_service_instance_sla_month              1     p      STARTED    0    261b xx.xx.xx.65 node-65
sw_service_instance_sla_month              0     p      STARTED    0    261b xx.xx.xx.66 node-66
sw_service_p90                             1     p      STARTED    0    261b xx.xx.xx.65 node-65
sw_service_p90                             0     p      STARTED    0    261b xx.xx.xx.66 node-66
sw_endpoint_p95_month                      1     p      STARTED    0    261b xx.xx.xx.65 node-65
sw_endpoint_p95_month                      0     p      STARTED    0    261b xx.xx.xx.66 node-66
sw_endpoint_cpm_month                      1     p      STARTED    0    261b xx.xx.xx.65 node-65
sw_endpoint_cpm_month                      0     p      STARTED    0    261b xx.xx.xx.66 node-66
sw_service_instance_cpm                    1     p      STARTED    0    261b xx.xx.xx.64 node-64
sw_service_instance_cpm                    0     p      STARTED    0    261b xx.xx.xx.65 node-65
sw_service_p95                             1     p      STARTED    0    261b xx.xx.xx.66 node-66

安装skywalking

下载

wget https://www.apache.org/dyn/closer.cgi/incubator/skywalking/6.0.0-GA/apache-skywalking-apm-incubating-6.0.0-GA.tar.gz

解压

tar -zxvf /data/software/apache-skywalking-apm-incubating-6.0.0-GA.tar.gz -C /data/

修改配置

/data/skywalking/config/application.yml

修改zk配置cluster.zookeeper 和 storage.elasticsearch集群配置

启动

# xx.xx.xx.64 启动OAPServer、skywalking-webapp进程
/data/skywalking/bin/startup.sh
# xx.xx.xx.65 启动OAPServer进程
/data/skywalking/bin/oapService.sh

查看

http://xx.xx.xx.64:8080 admin/admin

接入

复制agent

cp -r /data/skywalking/agent/ /data/sky_agent/

修改配置

/data/sky_agent/agent/config/agent.config

agent.service_name=${SW_AGENT_NAME:predictor-serving}
collector.backend_service=${SW_AGENT_COLLECTOR_BACKEND_SERVICES:xx.xx.xx.64:11800,xx.xx.xx.65:11800}

采样配置

agent.sample_n_per_3_secs=${SW_AGENT_SAMPLE:1000}

skywalking 全自动探针监控,不需要修改应用程序代码

高性能探针,针对单实例5000tps的应用,在全量采集的情况下,只增加 10% 的CPU开销。换成取样数来计算,SAMPLE_N_PER_3_SECS = 15000(5000 * 3 ) 只增加 10% 的CPU开销。

将取样率设置为 SAMPLE_N_PER_3_SECS = 1500  预计大约会增加 1% 的CPU开销。

那么,具体值视系统或服务的并发情况,可在测试环境下取得经验值的尝试范围将控制在[500 - 1500]即可

手动探针

  • 引入依赖
<!--手动探针-->
<dependency>
    <groupId>org.apache.skywalking</groupId>
    <artifactId>apm-toolkit-trace</artifactId>
    <version>6.0.0-GA</version>
</dependency>
<dependency>
    <groupId>org.apache.skywalking</groupId>
    <artifactId>apm-toolkit-opentracing</artifactId>
    <version>6.0.0-GA</version>
</dependency>
<dependency>
    <groupId>org.apache.skywalking</groupId>
    <artifactId>apm-toolkit-log4j-1.x</artifactId>
    <version>6.0.0-GA</version>
</dependency>
<!--手动探针-->
  • 自行需要埋点方法中加入 @Trace
    @RequestMapping("/hello")
    public String index() throws InterruptedException {
        functionA();
        functionB();
        functionC();
        return "Hello World";
    }

    @Trace
    private void functionA() throws InterruptedException {
        long rangeLong = new RandomDataGenerator().nextLong(100, 200);
        Thread.sleep(rangeLong);
        LOGGER.info("functionA traceId:{} use:{} ms", TraceContext.traceId(), rangeLong);
        //在被追踪的方法中自定义 tag.
        ActiveSpan.tag("functionA_tag", "exec functionA");
        functionA_1();
    }

    @Trace
    private void functionA_1() throws InterruptedException {
        long rangeLong = new RandomDataGenerator().nextLong(100, 200);
        Thread.sleep(rangeLong);
        LOGGER.info("functionA_1 traceId:{} use:{} ms", TraceContext.traceId(), rangeLong);
        ActiveSpan.tag("functionA_1_tag", "exec functionA_1");

        functionA_1_1();
        functionA_1_2();
    }

启动agent

jvm 启动参数增加agent

-javaagent:/data/sky_agent/agent/skywalking-agent.jar

image

image

Search

    Post Directory