Fork me on GitHub

Spring Boot 2.x监控数据可视化(Actuator + Prometheus + Grafana手把手)

目录

TIPS

本文基于Spring Boot 2.1.4,理论支持Spring Boot 2.x所有版本

众所周知,Spring Boot有个子项目Spring Boot Actuator,它为应用提供了强大的监控能力。从Spring Boot 2.0开始,Actuator将底层改为Micrometer,提供了更强、更灵活的监控能力。Micrometer是一个监控门面,可以类比成监控界的 Slf4j

借助Micrometer,应用能够对接各种监控系统,例如:

下面演示如何对接 Prometheus ,并使用 Grafana 实现数据的可视化。

TIPS

童鞋们对Prometheus或Grafana不熟悉也没关系,本文是手把手文章,按步骤操作即可。

编码

编写代码

  • 加依赖

    1
    2
    3
    4
    5
    6
    7
    8
    <dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-actuator</artifactId>
    </dependency>
    <dependency>
    <groupId>io.micrometer</groupId>
    <artifactId>micrometer-registry-prometheus</artifactId>
    </dependency>

    这里,我们为应用引入了 micrometer-registry-prometheus ,事实上,你想对接上文列表中的哪款监控系统,就写啥。例如想对接 Influx ,则需添加依赖 micrometer-registry-influx

  • 写配置

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    server:
    port: 8080
    spring:
    application:
    name: prometheus-test
    management:
    endpoints:
    web:
    exposure:
    include: 'prometheus'
    metrics:
    tags:
    application: ${spring.application.name}

    如配置所示,指定应用名为 prometheus-test ,并将 Actuator/actuator/prometheus 端点暴露出来; management.metrics.tags.application=prometheus-test 作用是为指标设置一个名为application="prometheus-test" 的Tag,Tag是Prometheus提供的一种能力,从而实现更加灵活的筛选。

测试

  • 启动应用

  • 访问 http://localhost:8080/actuator/prometheus 可获得类似如下的结果:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    64
    65
    66
    67
    68
    69
    70
    71
    72
    73
    74
    75
    76
    77
    78
    79
    80
    81
    82
    83
    84
    85
    86
    87
    88
    89
    90
    91
    92
    93
    94
    95
    96
    97
    98
    99
    100
    101
    102
    103
    104
    105
    106
    107
    108
    109
    110
    111
    112
    113
    114
    115
    116
    117
    118
    119
    120
    121
    122
    123
    124
    125
    126
    127
    128
    129
    130
    131
    132
    133
    134
    135
    136
    137
    138
    139
    140
    141
    142
    143
    144
    145
    146
    147
    148
    149
    150
    151
    152
    153
    154
    155
    156
    157
    158
    159
    160
    161
    162
    163
    164
    165
    # HELP jvm_memory_used_bytes The amount of used memory
    # TYPE jvm_memory_used_bytes gauge
    jvm_memory_used_bytes{application="prometheus-test",area="heap",id="PS Old Gen",} 2.1193976E7
    jvm_memory_used_bytes{application="prometheus-test",area="nonheap",id="Metaspace",} 3.8791688E7
    jvm_memory_used_bytes{application="prometheus-test",area="heap",id="PS Survivor Space",} 0.0
    jvm_memory_used_bytes{application="prometheus-test",area="nonheap",id="Compressed Class Space",} 5303976.0
    jvm_memory_used_bytes{application="prometheus-test",area="heap",id="PS Eden Space",} 8.2574816E7
    jvm_memory_used_bytes{application="prometheus-test",area="nonheap",id="Code Cache",} 8693824.0
    # HELP tomcat_global_received_bytes_total
    # TYPE tomcat_global_received_bytes_total counter
    tomcat_global_received_bytes_total{application="prometheus-test",name="http-nio-8080",} 0.0
    # HELP jvm_threads_daemon_threads The current number of live daemon threads
    # TYPE jvm_threads_daemon_threads gauge
    jvm_threads_daemon_threads{application="prometheus-test",} 20.0
    # HELP tomcat_sessions_alive_max_seconds
    # TYPE tomcat_sessions_alive_max_seconds gauge
    tomcat_sessions_alive_max_seconds{application="prometheus-test",} 0.0
    # HELP jvm_buffer_memory_used_bytes An estimate of the memory that the Java virtual machine is using for this buffer pool
    # TYPE jvm_buffer_memory_used_bytes gauge
    jvm_buffer_memory_used_bytes{application="prometheus-test",id="mapped",} 0.0
    jvm_buffer_memory_used_bytes{application="prometheus-test",id="direct",} 90112.0
    # HELP jvm_threads_states_threads The current number of threads having NEW state
    # TYPE jvm_threads_states_threads gauge
    jvm_threads_states_threads{application="prometheus-test",state="runnable",} 9.0
    jvm_threads_states_threads{application="prometheus-test",state="new",} 0.0
    jvm_threads_states_threads{application="prometheus-test",state="terminated",} 0.0
    jvm_threads_states_threads{application="prometheus-test",state="blocked",} 0.0
    jvm_threads_states_threads{application="prometheus-test",state="waiting",} 12.0
    jvm_threads_states_threads{application="prometheus-test",state="timed-waiting",} 3.0
    # HELP process_cpu_usage The "recent cpu usage" for the Java Virtual Machine process
    # TYPE process_cpu_usage gauge
    process_cpu_usage{application="prometheus-test",} 0.0030590633504868434
    # HELP logback_events_total Number of error level events that made it to the logs
    # TYPE logback_events_total counter
    logback_events_total{application="prometheus-test",level="info",} 7.0
    logback_events_total{application="prometheus-test",level="warn",} 0.0
    logback_events_total{application="prometheus-test",level="trace",} 0.0
    logback_events_total{application="prometheus-test",level="debug",} 0.0
    logback_events_total{application="prometheus-test",level="error",} 0.0
    # HELP tomcat_global_sent_bytes_total
    # TYPE tomcat_global_sent_bytes_total counter
    tomcat_global_sent_bytes_total{application="prometheus-test",name="http-nio-8080",} 195356.0
    # HELP process_files_max_files The maximum file descriptor count
    # TYPE process_files_max_files gauge
    process_files_max_files{application="prometheus-test",} 10240.0
    # HELP tomcat_threads_busy_threads
    # TYPE tomcat_threads_busy_threads gauge
    tomcat_threads_busy_threads{application="prometheus-test",name="http-nio-8080",} 1.0
    # HELP process_files_open_files The open file descriptor count
    # TYPE process_files_open_files gauge
    process_files_open_files{application="prometheus-test",} 101.0
    # HELP tomcat_sessions_active_current_sessions
    # TYPE tomcat_sessions_active_current_sessions gauge
    tomcat_sessions_active_current_sessions{application="prometheus-test",} 0.0
    # HELP jvm_classes_unloaded_classes_total The total number of classes unloaded since the Java virtual machine has started execution
    # TYPE jvm_classes_unloaded_classes_total counter
    jvm_classes_unloaded_classes_total{application="prometheus-test",} 2.0
    # HELP jvm_memory_committed_bytes The amount of memory in bytes that is committed for the Java virtual machine to use
    # TYPE jvm_memory_committed_bytes gauge
    jvm_memory_committed_bytes{application="prometheus-test",area="heap",id="PS Old Gen",} 1.5466496E8
    jvm_memory_committed_bytes{application="prometheus-test",area="nonheap",id="Metaspace",} 4.1418752E7
    jvm_memory_committed_bytes{application="prometheus-test",area="heap",id="PS Survivor Space",} 1.6252928E7
    jvm_memory_committed_bytes{application="prometheus-test",area="nonheap",id="Compressed Class Space",} 5767168.0
    jvm_memory_committed_bytes{application="prometheus-test",area="heap",id="PS Eden Space",} 1.73539328E8
    jvm_memory_committed_bytes{application="prometheus-test",area="nonheap",id="Code Cache",} 8716288.0
    # HELP http_server_requests_seconds
    # TYPE http_server_requests_seconds summary
    http_server_requests_seconds_count{application="prometheus-test",exception="None",method="GET",outcome="SUCCESS",status="200",uri="/actuator/prometheus",} 16.0
    http_server_requests_seconds_sum{application="prometheus-test",exception="None",method="GET",outcome="SUCCESS",status="200",uri="/actuator/prometheus",} 0.326299973
    http_server_requests_seconds_count{application="prometheus-test",exception="None",method="GET",outcome="SUCCESS",status="200",uri="/**/favicon.ico",} 3.0
    http_server_requests_seconds_sum{application="prometheus-test",exception="None",method="GET",outcome="SUCCESS",status="200",uri="/**/favicon.ico",} 0.028434427
    # HELP http_server_requests_seconds_max
    # TYPE http_server_requests_seconds_max gauge
    http_server_requests_seconds_max{application="prometheus-test",exception="None",method="GET",outcome="SUCCESS",status="200",uri="/actuator/prometheus",} 0.120627904
    http_server_requests_seconds_max{application="prometheus-test",exception="None",method="GET",outcome="SUCCESS",status="200",uri="/**/favicon.ico",} 0.015596512
    # HELP tomcat_global_error_total
    # TYPE tomcat_global_error_total counter
    tomcat_global_error_total{application="prometheus-test",name="http-nio-8080",} 0.0
    # HELP jvm_gc_max_data_size_bytes Max size of old generation memory pool
    # TYPE jvm_gc_max_data_size_bytes gauge
    jvm_gc_max_data_size_bytes{application="prometheus-test",} 2.863661056E9
    # HELP jvm_gc_live_data_size_bytes Size of old generation memory pool after a full GC
    # TYPE jvm_gc_live_data_size_bytes gauge
    jvm_gc_live_data_size_bytes{application="prometheus-test",} 2.1193976E7
    # HELP system_load_average_1m The sum of the number of runnable entities queued to available processors and the number of runnable entities running on the available processors averaged over a period of time
    # TYPE system_load_average_1m gauge
    system_load_average_1m{application="prometheus-test",} 3.9423828125
    # HELP process_uptime_seconds The uptime of the Java virtual machine
    # TYPE process_uptime_seconds gauge
    process_uptime_seconds{application="prometheus-test",} 173.424
    # HELP tomcat_sessions_expired_sessions_total
    # TYPE tomcat_sessions_expired_sessions_total counter
    tomcat_sessions_expired_sessions_total{application="prometheus-test",} 0.0
    # HELP jvm_gc_pause_seconds Time spent in GC pause
    # TYPE jvm_gc_pause_seconds summary
    jvm_gc_pause_seconds_count{action="end of minor GC",application="prometheus-test",cause="Metadata GC Threshold",} 1.0
    jvm_gc_pause_seconds_sum{action="end of minor GC",application="prometheus-test",cause="Metadata GC Threshold",} 0.012
    jvm_gc_pause_seconds_count{action="end of major GC",application="prometheus-test",cause="Metadata GC Threshold",} 1.0
    jvm_gc_pause_seconds_sum{action="end of major GC",application="prometheus-test",cause="Metadata GC Threshold",} 0.045
    # HELP jvm_gc_pause_seconds_max Time spent in GC pause
    # TYPE jvm_gc_pause_seconds_max gauge
    jvm_gc_pause_seconds_max{action="end of minor GC",application="prometheus-test",cause="Metadata GC Threshold",} 0.012
    jvm_gc_pause_seconds_max{action="end of major GC",application="prometheus-test",cause="Metadata GC Threshold",} 0.045
    # HELP jvm_gc_memory_promoted_bytes_total Count of positive increases in the size of the old generation memory pool before GC to after GC
    # TYPE jvm_gc_memory_promoted_bytes_total counter
    jvm_gc_memory_promoted_bytes_total{application="prometheus-test",} 1.06878E7
    # HELP jvm_gc_memory_allocated_bytes_total Incremented for an increase in the size of the young generation memory pool after one GC to before the next
    # TYPE jvm_gc_memory_allocated_bytes_total counter
    jvm_gc_memory_allocated_bytes_total{application="prometheus-test",} 7.8713648E7
    # HELP tomcat_global_request_seconds
    # TYPE tomcat_global_request_seconds summary
    tomcat_global_request_seconds_count{application="prometheus-test",name="http-nio-8080",} 19.0
    tomcat_global_request_seconds_sum{application="prometheus-test",name="http-nio-8080",} 0.432
    # HELP jvm_threads_live_threads The current number of live threads including both daemon and non-daemon threads
    # TYPE jvm_threads_live_threads gauge
    jvm_threads_live_threads{application="prometheus-test",} 24.0
    # HELP jvm_buffer_count_buffers An estimate of the number of buffers in the pool
    # TYPE jvm_buffer_count_buffers gauge
    jvm_buffer_count_buffers{application="prometheus-test",id="mapped",} 0.0
    jvm_buffer_count_buffers{application="prometheus-test",id="direct",} 11.0
    # HELP process_start_time_seconds Start time of the process since unix epoch.
    # TYPE process_start_time_seconds gauge
    process_start_time_seconds{application="prometheus-test",} 1.556017153576E9
    # HELP tomcat_threads_config_max_threads
    # TYPE tomcat_threads_config_max_threads gauge
    tomcat_threads_config_max_threads{application="prometheus-test",name="http-nio-8080",} 200.0
    # HELP system_cpu_count The number of processors available to the Java virtual machine
    # TYPE system_cpu_count gauge
    system_cpu_count{application="prometheus-test",} 8.0
    # HELP tomcat_threads_current_threads
    # TYPE tomcat_threads_current_threads gauge
    tomcat_threads_current_threads{application="prometheus-test",name="http-nio-8080",} 10.0
    # HELP jvm_memory_max_bytes The maximum amount of memory in bytes that can be used for memory management
    # TYPE jvm_memory_max_bytes gauge
    jvm_memory_max_bytes{application="prometheus-test",area="heap",id="PS Old Gen",} 2.863661056E9
    jvm_memory_max_bytes{application="prometheus-test",area="nonheap",id="Metaspace",} -1.0
    jvm_memory_max_bytes{application="prometheus-test",area="heap",id="PS Survivor Space",} 1.6252928E7
    jvm_memory_max_bytes{application="prometheus-test",area="nonheap",id="Compressed Class Space",} 1.073741824E9
    jvm_memory_max_bytes{application="prometheus-test",area="heap",id="PS Eden Space",} 1.395654656E9
    jvm_memory_max_bytes{application="prometheus-test",area="nonheap",id="Code Cache",} 2.5165824E8
    # HELP jvm_threads_peak_threads The peak live thread count since the Java virtual machine started or peak was reset
    # TYPE jvm_threads_peak_threads gauge
    jvm_threads_peak_threads{application="prometheus-test",} 29.0
    # HELP jvm_classes_loaded_classes The number of classes that are currently loaded in the Java virtual machine
    # TYPE jvm_classes_loaded_classes gauge
    jvm_classes_loaded_classes{application="prometheus-test",} 7936.0
    # HELP tomcat_sessions_active_max_sessions
    # TYPE tomcat_sessions_active_max_sessions gauge
    tomcat_sessions_active_max_sessions{application="prometheus-test",} 0.0
    # HELP tomcat_global_request_max_seconds
    # TYPE tomcat_global_request_max_seconds gauge
    tomcat_global_request_max_seconds{application="prometheus-test",name="http-nio-8080",} 0.161
    # HELP tomcat_sessions_created_sessions_total
    # TYPE tomcat_sessions_created_sessions_total counter
    tomcat_sessions_created_sessions_total{application="prometheus-test",} 0.0
    # HELP tomcat_sessions_rejected_sessions_total
    # TYPE tomcat_sessions_rejected_sessions_total counter
    tomcat_sessions_rejected_sessions_total{application="prometheus-test",} 0.0
    # HELP jvm_buffer_total_capacity_bytes An estimate of the total capacity of the buffers in this pool
    # TYPE jvm_buffer_total_capacity_bytes gauge
    jvm_buffer_total_capacity_bytes{application="prometheus-test",id="mapped",} 0.0
    jvm_buffer_total_capacity_bytes{application="prometheus-test",id="direct",} 90112.0
    # HELP system_cpu_usage The "recent cpu usage" for the whole system
    # TYPE system_cpu_usage gauge
    system_cpu_usage{application="prometheus-test",} 0.2523364485981308

    该端点返回的数据结构是Prometheus需要使用的。对照注释,相信大家能看懂每一项的含义。例如:

    1
    2
    3
    # HELP jvm_memory_used_bytes The amount of used memory
    # TYPE jvm_memory_used_bytes gauge
    jvm_memory_used_bytes{application="prometheus-test",area="heap",id="PS Old Gen",} 2.1193976E7

    表示:prometheus-test 应用堆内存中的 PS Old Gen 区域占用的空间是 2.1193976E7 字节。

对接Prometheus

Prometheus是一款开源的监控 + 时序数据库 + 报警软件,由SoundCloud公司开发的。目前在CNCF基金会托管,并已成功孵化。

官方网站

安装Prometheus

TIPS

简单期间,本文使用Docker方式安装,其他方式安装请详见 Prometheus安装说明

  • 准备配置文件,命名为prometheus.yml

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    scrape_configs:
    # 任意写,建议英文,不要包含特殊字符
    - job_name: 'spring'
    # 多久采集一次数据
    scrape_interval: 15s
    # 采集时的超时时间
    scrape_timeout: 10s
    # 采集的路径是啥
    metrics_path: '/actuator/prometheus'
    # 采集服务的地址,设置成上面Spring Boot应用所在服务器的具体地址。
    static_configs:
    - targets: ['30.1.229.89:8080']

    这样配置后,Prometheus服务器就会自动每隔15秒请求 http://30.1.229.89:8080/actuator/prometheus 。更多配置项详见 Prometheus Configuration官方文档

  • prometheus.yml 文件所在目录执行如下命令,即可启动Prometheus

    1
    2
    3
    docker run -d -p 9090:9090 \
    -v $(pwd)/prometheus.yml:/etc/prometheus/prometheus.yml \
    prom/prometheus --config.file=/etc/prometheus/prometheus.yml

测试

  • 访问http://localhost:9090 ,可看到类似如下的界面

    Prometheus-1

  • 点击 Insert metric at cursor ,即可选择监控指标;点击 Graph ,即可让指标以图表方式展示;点击Execute 按钮,即可看到类似下图的结果:

    Prometheus-2

Grafana可视化

至此,已经用Prometheus实现了监控数据的可视化,然而使用体验并不好。下面来用Grafana实现更友好、更贴近生产的监控可视化。

Grafana是一个开源的跨平台度量分析和可视化 + 告警工具。

官方网站

安装Grafana

TIPS

简单期间,本文使用Docker方式安装,其他方式安装请详见 Grafana安装说明

  • 启动

    1
    docker run -d -p 3000:3000 grafana/grafana

配置

  • 登录:访问 http://localhost:3000/login ,初始账号/密码为:admin/admin ,登录后,即可看到类似如下的界面

    Grafana-1

  • 点击Add data source ,即可看到类似如下的界面:

    Grafana-2

  • 点击Prometheus,即可看到类似如下界面:

    Grafana-3

  • 填写Prometheus的首页地址,并点击 Save & Test

创建监控Dashboard

  • 点击导航栏上的 + 按钮,并点击Dashboard,将会看到类似如下的界面:Grafana-4

  • 点击 Add Query ,即可看到类似如下的界面:

    Grafana-5

    在白框标记的位置输入指标,指标的取值详见Spring Boot应用的 /actuator/prometheus 端点,例如jvm_memory_used_bytesjvm_threads_states_threadsjvm_threads_live_threads 等,Grafana会给你较好的提示,并且支持较为复杂的计算,例如聚合、求和、平均等。如果想要绘制多个线条,可点击Add Query 按钮,如上图所示,笔者为图表绘制了两条线,分别代表daemon以及peak线程。

  • 点击下图的按钮,并填入Title,即可设置图表标题。

    Grafana-6

  • 按下图步骤操作,即可为Dashboard添加新的图表。

    Grafana-7

Dashboard市场

至此,已可用Grafana实现较为丰富的图表展示——可将关心的监控指标放置到Dashboard上,并且非常灵活!然而,这个配置的操作虽然不难,但还是挺费时间的……

那么是否有配置好的又强大、又通用、拿来即用的Dashboard呢?答案是肯定的!前往 Grafana Lab - Dashboards ,输入关键词即可搜索指定Dashboard。

Grafana-8

如上图所示,可以找到若干款以 Prometheus 作为数据源,支持Micrometer的Dashboard!下面,来演示如何使用 JVM(Micrometer) 这个Dashboard。

其他Dashboard使用方式大同小异,建议使用前先点击了解下详情,一般Dashboard中都会有说明。

  • 点击 JVM(Micrometer) 进入Dashboard详情介绍页,如下图所示:

    Grafana-9

    如图已详细描述了该Dashboard的特性、配置。其中的management.metrics.tags.application ,前面已经配置过了;本例也不打算演示 Process memory 特性,所以 micrometer-jvm-extras 怎么用也不打算引入了(这是该Dashboard作者为Micrometer提供的增强包,有兴趣的童鞋可自行点击过去研究)。

    细心的童鞋会发现,该页的右上角有个 4701 这是一个非常重要的数字!

  • 访问http://localhost:3000 ,按下图步骤操作:

    Grafana-10

  • 输入后即可看到类似如下的界面,选择数据源,并点击Import。

    Grafana-11

  • 此时,即可看到类似如下的界面,如图所示,我们常关心的指标该Dashboard均已支持!

    Grafana-12

比较好用的Dashboard

以上三款是笔者亲测较为全面、好用的Dashboard,仅供参考。

告警

Grafana支持的告警渠道非常丰富,例如邮件、钉钉、Slack、Webhook等,非常实用!比较简单,各位看官自行研究下吧。

配套代码

其中 prometheus+grafana 目录已提供Docker一键部署Prometheus + Grafana的Shell。

相关文章

评论系统未开启,无法评论!