Prometheus, a Cloud Native Computing Foundation project, is a systems and service monitoring system. It collects metrics from configured targets at given intervals, evaluates rule expressions, displays the results, and can trigger alerts when specified conditions are observed.
The features that distinguish Prometheus from other metrics and monitoring systems are:
A multi-dimensional data model (time series defined by metric name and set of key/value dimensions)
PromQL, a powerful and flexible query language to leverage this dimensionality
No dependency on distributed storage; single server nodes are autonomous
An HTTP pull model for time series collection
Pushing time series is supported via an intermediary gateway for batch jobs
Targets are discovered via service discovery or static configuration
Multiple modes of graphing and dashboarding support
Support for hierarchical and horizontal federation
配置Prometheus 下载 1 2 3 4 wget https://github.com/prometheus/prometheus/releases/download/v2.14.0/prometheus-2.14.0.linux-amd64.tar.gz tar zxf prometheus-2.14.0.linux-amd64.tar.gz mv prometheus-2.14.0.linux-amd64 /usr/local /prometheus
创建用户 1 2 groupadd --system prometheus useradd --system -g prometheus -s /sbin/nologin -c "Prometheus Monitoring System" prometheus
赋权 1 chown -R prometheus:prometheus /usr/local /prometheus
创建数据目录 1 2 mkdir /data/prometheus chown -R prometheus:prometheus /data/prometheus
创建Prometheus服务 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 cat > /usr/lib/systemd/system/prometheus.service <<EOF [Unit] Description=Prometheus After=network.target [Service] Type=simple User=prometheus ExecStart=/usr/local/prometheus/prometheus \ --config.file=/usr/local/prometheus/prometheus.yml \ --storage.tsdb.path=/data/prometheus \ --storage.tsdb.retention=30d \ --storage.tsdb.retention.size=512M \ --web.enable-admin-api \ --web.enable-lifecycle \ --web.external-url=http://monitor.example.com Restart=on-failure [Install] WantedBy=multi-user.target EOF
Type设置为notify时,服务会不断重启
--storage.tsdb.path
是可选项,默认数据目录在运行目录的./dada
目录中
--storage.tsdb.retention
设置了保留多长时间的数据
--storage.tsdb.retention.size
存储块可以使用的最大字节数(请注意,这不包括WAL大小,这可能很大)。 最早的数据将被删除。 默认为0或禁用。 此标志是实验性的,可以在将来的版本中进行更改。 支持的单位:KB,MB,GB,PB。 例如:“512MB”
--web.enable-admin-api
开启对admin api
的访问权限
--web.enable-lifecycle
启用远程热加载配置文件
--web.external-url=http://localhost:9090/
prometheus主机外网地址,不写会导致告警GeneratorURL不对
创建告警规则文件 推荐一个网站,里面有很多告警规则https://awesome-prometheus-alerts.grep.to/
Linux服务器存活报警
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 groups: - name: 主机状态-监控告警 rules: - alert: 主机状态 expr: up == 0 for : 1m labels: status: 非常严重 annotations: summary: "{{$labels .instance}}:服务器宕机" description: "{{$labels .instance}}:服务器延时超过5分钟" - alert: CPU使用情况 expr: 100-(avg(irate(node_cpu_seconds_total{mode="idle" }[5m])) by(instance)* 100) > 60 for : 1m labels: status: 一般告警 annotations: summary: "{{$labels .mountpoint}} CPU使用率过高!" description: "{{$labels .mountpoint }} CPU使用大于60%(目前使用:{{$value }}%)" - alert: 内存使用 expr: (1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes ))* 100 > 80 for : 1m labels: status: 严重告警 annotations: summary: "{{$labels .mountpoint}} 内存使用率过高!" description: "{{$labels .mountpoint }} 内存使用大于80%(目前使用:{{$value }}%)" - alert: IO性能 expr: 100-(avg(irate(node_disk_io_time_seconds_total[1m])) by(instance)* 100) < 60 for : 1m labels: status: 严重告警 annotations: summary: "{{$labels .mountpoint}} 流入磁盘IO使用率过高!" description: "{{$labels .mountpoint }} 流入磁盘IO大于60%(目前使用:{{$value }})" - alert: 网络 expr: ((sum(rate (node_network_receive_bytes_total{device!~'tap.*|veth.*|br.*|docker.*|virbr*|lo*' }[5m])) by (instance)) / 100) > 102400 for : 1m labels: status: 严重告警 annotations: summary: "{{$labels .mountpoint}} 流入网络带宽过高!" description: "{{$labels .mountpoint }}流入网络带宽持续2分钟高于100M. RX带宽使用率{{$value }}" - alert: 网络 expr: ((sum(rate (node_network_transmit_bytes_total{device!~'tap.*|veth.*|br.*|docker.*|virbr*|lo*' }[5m])) by (instance)) / 100) > 102400 for : 1m labels: status: 严重告警 annotations: summary: "{{$labels .mountpoint}} 流出网络带宽过高!" description: "{{$labels .mountpoint }}流出网络带宽持续2分钟高于100M. RX带宽使用率{{$value }}" - alert: TCP会话 expr: node_netstat_Tcp_CurrEstab > 1000 for : 1m labels: status: 严重告警 annotations: summary: "{{$labels .mountpoint}} TCP_ESTABLISHED过高!" description: "{{$labels .mountpoint }} TCP_ESTABLISHED大于1000%(目前使用:{{$value }}%)" - alert: 磁盘容量 expr: 100-(node_filesystem_free_bytes{fstype=~"ext4|xfs" }/node_filesystem_size_bytes {fstype=~"ext4|xfs" }*100) > 80 for : 1m labels: status: 严重告警 annotations: summary: "{{$labels .mountpoint}} 磁盘分区使用率过高!" description: "{{$labels .mountpoint }} 磁盘分区使用大于80%(目前使用:{{$value }}%)"
Windows服务器存活报警 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 groups: - name: Windows主机状态-监控告警 rules: - alert: WindowsServerCollectorError expr: windows_exporter_collector_success == 0 for: 5m labels: severity: critical annotations: summary: Windows Server collector Error (instance {{ $labels.instance }}) description: Collector {{ $labels.collector }} was not successful\n VALUE = {{ $value }}\n LABELS:\ {{ $labels }} - alert: WindowsServerServiceStatus expr: windows_service_status{status="ok"} != 1 for: 5m labels: severity: critical annotations: summary: Windows Server service Status (instance {{ $labels.instance }}) description: Windows Service state is not OK\n VALUE = {{ $value }}\n LABELS:\ {{ $labels }} - alert: WindowsServerCpuUsage expr: 100 - (avg by (instance) (rate(windows_cpu_time_total{mode="idle"}[2m])) * 100 ) > 80 for: 5m labels: severity: warning annotations: summary: Windows Server CPU Usage (instance {{ $labels.instance }}) description: CPU Usage is more than 80 %\n VALUE = {{ $value }}\n LABELS:\ {{ $labels }} - alert: WindowsServerMemoryUsage expr: 100 - ((windows_os_physical_memory_free_bytes / windows_cs_physical_memory_bytes) * 100 ) > 90 for: 5m labels: severity: warning annotations: summary: Windows Server memory Usage (instance {{ $labels.instance }}) description: Memory usage is more than 90 %\n VALUE = {{ $value }}\n LABELS:\ {{ $labels }} - alert: WindowsServerDiskSpaceUsage expr: 100.0 - 100 * ((windows_logical_disk_free_bytes / 1024 / 1024 ) / (windows_logical_disk_size_bytes / 1024 / 1024 )) > 80 for: 5m labels: severity: critical annotations: summary: Windows Server disk Space Usage (instance {{ $labels.instance }}) description: Disk usage is more than 80 %\n VALUE = {{ $value }}\n LABELS:\ {{ $labels }} - alert: 网络 expr: (irate(windows_net_bytes_received_total{nic!~'isatap.*|VPN.*'}[5m])*8 /1000) > 5120 for: 1m labels: status: 严重告警 annotations: summary: "{{$labels.mountpoint}} 流入(下载)网络带宽过高!" description: "{{$labels.mountpoint }} 流入(下载)网络带宽持续2分钟高于5M. RX带宽使用率{{$value}} " - alert: 网络 expr: (irate(windows_net_bytes_sent_total{nic!~'isatap.*|VPN.*'}[5m])*8 /1000) > 5120 for: 1m labels: status: 严重告警 annotations: summary: "{{$labels.mountpoint}} 流出(上传)网络带宽过高!" description: "{{$labels.mountpoint }} 流出(上传)网络带宽持续2分钟高于5M. RX带宽使用率{{$value}} "
Http监控告警 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 groups: - name: blackbox_network_stats rules: - alert: blackbox_network_stats expr: probe_success == 0 for: 1m labels: severity: critical annotations: description: 'Job {{ $labels.job }} 中的网站/接口 {{ $labels.instance }} 已经down掉超过一分钟.' summary: '网站/接口 {{ $labels.instance }} down ! ! !' - alert: BlackboxProbeHttpFailure expr: probe_http_status_code <= 199 OR probe_http_status_code >= 400 for: 5m labels: severity: critical annotations: summary: Blackbox probe HTTP failure (instance {{ $labels.instance }}) description: HTTP status code is not 200 -399 \n VALUE = {{ $value }}\n LABELS:\ {{ $labels }} - alert: BlackboxSslCertificateWillExpireSoon expr: probe_ssl_earliest_cert_expiry - time() < 86400 * 30 for: 5m labels: severity: warning annotations: summary: Blackbox SSL certificate will expire soon (instance {{ $labels.instance }}) description: SSL certificate expires in 30 days\n VALUE = {{ $value }}\n LABELS:\ {{ $labels }}
测试规则是不是正确
1 ./promtool check rules rules/basis.yml
修改Prometheus配置
1 2 3 4 5 6 7 8 9 10 alerting: alertmanagers: - static_configs: - targets: - localhost:9093 rule_files: - "rules/*.yml"
启动 1 2 systemctl enable prometheus.service systemctl start prometheus.service
配置nginx代理和HTTP Basic Auth Prometheus并没有提供任何认证支持。不过,借助 Nginx 作为反向代理服务器,我们可以很容易地添加 HTTP Basic Auth 功能。
然后,在 /usr/local/nginx/conf/
(可能你的 Nginx 配置目录在其他路径,请做相应修改)目录下,使用 apache2-utils
提供的 htpasswd
工具创建一个用户文件,需要填入用户名和密码:
1 htpasswd -c /usr/local /nginx/conf/.htpasswd admin
配置nginx
server {
listen 80;
server_name monitor.example.com;
location / {
auth_basic "Prometheus";
auth_basic_user_file ".htpasswd";
proxy_pass http://localhost:9090/;
}
}
访问 http://example.com:9090 ,输入账号密码访问
prometheus自动发现 自动发现机制方便我们在监控系统中动态的添加或者删除资源。比如zabbix可以自动发现监控主机以及监控资源。prometheus作为一个可以与zabbix旗鼓相当的监控系统,自然也有它的自动发现机制。
file_sd_configs file_sd_configs可以用来动态的添加和删除target。
修改prometheus的配置文件
1 2 3 4 5 - job_name: 'node' file_sd_configs: - refresh_interval: 1m files: - targets/nodes/*.yml
创建被扫描的文件nodes.yml
1 2 3 4 5 6 7 8 - targets: - '172.19.179.239:9100' - '172.19.179.240:9100' - '172.19.179.244:9100' - '172.19.179.253:9100' - '172.19.179.254:9100' labels: server: linux
consul_sd_file Consul 是基于 GO 语言开发的开源工具,主要面向分布式,服务化的系统提供服务注册、服务发现和配置管理的功能。Consul 提供服务注册/发现、健康检查、Key/Value存储、多数据中心和分布式一致性保证等功能。之前我们通过 Prometheus 实现监控,当新增一个 Target 时,需要变更服务器上的配置文件,即使使用 file_sd_configs 配置,也需要登录服务器修改对应 Json 文件,会非常麻烦。不过 Prometheus 官方支持多种自动服务发现的类型,其中就支持 Consul。
consul的配置需要有consul的服务提供
修改prometheus的配置文件
1 2 3 4 - job_name: 'consul-prometheus' consul_sd_configs: - server: '172.30.12.167:8500' services: []
容器启动 1 2 3 4 5 6 docker run -d \ -p 9090:9090 \ -v "/prom/prometheus.yml:/etc/prometheus/prometheus.yml" \ -v "/prom/rules:/etc/prometheus/rules" \ -v "/prom/targets:/etc/prometheus/targets \ prom/prometheus
配置node_exporter监控主机 Node_exporter是可以在* Nix和Linux系统上运行的计算机度量标准的导出器。
Node_exporter 主要用于暴露 metrics 给 Prometheus,其中 metrics 包括:cpu 的负载,内存的使用情况,网络等。
下载 1 2 3 4 wget https://github.com/prometheus/node_exporter/releases/download/v0.18.1/node_exporter-0.18.1.linux-amd64.tar.gz tar zxf node_exporter-0.18.1.linux-amd64.tar.gz mv node_exporter-0.18.1.linux-amd64 /usr/local /node_exporter
创建node_exporter服务 1 2 3 4 5 6 7 8 9 10 11 cat > /usr/lib/systemd/system/node_exporter.service <<EOF [Unit] Description=Node Exporter After=network.target [Service] ExecStart=/usr/local/node_exporter/node_exporter [Install] WantedBy=multi-user.target EOF
启动 1 2 systemctl enable node_exporter.service systemctl start node_exporter.service
配置prometheus.yml 在scrape_configs
下添加node_exporter
,重启Prometheus。
1 2 3 4 5 - job_name: 'node' static_configs: - targets: - '172.19.179.239:9100' - '172.19.179.240:9100'
容器运行 1 2 3 4 5 6 7 docker run -d \ -p 9100:9100 \ -v "/:/host:ro,rslave" \ --net="host" \ --path.rootfs=/host \ --collector.filesystem.ignored-mount-points "^/(sys|proc|dev|host|etc)($|/)" \ prom/node-exporter
配置Granfana 下载 1 2 3 4 wget https://dl.grafana.com/oss/release/grafana-6.5.2.linux-amd64.tar.gz tar -zxf grafana-6.5.2.linux-amd64.tar.gz mv grafana-6.5.2 /usr/local /grafana
创建Grafana服务 1 2 3 4 5 6 7 8 9 10 11 cat > /usr/lib/systemd/system/grafana-server.service <<EOF [Unit] Description=Grafana After=network.target [Service] Type=notify ExecStart=/usr/local/grafana/bin/grafana-server -homepath /usr/local/grafana Restart=on-failure [Install] WantedBy=multi-user.target EOF
启动 1 2 systemctl enable grafana-server.service systemctl start grafana-server.service
配置数据源 添加数据源
点击 Add data source
,选择Prometheus,在URL输入框键入http://localhost:9090
,点击save & test
,如果出现下图中的绿色提示,则表示配置有效,否则可能是地址或者端口等其他错误,需要自行修改。
下载模板 下载https://grafana.com/grafana/dashboards/9276 或者https://grafana.com/grafana/dashboards/8919
导入模板
效果图
配置nginx 添加Nginx配置,proxy_pass后面一定要有”/“(用以去掉/grafana/匹配本身)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 server { listen 80; server_name localhost; location /grafana/ { proxy_pass http://localhost:3000/; proxy_buffering off; proxy_http_version 1.1; proxy_set_header Upgrade $http_upgrade; proxy_set_header Connection "upgrade"; proxy_set_header X-Real-IP $remote_addr; proxy_set_header Host $host; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; access_log off; } }
修改grafana配置(grafana.ini),需要去掉行前的”;”
1 2 3 [server] domain = 你的域名 root_url = %(protocol)s://%(domain)s/grafana/
容器启动 1 2 3 4 5 6 7 8 9 10 11 docker run -d \ -p 3000:3000 \ -e TZ=Asia/Shanghai \ -e GF_DATABASE_TYPE=mysql \ -e GF_DATABASE_HOST=127.0.0.1:3306 \ -e GF_DATABASE_NAME=grafana \ -e GF_DATABASE_USER=root \ -e GF_DATABASE_PASSWORD=root \ -e GF_PLUGINS_ENABLE_ALPHA=true \ -e "GF_INSTALL_PLUGINS=grafana-piechart-panel,grafana-simple-json-datasource" \ grafana/grafana
配置Alertmanager 下载 1 2 3 wget https://github.com/prometheus/alertmanager/releases/download/v0.20.0/alertmanager-0.20.0.linux-amd64.tar.gz tar zxf alertmanager-0.20.0.linux-amd64.tar.gz
下载钉钉告警插件 1 2 3 4 wget https://github.com/timonwong/prometheus-webhook-dingtalk/releases/download/v1.4.0/prometheus-webhook-dingtalk-1.4.0.linux-amd64.tar.gz tar zxf prometheus-webhook-dingtalk-1.4.0.linux-amd64.tar.gz mv prometheus-webhook-dingtalk-1.4.0.linux-amd64 /usr/local /prometheus/alertmanager/webhook-dingtalk
配置config.yml
1 2 3 4 5 targets: webhook: url: https://oapi.dingtalk.com/robot/send?access_token=xxxxxxxxxxxx secret: SEC000000000000000000000 template: contrib/templates/legacy/dingtalk.tmpl
配置消息模板
这里提供几个模板
https://github.com/bwcxyk/tools_file/tree/master/prometheus/alertmanager/dingtalk/templates
创建服务 1 2 3 4 5 6 7 8 9 10 11 12 13 cat > /usr/lib/systemd/system/prometheus-webhook-dingtalk.service <<EOF [Unit] Description=prometheus-webhook-dingtalk After=network-online.target [Service] Restart=on-failure ExecStart=/usr/local/prometheus/alertmanager/prometheus-webhook-dingtalk/prometheus-webhook-dingtalk \ --config.file=/usr/local/prometheus/alertmanager/prometheus-webhook-dingtalk/config.yml [Install] WantedBy=multi-user.target EOF
启动钉钉告警插件 1 2 systemctl enable prometheus-webhook-dingtalk.service systemctl start prometheus-webhook-dingtalk.service
修改Alertmanager配置
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 global: resolve_timeout: 5m route: receiver: webhook group_by: [alertname ] group_wait: 30s group_interval: 5m repeat_interval: 3h routes: - receiver: webhook group_wait: 10s receivers: - name: webhook webhook_configs: - url: http://localhost:8060/dingtalk/webhook/send send_resolved: true inhibit_rules: - equal: ['alertname' , 'cluster' , 'service' ] source_match: severity: 'critical' target_match: severity: 'warning'
创建Alertmanager服务 1 2 3 4 5 6 7 8 9 10 11 12 13 14 cat > /usr/lib/systemd/system/alertmanager.service <<EOF [Unit] Description=Alertmanager After=network.target [Service] Type=simple User=prometheus ExecStart=/usr/local/prometheus/alertmanager/alertmanager --web.external-url=http://example.com:9093 --config.file=/usr/local/prometheus/alertmanager/alertmanager.yml --storage.path=/data/prometheus/alertmanager/data Restart=on-failure [Install] WantedBy=multi-user.target EOF
启动服务 1 2 systemctl enable alertmanager.service systemctl start alertmanager.service
配置nginx
server {
listen 80;
server_name monitor.example.com;
location / {
auth_basic "Prometheus";
auth_basic_user_file ".htpasswd";
proxy_pass http://localhost:9093/;
}
}
容器启动 1 2 3 4 docker run -d \ -p 9093:9093 \ -v "/prom/alertmanager.yaml:/etc/alertmanager/alertmanager.yaml" \ prom/alertmanager
Blackbox_exporter 下载 1 2 3 4 wget https://github.com/prometheus/blackbox_exporter/releases/download/v0.17.0/blackbox_exporter-0.17.0.linux-amd64.tar.gz tar zxf blackbox_exporter-0.17.0.linux-amd64.tar.gz mv blackbox_exporter-0.17.0.linux-amd64 blackbox_exporter
配置 编辑blackbox.yml文件
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 modules: http_2xx: prober: http timeout: 10s http: valid_http_versions: ["HTTP/1.1" , "HTTP/2" ] valid_status_codes: [200 ] method: GET preferred_ip_protocol: "ip4" http_post_2xx: prober: http timeout: 10s http: valid_http_versions: ["HTTP/1.1" , "HTTP/2" ] method: POST preferred_ip_protocol: "ip4"
修改Prometheus配置,增加job,使用基于文件的自动发现
metrics_path
的值在源码中默认为/metrics
,注意修改
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 - job_name: "blackbox-http" metrics_path: /probe params: module: [http_2xx ] file_sd_configs: - refresh_interval: 1m files: - targets/blackbox/http_2xx.yml relabel_configs: - source_labels: [__address__ ] target_label: __param_target - source_labels: [__param_target ] target_label: instance - target_label: __address__ replacement: 127.0 .0 .1 :9115 - job_name: "blackbox-http-post" metrics_path: /probe params: module: [http_post_2xx ] file_sd_configs: - refresh_interval: 1m files: - targets/blackbox/http_post_2xx.yml relabel_configs: - source_labels: [__address__ ] target_label: __param_target - source_labels: [__param_target ] target_label: instance - target_label: __address__ replacement: 127.0 .0 .1 :9115
创建targets/blackbox/http_2xx.yml
文件
1 2 - targets: - https://baidu.com
创建系统服务 1 2 3 4 5 6 7 8 9 10 11 12 13 14 cat > /usr/lib/systemd/system/blackbox.service <<EOF [Unit] Description=blackbox_exporter After=network.target [Service] User=root Type=simple ExecStart=/usr/local/prometheus/blackbox_exporter/blackbox_exporter --config.file=/usr/local/prometheus/blackbox_exporter/blackbox.yml Restart=on-failure [Install] WantedBy=multi-user.target EOF
启动服务 1 2 3 systemctl daemon-reload systemctl start blackbox.service systemctl enable blackbox.service
重载Prometheus 1 curl -X POST "http://127.0.0.1:9090/-/reload"
grafana图表 导入 https://grafana.com/grafana/dashboards/9965
告警配置 创建rules/blackbox_exporter.yml
文件
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 groups: - name: blackbox_network_stats rules: - alert: blackbox_network_stats expr: probe_success == 0 for: 1m labels: severity: critical annotations: description: 'Job {{ $labels.job }} 中的网站/接口 {{ $labels.instance }} 已经down掉超过一分钟.' summary: '网站/接口 {{ $labels.instance }} down ! ! !' - alert: BlackboxProbeHttpFailure expr: probe_http_status_code <= 199 OR probe_http_status_code >= 400 for: 5m labels: severity: critical annotations: summary: Blackbox probe HTTP failure (instance {{ $labels.instance }}) description: HTTP status code is not 200 -399 \n VALUE = {{ $value }}\n LABELS:\ {{ $labels }} - alert: BlackboxSslCertificateWillExpireSoon expr: probe_ssl_earliest_cert_expiry - time() < 86400 * 30 for: 5m labels: severity: warning annotations: summary: Blackbox SSL certificate will expire soon (instance {{ $labels.instance }}) description: SSL certificate expires in 30 days\n VALUE = {{ $value }}\n LABELS:\ {{ $labels }}
其他知识点 删除一些 job 任务或者 instance 的数据指标,则可以使用下面的命令:
1 2 curl -X POST -g 'http://localhost:9090/api/v1/admin/tsdb/delete_series?match[]={job="kubernetes"}' curl -X POST -g 'http://localhost:9090/api/v1/admin/tsdb/delete_series?match[]={instance="10.244.2.158:9090"}'
参考:Prometheus 删除数据指标
grafana模板 ES Nginx Logs