监控极狐GitLab Runner 使用

Tier: 基础版, 专业版, 旗舰版
Offering: JihuLab.com, 私有化部署

极狐GitLab Runner 可以使用 Prometheus 进行监控。

嵌入式 Prometheus 指标#

History

带有 Prometheus 指标的嵌入式 HTTP 统计服务器是在极狐GitLab Runner 1.8.0 中引入的。

极狐GitLab Runner 已经通过原生 Prometheus 指标进行检测，可以通过嵌入式 HTTP 服务器在 /metrics 路径上公开。如果启用，服务器可以被 Prometheus 监控系统抓取，或者通过其他任何 HTTP 客户端访问。

公开的信息包括：

Runner 业务逻辑指标（例如，目前正在运行的作业数量）
Go 特定的进程指标（例如，垃圾回收统计、goroutines 和内存统计）
一般进程指标（内存使用情况、CPU 使用情况、文件描述符使用情况等）
构建版本信息

指标格式在 Prometheus 的 Exposition formats 规范中有文档记录。

这些指标旨在为操作员提供监控和深入了解您的 runners 的方式。例如，您可能想知道 runner 主机上的负载平均值增加是否与处理的作业增加有关。或者您正在运行一个机器集群，并希望跟踪构建趋势，以便对基础设施进行更改。

了解更多关于 Prometheus 的信息#

要设置 Prometheus 服务器来抓取此 HTTP 端点并使用收集的指标，请参阅 Prometheus 的入门指南。有关如何配置 Prometheus 的更多详细信息，请参阅配置部分。有关警报配置的更多详细信息，请参阅警报规则和 Alertmanager。

可用指标#

要找到所有可用指标的完整列表，请在配置并启用后 curl 该指标端点。例如，对于监听端口为 9252 的本地 runner：

shell
1$ curl -s "http://localhost:9252/metrics" | grep -E "# HELP"
2
3# HELP gitlab_runner_api_request_statuses_total The total number of api requests, partitioned by runner, endpoint and status.
4# HELP gitlab_runner_autoscaling_machine_creation_duration_seconds Histogram of machine creation time.
5# HELP gitlab_runner_autoscaling_machine_states The current number of machines per state in this provider.
6# HELP gitlab_runner_concurrent The current value of concurrent setting
7# HELP gitlab_runner_errors_total The number of caught errors.
8# HELP gitlab_runner_limit The current value of limit setting
9# HELP gitlab_runner_request_concurrency The current number of concurrent requests for a new job
10# HELP gitlab_runner_request_concurrency_exceeded_total Count of excess requests above the configured request_concurrency limit
11# HELP gitlab_runner_version_info A metric with a constant '1' value labeled by different build stats fields.
12...

该列表包括 Go 特定的进程指标。对于不包括 Go 特定进程的可用指标列表，请参阅监控 runners。

pprof HTTP 端点#

History

pprof 集成是在极狐GitLab Runner 1.9.0 中引入的。

通过指标了解极狐GitLab Runner 进程的内部状态很有价值，但在某些情况下，您必须实时检查运行中的进程。这就是为什么我们引入了 pprof HTTP 端点。

pprof 端点通过嵌入式 HTTP 服务器在 /debug/pprof/ 路径上可用。

您可以在其文档中阅读有关使用 pprof 的更多信息。

配置指标 HTTP 服务器#

指标服务器导出有关极狐GitLab Runner 进程内部状态的数据，不应公开可用！

通过使用以下方法之一配置指标 HTTP 服务器：

在 config.toml 文件中使用 listen_address 全局配置选项。
对于 run 命令，使用 --listen-address 命令行选项。
对于使用 Helm chart 的 runners，在 values.yaml 中：

配置 metrics 选项：

yaml
1## Configure integrated Prometheus metrics exporter
2##
3## ref: https://gitlab.cn/docs/runner/monitoring/#configuration-of-the-metrics-http-server
4##
5metrics:
6  enabled: true
7
8  ## Define a name for the metrics port
9  ##
10  portName: metrics
11
12  ## Provide a port number for the integrated Prometheus metrics exporter
13  ##
14  port: 9252
15
16  ## Configure a prometheus-operator serviceMonitor to allow autodetection of
17  ## the scraping target. Requires enabling the service resource below.
18  ##
19  serviceMonitor:
20    enabled: true
21
22    ...

配置 service 监控以检索配置的 metrics：

yaml
1## Configure a service resource to allow scraping metrics by uisng
2## prometheus-operator serviceMonitor
3service:
4  enabled: true
5
6  ## Provide additonal labels for the service
7  ##
8  labels: {}
9
10  ## Provide additonal annotations for the service
11  ##
12  annotations: {}
13
14  ...

如果您将地址添加到 config.toml 文件中，要启动指标 HTTP 服务器，您必须重新启动 runner 进程。

在这两种情况下，该选项接受格式为 [host]:<port> 的字符串，其中：

host 可以是 IP 地址或主机名，
port 是有效的 TCP 端口或符号服务名称（如 http）。您应该使用已经在 Prometheus 中分配的端口 9252。

如果监听地址不包含端口，则默认为 9252。

地址示例：

:9252 监听所有接口上的端口 9252。
localhost:9252 监听环回接口上的端口 9252。
[2001:db8::1]:http 监听 IPv6 地址 [2001:db8::1] 上的 HTTP 端口 80。

请记住，要监听小于 1024 的端口 - 至少在 Linux/Unix 系统上 - 您需要具有 root/管理员权限。

HTTP 服务器在选定的 host:port 上打开 没有任何授权。如果您将指标服务器绑定到公共接口，请使用防火墙限制访问或添加 HTTP 代理进行授权和访问控制。