对适用于 Kubernetes 的极狐GitLab 代理进行故障排除

当您使用适用于 Kubernetes 的极狐GitLab 代理时,您可能会遇到需要进行故障排除的问题。

您可以从查看服务日志开始:

kubectl logs -f -l=app=gitlab-agent -n gitlab-agent

如果您是极狐GitLab 管理员,您还可以查看极狐GitLab 代理服务器日志。

Transport: Error while dialing failed to WebSocket dial

{
  "level": "warn",
  "time": "2020-11-04T10:14:39.368Z",
  "msg": "GetConfiguration failed",
  "error": "rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing failed to WebSocket dial: failed to send handshake request: Get \\\"https://gitlab-kas:443/-/kubernetes-agent\\\": dial tcp: lookup gitlab-kas on 10.60.0.10:53: no such host\""
}

kas-address 和您的代理 pod 之间存在连接问题时,会发生此错误。要解决此问题,请确保 kas-address 是准确的。

{
  "level": "error",
  "time": "2021-06-25T21:15:45.335Z",
  "msg": "Reverse tunnel",
  "mod_name": "reverse_tunnel",
  "error": "Connect(): rpc error: code = Unavailable desc = connection error: desc= \"transport: Error while dialing failed to WebSocket dial: expected handshake response status code 101 but got 301\""
}

kas-address 不包含尾部斜杠时会发生此错误。要解决此问题,请确保 wssws URL 以斜杠结尾,例如 wss://GitLab.host.tld:443/-/kubernetes-agent/ws:// GitLab.host.tld:80/-/kubernetes-agent/

ValidationError(Deployment.metadata)

{
  "level": "info",
  "time": "2020-10-30T08:56:54.329Z",
  "msg": "Synced",
  "project_id": "root/kas-manifest001",
  "resource_key": "apps/Deployment/kas-test001/nginx-deployment",
  "sync_result": "error validating data: [ValidationError(Deployment.metadata): unknown field \"replicas\" in io.k8s.apimachinery.pkg.apis.meta.v1.ObjectMeta, ValidationError(Deployment.metadata): unknown field \"selector\" in io.k8s.apimachinery.pkg.apis.meta.v1.ObjectMeta, ValidationError(Deployment.metadata): unknown field \"template\" in io.k8s.apimachinery.pkg.apis.meta.v1.ObjectMeta]"
}

当清单文件格式错误并且 Kubernetes 无法创建指定的对象时,会发生此错误。确保您的清单文件有效。

如需其他故障排除,请尝试使用清单文件直接在 Kubernetes 中创建对象。

Error while dialing failed to WebSocket dial: failed to send handshake request

{
  "level": "warn",
  "time": "2020-10-30T09:50:51.173Z",
  "msg": "GetConfiguration failed",
  "error": "rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing failed to WebSocket dial: failed to send handshake request: Get \\\"https://GitLabhost.tld:443/-/kubernetes-agent\\\": net/http: HTTP/1.x transport connection broken: malformed HTTP response \\\"\\\\x00\\\\x00\\\\x06\\\\x04\\\\x00\\\\x00\\\\x00\\\\x00\\\\x00\\\\x00\\\\x05\\\\x00\\\\x00@\\\\x00\\\"\""
}

当您在代理端将 wss 配置为 kas-address 时会发生此错误,但代理服务器在 wss 不可用。要解决此问题,请确保双方都配置了相同的方案。

Decompressor is not installed for grpc-encoding

{
  "level": "warn",
  "time": "2020-11-05T05:25:46.916Z",
  "msg": "GetConfiguration.Recv failed",
  "error": "rpc error: code = Unimplemented desc = grpc: Decompressor is not installed for grpc-encoding \"gzip\""
}

当代理的版本比代理服务器 (KAS) 的版本新时,会发生此错误。 要修复它,请确保 agentk 和代理服务器版本相同。

Certificate signed by unknown authority

{
  "level": "error",
  "time": "2021-02-25T07:22:37.158Z",
  "msg": "Reverse tunnel",
  "mod_name": "reverse_tunnel",
  "error": "Connect(): rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing failed to WebSocket dial: failed to send handshake request: Get \\\"https://GitLabhost.tld:443/-/kubernetes-agent/\\\": x509: certificate signed by unknown authority\""
}

当您的实例使用由代理未知的内部证书颁发机构签名的证书时,会发生此错误。

要解决此问题,您可以使用 Kubernetes configmap 将 CA 证书文件提供给代理,并将该文件安装在代理 /etc/ssl/certs 目录中,该文件将自动从该目录中获取。

例如,如果您的内部 CA 证书是 myCA.pem

kubectl -n gitlab-agent create configmap ca-pemstore --from-file=myCA.pem

然后在 resources.yml 中:

    spec:
      serviceAccountName: gitlab-agent
      containers:
      - name: agent
        image: "registry.gitlab.com/gitlab-org/cluster-integration/gitlab-agent/agentk:<version>"
        args:
        - --token-file=/config/token
        - --kas-address
        - wss://kas.host.tld:443 # replace this line with the line below if using Omnibus GitLab or GitLab.com.
        # - wss://gitlab.host.tld:443/-/kubernetes-agent/
        # - wss://kas.gitlab.com # for GitLab.com users, use this KAS.
        # - grpc://host.docker.internal:8150 # use this attribute when connecting from Docker.
        volumeMounts:
        - name: token-volume
          mountPath: /config
        - name: ca-pemstore-volume
          mountPath: /etc/ssl/certs/myCA.pem
          subPath: myCA.pem
      volumes:
      - name: token-volume
        secret:
          secretName: gitlab-agent-token
      - name: ca-pemstore-volume
        configMap:
          name: ca-pemstore
          items:
          - key: myCA.pem
            path: myCA.pem

或者,您可以将证书文件安装在不同的位置,并为 --ca-cert-file 代理参数指定它:

      containers:
      - name: agent
        image: "registry.gitlab.com/gitlab-org/cluster-integration/gitlab-agent/agentk:<version>"
        args:
        - --ca-cert-file=/tmp/myCA.pem
        - --token-file=/config/token
        - --kas-address
        - wss://kas.host.tld:443 # replace this line with the line below if using Omnibus GitLab or GitLab.com.
        # - wss://gitlab.host.tld:443/-/kubernetes-agent/
        # - wss://kas.gitlab.com # for GitLab.com users, use this KAS.
        # - grpc://host.docker.internal:8150 # use this attribute when connecting from Docker.
        volumeMounts:
        - name: token-volume
          mountPath: /config
        - name: ca-pemstore-volume
          mountPath: /tmp/myCA.pem
          subPath: myCA.pem

Project not found

{
  "level ":"error ",
  "time ":"2022-01-05T15:18:11.331Z",
  "msg ":"GetObjectsToSynchronize.Recv failed ",
  "mod_name ":"gitops ",
  "error ":"rpc error: code = NotFound desc = project not found ",
}

当您保存清单的项目不公开时,会发生此错误。要修复它,请确保您的项目是公开的,或者您的清单文件存储在配置代理的仓库中。

Failed to perform vulnerability scan on workload: jobs.batch already exists

{
  "level": "error",
  "time": "2022-06-22T21:03:04.769Z",
  "msg": "Failed to perform vulnerability scan on workload",
  "mod_name": "starboard_vulnerability",
  "error": "running scan job: creating job: jobs.batch \"scan-vulnerabilityreport-b8d497769\" already exists"
}

极狐GitLab 代理通过创建作业来扫描每个工作负载来执行漏洞扫描。如果扫描中断,这些作业可能会被遗留下来,需要在运行更多作业之前进行清理。您可以通过运行以下命令清理这些作业:

kubectl delete jobs -l app.kubernetes.io/managed-by=starboard -n gitlab-agent