๐Ÿ“—
smiley book
  • Smiley Books
  • AI
    • Readme
    • openai-whisper
      • ์ƒ˜ํ”Œ ์‹คํ–‰ํ•ด๋ณด๊ธฐ
      • GPU ์„œ๋ฒ„ ์ค€๋น„ํ•˜๊ธฐ
      • API๋กœ whisper๋ฅผ ์™ธ๋ถ€์— ์˜คํ”ˆํ•˜๊ธฐ
      • ํ”„๋กฌํ”„ํŠธ ์ง€์›
      • ์‹ค์‹œ๊ฐ„ message chat
      • ํ™”๋ฉด ์ด์˜๊ฒŒ ๋งŒ๋“ค๊ธฐ์™€ ๋กœ๊ทธ์ธ
      • ํŒŒ์ด์ฌ ๊ฐ€์ƒํ™˜๊ฒฝ
      • ์‹ค์‹œ๊ฐ„ voice chat
      • fine tunning(๋ฏธ์„ธ ์กฐ์ •) ์œผ๋กœ ์„ฑ๋Šฅ ์˜ฌ๋ฆฌ๊ธฐ
      • app์—์„œ api๋ฅผ ํ˜ธ์ถœํ•˜์—ฌ ์‹ค์‹œ๊ฐ„์œผ๋กœ ํ…์ŠคํŠธ๋กœ ๋ฐ”๊ฟ”๋ณด๊ธฐ
    • ollama - llm์„ ์‰ฝ๊ฒŒ ๋‚ด์ปด์—์„œ ์‹คํ–‰
      • ollama webui
      • ollama docker
    • stable diffusion
      • SDXL - text to image
      • SD-webui
    • ChatGPT
      • ๋‹ต๋ณ€์ด ๋Š๊ธธ๋•Œ
      • ์—ญํ• ์„ ์ •ํ•˜์ž
      • ๊ตฌ์ฒด์ ์ธ ์งˆ๋ฌธ
      • ๊ฒฐ๊ณผํ˜•ํƒœ๋ฅผ ์ง€์ •
      • ํ”„๋กฌํ”„ํŠธ๋ฅผ ์—ฌ๋Ÿฌ์ค„๋กœ ์‚ฌ์šฉํ•˜์ž.
      • ๋งˆํ‹ด ํŒŒ์šธ๋Ÿฌ ๊ธ€ ๋ฒˆ์—ญ๋ณธ
    • Prompt Engineering
    • Auto-GPT
    • Gemini
      • google ai studio
      • gemini-api
      • embedding guide
    • Huggingface
      • huggingface ์‚ฌ์šฉ๋ฒ•
      • huggingface nlp ๊ณต๋ถ€์ค‘
    • kaggle
      • download dataset
    • langchain
      • langchain์„ ๊ณต๋ถ€ํ•˜๋ฉฐ ์ •๋ฆฌ
      • basic
      • slackbot
      • rag
      • document-loader
      • website-loader
      • confluence
      • memory
      • function-call
      • langsmith
      • agent-toolkit
  • Ansible
    • templates vs files and jinja2
    • dynamic inventory
    • limit ์˜ต์…˜ ๊ฐ•์ œํ•˜๊ธฐ
    • limit ์‚ฌ์šฉํ›„ gather_fact ๋ฌธ์ œ
  • AWS
    • AWS CLI
    • EKS
      • cluster manage
      • ALB Controller
      • external-dns
      • fargate
    • ECR
    • S3
    • Certificate Manager
  • Azure
    • Azure AD OAuth Client Flow
  • Container
    • Registry
    • ๋นŒ๋“œ์‹œ์— env๊ฐ’ ์„ค์ •ํ•˜๊ธฐ
  • DB
    • PXC
      • Operator
      • PMM
      • ์‚ญ์ œ
      • GTID
      • Cross Site Replication
    • Mssql
    • Mysql
  • dotnet
    • Thread Pool
    • Connection Pool
    • Thread Pool2
  • Devops
    • Recommendation
  • GIT
    • Basic
    • Submodule
  • GitHub
    • Repository
    • GitHub Action
    • GitHub PR
    • Self Hosted Runner
    • GitHub Webhook
  • GitLab
    • CI/CD
    • CI/CD Advance
    • Ssl renew
    • CI/CD Pass env to other job
  • Go Lang
    • ๊ฐœ๋ฐœ ํ™˜๊ฒฝ ๊ตฌ์ถ•
    • multi os binary build
    • kubectl๊ฐ™์€ cli๋งŒ๋“ค๊ธฐ
    • azure ad cli
    • embed static file
    • go study
      • pointer
      • module and package
      • string
      • struct
      • goroutine
  • Kubernetes
    • Kubernetes๋Š” ๋ฌด์—‡์ธ๊ฐ€
    • Tools
    • Install with kubespray
    • Kubernetes hardening guidance
    • 11 ways not to get hacked
    • ArgoCD
      • Install
      • CLI
      • Repository
      • Apps
      • AWS ALB ์‚ฌ์šฉ
      • Notification slack
      • Backup / DR
      • Ingress
      • 2021-11-16 Github error
      • Server Config
      • auth0 ์ธ์ฆ ์ถ”๊ฐ€(oauth,OIDC)
    • Extension
      • Longhorn pvc
      • External dns
      • Ingress nginx
      • Cert Manager
      • Kube prometheus
    • Helm
      • Subchart
      • Tip
    • Loki
    • Persistent Volume
    • TIP
      • Job
      • Pod
      • Log
  • KAFKA
    • raft
  • KVM
    • kvm cpu model
  • Linux
    • DNS Bind9
      • Cert-Manager
      • Certbot
      • Dynamic Update
      • Log
    • Export and variable
    • Grep ์‚ฌ์šฉ๋ฒ•
  • Modeling
    • C4 model introduce
    • Mermaid
    • reference
  • Monitoring
    • Readme
    • 0. What is Monitoring
    • 1. install prometheus and grafana
    • 2. grafana provisioning
    • 3. grafana dashboard
    • 4. grafana portable dashboard
    • 5. prometheus ui
    • 6. prometheus oauth2
    • Prometheus
      • Metric type
      • basic
      • rate vs irate
      • k8s-prometheus
    • Grafana
      • Expolorer
    • Node Exporter
      • advance
      • textfile collector
  • Motivation
    • 3 Simple Rule
  • OPENNEBULA
    • Install(ansible)
    • Install
    • Tip
    • Windows vm
  • Reading
    • comfort zone
    • ๋ฐฐ๋ ค
    • elon musk 6 rule for insane productivity
    • Feynman Technique
    • how to interview - elon musk
    • ๊ฒฝ์ฒญ
    • Readme
  • Redis
    • Install
    • Master-slave Architecture
    • Sentinel
    • Redis Cluster
    • Client programming c#
  • SEO
    • Readme
  • Security
    • criminalip.io
      • criminalip.io
  • Stock
    • robinhood-python
  • Terraform
    • moved block
    • output
  • vault
    • Readme
  • VS Code
    • dev container
    • dev container on remote server
  • Old fashione trend
    • curity
    • MAAS
      • Install maas
      • Manage maas
      • Tip
Powered by GitBook
On this page
  • prometheus
  • what is kube-prometheus
  • ํด๋Ÿฌ์Šคํ„ฐ์— ์„ค์น˜ํ•˜๊ธฐ
  • ํ™•์ธ
  • prometheus
  • alertmanager
  • grafana
  • customize manifest
  • ingress๋ฅผ ํ†ตํ•ด์„œ public domain์œผ๋กœ ์ ‘๊ทผ๊ฐ€๋Šฅํ•˜๊ฒŒ
  • basic auth ์‚ฌ์šฉ
  • etcd ๋ชจ๋‹ˆํ„ฐ๋ง
  • instance๊ฐ€ ํ•˜๋‚˜์˜ ๋…ธ๋“œ์— 2๊ฐœ๋œจ๋Š”๊ฑธ ๋ฐฉ์ง€
  • alert
  • KubeSchedulerDown-alert
  • CPUThrottlingHigh-alert
  • api error burn rate
  • grafana customize
  • ingress-nginx ๋ชจ๋‹ˆํ„ฐ๋ง ํ•˜๊ธฐ
  • serviceMonitor ์ถ”๊ฐ€ํ•˜๊ธฐ
  • upgrade
  • resource์— limit์„ค์ • ์ง€์šฐ๊ธฐ
  • nginx grafana dash board ์ถ”๊ฐ€
  • node exporter listen port ๋ณ€๊ฒฝํ•˜๊ธฐ
  • kubespray์‚ฌ์šฉ์‹œ ์ฃผ์˜์‚ฌํ•ญ

Was this helpful?

  1. Kubernetes
  2. Extension

Kube prometheus

PreviousCert ManagerNextHelm

Last updated 1 year ago

Was this helpful?

prometheus

์„œ๋ฒ„์˜ ๋ฉ”ํŠธ๋ฆญ์„ ๋ชจ๋‹ˆํ„ฐ๋ง์„ ํ•˜๊ธฐ ์œ„ํ•œ ํˆด

๋ณดํ†ต์€ ๊ทธ๋ผํŒŒ๋‚˜์™€ ๊ฐ™์ด ์‚ฌ์šฉํ•˜์—ฌ ์„œ๋ฒ„ ๋ฉ”ํŠธ๋ฆญ์„ ๋ชจ๋‹ˆํ„ฐ๋งํ•œ๋‹ค.

what is kube-prometheus

k8s์—์„œ ๋ฉ”ํŠธ๋ฆญ์„ ๋ชจ๋‹ˆํ„ฐ๋ง ํ•˜๊ธฐ ์œ„ํ•œ ํˆด

์ฟ ๋ฒ„๋„คํ‹ฐ์Šคํด๋Ÿฌ์Šคํ„ฐ๋‹น ๊ฐ์ž 1๊ฐœ์”ฉ ์ถ”์ฒœ . ์™œ๋ƒ๋ฉด ์„œ๋น„์Šค ๋””์Šค์ปค๋ฒ„๋ฆฌ๋ž‘ ๋‚ด๋ถ€ pod๊ฐ€ ํ†ต์‹ ์ด ๋˜์•ผํ•œ๋‹ค. ์™ธ๋ถ€๋„ ๋ชจ๋‹ˆํ„ฐ๋ง์ด ๊ฐ€๋Šฅ์€ ํ•˜๋‚˜ ๊ตฌ์ง€ ๊ทธ๋Ÿดํ•„์š” ์—†์–ด ๋ณด์ธ๋‹ค.

์ด๊ฑด k8s ํด๋Ÿฌ์Šคํ„ฐ๋งŒ ๋ชจ๋‹ˆํ„ฐ๋ง ํ•˜๋Š”๊ฒƒ์œผ๋กœ ์‚ฌ์šฉํ•˜์ž.

๊ธฐ๋ณธ์ ์œผ๋กœ ๋ฉ”๋ชจ๋ฆฌ์—๋งŒ ์ €์žฅ๋˜๋Š”๋“ฏ ๋ณด์ธ๋‹ค. ๊ทธ๋ž˜์„œ ์ƒํƒœ๋ฅผ ์ €์žฅํ•˜๋ ค๋ฉด ์ถ”๊ฐ€ ์ž‘์—…์ด ํ•„์š”ํ•˜๋‹ค.

ํด๋Ÿฌ์Šคํ„ฐ์— ์„ค์น˜ํ•˜๊ธฐ

jb๋ผ๋Š” ์œ ํ‹ธ๋ฆฌํ‹ฐ๊ฐ€ ํ•„์š”ํ•˜๋‹ค.

brew install jsonnet-bundler # for jb command
cd ~/Desktop
git clone https://github.com/prometheus-operator/kube-prometheus.git

cd kube-prometheus

๋‚˜๋Š” argocd๋ฅผ k8s ์„ค์ •์œผ๋กœ ์‚ฌ์šฉํ•˜๊ธฐ ๋•Œ๋ฌธ์— argocd git repo์— ๋„ฃ๊ณ  ์‹ถ๋‹ค.

3๊ฐœ ํŒŒ์ผ์„ ๋ณต์‚ฌํ•˜์—ฌ argocd repo์— ๋„ฃ์ž.

cp build.sh ~/Desktop/argocd-repo/core/prometheus
cp jsonnetfile.json.sh ~/Desktop/argocd-repo/core/prometheus
cp cluster.jsonnet ~/Desktop/argocd-repo/core/prometheus/cluster.jsonnet

๋นŒ๋“œํ•˜์ž.

cd ~/Desktop/argocd-repo/core/prometheus

jsonnetfile.json ์—์„œ ์›ํ•˜๋Š” ๋ฒ„์ „์œผ๋กœ ์ˆ˜์ •ํ•˜์ž. k8s๋ฒ„์ „์— ๋”ฐ๋ผ์„œ ์‚ฌ์šฉํ• ์ˆ˜ ์žˆ๋Š” ๋ฒ„์ „์ด ๋‹ค๋ฅด๋‹ค.

๋‹ค์Œ์—์„œ ํ™•์ธ

๋นŒ๋“œํ•˜์ž.

jb install # vendor ํด๋”๊ฐ€ ์ƒ๊ธด๋‹ค.
docker run --rm -v $(pwd):$(pwd) --workdir $(pwd) quay.io/coreos/jsonnet-ci ./build.sh cluster.jsonnet

manifestํด๋”๊ฐ€ ์ƒ๊ธด๋‹ค. ์ด๊ฑธ argocd repo์— ๋„ฃ๊ณ  argocd์—์„œ app๋“ฑ๋กํ•˜๋ฉด ๋””ํ”Œ๋กœ์ด๊ฐ€ ๋˜๋Š” ๊ฒƒ์„ ๋ณผ์ˆ˜ ์žˆ๋‹ค.

cluster.jsonnet์„ ๊ฐ์ž์˜ ์ƒํ™ฉ์— ๋งž๊ฒŒ ์ด๋ฆ„์„ ๋ฐ”๊ฟ”์„œ ์‚ฌ์šฉํ•œ๋‹ค.

์ดํŒŒ์ผ์„ ์ˆ˜์ •ํ•˜๋ฉด ์ˆ˜์ •๋œ manifest๊ฐ€ ์ƒ์„ฑ์ด ๋œ๋‹ค.

ํ™•์ธ

ํฌํŠธ ํฌ์›Œ๋”ฉ์œผ๋กœ ํ™•์ธํ• ์ˆ˜ ์žˆ๋‹ค.

kubectl -n monitoring port-forward svc/prometheus-k8s 9090
kubectl -n monitoring port-forward svc/alertmanager-main 9093
kubectl -n monitoring port-forward svc/grafana 3000

prometheus

alertmanager

grafana

customize manifest

cluster.jsonnet์„ ์ˆ˜์ • ํ•ด์„œ manifest ์ˆ˜์ •

grafana/prometheus/alertmanager svc๊ฐ€ ํ˜„์žฌ๋Š” clusterip ์ธ๋ฐ node port๋กœ ๋ณ€๊ฒฝํ•ด๋ณด์ž.

kube-prometheus/addons/node-ports.libsonnet ์ด๋ถ€๋ถ„๋งŒ ์ฃผ์„ ํ•ด์ œ ํ•ด์ฃผ๋ฉด๋œ๋‹ค.

local kp =
  (import 'kube-prometheus/main.libsonnet') +
  (import 'kube-prometheus/addons/node-ports.libsonnet')
  {

๋‹ค์‹œ ๋นŒ๋“œํ•˜๊ณ  ์ปค๋ฐ‹ํ•˜๋ฉด๋œ๋‹ค.

docker run --rm -v $(pwd):$(pwd) --workdir $(pwd) quay.io/coreos/jsonnet-ci ./build.sh cluster.jsonnet

์ปค๋ฐ‹ํ›„ svc๋ฅผ ํ™•์ธํ•ด๋ณด๋ฉด ์„œ๋น„์Šค ํƒ€์ž…์ด ๋…ธ๋“œํฌํŠธ๋กœ ๋ณ€๊ฒฝ๋˜๋Š”๊ฑธ ์•Œ์ˆ˜ ์žˆ๋‹ค.

์ด์ œ ๋‹ค์‹œ ์ง€์›Œ๊ณ  ๋‹ค์‹œ ๋นŒ๋“œ ์ปค๋ฐ‹ ํ‘ธ์‹œ ํ•˜๋ฉด ์›๋ž˜๋Œ€๋กœ ๋Œ์•„์˜ค๋Š”๊ฒƒ์„ ์•Œ์ˆ˜ ์žˆ๋‹ค.

ingress๋ฅผ ํ†ตํ•ด์„œ public domain์œผ๋กœ ์ ‘๊ทผ๊ฐ€๋Šฅํ•˜๊ฒŒ

cluster.jsonnet์„ ์œ„ ํŒŒ์ผ์ฒ˜๋Ÿผ ์ˆ˜์ •ํ›„ domain์„ ๋ณ€๊ฒฝํ•ด์ฃผ๋ฉด๋œ๋‹ค.

basic auth ์‚ฌ์šฉ

auth๋ผ๋Š” ํŒŒ์ผ์„ ์ฐธ์กฐํ•˜๋Š”๊ฒƒ์„ ์•Œ์ˆ˜ ์žˆ๋‹ค.

์ด๊ฑธ ๋งŒ๋“ค๊ธฐ ์œ„ํ•ด์„œ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์ด ํ•œ๋‹ค.

USER=admin; PASSWORD=XXXXX; echo "${USER}:$(openssl passwd -stdin -apr1 <<< ${PASSWORD})" > auth

auth๋ผ๋Š” ํŒŒ์ผ์ด ์ƒ๊ฒผ๋‹ค. ๋‚ด์šฉ์„ ๋ณต์‚ฌํ•˜์—ฌ cluster.jsonnetํŒŒ์ผ๊ณผ ๊ฐ™์€ ๋””๋ ‰ํ† ๋ฆฌ์— ๋ณต์‚ฌํ•ด์„œ ๋„ฃ์–ด์ค€๋‹ค.

๋„๋ฉ”์ธ์œผ๋กœ ์ ‘๊ทผํ•˜๋ฉด basic loginํ™”๋ฉด์ด ๋‚˜์˜ค๊ณ  ์ƒ์„ฑํ•ด์ค€ id ๋น„๋ฒˆ์„ ๋„ฃ์œผ๋ฉด ๋กœ๊ทธ์ธ์ด ๋œ๋‹ค.

ํ•˜๊ณ ๋ณด๋ฉด grafana๋Š” id/password๋ฅผ ๋‘๋ฒˆ ๋„ฃ์–ด์•ผํ•˜๋Š” ๋ฌธ์ œ๊ฐ€ ์ƒ๊ธฐ๋Š”๋ฐ?

grafana๋Š” ๊ธฐ๋ณธ์ธ์ฆ์—์„œ ๋นผ๋„ ๋ ๋“ฏ ๋ณด์ธ๋‹ค. grafana๋ฅผ ์ˆ˜์ •ํ–‡๋‹ค. ingress๋ผ๋Š” ํ•จ์ˆ˜๋ฅผ ์•ˆ์“ฐ๊ณ  ์ง์ ‘ ๋„ฃ์–ด์ค€๋‹ค.

grafana: {
  apiVersion: 'networking.k8s.io/v1',
  kind: 'Ingress',
  metadata: {
    name: 'grafana',
    namespace: $.values.common.namespace,
  },
  spec: {
    rules: [{
      host: 'grafana.c3',
      http: {
        paths: [{
          path: '/',
          pathType: 'Prefix',
          backend: {
            service: {
              name: 'grafana',
              port: {
                name: 'http',
              },
            },
          },
        }],
      },
    }],
  },
},

etcd ๋ชจ๋‹ˆํ„ฐ๋ง

brew install cfssl # install cfssl

cd core/kube-prometheus

scp master1-eqix-sv5:/etc/ssl/etcd/ssl/ca.pem etcd/
scp master1-eqix-sv5:/etc/ssl/etcd/ssl/ca-key.pem etcd/

chmod 755 etcd/*.pem

vi etcd/client.json

cat <<EOF > etcd/client.json
{
  "CN": "etcd-ca",
  "hosts": [""],
  "key": {
    "algo": "rsa",
    "size": 2048
  },
  "names": [{}]
}
EOF
# Generate client certificate
cfssl gencert -ca ca.pem -ca-key ca-key.pem client.json | cfssljson -bare etcd-client

jsonnet ์„ค์ •

์—ฌ๊ธฐ ์ฐธ๊ณ ํ•˜๋ฉด๋œ๋‹ค.

์•„์ดํ”ผ๋Š” ์‚ฌ์šฉํ•˜๋Š” ์•„์ดํ”ผ ์ „๋ถ€ ๋„ฃ์–ด์ฃผ๋ฉด๋˜๊ณ  ์„œ๋ฒ„์ด๋ฆ„์€ ๋นˆ์นธ์œผ๋กœ ํ•ด๋„ ๋œ๋‹ค. insecureSkipVerify ๋Š” false๋กœ

etcd+: {
        ips: ['172.16.3.11', '172.16.3.12', '172.16.3.13'],
        clientCA: importstr 'etcd/ca.pem',
        clientKey: importstr 'etcd/etcd-client-key.pem',
        clientCert: importstr 'etcd/etcd-client.pem',
        //serverName: 'etcd.kube-system.svc.cluster.local',
        serverName: '',

        insecureSkipVerify: true,
      },

manifest๋ฅผ ์—…๋ฐ์ดํŠธํ•˜์ž.

docker run --rm -v $(pwd):$(pwd) --workdir $(pwd) quay.io/coreos/jsonnet-ci ./build.sh c4.jsonnet

prometheus ์›น์— ๊ฐ€์„œ etcd_cluster_version ์œผ๋กœ ๊ฒ€์ƒ‰ํ•ด์„œ ๋‚˜์˜ค๋ฉด ํ™•์ธ๋œ๋‹ค.

instance๊ฐ€ ํ•˜๋‚˜์˜ ๋…ธ๋“œ์— 2๊ฐœ๋œจ๋Š”๊ฑธ ๋ฐฉ์ง€

ํ˜„์žฌ alertmanager-main์ด node05์— ๋‘๊ฐœ๊ฐ€ ๋–  ์žˆ๋‹ค. ์ด๊ฑธ ๋‹ค๋ฅธ๋…ธ๋“œ์—์„œ ๋„์›Œ๋ณด์ž.

์ฐธ๊ณ ํ•ด์„œ ์ฃผ์„๋งŒ ํ•œ์ค„ ํ’€์–ด์คฌ๋‹ค.

์„œ๋กœ ๋‹ค๋ฅธ ๋…ธ๋“œ์— ๋ฐฐํฌ๋˜๋Š”๊ฒƒ์„ ํ™•์ธํ–ˆ๋‹ค.

alert

์Šฌ๋ž™์œผ๋กœ alert๋ฅผ ๋ฐ›๊ณ  ์‹ถ๋‹ค.

์ผ๋‹จ ์Šฌ๋ž™์ฑ„๋„์„ ๋งŒ๋“ค์–ด๋ณด์ž.

์‹ค์ œ ๋ฉ”์„ธ์ง€๊ฐ€ ๊ฐ€๋Š”์ง€ ํ…Œ์ŠคํŠธ ํ•œ๋‹ค.

mkdir alertmanager

cat <<EOF > alertmanager/config.yml
global:
  resolve_timeout: 1m
  slack_api_url: 'https://hooks.slack.com/services/T/B01P/ddd'
route:
  receiver: 'slack-notifications'
receivers:
  - name: 'slack-notifications'
    slack_configs:
      - channel: '#kube'
        send_resolved: true
        icon_url: 'https://avatars3.githubusercontent.com/u/3380462'
        title: '[{{ .Status | toUpper }}{{ if eq .Status "firing" }}:{{ .Alerts.Firing | len }}{{ end }}] Monitoring Event Notification'
        text: |-
          {{ range .Alerts }}
            *Alert:* {{ .Annotations.summary }} - `{{ .Labels.severity }}`
            *Description:* {{ .Annotations.description }}
            *Graph:* <{{ .GeneratorURL }}|:chart_with_upwards_trend:>
            *Details:*
            {{ range .Labels.SortedPairs }} โ€ข *{{ .Name }}:* `{{ .Value }}`
            {{ end }}
          {{ end }}
EOF

์›น ํ›„ํฌ url์„ ์ ์–ด์ฃผ๊ณ  ๋‚˜๋จธ์ง€๋Š” ์ž˜ ์ˆ˜์ •ํ•ด์„œ ๋ณด๋‚ด์ค€๋‹ค.

jsonnet ํŒŒ์ผ์„ ์ˆ˜์ •ํ•œ๋‹ค.

values+:: {
     ...

      // Change Alertmanager configuration
      alertmanager+: {
        config: importstr 'alertmanager/config.yaml',
      },

์ปดํŒŒ์ผ ํ•˜๊ณ  ์˜ฌ๋ ค๋ณด์ž.

๋ฌธ์ œ๊ฐ€ ์ƒ๊ธฐ๋ฉด ์Šฌ๋ž™์œผ๋กœ ์•Œ๋ฆผ์ด ์ž˜ ์˜จ๋‹ค.

KubeSchedulerDown-alert

KubeSchedulerDown ์•Œ๋ฆผ์ด ๊ณ„์†์˜จ๋‹ค.

values+:: {
  common+: {
    namespace: 'monitoring',
    platform: 'kubespray',
  },

์ด๊ฑธ ์ถ”๊ฐ€ํ•˜๋ฉด ์—๋Ÿฌ๊ฐ€ ์—†์–ด์ง„๋‹ค๊ณ  ํ•˜๋Š”๋ฐ ..ํ•ด๋ณด์ž.

์—†์–ด์ง„๋‹ค.

CPUThrottlingHigh-alert

CPUThrottlingHigh๊ฐ€ ๊ณ„์† ์•Œ๋ฆผ์œผ๋กœ ์˜จ๋‹ค. node-exporter๊ฐ€ cpu๊ฐ€ ๋†’๋‹ค๋Š”๊ฒƒ์ด๋‹ค.

๋‚ด์šฉ์„ ํ™•์ธํ•ด๋ณด์ž. manifestํŒŒ์ผ์„ ํ™•์ธํ•ด๋ณด๋‹ˆ ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

- alert: CPUThrottlingHigh
      annotations:
        description: '{{ $value | humanizePercentage }} throttling of CPU in namespace {{ $labels.namespace }} for container {{ $labels.container }} in pod {{ $labels.pod }}.'
        runbook_url: https://github.com/prometheus-operator/kube-prometheus/wiki/cputhrottlinghigh
        summary: Processes experience elevated CPU throttling.
      expr: |
        sum(increase(container_cpu_cfs_throttled_periods_total{container!="", }[5m])) by (container, pod, namespace)
          /
        sum(increase(container_cpu_cfs_periods_total{}[5m])) by (container, pod, namespace)
          > ( 25 / 100 )
      for: 15m
      labels:
        severity: info

๊ฐ’์ด 25์ด์ƒ์ด๋ฉด ๋ณด๋‚ด๊ฒŒ ๋˜์žˆ๋‹ค. ํ•ด๊ฒฐ๋ฐฉ์•ˆ์„ ๊ณ ๋ฏผํ•ด๋ณด์ž.

  1. 25์ด์ƒ์ด ๋ฌด๋ฆฌ๊ฐ€ ์—†๋‹ค๊ณ  ํŒ๋‹จ๋˜๋ฉด ์˜ˆ๋ฅผ๋“ค์–ด 50%๊นŒ์ง€๋Š” ์•Œ๋ฆผ์„ ๋ณด๊ณ ์‹ถ์ง€ ์•Š๋‹ค๊ณ  ํ•˜๋ฉด 25๋ฅผ 50์œผ๋กœ ๋ฐ”๊พธ๋ฉด ๋˜์ง€ ๋‚ณ์„๊ฐ€?

  2. ํ•ด๋‹น pod์˜ resource๋ฅผ ์ถ”๊ฐ€ํ•ด ์ค˜์•ผ ํ•˜์ง€ ์•Š์„๊ฐ€?

node-exporter-daemonset.yaml ์—์„œ ๋‹ค์Œ ๋ถ€๋ถ„์„ ์ˆ˜์ •ํ•ด์•ผ ํ•œ๋‹ค.

resources:
  limits:
    cpu: 250m
    memory: 180Mi
  requests:
    cpu: 102m
    memory: 180Mi

์ผ๋‹จ request๋ฅผ cpu 250m์œผ๋กœ ํ•ด๋ณด๊ณ  ์•Œ๋ฆผ์ด ์˜ค๋Š”์ง€ ํ™•์ธํ•ด๋ณด์ž.

์ผ๋‹จ ๊ธฐ์กด๋ณด๋‹ค๋Š” %๊ฐ€ ๋‚ด๋ ค๊ฐ„๊ฒƒ์„ ์•Œ์ˆ˜ ์žˆ๋‹ค.

์—ฌ์ „ํžˆ 25๊ฐ€ ๋„˜์–ด๊ฐ€๋ฉด ์•Œ๋ฆผ์ด ๋ฐœ์ƒ 50์œผ๋กœ ๋ณ€๊ฒฝํ•ด์„œ ํ…Œ์ŠคํŠธ

์•Œ๋ฆผ์ด ์ค„์–ด๋“ค์—ˆ๋‹ค.

์ด์ œ ์ปดํŒŒ์ผ์‹œ ์ € ์ˆซ์ž๋“ค์„ ๋ณ€๊ฒฝํ•ด์ค˜์•ผํ•˜๋Š”๋ฐ..

values+:: {
  ...
  kubernetesControlPlane+: {
    mixin+: {
      _config+: {
        cpuThrottlingPercent: 60,
      },
    },
  },

}

์ด๋ ‡๊ฒŒ ํ•˜๊ณ  ์ปดํŒŒ์ผ ํ‘ธ์‹œํ•˜๋ฉด ๋œ๋‹ค.

api error burn rate

์ด ์—๋Ÿฌ๊ฐ€๋‚˜์„œ ํ™•์ธํ•ด๋ดฃ๋”๋‹ˆ ๋…ธ๋“œ์—์„œ ๋‹ค์Œ ์—๋Ÿฌ๊ฐ€ ๋‚˜์˜จ๋‹ค.

Search Line limits were exceeded, some search paths have been omitted, the applied search line

/etc/resolve.conf์— ๋ณด๋ฉด ์—ฌ๋Ÿฌ๊ฐœ์˜ search์— ํ•ญ๋ชฉ์ด ์žˆ์—‡๋‹ค. ์ „๋ถ€ ์ง€์›Œ์ฃผ๋‹ˆ ์—๋Ÿฌ๋„ ์—†์–ด์กŒ๊ณ  ์•Œ๋žŒ๋„ ์—†์–ด์ก‹๋‹ค.

grafana customize

DataSource prom/loki ๋ฅผ ๊ธฐ๋ณธ์ถ”๊ฐ€, id/pass์ถ”๊ฐ€

// datasource๊ฐ€ ํ•˜๋‚˜๋„ ์—†์œผ๋ฉด prometheus datasource๋Š” ์ž๋™์œผ๋กœ ๋„ฃ์–ด์ค€๋‹ค. ๊ทธ๋Ÿฐ๋ฐ ์•„๋ž˜์ฒ˜๋Ÿผ loki๋ฅผ ๋„ฃ์–ด๋ฒ„๋ฆฌ๋ฉด prometheus datasource๊ฐ€ ์ž๋™์œผ๋กœ ์ƒ์„ฑ์ด ์•ˆ๋˜๋Š”๋“ฏ ๋ณด์ธ๋‹ค. ๊ทธ๋ž˜์„œ ์•„๋ž˜์ฒ˜๋Ÿผ ๋”ฐ๋กœ ์ถ”๊ฐ€ํ•ด์ฃผ์—ˆ๋‹ค.

์•„๋ž˜ url์€ ๊ฐ™์€ namespace์—์„œ๋Š” ์„œ๋น„์Šค์ด๋ฆ„์œผ๋กœ ์ ‘์†์ด ๊ฐ€๋Šฅํ•˜๋‹ค. ๋‹ค๋ฅด๋ฉด servicename.namespace.svc.cluster.local ์ด๋Ÿฐ์‹์œผ๋กœ ์ ‘์†์ด ๊ฐ€๋Šฅํ•˜๋‹ค. ๋˜๋Š” servicename.namespace.svc

grafana+:: {
  // Add DataSource
  datasources+: [
    {
      name: 'prometheus',
      type: 'prometheus',
      access: 'proxy',
      orgId: 1,
      url: 'http://prometheus-k8s:9090',
      editable: false,
    },
    {
      name: 'loki',
      type: 'loki',
      access: 'proxy',
      orgId: 2,
      url: 'http://core-loki-stack:3100',
      editable: false,
    },
  ],

  config+: {
    sections+: {
      'security': {
        admin_user: 'admin',
        admin_password: 'yourpassword'
      },

      server+: {
        root_url: 'https://grafana.yourdomain/',
      },
    },
  },
},

ingress-nginx ๋ชจ๋‹ˆํ„ฐ๋ง ํ•˜๊ธฐ

์—ฌ๊ธฐ๋ฅผ ๋ณด๋ฉด serviceMonitor ๋ณผ์ˆ˜ ์žˆ๋‹ค ์ด๊ฑธ ๋งŒ๋“ค์–ด ์ฃผ๋ฉด ์ฟ ๋ฒ„๋„คํ‹ฐ์Šค์—์„œ ๋ชจ๋‹ˆํ„ฐ๋ง์ด ๊ฐ€๋Šฅํ•˜๋‹ค.

๊ทธ๋Ÿฐ๋ฐ ingress-nginx๋Š” ๋ฒŒ์„œ ๊ฐ€์ง€๊ณ  ์žˆ๋‹ค ํ™œ์„ฑํ™”๋ฉด ํ•ด์ฃผ๋ฉด ๋œ๋‹ค.

- name: controller.metrics.enabled
  value: 'true'
- name: controller.metrics.serviceMonitor.enabled
  value: 'true'

์ด๋Ÿฌ๋ฉด ๋œ๋‹ค. ํ”„๋กœ๋ฉ”ํ…Œ์šฐ์Šค์—์„œ ๋ฐ์ดํ„ฐ๋ฅผ ๋ชจ์•„์„œ ๊ทธ๋ผํŒŒ๋‚˜๋กœ ๋ณด์—ฌ์ค€๋‹ค.

๊ทธ๋ผํŒŒ๋‚˜์—์„œ ๋‹ค์Œ ์ฃผ์†Œ์˜ json์„ ์ถ”๊ฐ€ํ•ด์„œ ๋Œ€์‹œ๋ณด๋“œ๋ฅผ ๋ณผ์ˆ˜ ์žˆ๋‹ค.

์‚ฌ์šฉํ•˜๋Š” ๋ฒ„์ „์— ๋”ฐ๋ผ ๋‹ค๋ฅผ์ˆ˜ ์žˆ์œผ๋‹ˆ tag๋ฅผ ๋งž์ถฐ์„œ ์‚ฌ์šฉํ•˜๋ฉด๋œ๋‹ค.

ํƒ€์ผ“์— ์ถ”๊ฐ€๋œ๊ฑธ ์•Œ์ˆ˜๊ฐ€ ์žˆ๋‹ค.

helmchart์—์„œ prometheusRule ๋ถ€๋ถ„์„ ํ™•์ธํ•˜๋ฉด alert rules๋ฅผ ์ถ”๊ฐ€ํ• ์ˆ˜ ์žˆ๋‹ค.

prometheusRule:
  enabled: true
  additionalLabels: {}
  # namespace: ""
  rules:
    # []
    # # These are just examples rules, please adapt them to your needs
    - alert: NGINXConfigFailed
      expr: count(nginx_ingress_controller_config_last_reload_successful == 0) > 0
      for: 1s
      labels:
        severity: critical
      annotations:
        description: bad ingress config - nginx config test failed
        summary: uninstall the latest ingress changes to allow config reloads to resume
    - alert: NGINXCertificateExpiry
      expr: (avg(nginx_ingress_controller_ssl_expire_time_seconds) by (host) - time()) < 604800
      for: 1s
      labels:
        severity: critical
      annotations:
        description: ssl certificate(s) will expire in less then a week
        summary: renew expiring certificates to avoid downtime
    - alert: NGINXTooMany500s
      expr: 100 * ( sum( nginx_ingress_controller_requests{status=~"5.+"} ) / sum(nginx_ingress_controller_requests) ) > 5
      for: 1m
      labels:
        severity: warning
      annotations:
        description: Too many 5XXs
        summary: More than 5% of all requests returned 5XX, this requires your attention
    - alert: NGINXTooMany400s
      expr: 100 * ( sum( nginx_ingress_controller_requests{status=~"4.+"} ) / sum(nginx_ingress_controller_requests) ) > 5
      for: 1m
      labels:
        severity: warning
      annotations:
        description: Too many 4XXs
        summary: More than 5% of all requests returned 4XX, this requires your attention

serviceMonitor ์ถ”๊ฐ€ํ•˜๊ธฐ

kind: Service
apiVersion: v1
metadata:
  name: example-app
  labels:
    app: example-app
spec:
  selector:
    app: example-app
  ports:
    - name: web
      port: 8080

---
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: example-app
  labels:
    team: frontend
spec:
  selector:
    matchLabels:
      app: example-app
  endpoints:
    - port: web

upgrade

๋ฒ„์ „ ๋ฒˆํ˜ธ๋ฅผ jsonnetfile.json ์—์„œ ์—…๋ฐ์ดํŠธํ•˜์ž.

'version': 'release-0.10'

๋นŒ๋“œํ•˜์ž.

jb update
docker run --rm -v $(pwd):$(pwd) --workdir $(pwd) quay.io/coreos/jsonnet-ci ./build.sh cluster.jsonnet

resource์— limit์„ค์ • ์ง€์šฐ๊ธฐ

ํŠน๋ณ„ํ•œ ๊ฒฝ์šฐ์— ์ด๊ฒŒ ํ•„์š”ํ•˜๋‹ค.

(import 'kube-prometheus/addons/strip-limits.libsonnet') +

์ถ”๊ฐ€ํ•˜๊ณ  ๋นŒ๋“œํ•˜์ž. ๊ฒฐ๊ณผ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

limit์ด ์ง€์›Œ์ง„๊ฑธ ์•Œ์ˆ˜ ์žˆ๋‹ค.

nginx grafana dash board ์ถ”๊ฐ€

grafana ํด๋”๋ฅผ ๋งŒ๋“ค๊ณ  ํŒŒ์ผ์„ ์ถ”๊ฐ€ํ•˜์ž.

์—ฌ๊ธฐ์„œ jsonํŒŒ์ผ 2๊ฐœ๋ฅผ ๋ฐ›์•„์„œ ์ €์žฅํ•œ๋‹ค. ๋‚˜๋Š” 1.1.0๋ฒ„์ „์˜ ingress-nginx๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค. ๊ฐ์ž์— ๋งž๋Š” ๋ฒ„์ „์œผ๋กœ ์‚ฌ์šฉํ•˜์ž.

cluster.jsonnet์—์„œ ๋‹ค์Œ์ฝ”๋“œ ์ถ”๊ฐ€ํ•˜์ž.

grafana+:: {
  dashboards+: {
    'nginx.json': (import 'grafana/nginx.json'),
    'nginx-request-handling-performance.json': (import 'grafana/request-handling-performance.json'),
  },
},

๋นŒ๋“œํ›„ ์ ์šฉํ•˜๋ฉด ๋‹ค์Œ์ฒ˜๋Ÿผ ๋ณด์ธ๋‹ค.

์™„๋ฃŒ

node exporter listen port ๋ณ€๊ฒฝํ•˜๊ธฐ

๊ธฐ๋ณธ์ ์œผ๋กœ 127.0.0.1์—์„œ๋งŒ node exporter๊ฐ’์„ ๋ฐ›๊ฒŒ ์„ค์ •์ด ๋˜์–ด์žˆ๋‹ค. ์ด๊ฑธ 0.0.0.0์œผ๋กœ ๋ณ€๊ฒฝํ•˜๋ฉด ํด๋Ÿฌ์Šคํ„ฐ์™ธ๋ถ€์—์„œ ๋ฉ”ํŠธ๋ฆญ์„ ๋ฐ›์„์ˆ˜๊ฐ€ ์žˆ๋‹ค.

๋ณ€๊ฒฝ์„ ํ•ด์•ผํ•˜๋Š”๋ฐ..

values+:: {
  ....
  // ์ด๋ถ€๋ถ„์„ ์ถ”๊ฐ€ํ•˜์ž.
  nodeExporter+: {
    listenAddress: '0.0.0.0',
  },

๊ฐ„๋‹จํžˆ ์„ค๋ช…์„ ํ•˜๋ฉด vendor/kube-prometheus/main.libsonnet ์—ฌ๊ธฐ ํŒŒ์ผ์— ๊ฐ€๋ณด๋ฉด ๋‹ค์Œ์ฒ˜๋Ÿผ ๋ณด์ธ๋‹ค.

node exporter์— ์„ค์ •์„ ํ•˜๋Š”๊ณณ์ด๋‹ค.

vendor/kube-prometheus/component/node-exporter.libsonnet ๋ฅผ ์ฐธ์กฐํ•˜๊ณ  ์žˆ์Œ๋„ ์•Œ์ˆ˜ ์žˆ๋‹ค.

ํŒŒ์ผ์„ ์—ด์–ด๋ณด์ž.

์ด๋ ‡๊ฒŒ ๋ณ€์ˆ˜๋ฅผ ์„ค์ •ํ•˜๊ณ  ์ด๊ฑธ ๋‚˜์ค‘์— ์‚ฌ์šฉํ•œ๋‹ค.

๊ทธ๋Ÿฌ๋ฏ€๋กœ cluster.jsonnet์—์„œ ์œ„ ์ฝ”๋“œ๋ฅผ ์ถ”๊ฐ€ํ•˜์—ฌ ์„ค์ •์— ๋ฐ€์–ด๋„ฃ์–ด์ฃผ๋ฉด ๊ธฐ๋ณธ๊ฐ’์„ ๋ฎ์–ด์“ฐ๊ฒŒ ๋œ๋‹ค.

kubespray์‚ฌ์šฉ์‹œ ์ฃผ์˜์‚ฌํ•ญ

์ฟ ๋ฒ„๋„คํ‹ฐ์Šค ์„ค์น˜๋ฅผ kubespray๋กœ ํ•˜์‹ ๋ถ„๋“ค์€ ๋‹ค์Œ ๋‚ด์šฉ์„ ์ถ”๊ฐ€ํ•ด์ฃผ์„ธ์š”

values+:: {
  common+: {
    namespace: 'monitoring',
    platform: 'kubespray'
  },

platform ์ข…๋ฅ˜๋Š” ๋‹ค์Œ ๋ฌธ์„œ๋ฅผ ์ฐธ๊ณ ํ•˜์ž.

์›น ํ›„ํฌ ๊ด€๋ จ ์„ค์ •์„ ํ•œ๋‹ค.

https://github.com/prometheus
https://github.com/prometheus-operator/kube-prometheus
https://github.com/prometheus-operator/kube-prometheus#compatibility
http://localhost:9090
http://localhost:9093
http://localhost:3000/login
https://github.com/prometheus-operator/kube-prometheus/blob/main/examples/ingress.jsonnet
https://github.com/prometheus-operator/kube-prometheus/blob/main/examples/etcd.jsonnet
https://github.com/prometheus-operator/kube-prometheus/blob/main/examples/anti-affinity.jsonnet
https://api.slack.com/messaging/webhooks
https://prometheus.io/docs/alerting/latest/notification_examples/
https://github.com/prometheus-operator/kube-prometheus/issues/1165
https://github.com/prometheus-operator/prometheus-operator/blob/master/Documentation/user-guides/getting-started.md
https://github.com/kubernetes/ingress-nginx/tree/main/deploy/grafana/dashboards
https://github.com/kubernetes/ingress-nginx/tree/controller-v1.1.0/deploy/grafana/dashboards
https://github.com/prometheus-operator/kube-prometheus/blob/main/docs/customizations/platform-specific.md