eks 5주차

시스템엔지니어 2023. 5. 23. 18:59

2023. 5. 23. 18:59

diff -y eks-oneclick3.yaml eks-oneclick4.yaml

4주차 원클릭과 5주차 원클릭의 차이는

ebs-csi-driver 설치 안함

t3 설치

xRay true로 변경

언제나처럼 실습환경 설치

Amazon EKS (myeks) 윈클릭 배포 & 기본 설정

기본 설정

# default 네임스페이스 적용
kubectl ns default

# (옵션) context 이름 변경
NICK=<각자 자신의 닉네임>
NICK=gasida
kubectl ctx
kubectl config rename-context admin@myeks.ap-northeast-2.eksctl.io $NICK@myeks

# ExternalDNS
MyDomain=<자신의 도메인>
echo "export MyDomain=<자신의 도메인>" >> /etc/profile
*MyDomain=gasida.link*
*echo "export MyDomain=gasida.link" >> /etc/profile*
MyDnzHostedZoneId=$(aws route53 list-hosted-zones-by-name --dns-name "${MyDomain}." --query "HostedZones[0].Id" --output text)
echo $MyDomain, $MyDnzHostedZoneId
curl -s -O <https://raw.githubusercontent.com/gasida/PKOS/main/aews/externaldns.yaml>
MyDomain=$MyDomain MyDnzHostedZoneId=$MyDnzHostedZoneId envsubst < externaldns.yaml | kubectl apply -f -

# kube-ops-view
helm repo add geek-cookbook <https://geek-cookbook.github.io/charts/>
helm install kube-ops-view geek-cookbook/kube-ops-view --version 1.2.2 --set env.TZ="Asia/Seoul" --namespace kube-system
kubectl patch svc -n kube-system kube-ops-view -p '{"spec":{"type":"LoadBalancer"}}'
kubectl **annotate** service kube-ops-view -n kube-system "external-dns.alpha.kubernetes.io/hostname=**kubeopsview**.$MyDomain"
echo -e "Kube Ops View URL = http://**kubeopsview**.$MyDomain:8080/#scale=1.5"

# AWS LB Controller
helm repo add eks <https://aws.github.io/eks-charts>
helm repo update
helm install aws-load-balancer-controller eks/aws-load-balancer-controller -n kube-system --set clusterName=$CLUSTER_NAME \\
  --set serviceAccount.create=false --set serviceAccount.name=aws-load-balancer-controller

# 노드 보안그룹 ID 확인
NGSGID=$(aws ec2 describe-security-groups --filters Name=group-name,Values='*ng1*' --query "SecurityGroups[*].[GroupId]" --output text)
aws ec2 authorize-security-group-ingress --group-id $NGSGID --protocol '-1' --cidr 192.168.1.100/32

프로메테우스 & 그라파나(admin / prom-operator) 설치 : 대시보드 추천 15757 17900 15172

# 사용 리전의 인증서 ARN 확인
CERT_ARN=`aws acm list-certificates --query 'CertificateSummaryList[].CertificateArn[]' --output text`
echo $CERT_ARN

# repo 추가
helm repo add prometheus-community <https://prometheus-community.github.io/helm-charts>

# 파라미터 파일 생성
cat < monitor-values.yaml
**prometheus**:
  prometheusSpec:
    podMonitorSelectorNilUsesHelmValues: false
    serviceMonitorSelectorNilUsesHelmValues: false
    retention: 5d
    retentionSize: "10GiB"

  verticalPodAutoscaler:
    enabled: true

  ingress:
    enabled: true
    ingressClassName: alb
    hosts: 
      - prometheus.$MyDomain
    paths: 
      - /*
    annotations:
      alb.ingress.kubernetes.io/scheme: internet-facing
      alb.ingress.kubernetes.io/target-type: ip
      alb.ingress.kubernetes.io/listen-ports: '[{"HTTPS":443}, {"HTTP":80}]'
      alb.ingress.kubernetes.io/certificate-arn: $CERT_ARN
      alb.ingress.kubernetes.io/success-codes: 200-399
      alb.ingress.kubernetes.io/load-balancer-name: myeks-ingress-alb
      alb.ingress.kubernetes.io/group.name: study
      alb.ingress.kubernetes.io/ssl-redirect: '443'

grafana:
  defaultDashboardsTimezone: Asia/Seoul
  adminPassword: prom-operator

  ingress:
    enabled: true
    ingressClassName: alb
    hosts: 
      - grafana.$MyDomain
    paths: 
      - /*
    annotations:
      alb.ingress.kubernetes.io/scheme: internet-facing
      alb.ingress.kubernetes.io/target-type: ip
      alb.ingress.kubernetes.io/listen-ports: '[{"HTTPS":443}, {"HTTP":80}]'
      alb.ingress.kubernetes.io/certificate-arn: $CERT_ARN
      alb.ingress.kubernetes.io/success-codes: 200-399
      alb.ingress.kubernetes.io/load-balancer-name: myeks-ingress-alb
      alb.ingress.kubernetes.io/group.name: study
      alb.ingress.kubernetes.io/ssl-redirect: '443'

defaultRules:
  create: false
kubeControllerManager:
  enabled: false
kubeEtcd:
  enabled: false
kubeScheduler:
  enabled: false
alertmanager:
  enabled: false
EOT

# 배포
kubectl create ns monitoring
helm install kube-prometheus-stack prometheus-community/kube-prometheus-stack --version 45.27.2 \\
--set prometheus.prometheusSpec.scrapeInterval='15s' --set prometheus.prometheusSpec.evaluationInterval='15s' \\
-f monitor-values.yaml --namespace monitoring

# Metrics-server 배포
**kubectl apply -f <https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml**>

EKS Node Viewer 설치 : 노드 할당 가능 용량과 요청 request 리소스 표시, 실제 파드 리소스 사용량 X - 링크

It displays the scheduled pod resource requests vs the allocatable capacity on the node. It does not look at the actual pod resource usage.

# go 설치
yum install -y go

# EKS Node Viewer 설치 : 현재 ec2 spec에서는 **설치에 다소 시간이 소요됨** = **2분 이상**
go install github.com/awslabs/eks-node-viewer/cmd/eks-node-viewer@latest

# bin 확인 및 사용 
tree ~/go/bin
**cd ~/go/bin**
**./eks-node-viewer**
3 nodes (875m/5790m) 15.1% cpu ██████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ $0.156/hour | $113.880/month
20 pods (0 pending 20 running 20 bound)

ip-192-168-3-196.ap-northeast-2.compute.internal cpu ████████░░░░░░░░░░░░░░░░░░░░░░░░░░░  22% (7 pods) t3.medium/$0.0520 On-Demand
ip-192-168-1-91.ap-northeast-2.compute.internal  cpu ████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░  12% (6 pods) t3.medium/$0.0520 On-Demand
ip-192-168-2-185.ap-northeast-2.compute.internal cpu ████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░  12% (7 pods) t3.medium/$0.0520 On-Demand
Press any key to quit

**명령 샘플**
# Standard usage
**./eks-node-viewer**

# Display both CPU and Memory Usage
**./eks-node-viewer --resources cpu,memory**

# Karenter nodes only
**./eks-node-viewer --node-selector "karpenter.sh/provisioner-name"**

# Display extra labels, i.e. AZ
**./eks-node-viewer --extra-labels topology.kubernetes.io/zone**

# Specify a particular AWS profile and region
AWS_PROFILE=myprofile AWS_REGION=us-west-2

**기본 옵션**
# select only Karpenter managed nodes
node-selector=karpenter.sh/provisioner-name

# display both CPU and memory
resources=cpu,memory

HPA, VPA, CA에 대한 설명
HPA - 스케일인/아웃, KEDA
VPA - 스케일업/다운(사용 잘 안함)
CA - 노드 확장/축소

HPA 실습 시작

# Run and expose php-apache server
curl -s -O <https://raw.githubusercontent.com/kubernetes/website/main/content/en/examples/application/php-apache.yaml>
cat php-apache.yaml | yh
**kubectl apply -f php-apache.yaml**

# 확인
kubectl exec -it deploy/php-apache -- **cat /var/www/html/index.php**
...

# 모니터링 : 터미널2개 사용
watch -d 'kubectl get hpa,pod;echo;kubectl top pod;echo;kubectl top node'
kubectl exec -it deploy/php-apache -- top

# 접속
PODIP=$(kubectl get pod -l run=php-apache -o jsonpath={.items[0].status.podIP})
curl -s $PODIP; echo

현재 노드 사용량은 위와같다.

# Create the HorizontalPodAutoscaler : requests.cpu=200m - [알고리즘](<https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/#algorithm-details>)
# Since each pod requests **200 milli-cores** by kubectl run, this means an average CPU usage of **100 milli-cores**.
kubectl autoscale deployment php-apache **--cpu-percent=50** --min=1 --max=10
**kubectl describe hpa**
...
Metrics:                                               ( current / target )
  resource cpu on pods  (as a percentage of request):  0% (1m) / 50%
Min replicas:                                          1
Max replicas:                                          10
Deployment pods:                                       1 current / 1 desired
...

# HPA 설정 확인
kubectl krew install neat
kubectl get hpa php-apache -o yaml
**kubectl get hpa php-apache -o yaml | kubectl neat | yh**
spec: 
  minReplicas: 1               # [4] 또는 최소 1개까지 줄어들 수도 있습니다
  maxReplicas: 10              # [3] 포드를 최대 5개까지 늘립니다
  **scaleTargetRef**: 
    apiVersion: apps/v1
    kind: **Deployment**
    name: **php-apache**           # [1] php-apache 의 자원 사용량에서
  **metrics**: 
  - type: **Resource**
    resource: 
      name: **cpu**
      target: 
        type: **Utilization**
        **averageUtilization**: 50  # [2] CPU 활용률이 50% 이상인 경우

# 반복 접속 1 (파드1 IP로 접속) >> 증가 확인 후 중지
while true;do curl -s $PODIP; sleep 0.5; done

# 반복 접속 2 (서비스명 도메인으로 접속) >> 증가 확인(몇개까지 증가되는가? 그 이유는?) 후 중지 >> **중지 5분 후** 파드 갯수 감소 확인
# Run this in a separate terminal
# so that the load generation continues and you can carry on with the rest of the steps
kubectl run -i --tty load-generator --rm --image=**busybox**:1.28 --restart=Never -- /bin/sh -c "while sleep 0.01; do wget -q -O- ; done"

사용량 올라가는중

위와같이 파드가 오토스케일링된것을 확인할수있다.

이번엔 KEDA

HPA는 CPU, Memory 메트릭을 기반으로 스케일 인/아웃을 하는데 KEDA는 특정 이벤트를 기반으로 스케일여부를 결정한다.

지원 메트릭이 무지하게 많다.

KEDA with Helm : 특정 **이벤트(cron 등)**기반의 파드 오토 스케일링 - Chart Grafana Cron

# KEDA 설치
cat < keda-values.yaml
metricsServer:
  useHostNetwork: true

**prometheus**:
  metricServer:
    enabled: true
    port: 9022
    portName: metrics
    path: /metrics
    serviceMonitor:
      # Enables ServiceMonitor creation for the Prometheus Operator
      enabled: true
    podMonitor:
      # Enables PodMonitor creation for the Prometheus Operator
      enabled: true
  operator:
    enabled: true
    port: 8080
    serviceMonitor:
      # Enables ServiceMonitor creation for the Prometheus Operator
      enabled: true
    podMonitor:
      # Enables PodMonitor creation for the Prometheus Operator
      enabled: true

  webhooks:
    enabled: true
    port: 8080
    serviceMonitor:
      # Enables ServiceMonitor creation for the Prometheus webhooks
      enabled: true
EOT

kubectl create namespace **keda**
helm repo add kedacore <https://kedacore.github.io/charts>
helm install **keda** kedacore/keda --version 2.10.2 --namespace **keda -f** keda-values.yaml

# KEDA 설치 확인
kubectl **get-all** -n keda
kubectl get **all** -n keda
**kubectl get crd | grep keda**

# keda 네임스페이스에 디플로이먼트 생성
**kubectl apply -f php-apache.yaml -n keda
kubectl get pod -n keda**

# ScaledObject ****정책 생성 : cron
cat < keda-cron.yaml
apiVersion: keda.sh/v1alpha1
kind: **ScaledObject**
metadata:
  name: php-apache-cron-scaled
spec:
  minReplicaCount: 0
  maxReplicaCount: 2
  pollingInterval: 30
  cooldownPeriod: 300
  **scaleTargetRef**:
    apiVersion: apps/v1
    kind: Deployment
    name: php-apache
  **triggers**:
  - type: **cron**
    metadata:
      timezone: Asia/Seoul
      **start**: 00,15,30,45 * * * *
      **end**: 05,20,35,50 * * * *
      **desiredReplicas**: "1"
EOT
**kubectl apply -f keda-cron.yaml -n keda**

# 그라파나 대시보드 추가
# 모니터링
watch -d 'kubectl get ScaledObject,hpa,pod -n keda'
kubectl get ScaledObject -w

# 확인
kubectl get ScaledObject,hpa,pod -n keda
**kubectl get hpa -o jsonpath={.items[0].spec} -n keda | jq**
...
"metrics": [
    {
      "**external**": {
        "metric": {
          "name": "s0-cron-Asia-Seoul-**00,15,30,45**xxxx-**05,20,35,50**xxxx",
          "selector": {
            "matchLabels": {
              "scaledobject.keda.sh/name": "php-apache-cron-scaled"
            }
          }
        },
        "**target**": {
          "**averageValue**": "1",
          "type": "AverageValue"
        }
      },
      "type": "**External**"
    }

# KEDA 및 deployment 등 삭제
kubectl delete -f keda-cron.yaml -n keda && kubectl delete deploy php-apache -n keda && helm uninstall keda -n keda
kubectl delete namespace keda

설치 완료, 30분에 생성되고 35분에 삭제될것이다.

29분의 상태

30분이되니 php-apache 파드가 생성된것을 확인할수있다.

CA 실습 시작

HPA는 파드확장, CA는 노드 확장개념이다.

# 현재 autoscaling(ASG) 정보 확인
# aws autoscaling describe-auto-scaling-groups --query "AutoScalingGroups[? Tags[? (Key=='eks:cluster-name') && Value=='**클러스터이름**']].[AutoScalingGroupName, MinSize, MaxSize,DesiredCapacity]" --output table
**aws autoscaling describe-auto-scaling-groups \\
    --query "AutoScalingGroups[? Tags[? (Key=='eks:cluster-name') && Value=='myeks']].[AutoScalingGroupName, MinSize, MaxSize,DesiredCapacity]" \\
    --output table**
-----------------------------------------------------------------
|                   DescribeAutoScalingGroups                   |
+------------------------------------------------+----+----+----+
|  eks-ng1-44c41109-daa3-134c-df0e-0f28c823cb47  |  3 |  3 |  3 |
+------------------------------------------------+----+----+----+

# MaxSize 6개로 수정
**export ASG_NAME=$(aws autoscaling describe-auto-scaling-groups --query "AutoScalingGroups[? Tags[? (Key=='eks:cluster-name') && Value=='myeks']].AutoScalingGroupName" --output text)
aws autoscaling update-auto-scaling-group --auto-scaling-group-name ${ASG_NAME} --min-size 3 --desired-capacity 3 --max-size 6**

# 확인
**aws autoscaling describe-auto-scaling-groups --query "AutoScalingGroups[? Tags[? (Key=='eks:cluster-name') && Value=='myeks']].[AutoScalingGroupName, MinSize, MaxSize,DesiredCapacity]" --output table**
-----------------------------------------------------------------
|                   DescribeAutoScalingGroups                   |
+------------------------------------------------+----+----+----+
|  eks-ng1-c2c41e26-6213-a429-9a58-02374389d5c3  |  3 |  6 |  3 |
+------------------------------------------------+----+----+----+

# 배포 : Deploy the Cluster Autoscaler (CA)
curl -s -O <https://raw.githubusercontent.com/kubernetes/autoscaler/master/cluster-autoscaler/cloudprovider/aws/examples/cluster-autoscaler-autodiscover.yaml>
sed -i "s//$CLUSTER_NAME/g" cluster-autoscaler-autodiscover.yaml
kubectl apply -f cluster-autoscaler-autodiscover.yaml

# 확인
kubectl get pod -n kube-system | grep cluster-autoscaler
kubectl describe deployments.apps -n kube-system cluster-autoscaler

# (옵션) cluster-autoscaler 파드가 동작하는 워커 노드가 퇴출(evict) 되지 않게 설정
kubectl -n kube-system annotate deployment.apps/cluster-autoscaler cluster-autoscaler.kubernetes.io/safe-to-evict="false"

현재 ec2 3대임을 확인할수있다.

# 모니터링 
kubectl get nodes -w
while true; do kubectl get node; echo "------------------------------" ; date ; sleep 1; done
while true; do aws ec2 describe-instances --query "Reservations[*].Instances[*].{PrivateIPAdd:PrivateIpAddress,InstanceName:Tags[?Key=='Name']|[0].Value,Status:State.Name}" --filters Name=instance-state-name,Values=running --output text ; echo "------------------------------"; date; sleep 1; done

# Deploy a Sample App
# We will deploy an sample nginx application as a ReplicaSet of 1 Pod
cat <<EoF> nginx.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-to-scaleout
spec:
  replicas: 1
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        service: nginx
        app: nginx
    spec:
      containers:
      - image: nginx
        name: nginx-to-scaleout
        resources:
          limits:
            cpu: 500m
            memory: 512Mi
          requests:
            cpu: 500m
            memory: 512Mi
EoF

kubectl apply -f nginx.yaml
kubectl get deployment/nginx-to-scaleout

# Scale our ReplicaSet
# Let’s scale out the replicaset to 15
kubectl scale --replicas=15 deployment/nginx-to-scaleout && date

# 확인
kubectl get pods -l app=nginx -o wide --watch
kubectl -n kube-system logs -f deployment/cluster-autoscaler

# 노드 자동 증가 확인
kubectl get nodes
aws autoscaling describe-auto-scaling-groups \\
    --query "AutoScalingGroups[? Tags[? (Key=='eks:cluster-name') && Value=='**myeks**']].[AutoScalingGroupName, MinSize, MaxSize,DesiredCapacity]" \\
    --output table

**./eks-node-viewer**
42 pods (0 pending 42 running 42 bound)
ip-192-168-3-196.ap-northeast-2.compute.internal cpu ███████████████████████████████████ 100% (10 pods) t3.medium/$0.0520 On-Demand
ip-192-168-1-91.ap-northeast-2.compute.internal  cpu ███████████████████████████████░░░░  89% (9 pods)  t3.medium/$0.0520 On-Demand
ip-192-168-2-185.ap-northeast-2.compute.internal cpu █████████████████████████████████░░  95% (11 pods) t3.medium/$0.0520 On-Demand
ip-192-168-2-87.ap-northeast-2.compute.internal  cpu █████████████████████████████░░░░░░  84% (6 pods)  t3.medium/$0.0520 On-Demand
ip-192-168-3-15.ap-northeast-2.compute.internal  cpu █████████████████████████████░░░░░░  84% (6 pods)  t3.medium/$0.0520 On-Demand

# 디플로이먼트 삭제
kubectl delete -f nginx.yaml && date

# **노드 갯수 축소** : 기본은 10분 후 scale down 됨, 물론 아래 flag 로 시간 수정 가능 >> 그러니 **디플로이먼트 삭제 후 10분 기다리고 나서 보자!**
# By default, cluster autoscaler will wait 10 minutes between scale down operations, 
# you can adjust this using the --scale-down-delay-after-add, --scale-down-delay-after-delete, 
# and --scale-down-delay-after-failure flag. 
# E.g. --scale-down-delay-after-add=5m to decrease the scale down delay to 5 minutes after a node has been added.

# 터미널1
watch -d kubectl get node

위와같이 배포를 진행하면

인스턴스가 생성된다. 즉 노드가 확장되는것이다.

이번엔 CPA

노드 수 증가에 비례하여 성능 처리가 필요한 애플리케이션(컨테이너/파드)를 수평으로 자동 확장하는 개념

#
helm repo add cluster-proportional-autoscaler <https://kubernetes-sigs.github.io/cluster-proportional-autoscaler>

# CPA규칙을 설정하고 helm차트를 릴리즈 필요
helm upgrade --install cluster-proportional-autoscaler cluster-proportional-autoscaler/cluster-proportional-autoscaler

# nginx 디플로이먼트 배포
cat < cpa-nginx.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment
spec:
  replicas: 1
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:latest
        resources:
          limits:
            cpu: "100m"
            memory: "64Mi"
          requests:
            cpu: "100m"
            memory: "64Mi"
        ports:
        - containerPort: 80
EOT
kubectl apply -f cpa-nginx.yaml

# CPA 규칙 설정
cat < cpa-values.yaml
config:
  ladder:
    nodesToReplicas:
      - [1, 1]
      - [2, 2]
      - [3, 3]
      - [4, 3]
      - [5, 5]
options:
  namespace: default
  target: "deployment/nginx-deployment"
EOF

# 모니터링
**watch -d kubectl get pod**

# helm 업그레이드
helm upgrade --install cluster-proportional-autoscaler -f cpa-values.yaml cluster-proportional-autoscaler/cluster-proportional-autoscaler

# 노드 5개로 증가
export ASG_NAME=$(aws autoscaling describe-auto-scaling-groups --query "AutoScalingGroups[? Tags[? (Key=='eks:cluster-name') && Value=='myeks']].AutoScalingGroupName" --output text)
aws autoscaling update-auto-scaling-group --auto-scaling-group-name ${ASG_NAME} --min-size 5 --desired-capacity 5 --max-size 5
aws autoscaling describe-auto-scaling-groups --query "AutoScalingGroups[? Tags[? (Key=='eks:cluster-name') && Value=='myeks']].[AutoScalingGroupName, MinSize, MaxSize,DesiredCapacity]" --output table

# 노드 4개로 축소
aws autoscaling update-auto-scaling-group --auto-scaling-group-name ${ASG_NAME} --min-size 4 --desired-capacity 4 --max-size 4
aws autoscaling describe-auto-scaling-groups --query "AutoScalingGroups[? Tags[? (Key=='eks:cluster-name') && Value=='myeks']].[AutoScalingGroupName, MinSize, MaxSize,DesiredCapacity]" --output table

마지막으로 karpenter 이다.

karpenter는 CA보다 더 똑똑하게 작동한다.

Kubernetes 스케줄러가 예약할 수 없다고 표시한 파드들을 감시합니다.
파드가 요청한 자원 요구사항, 노드 선택자, 어피니티, 허용 조건, 그리고 토폴로지 분산 제약조건을 평가합니다.
파드 요구사항을 충족하는 노드를 프로비저닝합니다.
파드를 새로운 노드에 예약합니다.
더 이상 필요하지 않은 경우 노드를 제거합니다.

karpenter 실습은 새로 배포하여 실습한다.

# YAML 파일 다운로드
curl -O <https://s3.ap-northeast-2.amazonaws.com/cloudformation.cloudneta.net/K8S/**karpenter-preconfig.yaml**>

# CloudFormation 스택 배포
예시) aws cloudformation deploy --template-file **karpenter-preconfig.yaml** --stack-name **myeks2** --parameter-overrides KeyName=**kp-gasida** SgIngressSshCidr=$(curl -s ipinfo.io/ip)/32  MyIamUserAccessKeyID=**AKIA5...** MyIamUserSecretAccessKey=**'CVNa2...'** ClusterBaseName=**myeks2** --region ap-northeast-2

# CloudFormation 스택 배포 완료 후 작업용 EC2 IP 출력
aws cloudformation describe-stacks --stack-name **myeks2** --query 'Stacks[*].**Outputs[0]**.OutputValue' --output text

# 작업용 EC2 SSH 접속
ssh -i **~/.ssh/kp-gasida.pem** ec2-user@$(aws cloudformation describe-stacks --stack-name **myeks2** --query 'Stacks[*].Outputs[0].OutputValue' --output text)

# IP 주소 확인 : 172.30.0.0/16 VPC 대역에서 172.30.1.0/24 대역을 사용 중
ip -br -c addr

# EKS Node Viewer 설치 : 현재 ec2 spec에서는 **설치에 다소 시간이 소요됨** = **2분 이상**
go install github.com/awslabs/eks-node-viewer/cmd/eks-node-viewer@latest

# [터미널1] bin 확인 및 사용
tree ~/go/bin
cd ~/go/bin
**./eks-node-viewer -h**
./eks-node-viewer  # EKS 배포 완료 후 실행 하자

```bash
# 환경변수 정보 확인
export | egrep 'ACCOUNT|AWS_|CLUSTER' | egrep -v 'SECRET|KEY'

# 환경변수 설정
export KARPENTER_VERSION=v0.27.5
export TEMPOUT=$(mktemp)
**echo $KARPENTER_VERSION $CLUSTER_NAME $AWS_DEFAULT_REGION $AWS_ACCOUNT_ID $TEMPOUT**

# CloudFormation 스택으로 IAM Policy, Role, EC2 Instance Profile 생성 : **3분 정도 소요**
curl -fsSL https://karpenter.sh/"${KARPENTER_VERSION}"/getting-started/getting-started-with-karpenter/cloudformation.yaml  > $TEMPOUT \
&& aws cloudformation deploy \
  --stack-name "Karpenter-${CLUSTER_NAME}" \
  --template-file "${TEMPOUT}" \
  --capabilities CAPABILITY_NAMED_IAM \
  --parameter-overrides "ClusterName=${CLUSTER_NAME}"

# 클러스터 생성 : myeks2 EKS 클러스터 생성 **19분 정도** 소요
**eksctl create cluster** -f - <<EOF
---
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
  name: ${CLUSTER_NAME}
  region: ${AWS_DEFAULT_REGION}
  version: "1.24"
  tags:
    karpenter.sh/discovery: ${CLUSTER_NAME}

iam:
  withOIDC: true
  serviceAccounts:
  - metadata:
      name: karpenter
      namespace: karpenter
    roleName: ${CLUSTER_NAME}-karpenter
    attachPolicyARNs:
    - arn:aws:iam::${AWS_ACCOUNT_ID}:policy/KarpenterControllerPolicy-${CLUSTER_NAME}
    roleOnly: true

iamIdentityMappings:
- arn: "arn:aws:iam::${AWS_ACCOUNT_ID}:role/KarpenterNodeRole-${CLUSTER_NAME}"
  username: system:node:{{EC2PrivateDNSName}}
  groups:
  - system:bootstrappers
  - system:nodes

managedNodeGroups:
- instanceType: m5.large
  amiFamily: AmazonLinux2
  name: ${CLUSTER_NAME}-ng
  desiredCapacity: 2
  minSize: 1
  maxSize: 10
  iam:
    withAddonPolicies:
      externalDNS: true

*## Optionally run on fargate
# fargateProfiles:
# - name: karpenter
#  selectors:
#  - namespace: karpenter*
EOF

**# eks 배포 확인**
eksctl get cluster
eksctl get nodegroup --cluster $CLUSTER_NAME
eksctl get **iamidentitymapping** --cluster $CLUSTER_NAME
eksctl get **iamserviceaccount** --cluster $CLUSTER_NAME
eksctl get addon --cluster $CLUSTER_NAME

# [터미널1] eks-node-viewer
cd ~/go/bin && ./eks-node-viewer

**# k8s 확인**
kubectl cluster-info
kubectl get node --label-columns=node.kubernetes.io/instance-type,eks.amazonaws.com/capacityType,topology.kubernetes.io/zone
kubectl get pod -n kube-system -owide
**kubectl describe cm -n kube-system aws-auth**
...
mapRoles:
----
- groups:
  - system:bootstrappers
  - system:**nodes**
  **rolearn**: arn:aws:iam::911283464785:role/**KarpenterNodeRole-myeks2**
  username: system:node:{{EC2PrivateDNSName}}
- groups:
  - system:bootstrappers
  - system:nodes
  rolearn: arn:aws:iam::911283464785:role/eksctl-myeks2-nodegroup-myeks2-ng-NodeInstanceRole-1KDXF4FLKKX1B
  username: system:node:{{EC2PrivateDNSName}}
...

# 카펜터 설치를 위한 환경 변수 설정 및 확인
export CLUSTER_ENDPOINT="$(aws eks describe-cluster --name ${CLUSTER_NAME} --query "cluster.endpoint" --output text)"
export KARPENTER_IAM_ROLE_ARN="arn:aws:iam::${AWS_ACCOUNT_ID}:role/${CLUSTER_NAME}-karpenter"
echo $CLUSTER_ENDPOINT $KARPENTER_IAM_ROLE_ARN

# EC2 Spot Fleet 사용을 위한 service-linked-role 생성 확인 : 만들어있는것을 확인하는 거라 아래 에러 출력이 정상!
# If the role has already been successfully created, you will see:
# An error occurred (InvalidInput) when calling the CreateServiceLinkedRole operation: Service role name AWSServiceRoleForEC2Spot has been taken in this account, please try a different suffix.
aws iam create-service-linked-role --aws-service-name spot.amazonaws.com || true

# docker logout : Logout of docker to perform an unauthenticated pull against the public ECR
docker logout public.ecr.aws

# karpenter 설치
helm upgrade --install karpenter oci://public.ecr.aws/karpenter/karpenter --version ${KARPENTER_VERSION} --namespace karpenter --create-namespace \
  --set serviceAccount.annotations."eks\.amazonaws\.com/role-arn"=${KARPENTER_IAM_ROLE_ARN} \
  --set settings.aws.clusterName=${CLUSTER_NAME} \
  --set settings.aws.defaultInstanceProfile=KarpenterNodeInstanceProfile-${CLUSTER_NAME} \
  --set settings.aws.interruptionQueueName=${CLUSTER_NAME} \
  --set controller.resources.requests.cpu=1 \
  --set controller.resources.requests.memory=1Gi \
  --set controller.resources.limits.cpu=1 \
  --set controller.resources.limits.memory=1Gi \
  --wait

# 확인
kubectl get-all -n karpenter
kubectl get all -n karpenter
kubectl get cm -n karpenter karpenter-global-settings -o jsonpath={.data} | jq
kubectl get crd | grep karpenter
```

위와같이 설치를 진행해준다.

그다음 진행

#
cat <<EOF | kubectl apply -f -
apiVersion: karpenter.sh/v1alpha5
kind: **Provisioner**
metadata:
  name: default
**spec**:
  requirements:
    - key: karpenter.sh/capacity-type
      operator: In
      values: ["**spot**"]
  limits:
    resources:
      cpu: 1000
  providerRef:
    name: default
  **ttlSecondsAfterEmpty**: 30
---
apiVersion: karpenter.k8s.aws/v1alpha1
kind: **AWSNodeTemplate**
metadata:
  name: default
spec:
  subnetSelector:
    karpenter.sh/discovery: ${CLUSTER_NAME}
  securityGroupSelector:
    karpenter.sh/discovery: ${CLUSTER_NAME}
EOF

# 확인
kubectl get awsnodetemplates,provisioners

위와같이 Provisioner 설치를 진행한다.

# pause 파드 1개에 CPU 1개 최소 보장 할당
cat <<EOF | kubectl apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
  name: inflate
spec:
  replicas: 0
  selector:
    matchLabels:
      app: inflate
  template:
    metadata:
      labels:
        app: inflate
    spec:
      terminationGracePeriodSeconds: 0
      containers:
        - name: inflate
          image: public.ecr.aws/eks-distro/kubernetes/pause:3.7
          resources:
            **requests:
              cpu: 1**
EOF
kubectl scale deployment inflate --replicas 5
kubectl logs -f -n karpenter -l app.kubernetes.io/name=karpenter -c controller

# 스팟 인스턴스 확인!
aws ec2 describe-spot-instance-requests --filters "Name=state,Values=active" --output table
kubectl get node -l karpenter.sh/capacity-type=spot -o jsonpath='{.items[0].metadata.labels}' | jq
**kubectl get node --label-columns=eks.amazonaws.com/capacityType,karpenter.sh/capacity-type,node.kubernetes.io/instance-type**
NAME                                                 STATUS   ROLES    AGE     VERSION                CAPACITYTYPE   CAPACITY-TYPE   INSTANCE-TYPE
ip-192-168-165-220.ap-northeast-2.compute.internal   Ready    <none>   2m37s   v1.24.13-eks-0a21954                  spot            c5a.2xlarge
ip-192-168-57-91.ap-northeast-2.compute.internal     Ready    <none>   13m     v1.24.13-eks-0a21954   ON_DEMAND                      m5.large
ip-192-168-75-253.ap-northeast-2.compute.internal    Ready    <none>   13m     v1.24.13-eks-0a21954   ON_DEMAND                      m5.large

실습을 진행하는 과정에서

위와같은 에러가 발생했다.

https://github.com/aws/karpenter/issues/2902

와 동일한 에러인지... 추후 확인이 필요하다.

#
kubectl delete provisioners default
cat <<EOF | kubectl apply -f -
apiVersion: karpenter.sh/v1alpha5
kind: **Provisioner**
metadata:
  name: default
spec:
  **consolidation:
    enabled: true**
  labels:
    type: karpenter
  limits:
    resources:
      cpu: 1000
      memory: 1000Gi
  providerRef:
    name: default
  requirements:
    - key: karpenter.sh/capacity-type
      operator: In
      values:
        - on-demand
    - key: node.kubernetes.io/instance-type
      operator: In
      values:
        - c5.large
        - m5.large
        - m5.xlarge
EOF

#
cat <<EOF | kubectl apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
  name: inflate
spec:
  replicas: 0
  selector:
    matchLabels:
      app: inflate
  template:
    metadata:
      labels:
        app: inflate
    spec:
      terminationGracePeriodSeconds: 0
      containers:
        - name: inflate
          image: public.ecr.aws/eks-distro/kubernetes/pause:3.7
          resources:
            requests:
              cpu: 1
EOF
kubectl scale deployment inflate --replicas 12
kubectl logs -f -n karpenter -l app.kubernetes.io/name=karpenter -c controller

# 인스턴스 확인
# This changes the total memory request for this deployment to around 12Gi, 
# which when adjusted to account for the roughly 600Mi reserved for the kubelet on each node means that this will fit on 2 instances of type m5.large:
kubectl get node -l type=karpenter
**kubectl get node --label-columns=eks.amazonaws.com/capacityType,karpenter.sh/capacity-type**
**kubectl get node --label-columns=node.kubernetes.io/instance-type,topology.kubernetes.io/zone**

# Next, scale the number of replicas back down to 5:
kubectl scale deployment inflate --replicas 5

# The output will show Karpenter identifying specific nodes to cordon, drain and then terminate:
**kubectl logs -f -n karpenter -l app.kubernetes.io/name=karpenter -c controller**
2023-05-17T07:02:00.768Z	INFO	controller.deprovisioning	deprovisioning via **consolidation delete**, terminating 1 machines ip-192-168-14-81.ap-northeast-2.compute.internal/m5.xlarge/on-demand	{"commit": "d7e22b1-dirty"}
2023-05-17T07:02:00.803Z	INFO	controller.termination	**cordoned** node	{"commit": "d7e22b1-dirty", "node": "ip-192-168-14-81.ap-northeast-2.compute.internal"}
2023-05-17T07:02:01.320Z	INFO	controller.termination	**deleted** node	{"commit": "d7e22b1-dirty", "node": "ip-192-168-14-81.ap-northeast-2.compute.internal"}
2023-05-17T07:02:39.283Z	DEBUG	controller	**deleted** launch template	{"commit": "d7e22b1-dirty", "launch-template": "karpenter.k8s.aws/9547068762493117560"}

# Next, scale the number of replicas back down to 1
kubectl scale deployment inflate --replicas 1
**kubectl logs -f -n karpenter -l app.kubernetes.io/name=karpenter -c controller**
2023-05-17T07:05:08.877Z	INFO	controller.deprovisioning	deprovisioning via consolidation delete, terminating 1 machines ip-192-168-145-253.ap-northeast-2.compute.internal/m5.xlarge/on-demand	{"commit": "d7e22b1-dirty"}
2023-05-17T07:05:08.914Z	INFO	controller.termination	cordoned node	{"commit": "d7e22b1-dirty", "node": "ip-192-168-145-253.ap-northeast-2.compute.internal"}
2023-05-17T07:05:09.316Z	INFO	controller.termination	deleted node	{"commit": "d7e22b1-dirty", "node": "ip-192-168-145-253.ap-northeast-2.compute.internal"}
2023-05-17T07:05:25.923Z	INFO	controller.deprovisioning	deprovisioning via consolidation replace, terminating 1 machines ip-192-168-48-2.ap-northeast-2.compute.internal/m5.xlarge/on-demand and replacing with on-demand machine from types m5.large, c5.large	{"commit": "d7e22b1-dirty"}
2023-05-17T07:05:25.940Z	INFO	controller.deprovisioning	launching machine with 1 pods requesting {"cpu":"1125m","pods":"4"} from types m5.large, c5.large	{"commit": "d7e22b1-dirty", "provisioner": "default"}
2023-05-17T07:05:26.341Z	DEBUG	controller.deprovisioning.cloudprovider	created launch template	{"commit": "d7e22b1-dirty", "provisioner": "default", "launch-template-name": "karpenter.k8s.aws/9547068762493117560", "launch-template-id": "lt-036151ea9df7d309f"}
2023-05-17T07:05:28.182Z	INFO	controller.deprovisioning.cloudprovider	launched instance	{"commit": "d7e22b1-dirty", "provisioner": "default", "id": "i-0eb3c8ff63724dc95", "hostname": "ip-192-168-144-98.ap-northeast-2.compute.internal", "instance-type": "c5.large", "zone": "ap-northeast-2b", "capacity-type": "on-demand", "capacity": {"cpu":"2","ephemeral-storage":"20Gi","memory":"3788Mi","pods":"29"}}
2023-05-17T07:06:12.307Z	INFO	controller.termination	cordoned node	{"commit": "d7e22b1-dirty", "node": "ip-192-168-48-2.ap-northeast-2.compute.internal"}
2023-05-17T07:06:12.856Z	INFO	controller.termination	deleted node	{"commit": "d7e22b1-dirty", "node": "ip-192-168-48-2.ap-northeast-2.compute.internal"}

# 인스턴스 확인
kubectl get node -l type=karpenter
kubectl get node --label-columns=eks.amazonaws.com/capacityType,karpenter.sh/capacity-type
kubectl get node --label-columns=node.kubernetes.io/instance-type,topology.kubernetes.io/zone

# 삭제
kubectl delete deployment inflate

마지막은 Consolidation 이다.

위와같이 새로 provisioners 새로 생성해주고

거의 실시간으로 확장이된다. !!! 굿!!!!

이번에는 위와같이 줄여보면 바로바로 줄어든다. !

저작자표시 (새창열림)

'job > eks' 카테고리의 다른 글

7주차 eks 스터디 (0)	2023.06.08
eks 6주차 (0)	2023.05.31
4주차 eks 스터디 (0)	2023.05.16
3주차 eks 스터디 (0)	2023.05.13
eks 2주차 (0)	2023.05.02

안녕하세요.

eks 5주차

'job > eks' 카테고리의 다른 글

+ Recent posts

티스토리툴바