istio-3-2

시스템엔지니어 2025. 4. 27. 00:34

2025. 4. 27. 00:34

내용이 너무 길어서 오늘은 2개로 나눈다.

실습환경 구성은 다음과 같다.

#
git clone https://github.com/AcornPublishing/istio-in-action
cd istio-in-action/book-source-code-master
pwd # 각자 자신의 pwd 경로
code .

# 아래 extramounts 생략 시, myk8s-control-plane 컨테이너 sh/bash 진입 후 직접 git clone 가능
kind create cluster --name myk8s --image kindest/node:v1.23.17 --config - <<EOF
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
  extraPortMappings:
  - containerPort: 30000 # Sample Application (istio-ingrssgateway) HTTP
    hostPort: 30000
  - containerPort: 30001 # Prometheus
    hostPort: 30001
  - containerPort: 30002 # Grafana
    hostPort: 30002
  - containerPort: 30003 # Kiali
    hostPort: 30003
  - containerPort: 30004 # Tracing
    hostPort: 30004
  - containerPort: 30005 # Sample Application (istio-ingrssgateway) HTTPS
    hostPort: 30005
  - containerPort: 30006 # TCP Route
    hostPort: 30006
  - containerPort: 30007 # kube-ops-view
    hostPort: 30007
  extraMounts: # 해당 부분 생략 가능
  - hostPath: /Users/gasida/Downloads/istio-in-action/book-source-code-master # 각자 자신의 pwd 경로로 설정
    containerPath: /istiobook
networking:
  podSubnet: 10.10.0.0/16
  serviceSubnet: 10.200.1.0/24
EOF

# 설치 확인
docker ps

# 노드에 기본 툴 설치
docker exec -it myk8s-control-plane sh -c 'apt update && apt install tree psmisc lsof wget bridge-utils net-tools dnsutils tcpdump ngrep iputils-ping git vim -y'

# (옵션) kube-ops-view
helm repo add geek-cookbook https://geek-cookbook.github.io/charts/
helm install kube-ops-view geek-cookbook/kube-ops-view --version 1.2.2 --set service.main.type=NodePort,service.main.ports.http.nodePort=30007 --set env.TZ="Asia/Seoul" --namespace kube-system
kubectl get deploy,pod,svc,ep -n kube-system -l app.kubernetes.io/instance=kube-ops-view

## kube-ops-view 접속 URL 확인
open "http://localhost:30007/#scale=1.5"
open "http://localhost:30007/#scale=1.3"

# (옵션) metrics-server
helm repo add metrics-server https://kubernetes-sigs.github.io/metrics-server/
helm install metrics-server metrics-server/metrics-server --set 'args[0]=--kubelet-insecure-tls' -n kube-system
kubectl get all -n kube-system -l app.kubernetes.io/instance=metrics-server



# myk8s-control-plane 진입 후 설치 진행
docker exec -it myk8s-control-plane bash
-----------------------------------
# (옵션) 코드 파일들 마운트 확인
tree /istiobook/ -L 1
혹은
git clone ... /istiobook

# istioctl 설치
export ISTIOV=1.17.8
echo 'export ISTIOV=1.17.8' >> /root/.bashrc

curl -s -L https://istio.io/downloadIstio | ISTIO_VERSION=$ISTIOV sh -
cp istio-$ISTIOV/bin/istioctl /usr/local/bin/istioctl
istioctl version --remote=false

# default 프로파일 컨트롤 플레인 배포
istioctl install --set profile=default -y

# 설치 확인 : istiod, istio-ingressgateway, crd 등
kubectl get istiooperators -n istio-system -o yaml
kubectl get all,svc,ep,sa,cm,secret,pdb -n istio-system
kubectl get cm -n istio-system istio -o yaml
kubectl get crd | grep istio.io | sort

# 보조 도구 설치
kubectl apply -f istio-$ISTIOV/samples/addons
kubectl get pod -n istio-system

# 빠져나오기
exit
-----------------------------------

# istio-proxy 로그 출력 설정 : configmap 에 mesh 바로 아래에 accessLogFile 부분 추가
KUBE_EDITOR="nano"  kubectl edit cm -n istio-system istio
...
  mesh: |-
    accessLogFile: /dev/stdout
...

# 실습을 위한 네임스페이스 설정
kubectl create ns istioinaction
kubectl label namespace istioinaction istio-injection=enabled
kubectl get ns --show-labels

# istio-ingressgateway 서비스 : NodePort 변경 및 nodeport 지정 변경 , externalTrafficPolicy 설정 (ClientIP 수집)
kubectl patch svc -n istio-system istio-ingressgateway -p '{"spec": {"type": "NodePort", "ports": [{"port": 80, "targetPort": 8080, "nodePort": 30000}]}}'
kubectl patch svc -n istio-system istio-ingressgateway -p '{"spec": {"type": "NodePort", "ports": [{"port": 443, "targetPort": 8443, "nodePort": 30005}]}}'
kubectl patch svc -n istio-system istio-ingressgateway -p '{"spec":{"externalTrafficPolicy": "Local"}}'
kubectl describe svc -n istio-system istio-ingressgateway

# NodePort 변경 및 nodeport 30001~30003으로 변경 : prometheus(30001), grafana(30002), kiali(30003), tracing(30004)
kubectl patch svc -n istio-system prometheus -p '{"spec": {"type": "NodePort", "ports": [{"port": 9090, "targetPort": 9090, "nodePort": 30001}]}}'
kubectl patch svc -n istio-system grafana -p '{"spec": {"type": "NodePort", "ports": [{"port": 3000, "targetPort": 3000, "nodePort": 30002}]}}'
kubectl patch svc -n istio-system kiali -p '{"spec": {"type": "NodePort", "ports": [{"port": 20001, "targetPort": 20001, "nodePort": 30003}]}}'
kubectl patch svc -n istio-system tracing -p '{"spec": {"type": "NodePort", "ports": [{"port": 80, "targetPort": 16686, "nodePort": 30004}]}}'

# Prometheus 접속 : envoy, istio 메트릭 확인
open http://127.0.0.1:30001

# Grafana 접속
open http://127.0.0.1:30002

# Kiali 접속 1 : NodePort
open http://127.0.0.1:30003

# (옵션) Kiali 접속 2 : Port forward
kubectl port-forward deployment/kiali -n istio-system 20001:20001 &
open http://127.0.0.1:20001

# tracing 접속 : 예거 트레이싱 대시보드
open http://127.0.0.1:30004


# 접속 테스트용 netshoot 파드 생성
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
  name: netshoot
spec:
  containers:
  - name: netshoot
    image: nicolaka/netshoot
    command: ["tail"]
    args: ["-f", "/dev/null"]
  terminationGracePeriodSeconds: 0
EOF

6.1 Building resilience into the application

마이크로서비스는 분산 환경에서 불가피한 장애에 대비해, 재시도, 타임아웃, 서킷 브레이커 등 복원력 패턴을 일관되게 적용하여 서비스 연쇄 장애를 방지하고 전체 시스템의 안정성과 가용성을 높여야 합니다.

서비스 메시 기술이 등장하기 전에는, 개발자들이 각 애플리케이션 코드에 직접 복원력 패턴(재시도, 타임아웃, 서킷 브레이커 등)을 구현해야 했고, 트위터 Finagle, 넷플릭스 Hystrix·Ribbon 같은 오픈소스 프레임워크가 등장했지만, 언어와 프레임워크마다 구현 방식이 달라 유지보수와 일관성 확보에 어려움이 있었습니다.

이스티오의 서비스 프록시는 각 애플리케이션 옆에 사이드카로 배포되어 모든 네트워크 트래픽을 가로채고, 애플리케이션 코드 수정 없이 재시도, 타임아웃, 서킷 브레이킹, 클라이언트 측 로드 밸런싱 등 다양한 복원력 패턴을 프록시 레벨에서 일관되게 적용할 수 있게 해줍니다.

이스티오는 애플리케이션 인스턴스 옆에 배치된 사이드카 프록시를 통해 복원력 패턴(재시도/서킷브레이커 등)을 중앙 게이트웨이 없이 분산 처리하며, 기존의 중앙 집중식 하드웨어/미들웨어 방식보다 동적 클라우드 환경에 적합한 유연성과 확장성을 제공합니다.

6.2 Client-side load balancing 클라이언트 측 로드 밸런싱 (실습)

Server-side 로드 밸런싱은 중앙 집중식 장치가 트래픽을 분배하는 방식이고, client-side 로드 밸런싱은 클라이언트(또는 프록시)가 엔드포인트 정보를 직접 받아 분산 처리하는 방식으로, 분산성과 유연성은 client-side가 높지만, 관리와 보안은 server-side가 더 용이합니다.

실습시작

kubectl delete gw,vs,deploy,svc,destinationrule --all -n istioinaction

# (옵션) kiali 에서 simple-backend-1,2 버전 확인을 위해서 labels 설정 : ch6/simple-backend.yaml
open ch6/simple-backend.yaml
...
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: simple-backend
    version: v1
  name: simple-backend-1
spec:
  replicas: 1
  selector:
    matchLabels:
      app: simple-backend
  template:
    metadata:
      labels:
        app: simple-backend
        version: v1
...
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: simple-backend
    version: v2
  name: simple-backend-2
spec:
  replicas: 2
  selector:
    matchLabels:
      app: simple-backend
  template:
    metadata:
      labels:
        app: simple-backend
        version: v2
...

# 예제 서비스 2개 배포
kubectl apply -f ch6/simple-backend.yaml -n istioinaction
kubectl apply -f ch6/simple-web.yaml -n istioinaction

# 확인
kubectl get deploy,pod,svc,ep -n istioinaction -o wide
NAME                               READY   UP-TO-DATE   AVAILABLE   AGE    CONTAINERS       IMAGES                                 SELECTOR
deployment.apps/simple-backend-1   1/1     1            1           105m   simple-backend   nicholasjackson/fake-service:v0.17.0   app=simple-backend
deployment.apps/simple-backend-2   2/2     2            2           105m   simple-backend   nicholasjackson/fake-service:v0.17.0   app=simple-backend
deployment.apps/simple-web         1/1     1            1           105m   simple-web       nicholasjackson/fake-service:v0.17.0   app=simple-web
...

# gw,vs 배포
cat ch6/simple-web-gateway.yaml
apiVersion: networking.istio.io/v1alpha3
kind: Gateway
metadata:
  name: simple-web-gateway
spec:
  selector:
    istio: ingressgateway
  servers:
  - port:
      number: 80
      name: http
      protocol: HTTP
    hosts:
    - "simple-web.istioinaction.io"
---
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: simple-web-vs-for-gateway
spec:
  hosts:
  - "simple-web.istioinaction.io"
  gateways:
  - simple-web-gateway
  http:
  - route:
    - destination:
        host: simple-web
        
kubectl apply -f ch6/simple-web-gateway.yaml -n istioinaction

# 확인
kubectl get gw,vs -n istioinaction

docker exec -it myk8s-control-plane istioctl proxy-status
NAME                                                  CLUSTER        CDS        LDS        EDS        RDS        ECDS         ISTIOD                      VERSION
istio-ingressgateway-996bc6bb6-ztcx5.istio-system     Kubernetes     SYNCED     SYNCED     SYNCED     SYNCED     NOT SENT     istiod-7df6ffc78d-xmjbj     1.17.8
simple-backend-1-7449cc5945-d9zmc.istioinaction       Kubernetes     SYNCED     SYNCED     SYNCED     SYNCED     NOT SENT     istiod-7df6ffc78d-xmjbj     1.17.8
simple-backend-2-6876494bbf-vdttr.istioinaction       Kubernetes     SYNCED     SYNCED     SYNCED     SYNCED     NOT SENT     istiod-7df6ffc78d-xmjbj     1.17.8
simple-backend-2-6876494bbf-zn6v9.istioinaction       Kubernetes     SYNCED     SYNCED     SYNCED     SYNCED     NOT SENT     istiod-7df6ffc78d-xmjbj     1.17.8
simple-web-7cd856754-tjdv6.istioinaction              Kubernetes     SYNCED     SYNCED     SYNCED     SYNCED     NOT SENT     istiod-7df6ffc78d-xmjbj     1.17.8


# 도메인 질의를 위한 임시 설정 : 실습 완료 후에는 삭제 해둘 것
echo "127.0.0.1       simple-web.istioinaction.io" | sudo tee -a /etc/hosts
cat /etc/hosts | tail -n 3

# 호출
curl -s http://simple-web.istioinaction.io:30000
open http://simple-web.istioinaction.io:30000

# 신규 터미널 : 반복 접속 실행 해두기
while true; do curl -s http://simple-web.istioinaction.io:30000 | jq ".upstream_calls[0].body" ; date "+%Y-%m-%d %H:%M:%S" ; sleep 1; echo; done


# 로그 확인
kubectl stern -l app=simple-web -n istioinaction
kubectl stern -l app=simple-web -n istioinaction -c istio-proxy
kubectl stern -l app=simple-web -n istioinaction -c simple-web
kubectl stern -l app=simple-backend -n istioinaction
kubectl stern -l app=simple-backend -n istioinaction -c istio-proxy
kubectl stern -l app=simple-backend -n istioinaction -c simple-backend


# (옵션) proxy-config
# proxy-config : simple-web
docker exec -it myk8s-control-plane istioctl proxy-config listener deploy/simple-web.istioinaction

docker exec -it myk8s-control-plane istioctl proxy-config routes deploy/simple-web.istioinaction
docker exec -it myk8s-control-plane istioctl proxy-config routes deploy/simple-web.istioinaction | grep backend
80                                                            simple-backend, simple-backend.istioinaction + 1 more...     /* 

docker exec -it myk8s-control-plane istioctl proxy-config cluster deploy/simple-web.istioinaction --fqdn simple-backend.istioinaction.svc.cluster.local
SERVICE FQDN                                       PORT     SUBSET     DIRECTION     TYPE     DESTINATION RULE
simple-backend.istioinaction.svc.cluster.local     80       -          outbound      EDS

docker exec -it myk8s-control-plane istioctl proxy-config cluster deploy/simple-web.istioinaction --fqdn simple-backend.istioinaction.svc.cluster.local -o json
...
       "name": "outbound|80||simple-backend.istioinaction.svc.cluster.local",
        "type": "EDS",
        "edsClusterConfig": {
            "edsConfig": {
                "ads": {},
                "initialFetchTimeout": "0s",
                "resourceApiVersion": "V3"
            },
            "serviceName": "outbound|80||simple-backend.istioinaction.svc.cluster.local"
        },
        "connectTimeout": "10s",
        "lbPolicy": "LEAST_REQUEST",
...

docker exec -it myk8s-control-plane istioctl proxy-config endpoint deploy/simple-web.istioinaction
docker exec -it myk8s-control-plane istioctl proxy-config endpoint deploy/simple-web.istioinaction --cluster 'outbound|80||simple-backend.istioinaction.svc.cluster.local'
ENDPOINT            STATUS      OUTLIER CHECK     CLUSTER
10.10.0.14:8080     HEALTHY     OK                outbound|80||simple-backend.istioinaction.svc.cluster.local
10.10.0.15:8080     HEALTHY     OK                outbound|80||simple-backend.istioinaction.svc.cluster.local
10.10.0.16:8080     HEALTHY     OK                outbound|80||simple-backend.istioinaction.svc.cluster.local

docker exec -it myk8s-control-plane istioctl proxy-config endpoint deploy/simple-web.istioinaction --cluster 'outbound|80||simple-backend.istioinaction.svc.cluster.local' -o json
...

이스티오 DestinationRule 리소스로 simple-backend 서비스를 호출하는 모든 클라이언트의 로드 밸런싱을 ROUND_ROBIN으로 설정하자.
**DestinationRule**는 특정 목적지를 호출하는 메시 내 클라이언트들에 정책을 지정한다.
simple-backend 용 첫 DestinationRule는 다음과 같다.

# cat ch6/simple-backend-dr-rr.yaml
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
  name: simple-backend-dr
spec:
  host: simple-backend.istioinaction.svc.cluster.local
  trafficPolicy:
    loadBalancer:
      simple: ROUND_ROBIN # 엔드포인트 결정을 '순서대로 돌아가며'
      
      
# DestinationRule 적용 : ROUND_ROBIN
cat ch6/simple-backend-dr-rr.yaml
kubectl apply -f ch6/simple-backend-dr-rr.yaml -n istioinaction

# 확인 : DestinationRule 단축어 dr
kubectl get dr -n istioinaction
NAME                HOST                                             AGE
simple-backend-dr   simple-backend.istioinaction.svc.cluster.local   11s

kubectl get destinationrule simple-backend-dr -n istioinaction \
 -o jsonpath='{.spec.trafficPolicy.loadBalancer.simple}{"\n"}'
ROUND_ROBIN

# 호출 : 이 예시 서비스 집합에서는 호출 체인을 보여주는 JSON 응답을 받느다
## simple-web 서비스는 simple-backend 서비스를 호출하고, 우리는 궁극적으로 simple-backend-1 에서 온 응답 메시지 Hello를 보게 된다.
## 몇 번 더 반복하면 simple-backend-1 과 simple-backend-2 에게 응답을 받는다. 
curl -s http://simple-web.istioinaction.io:30000 | jq ".upstream_calls[0].body"

# 반복 호출 확인 : 파드 비중은 backend-2가 2개임
for in in {1..10}; do curl -s http://simple-web.istioinaction.io:30000 | jq ".upstream_calls[0].body"; done
for in in {1..50}; do curl -s http://simple-web.istioinaction.io:30000 | jq ".upstream_calls[0].body"; done | sort | uniq -c | sort -nr


# 로그 확인 : backend 요청을 하면 요청을 처리할 redirect 주소를 응답 (301), 전달 받은 redirect(endpoint)로 다시 요청
kubectl stern -l app=simple-web -n istioinaction -c istio-proxy
## simpleweb → simple-backend (301) redirect 응답 수신
simple-web-7cd856754-tjdv6 istio-proxy [2025-04-20T04:22:24.317Z] "GET // HTTP/1.1" 301 - via_upstream - "-" 0 36 3 3 "172.18.0.1" "curl/8.7.1" "ee707715-7e7c-42c3-a404-d3ee22f79d11" "simple-backend:80" "10.10.0.16:8080" outbound|80||simple-backend.istioinaction.svc.cluster.local 10.10.0.17:46590 10.200.1.161:80 172.18.0.1:0 - default
## simpleweb → simple-backend (200)
simple-web-7cd856754-tjdv6 istio-proxy [2025-04-20T04:22:24.324Z] "GET / HTTP/1.1" 200 - via_upstream - "-" 0 278 156 156 "172.18.0.1" "curl/8.7.1" "ee707715-7e7c-42c3-a404-d3ee22f79d11" "simple-backend:80" "10.10.0.14:8080" outbound|80||simple-backend.istioinaction.svc.cluster.local 10.10.0.17:38336 10.200.1.161:80 172.18.0.1:0 - default
## simpleweb → 외부 curl 응답(200)
simple-web-7cd856754-tjdv6 istio-proxy [2025-04-20T04:22:24.307Z] "GET / HTTP/1.1" 200 - via_upstream - "-" 0 889 177 177 "172.18.0.1" "curl/8.7.1" "ee707715-7e7c-42c3-a404-d3ee22f79d11" "simple-web.istioinaction.io:30000" "10.10.0.17:8080" inbound|8080|| 127.0.0.6:40981 10.10.0.17:8080 172.18.0.1:0 outbound_.80_._.simple-web.istioinaction.svc.cluster.local default

kubectl stern -l app=simple-backend -n istioinaction -c istio-proxy
## simple-backend → (응답) simpleweb (301)
simple-backend-2-6876494bbf-zn6v9 istio-proxy [2025-04-20T04:22:45.209Z] "GET // HTTP/1.1" 301 - via_upstream - "-" 0 36 3 3 "172.18.0.1" "curl/8.7.1" "71ba286a-a45f-41bc-9b57-69710ea576d7" "simple-backend:80" "10.10.0.14:8080" inbound|8080|| 127.0.0.6:54105 10.10.0.14:8080 172.18.0.1:0 outbound_.80_._.simple-backend.istioinaction.svc.cluster.local default
## simple-backend → (응답) simpleweb (200)
simple-backend-1-7449cc5945-d9zmc istio-proxy [2025-04-20T04:22:45.216Z] "GET / HTTP/1.1" 200 - via_upstream - "-" 0 278 152 152 "172.18.0.1" "curl/8.7.1" "71ba286a-a45f-41bc-9b57-69710ea576d7" "simple-backend:80" "10.10.0.15:8080" inbound|8080|| 127.0.0.6:43705 10.10.0.15:8080 172.18.0.1:0 outbound_.80_._.simple-backend.istioinaction.svc.cluster.local default

#
docker exec -it myk8s-control-plane istioctl proxy-config cluster deploy/simple-web.istioinaction --fqdn simple-backend.istioinaction.svc.cluster.local -o json
...
       "name": "outbound|80||simple-backend.istioinaction.svc.cluster.local",
        "type": "EDS",
        "edsClusterConfig": {
            "edsConfig": {
                "ads": {},
                "initialFetchTimeout": "0s",
                "resourceApiVersion": "V3"
            },
            "serviceName": "outbound|80||simple-backend.istioinaction.svc.cluster.local"
        },
        "connectTimeout": "10s",
        "lbPolicy": "LEAST_REQUEST", # RR은 기본값(?)이여서, 해당 부분 설정이 이전과 다르게 없다
...

# 
docker exec -it myk8s-control-plane istioctl proxy-config endpoint deploy/simple-web.istioinaction --cluster 'outbound|80||simple-backend.istioinaction.svc.cluster.local' -o json

위와같이 라운드로빈에 의해 29대21정도로 접속되는것을 확인할 수 있다. 이는 백앤드2 서비스가 파드가 더 많아서 그렇다.

부하 생성기를 사용해 simple-backend 서비스 지연 시간을 변화시키는 어느 정도 현실적인 시나리오를 살펴보자.
그러면 이런 상황에서 어떤 이스티오의 로드 밸런싱 전략이 가장 적합한지 선택하는 데 도움이 될 것이다.

우리는 Fortio 라는 CLI 부하 생성 도구를 사용해 서비스를 실행하고 클라이언트 측 로드 밸런싱의 차이를 관찰할 것이다.

# mac 설치
brew install fortio
fortio -h
fortio server
open http://127.0.0.1:8080/fortio

# windows 설치
1. 다운로드 https://github.com/fortio/fortio/releases/download/v1.69.3/fortio_win_1.69.3.zip
2. 압축 풀기
3. Windows Command Prompt : fortio.exe server
4. Once fortio server is running, you can visit its web UI at http://localhost:8080/fortio/

이제 Fortio 로드 테스트 클라이언트를 사용할 준비가 됐으므로 사용 사례를 살펴보자.
Fortio를 사용해서 60초 동안 10개의 커넥션을 통해 초당 1000개의 요청을 보낼 것이다.
- Fortio to send 1,000 rps (requests per seconds) for 60 seconds through 10 connections
Fortio는 각 호출의 지연 시간을 추적하고 지연 시간 백분위수 분석과 함께 히스토그램에 표시한다.
테스트를 하기 전에 지연 시간을 1초까지 늘린 simple-backend-1 서비스를 도입할 것이다.
이는 엔드포인트 중 하나에 긴 가비지 컬렉션 이벤트 또는 기타 애플리케이션 지연 시간이 발생한 상황을 시뮬레이션한다.
우리는 로드 밸런싱 전략을 라운드 로빈, 랜덤, 최소 커넥션으로 바꿔가면서 차이점을 관찰할 것이다.

#
cat ch6/simple-backend-delayed.yaml
...
      - env:
        - name: "LISTEN_ADDR"
          value: "0.0.0.0:8080"
        - name: "SERVER_TYPE"
          value: "http"                      
        - name: "NAME"
          value: "simple-backend"      
        - name: "MESSAGE"
          value: "Hello from simple-backend-1"                     
        - name: "TIMING_VARIANCE"
          value: "10ms"                              
        - name: "TIMING_50_PERCENTILE"
          value: "1000ms"                                      
        - name: KUBERNETES_NAMESPACE
          valueFrom:
            fieldRef:
              fieldPath: metadata.namespace
        image: nicholasjackson/fake-service:v0.17.0
...
kubectl apply -f ch6/simple-backend-delayed.yaml -n istioinaction
kubectl rollout restart deployment -n istioinaction simple-backend-1

# 확인???
kubectl exec -it deploy/simple-backend-1 -n istioinaction -- env | grep TIMING
TIMING_VARIANCE=10ms
TIMING_50_PERCENTILE=150ms # ???

kubectl exec -it deploy/simple-backend-2 -n istioinaction -- env | grep TIMING
TIMING_VARIANCE=10ms
TIMING_50_PERCENTILE=150ms

# 직접 deploy 편집 수정???
KUBE_EDITOR="nano" kubectl edit deploy/simple-backend-1 -n istioinaction
...
      - name: TIMING_50_PERCENTILE
          value: 1000ms
...

kubectl rollout restart deployment -n istioinaction simple-backend-1
kubectl exec -it deploy/simple-backend-1 -n istioinaction -- env | grep TIMING
TIMING_VARIANCE=10ms
TIMING_50_PERCENTILE=150ms # ???


# 동작 중 파드에 env 직접 수정..
kubectl exec -it deploy/simple-backend-1 -n istioinaction -- sh
-----------------------------------
export TIMING_50_PERCENTILE=1000ms
exit
-----------------------------------

#
kubectl describe pod -n istioinaction -l app=simple-backend | grep TIMING_50_PERCENTILE:
      TIMING_50_PERCENTILE:  1000ms
      TIMING_50_PERCENTILE:  150ms
      TIMING_50_PERCENTILE:  150ms

# 테스트
curl -s http://simple-web.istioinaction.io:30000 | grep duration  
curl -s http://simple-web.istioinaction.io:30000 | grep duration 
curl -s http://simple-web.istioinaction.io:30000 | grep duration              
  "duration": "1.058699s",
      "duration": "1.000934s",

설정중 특이사항은 정상적으로 TIMING_50_PERCENTILE 반영이 안된다. 그래서 파드에 직접 접속해서 변수를 지정하는 방법을 사용한다.

이제 fortio를 다음과 같이 설정한다.

Title : roundrobin
URL : http://simple-web.istioinaction.io:30000
QPS : 1000
Duration : 60s
Threads : 10
Jitter: Check
No Catch-up : Uncheck
Extra Headers
- User-Agent: fortio
Timeout : 2000ms

⇒ Start 클릭

여기서 특이사항은 실습에선 URL부분에 http를 넣었는데, 나는 넣으면 에러난다. 그래서 빼고 했다.

이제 로드밸런싱 알고리즘을 리스트 커넥션으로 바꿔본다.

#
cat ch6/simple-backend-dr-least-conn.yaml
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
  name: simple-backend-dr
spec:
  host: simple-backend.istioinaction.svc.cluster.local
  trafficPolicy:
    loadBalancer:
      simple: LEAST_CONN

kubectl apply -f ch6/simple-backend-dr-least-conn.yaml -n istioinaction

# 확인
kubectl get destinationrule simple-backend-dr -n istioinaction \
 -o jsonpath='{.spec.trafficPolicy.loadBalancer.simple}{"\n"}'

#
docker exec -it myk8s-control-plane istioctl proxy-config cluster deploy/simple-web.istioinaction --fqdn simple-backend.istioinaction.svc.cluster.local -o json | grep lbPolicy
"lbPolicy": "LEAST_REQUEST",

사용자경험이 좋아졌다.

6.3 Locality-aware load balancing 지역 인식 로드 밸런싱 (실습)

이스티오 컨트롤 플레인은 서비스 토폴로지 분석을 기반으로 동일 리전/가용 영역 내 서비스 호출을 우선시하는 지능형 로드 밸런싱을 자동화하여 네트워크 지연 시간과 비용을 최소화합니다.

이스티오는 쿠버네티스 노드의 리전/영역 레이블(예: topology.kubernetes.io/region)을 기반으로 지역 인식 로드 밸런싱을 수행하며, 단일 노드 환경에서는 파드에 istio-locality 레이블을 직접 설정해 테스트할 수 있습니다.

# cat ch6/simple-service-locality.yaml
---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: simple-web
  name: simple-web
spec:
  replicas: 1
  selector:
    matchLabels:
      app: simple-web
  template:
    metadata:
      labels:
        app: simple-web
        istio-locality: us-west1.us-west1-a
...
---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: simple-backend
  name: simple-backend-1
spec:
  replicas: 1
  selector:
    matchLabels:
      app: simple-backend
  template:
    metadata:
      labels:
        app: simple-backend
        istio-locality: us-west1.us-west1-a
        version: v1 # 추가해두자!
...
---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: simple-backend
  name: simple-backend-2
spec:
  replicas: 2
  selector:
    matchLabels:
      app: simple-backend
  template:
    metadata:
      labels:
        app: simple-backend
        istio-locality: us-west1.us-west1-b
        version: v2 # 추가해두자!
...

simple-backend 서비스를 배포할 때 지역 레이블을 다양하게 지정할 것이다.
simple-web과 같은 지역인 us-west1-a 에 simple-backend-1을 배포한다.
그리고 us-west1-b 에 simple-backend-2 를 배포한다. 이 경우, 리전은 동일하지만 영역이 다르다.
지역 간에 로드 밸런싱을 수행할 수 있는 이스티오의 기능에는 리전, 영역, 심지어는 더 세밀한 하위 영역 subzone 도 포함된다.

#
kubectl apply -f ch6/simple-service-locality.yaml -n istioinaction

# 확인
## simple-backend-1 : us-west1-a  (same locality as simple-web)
kubectl get deployment.apps/simple-backend-1 -n istioinaction \
-o jsonpath='{.spec.template.metadata.labels.istio-locality}{"\n"}'
us-west1.us-west1-a

## simple-backend-2 : us-west1-b
kubectl get deployment.apps/simple-backend-2 -n istioinaction \
-o jsonpath='{.spec.template.metadata.labels.istio-locality}{"\n"}'
us-west1.us-west1-b

이스티오의 지역 인식 로드 밸런싱은 기본적으로 활성화되어 동일 지역의 서비스로 트래픽을 우선 분배하지만, 인스턴스 수 불균형 등 실제 부하 특성에 따라 설정을 세밀하게 튜닝하는 것이 중요합니다.

호출테스트 1

# 신규 터미널 : 반복 접속 실행 해두기
while true; do curl -s http://simple-web.istioinaction.io:30000 | jq ".upstream_calls[0].body" ; date "+%Y-%m-%d %H:%M:%S" ; sleep 1; echo; done

# 호출 : 이 예시 서비스 집합에서는 호출 체인을 보여주는 JSON 응답을 받느다
curl -s http://simple-web.istioinaction.io:30000 | jq ".upstream_calls[0].body"

# 반복 호출 확인 : 파드 비중은 backend-2가 2개임
for in in {1..10}; do curl -s http://simple-web.istioinaction.io:30000 | jq ".upstream_calls[0].body"; done
for in in {1..50}; do curl -s http://simple-web.istioinaction.io:30000 | jq ".upstream_calls[0].body"; done | sort | uniq -c | sort -nr

위와같이 1,2 왔다갔다 한다. 즉 다른 리전으로도 트래픽이 흐르는것.

헬스체크를 설정해서 올바르게 트래픽이 흐르도록(동일 리전으로) 설정해보자.

#
cat ch6/simple-backend-dr-outlier.yaml
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
  name: simple-backend-dr
spec:
  host: simple-backend.istioinaction.svc.cluster.local
  trafficPolicy:
    outlierDetection:
      consecutive5xxErrors: 1
      interval: 5s
      baseEjectionTime: 30s
      maxEjectionPercent: 100

kubectl apply -f ch6/simple-backend-dr-outlier.yaml -n istioinaction

# 확인
kubectl get dr -n istioinaction simple-backend-dr -o jsonpath='{.spec}' | jq


# 반복 호출 확인 : 파드 비중 확인
for in in {1..10}; do curl -s http://simple-web.istioinaction.io:30000 | jq ".upstream_calls[0].body"; done
for in in {1..50}; do curl -s http://simple-web.istioinaction.io:30000 | jq ".upstream_calls[0].body"; done | sort | uniq -c | sort -nr


# proxy-config : simple-web 에서 simple-backend 정보 확인
docker exec -it myk8s-control-plane istioctl proxy-config cluster deploy/simple-web.istioinaction --fqdn simple-backend.istioinaction.svc.cluster.local        
docker exec -it myk8s-control-plane istioctl proxy-config cluster deploy/simple-web.istioinaction --fqdn simple-backend.istioinaction.svc.cluster.local -o json
...
        },
        "outlierDetection": {
            "consecutive5xx": 1,
            "interval": "5s",
            "baseEjectionTime": "30s",
            "maxEjectionPercent": 100,
            "enforcingConsecutive5xx": 100,
            "enforcingSuccessRate": 0
        },
...

docker exec -it myk8s-control-plane istioctl proxy-config endpoint deploy/simple-web.istioinaction --cluster 'outbound|80||simple-backend.istioinaction.svc.cluster.local'
docker exec -it myk8s-control-plane istioctl proxy-config endpoint deploy/simple-web.istioinaction --cluster 'outbound|80||simple-backend.istioinaction.svc.cluster.local' -o json
...
                "healthStatus": {
                    "edsHealthStatus": "HEALTHY"
                },
                "weight": 1,
                "priority": 1,
                "locality": {
                    "region": "us-west1",
                    "zone": "us-west1-b"
                }
            
...

# 로그 확인
kubectl logs -n istioinaction -l app=simple-backend -c istio-proxy -f
kubectl stern -l app=simple-backend -n istioinaction
...

위와같이 지역인식 로드밸런싱이 되는것을 알 수 있다.

호출 테스트 2 ⇒ 오동작 주입 후 확인
- 트래픽이 가용 영역을 넘어가는 것을 보기 위해 simple-backend-1 서비스를 오동작 상태로 만들어보자.
- simple-web 에서 simple-backend-1 호출하면 항상 HTTP 500 오류를 발생하게 하자

# HTTP 500 에러를 일정비율로 발생
cat ch6/simple-service-locality-failure.yaml
...
        - name: "ERROR_TYPE"
          value: "http_error"           
        - name: "ERROR_RATE"
          value: "1"                              
        - name: "ERROR_CODE"
          value: "500"  
...
kubectl apply -f ch6/simple-service-locality-failure.yaml -n istioinaction

# simple-backend-1- Pod 가 Running 상태로 완전히 배포된 후에 호출 확인

# 반복 호출 확인 : 파드 비중 확인
for in in {1..10}; do curl -s http://simple-web.istioinaction.io:30000 | jq ".upstream_calls[0].body"; done
for in in {1..50}; do curl -s http://simple-web.istioinaction.io:30000 | jq ".upstream_calls[0].body"; done | sort | uniq -c | sort -nr


# 확인
docker exec -it myk8s-control-plane istioctl proxy-config endpoint deploy/simple-web.istioinaction --cluster 'outbound|80||simple-backend.istioinaction.svc.cluster.local'        
ENDPOINT            STATUS      OUTLIER CHECK     CLUSTER
10.10.0.23:8080     HEALTHY     OK                outbound|80||simple-backend.istioinaction.svc.cluster.local
10.10.0.24:8080     HEALTHY     OK                outbound|80||simple-backend.istioinaction.svc.cluster.local
10.10.0.25:8080     HEALTHY     FAILED            outbound|80||simple-backend.istioinaction.svc.cluster.local

# simple-backend-1 500에러 리턴으로 outliercheck 실패 상태로 호출에서 제외됨
docker exec -it myk8s-control-plane istioctl proxy-config endpoint deploy/simple-web.istioinaction --cluster 'outbound|80||simple-backend.istioinaction.svc.cluster.local' -o json
...
                "healthStatus": {
                    "failedOutlierCheck": true,
                    "edsHealthStatus": "HEALTHY"
                },
                ...
                "healthStatus": {
                    "edsHealthStatus": "HEALTHY"
                },

원래는 1번으로만 트래픽이 흐르다가 헬스체크에 의해 2번으로만 트래픽이 흐르는것을 알 수 있다...(와우 !)

이는 다음과 같이 24번 아이피(동일리전)에 문제가 발생해서이다.

6.3.2 More control over locality load balancing with weighted distribution : 가중치 분포로 지역 인식 LB 제어 강화

이스티오의 지역 인식 로드 밸런싱은 기본적으로 동일 지역에 우선 트래픽을 보내지만, 필요에 따라 여러 지역에 트래픽을 가중치로 분산(지역 가중 분포)하여 과부하를 예방할 수 있습니다.

특정 영역에서 리전이 처리 할 수 없는 부하가 들어온다고 해보자.
트래픽의 70%가 최인접 지역으로 가고, 30%가 인접 지역으로 가길 원한다.
앞선 예제를 따라 simple-backend 서비스로 가는 트래픽 70%를 us-west1-a로, 30%를 us-west1-b로 보낼 것이다.

#
cat ch6/simple-backend-dr-outlier-locality.yaml
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
  name: simple-backend-dr
spec:
  host: simple-backend.istioinaction.svc.cluster.local
  trafficPolicy:
    loadBalancer: # 로드 밸런서 설정 추가
      localityLbSetting:
        distribute:
        - from: us-west1/us-west1-a/* # 출발지 영역
          to:
            "us-west1/us-west1-a/*": 70 # 목적지 영역
            "us-west1/us-west1-b/*": 30 # 목적지 영역
    connectionPool:      
      http:
        http2MaxRequests: 10
        maxRequestsPerConnection: 10
    outlierDetection:
      consecutive5xxErrors: 1
      interval: 5s
      baseEjectionTime: 30s
      maxEjectionPercent: 100

kubectl apply -f ch6/simple-backend-dr-outlier-locality.yaml -n istioinaction


# 반복 호출 확인 : 파드 비중 확인
for in in {1..10}; do curl -s http://simple-web.istioinaction.io:30000 | jq ".upstream_calls[0].body"; done
for in in {1..50}; do curl -s http://simple-web.istioinaction.io:30000 | jq ".upstream_calls[0].body"; done | sort | uniq -c | sort -nr


# endpoint 에 weight 는 모두 1이다. 위 70/30 비중은 어느곳의 envoy 에 설정 되는 걸까?...
docker exec -it myk8s-control-plane istioctl proxy-config endpoint deploy/simple-web.istioinaction --cluster 'outbound|80||simple-backend.istioinaction.svc.cluster.local' -o json
docker exec -it myk8s-control-plane istioctl proxy-config endpoint deploy/simple-web.istioinaction --cluster 'outbound|80||simple-backend.istioinaction.svc.cluster.local'
ENDPOINT            STATUS      OUTLIER CHECK     CLUSTER
10.10.0.23:8080     HEALTHY     OK                outbound|80||simple-backend.istioinaction.svc.cluster.local
10.10.0.24:8080     HEALTHY     OK                outbound|80||simple-backend.istioinaction.svc.cluster.local
10.10.0.26:8080     HEALTHY     OK                outbound|80||simple-backend.istioinaction.svc.cluster.local

설정한것처럼 대략 7:3정도로 로드밸런싱이 된다.

6.4 Transparent timeouts and retries (실습)

이스티오는 다양한 타임아웃과 재시도 설정을 통해 네트워크 지연과 실패 같은 신뢰성 문제를 극복하여 분산 시스템의 안정성을 높입니다.

분산 시스템에서 타임아웃 계층화는 연쇄 장애 방지를 위해 외부 서비스(edge)에서 긴 타임아웃, 내부 서비스(backend)로 갈수록 짧은 타임아웃을 설정하는 전략입니다. 이스티오는 VirtualService를 통해 서비스 호출별 타임아웃을 세밀하게 제어하며, 상위 서비스의 타임아웃이 하위 서비스보다 우선 적용되어 시스템 전반의 안정성을 확보합니다.

실습을 진행하기 위해 다음 코드를 실행한다.

kubectl apply -f ch6/simple-web.yaml -n istioinaction
kubectl apply -f ch6/simple-backend.yaml -n istioinaction
kubectl delete destinationrule simple-backend-dr -n istioinaction


# 호출 테스트 : 보통 10~20ms 이내 걸림
curl -s http://simple-web.istioinaction.io:30000 | jq .code
time curl -s http://simple-web.istioinaction.io:30000 | jq .code
for in in {1..10}; do time curl -s http://simple-web.istioinaction.io:30000 | jq .code; done

# simple-backend-1를 1초 delay로 응답하도록 배포
cat ch6/simple-backend-delayed.yaml
kubectl apply -f ch6/simple-backend-delayed.yaml -n istioinaction

kubectl exec -it deploy/simple-backend-1 -n istioinaction -- env | grep TIMING
TIMING_VARIANCE=10ms
TIMING_50_PERCENTILE=150ms

# 동작 중 파드에 env 직접 수정..
kubectl exec -it deploy/simple-backend-1 -n istioinaction -- sh
-----------------------------------
export TIMING_50_PERCENTILE=1000ms
exit
-----------------------------------

# 호출 테스트 : simple-backend-1로 로드밸런싱 될 경우 1초 이상 소요 확인
for in in {1..10}; do time curl -s http://simple-web.istioinaction.io:30000 | jq .code; done
...
curl -s http://simple-web.istioinaction.io:30000  0.01s user 0.01s system 6% cpu 0.200 total
jq .code  0.00s user 0.00s system 3% cpu 0.199 total
500
...


# 
cat ch6/simple-backend-vs-timeout.yaml
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: simple-backend-vs
spec:
  hosts:
  - simple-backend
  http:
  - route:
    - destination:
        host: simple-backend
    timeout: 0.5s
    
kubectl apply -f ch6/simple-backend-vs-timeout.yaml -n istioinaction

#
kubectl get vs -n istioinaction
NAME                        GATEWAYS                 HOSTS                             AGE
simple-backend-vs                                    ["simple-backend"]                14s
simple-web-vs-for-gateway   ["simple-web-gateway"]   ["simple-web.istioinaction.io"]   6h11m


# 호출 테스트 : 0.5s 이상 걸리는 호출은 타임아웃 발생 (500응답)
for in in {1..10}; do time curl -s http://simple-web.istioinaction.io:30000 | jq .code; done
...
curl -s http://simple-web.istioinaction.io:30000  0.01s user 0.01s system 2% cpu 0.537 total
jq .code  0.00s user 0.00s system 0% cpu 0.535 total
500
...

# istio-proxy config 에서 위 timeout 적용 envoy 설정 부분 찾아 두자.

500에러시 재시도 하는 실습을 진행하기 위해 기존 설정 초기화

kubectl apply -f ch6/simple-web.yaml -n istioinaction
kubectl apply -f ch6/simple-backend.yaml -n istioinaction

그리고 다음과 같이 이스티오의 기본 설정을 종료

#
docker exec -it myk8s-control-plane bash
----------------------------------------
# Retry 옵션 끄기 : 최대 재시도 0 설정
istioctl install --set profile=default --set meshConfig.defaultHttpRetryPolicy.attempts=0
y
exit
----------------------------------------

# 확인
kubectl get istiooperators -n istio-system -o yaml
...
    meshConfig:
      defaultConfig:
        proxyMetadata: {}
      defaultHttpRetryPolicy:
        attempts: 0
      enablePrometheusMerge: true
...

# istio-proxy 에서 적용 부분 찾아보자

이제 503 발생시 재시도 하도록 설정한다.

#
cat ch6/simple-backend-periodic-failure-503.yaml
...
        - name: "ERROR_TYPE"
          value: "http_error"           
        - name: "ERROR_RATE"
          value: "0.75"                              
        - name: "ERROR_CODE"
          value: "503"  
...

#
kubectl apply -f ch6/simple-backend-periodic-failure-503.yaml -n istioinaction

#
kubectl exec -it deploy/simple-backend-1 -n istioinaction -- env | grep ERROR

#
kubectl exec -it deploy/simple-backend-1 -n istioinaction -- sh
---------------------------------------------------------------
export ERROR_TYPE=http_error
export ERROR_RATE=0.75
export ERROR_CODE=503
exit
---------------------------------------------------------------

# 호출테스트 : simple-backend-1 호출 시 failures (500) 발생
# simple-backend-1 --(503)--> simple-web --(500)--> curl(외부)
for in in {1..10}; do time curl -s http://simple-web.istioinaction.io:30000 | jq .code; done
...
curl -s http://simple-web.istioinaction.io:30000  0.01s user 0.01s system 6% cpu 0.200 total
jq .code  0.00s user 0.00s system 3% cpu 0.199 total
500
...

# app, istio-proxy log 에서 500, 503 로그 확인해보자.

이제 이스티오의 재시도 정책을 다시 바꾼다.

#
cat ch6/simple-backend-enable-retry.yaml
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: simple-backend-vs
spec:
  hosts:
  - simple-backend
  http:
  - route:
    - destination:
        host: simple-backend
    retries:
      attempts: 2

kubectl apply -f ch6/simple-backend-enable-retry.yaml -n istioinaction

#
docker exec -it myk8s-control-plane istioctl proxy-config routes deploy/simple-web.istioinaction --name 80
docker exec -it myk8s-control-plane istioctl proxy-config routes deploy/simple-web.istioinaction --name 80 -o json
...
               "name": "simple-backend.istioinaction.svc.cluster.local:80",
                "domains": [
                    "simple-backend.istioinaction.svc.cluster.local",
                    "simple-backend",
                    "simple-backend.istioinaction.svc",
                    "simple-backend.istioinaction",
                    "10.200.1.161"
                ],
                "routes": [
                    {
                        "match": {
                            "prefix": "/"
                        },
                        "route": {
                            "cluster": "outbound|80||simple-backend.istioinaction.svc.cluster.local",
                            "timeout": "0s",
                            "retryPolicy": {
                                "retryOn": "connect-failure,refused-stream,unavailable,cancelled,retriable-status-codes",
                                "numRetries": 2,
...

# 호출테스트 : 모두 성공!
# simple-backend-1 --(503, retry 후 정상 응답)--> simple-web --> curl(외부)
for in in {1..10}; do time curl -s http://simple-web.istioinaction.io:30000 | jq .code; done

# app, istio-proxy log 에서 503 로그 확인해보자.

500에러가 없어졌다.

사용자 경험이 좋아지고 있다.

503에러가 아닌 500 에러에 대해서는 ?

apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: simple-backend-vs
spec:
  hosts:
  - simple-backend
  http:
  - route:
    - destination:
        host: simple-backend
    retries:
      attempts: 2 # 최대 재시도 횟수
      retryOn: gateway-error,connect-failure,retriable-4xx # 다시 시도해야 할 오류
      perTryTimeout: 300ms # 타임 아웃
      retryRemoteLocalities: true # 재시도 시 다른 지역의 엔드포인트에 시도할지 여부
      

# 500 에러 코드 리턴
cat ch6/simple-backend-periodic-failure-500.yaml
...
        - name: "ERROR_TYPE"
          value: "http_error"           
        - name: "ERROR_RATE"
          value: "0.75"                              
        - name: "ERROR_CODE"
          value: "500"
...

kubectl apply -f ch6/simple-backend-periodic-failure-500.yaml -n istioinaction

#
kubectl exec -it deploy/simple-backend-1 -n istioinaction -- sh
---------------------------------------------------------------
export ERROR_TYPE=http_error
export ERROR_RATE=0.75
export ERROR_CODE=500
exit
---------------------------------------------------------------

# envoy 설정 확인 : 재시도 동작(retryOn) 에 retriableStatusCodes 는 503만 있음.
docker exec -it myk8s-control-plane istioctl proxy-config route deploy/simple-web.istioinaction --name 80 -o json
...
                       "route": {
                            "cluster": "outbound|80||simple-backend.istioinaction.svc.cluster.local",
                            "timeout": "0s",
                            "retryPolicy": {
                                "retryOn": "connect-failure,refused-stream,unavailable,cancelled,retriable-status-codes",
                                "numRetries": 2,
                                "retryHostPredicate": [
                                    {
                                        "name": "envoy.retry_host_predicates.previous_hosts",
                                        "typedConfig": {
                                            "@type": "type.googleapis.com/envoy.extensions.retry.host.previous_hosts.v3.PreviousHostsPredicate"
                                        }
                                    }
                                ],
                                "hostSelectionRetryMaxAttempts": "5",
                                "retriableStatusCodes": [
                                    503
                                ]
                            },
                            "maxGrpcTimeout": "0s"
                        },
...


# 호출테스트 : Retry 동작 안함.
# simple-backend-1 --(500, retry 안함)--> simple-web --(500)> curl(외부)
for in in {1..10}; do time curl -s http://simple-web.istioinaction.io:30000 | jq .code; done
...
curl -s http://simple-web.istioinaction.io:30000  0.01s user 0.01s system 30% cpu 0.036 total
jq .code  0.00s user 0.00s system 14% cpu 0.035 total
200
curl -s http://simple-web.istioinaction.io:30000  0.00s user 0.01s system 5% cpu 0.184 total
jq .code  0.00s user 0.00s system 2% cpu 0.183 total
500
...

위와같이 500 에러가 발생하도록 하고...

#
cat ch6/simple-backend-vs-retry-500.yaml
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: simple-backend-vs
spec:
  hosts:
  - simple-backend
  http:
  - route:
    - destination:
        host: simple-backend
    retries:
      attempts: 2
      retryOn: 5xx # HTTP 5xx 모두에 재시도

kubectl apply -f ch6/simple-backend-vs-retry-500.yaml -n istioinaction

# 호출테스트 : 모두 성공!
# simple-backend-1 --(500, retry)--> simple-web --(200)> curl(외부)
for in in {1..10}; do time curl -s http://simple-web.istioinaction.io:30000 | jq .code; done

RetryOn을 5xx로 했떠니 전부 200 응답이 나는것을 확인할 수 있다.

그럼 타임아웃에 따른 재시도는 ?

재시도 정책 설정 시
(시도별 타임아웃 * 재시도 횟수) + (백오프 지연 * (재시도 횟수-1)) < 전체 타임아웃 조건을 반드시 충족시켜야 의도한 재시도 동작을 보장합니다.

이스티오의 재시도 동작은 다음과 같이 작동합니다:

기본 재시도 정책: HTTP 요청 실패 시 최대 2회 재시도 (총 3회 시도), 재시도 간 백오프는 25ms부터 시작해 지수적으로 증가 (예: 25ms → 50ms → 100ms).
Thundering Herd 위험: 다중 계층 서비스 체인에서 재시도 시 요청 수가 기하급수적으로 증가 (예: 5계층 시 최대 32회 요청 발생.
안전하지 않은 기본 설정: 503 오류 자동 재시도로 인해 중복 처리 위험 (결제 중복 등), retryOn 설정을 connect-failure 등으로 제한 권장.
지역 제한 재시도: 기본적으로 동일 지역 엔드포인트만 재시도하지만, retryRemoteLocalities: true 설정 시 다른 리전으로 재시도 확장 가능.
세부 제어 필요: VirtualService에서 attempts, perTryTimeout, retryOn 조건을 서비스별로 명시적 설정해야 안전성 보장.

※ 재시도 정책은 VirtualService 리소스에서 retries 필드로 제어되며, 백오프 알고리즘은 현재 커스터마이징 불가.

6.4.3 Advanced retries : Istio Extension API (EnvoyFilter)

이스티오는 기본적으로 HTTP 503 오류에 대해 25ms 백오프 시간으로 재시도를 수행하지만, 상세 설정(재시도 조건, 백오프 시간)은 EnvoyFilter API를 통해 엔보이 프록시 설정을 직접 수정해야 커스터마이징이 가능합니다.

※ 주의: EnvoyFilter 사용은 호환성 문제를 유발할 수 있으므로 Istio 버전과 Envoy 문서를 반드시 확인해야 합니다.

# cat ch6/simple-backend-ef-retry-status-codes.yaml                                            
apiVersion: networking.istio.io/v1alpha3
kind: EnvoyFilter
metadata:
  name: simple-backend-retry-status-codes
  namespace: istioinaction
spec:
  workloadSelector:
    labels:
      app: simple-web
  configPatches:
  - applyTo: HTTP_ROUTE
    match:
      context: SIDECAR_OUTBOUND
      routeConfiguration:
        vhost:
          name: "simple-backend.istioinaction.svc.cluster.local:80"          
    patch:
      operation: MERGE
      value:
        route:
          retry_policy: # 엔보이 설정에서 직접 나온다?
            retry_back_off: 
              base_interval: 50ms # 기본 간격을 늘린다
            retriable_status_codes: # 재시도할 수 있는 코드를 추가한다
            - 408
            - 400
            

# 408 에러코드를 발생
kubectl apply -f ch6/simple-backend-periodic-failure-408.yaml -n istioinaction

# 파드 정상 기동 후 수정
kubectl exec -it deploy/simple-backend-1 -n istioinaction -- sh
---------------------------------------------------------------
export ERROR_TYPE=http_error
export ERROR_RATE=0.75
export ERROR_CODE=408
exit
---------------------------------------------------------------

# 호출테스트 : 408 에러는 retryOn: 5xx 에 포함되지 않으므로 에러를 리턴.
# simple-backend-1 --(408)--> simple-web --(500)--> curl(외부)
for in in {1..10}; do time curl -s http://simple-web.istioinaction.io:30000 | jq .code; done
...

이제 408 에러도 재시도 하도록 적용한다.

#
cat ch6/simple-backend-ef-retry-status-codes.yaml
...
    patch:
      operation: MERGE
      value:
        route:
          retry_policy: # 엔보이 설정에서 직접 나온다?
            retry_back_off: 
              base_interval: 50ms # 기본 간격을 늘린다
            retriable_status_codes: # 재시도할 수 있는 코드를 추가한다
            - 408
            - 400

kubectl apply -f ch6/simple-backend-ef-retry-status-codes.yaml -n istioinaction

# 확인
kubectl get envoyfilter -n istioinaction -o json
kubectl get envoyfilter -n istioinaction
NAME                                AGE
simple-backend-retry-status-codes   45s

# VirtualService 에도 재시도 할 대상 코드 추가
cat ch6/simple-backend-vs-retry-on.yaml
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: simple-backend-vs
spec:
  hosts:
  - simple-backend
  http:
  - route:
    - destination:
        host: simple-backend
    retries:
      attempts: 2
      retryOn: 5xx,retriable-status-codes # retryOn 항목에 retriable-status-codes 를 추가

kubectl apply -f ch6/simple-backend-vs-retry-on.yaml -n istioinaction

# envoy 설정 확인 : 재시도 동작(retryOn) 에 5xx 과 retriableStatusCodes 는 408,400 있음.
docker exec -it myk8s-control-plane istioctl proxy-config route deploy/simple-web.istioinaction --name 80 -o json
...
                        "route": {
                            "cluster": "outbound|80||simple-backend.istioinaction.svc.cluster.local",
                            "timeout": "0s",
                            "retryPolicy": {
                                "retryOn": "5xx,retriable-status-codes",
                                "numRetries": 2,
                                "retryHostPredicate": [
                                    {
                                        "name": "envoy.retry_host_predicates.previous_hosts",
                                        "typedConfig": {
                                            "@type": "type.googleapis.com/envoy.extensions.retry.host.previous_hosts.v3.PreviousHostsPredicate"
                                        }
                                    }
                                ],
                                "hostSelectionRetryMaxAttempts": "5",
                                "retriableStatusCodes": [
                                    408,
                                    400
                                ],
                                "retryBackOff": {
                                    "baseInterval": "0.050s"
                                }
                            },
                            "maxGrpcTimeout": "0s"
                        }
...

# 호출테스트 : 성공
# simple-backend-1 --(408, retry 성공)--> simple-web --> curl(외부)
for in in {1..10}; do time curl -s http://simple-web.istioinaction.io:30000 | jq .code; done
...

408에러도 200에러로 응답되는것을 알 수 있다.

6.5 Circuit breaking with Istio

서킷 브레이커 구현 방식: 이스티오는 DestinationRule의 connectionPool (최대 연결/요청 수 제한)과 outlierDetection (이상 엔드포인트 감지) 설정을 통해 서킷 브레이커 패턴을 구현합니다.
과부하 방지:
- connectionPool은 동시 연결 수(http1MaxPendingRequests)와 대기 요청 수(maxRequestsPerConnection)를 제한해 과부하 시 503 오류로 즉시 실패 처리하여 연쇄 장애를 차단합니다.
장애 엔드포인트 격리:
- outlierDetection은 연속 오류(consecutive5xxErrors) 발생 시 해당 엔드포인트를 일시적으로 풀에서 제거(ejection)해 시스템 회복 시간을 확보합니다.

서킷 브레이커를 2가지 방식으로 제공을 하는데..

6.5.1 Guarding against slow services wih connection-pool control* : 커넥션 풀 제어로 느린 서비스에 대응하기

# tracing.sampling=100
docker exec -it myk8s-control-plane bash
----------------------------------------
istioctl install --set profile=default --set meshConfig.accessLogFile=/dev/stdout --set meshConfig.defaultConfig.tracing.sampling=100 --set meshConfig.defaultHttpRetryPolicy.attempts=0
y
exit
----------------------------------------

# 확인
kubectl describe cm -n istio-system istio
...
defaultConfig:
  discoveryAddress: istiod.istio-system.svc:15012
  proxyMetadata: {}
  tracing:
    sampling: 100.0
    zipkin:
      address: zipkin.istio-system:9411
...

# 적용 : rollout 
kubectl rollout restart deploy -n istio-system istiod
kubectl rollout restart deploy -n istio-system istio-ingressgateway
kubectl rollout restart deploy -n istioinaction simple-web
kubectl rollout restart deploy -n istioinaction simple-backend-1



# 현재 적용되어 있는 상태
kubectl apply -f ch6/simple-web.yaml -n istioinaction
kubectl apply -f ch6/simple-web-gateway.yaml -n istioinaction
kubectl apply -f ch6/simple-backend-vs-retry-on.yaml -n istioinaction

# destinationrule 삭제
kubectl delete destinationrule --all -n istioinaction

# simple-backend-2 제거
kubectl scale deploy simple-backend-2 --replicas=0 -n istioinaction

# 응답지연(1초)을 발생하는 simple-backend-1 배포
kubectl apply -f ch6/simple-backend-delayed.yaml -n istioinaction

# 동작 중 파드에 env 직접 수정..
kubectl exec -it deploy/simple-backend-1 -n istioinaction -- sh
-----------------------------------
export TIMING_50_PERCENTILE=1000ms
exit
-----------------------------------

# 테스트
curl -s http://simple-web.istioinaction.io:30000 | grep duration              
  "duration": "1.058699s",
      "duration": "1.000934s",

위와같이 실습환경을 구성하고..

fortio로 테스트를 해본다.

# cat ch6/simple-backend-dr-conn-limit.yaml
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
  name: simple-backend-dr
spec:
  host: simple-backend.istioinaction.svc.cluster.local
  trafficPolicy:
    connectionPool:
      tcp:
        maxConnections: 1 # 커넥션 총 개수 Total number of connections
      http:
        http1MaxPendingRequests: 1 # 대기 중인 요청 Queued requests
        maxRequestsPerConnection: 1 # 커넥션당 요청 개수 Requests per connection
        maxRetries: 1 # Maximum number of retries that can be outstanding to all hosts in a cluster at a given time.
        http2MaxRequests: 1 # 모든 호스트에 대한 최대 동시 요청 개수 Maximum concurrent requests to all hosts

# DestinationRule 적용 (connection-limiting) 
kubectl apply -f ch6/simple-backend-dr-conn-limit.yaml -n istioinaction
kubectl get dr -n istioinaction

#
docker exec -it myk8s-control-plane istioctl proxy-config cluster deploy/simple-backend-1.istioinaction | egrep 'RULE|backend'
SERVICE FQDN                                            PORT      SUBSET     DIRECTION     TYPE             DESTINATION RULE
                                                        8080      -          inbound       ORIGINAL_DST     simple-backend-dr.istioinaction
simple-backend.istioinaction.svc.cluster.local          80        -          outbound      EDS              simple-backend-dr.istioinaction

# 설정 적용 확인
docker exec -it myk8s-control-plane istioctl proxy-config cluster deploy/simple-backend-1.istioinaction --fqdn simple-backend.istioinaction.svc.cluster.local -o json
...
        "connectTimeout": "10s",
        "lbPolicy": "LEAST_REQUEST",
        "circuitBreakers": {
            "thresholds": [
                {
                    "maxConnections": 1, # tcp.maxConnections, 커넥션 총 개수 Total number of connections
                    "maxPendingRequests": 1, # http.http1MaxPendingRequests, 대기 중인 요청 Queued requests
                    "maxRequests": 1, # http.http2MaxRequests, 모든 호스트에 대한 최대 동시 요청 개수 
                    "maxRetries": 1, # http.maxRetries
                    "trackRemaining": true
                }
            ]
        },
        "typedExtensionProtocolOptions": {
            "envoy.extensions.upstreams.http.v3.HttpProtocolOptions": {
                "@type": "type.googleapis.com/envoy.extensions.upstreams.http.v3.HttpProtocolOptions",
                "commonHttpProtocolOptions": { 
                    "maxRequestsPerConnection": 1 # http.maxRequestsPerConnection, 커넥션당 요청 개수
...

# (참고) 기본값?
docker exec -it myk8s-control-plane istioctl proxy-config cluster deploy/istio-ingressgateway.istio-system --fqdn simple-web.istioinaction.svc.cluster.local -o json
...
        "connectTimeout": "10s",
        "lbPolicy": "LEAST_REQUEST",
        "circuitBreakers": {
            "thresholds": [
                {
                    "maxConnections": 4294967295,
                    "maxPendingRequests": 4294967295,
                    "maxRequests": 4294967295,
                    "maxRetries": 4294967295,
                    "trackRemaining": true
...

이스티오의 DestinationRule에서 maxConnections는 대상 호스트당 HTTP1/TCP 커넥션 최대 수를 제한하고, http1MaxPendingRequests는 사용 가능한 커넥션이 없을 때 대기 중인 요청 수를 제한하며, http2MaxRequests(이름과 달리 HTTP1.1/HTTP2 모두 적용)는 호스트별 동시 요청 수를 제어합니다.

이스티오는 기본적으로 프록시 메트릭 카디널리티를 줄이기 위해 통계를 제한하지만, 특정 서비스(예: simple-web)에 대해 상세 통계 수집을 활성화해 서킷 브레이커 동작과 업스트림 서비스(simple-backend) 장애를 명확히 구분할 수 있습니다.

좀더 명확한 테스트를 위해 다음과 같이 설정한다.

# simple-web 디플로이먼트에 sidecar.istio.io/statsInclusionPrefixes="cluster.<이름>" 애너테이션 추가하자
## sidecar.istio.io/statsInclusionPrefixes: cluster.outbound|80||simple-backend.istioinaction.svc.cluster.local
cat ch6/simple-web-stats-incl.yaml | grep statsInclusionPrefixes 
        sidecar.istio.io/statsInclusionPrefixes: "cluster.outbound|80||simple-backend.istioinaction.svc.cluster.local"        
kubectl apply -f ch6/simple-web-stats-incl.yaml -n istioinaction

# 정확한 확인을 위해 istio-proxy stats 카운터 초기화
kubectl exec -it deploy/simple-web -c istio-proxy -n istioinaction \
-- curl -X POST localhost:15000/reset_counters

# simple-web 에 istio-proxy 의 stats 조회
kubectl exec -it deploy/simple-web -c istio-proxy -n istioinaction \
 -- curl localhost:15000/stats | grep simple-backend | grep overflow
cluster.outbound|80||simple-backend.istioinaction.svc.cluster.local.upstream_cx_overflow: 0
cluster.outbound|80||simple-backend.istioinaction.svc.cluster.local.upstream_cx_pool_overflow: 0
cluster.outbound|80||simple-backend.istioinaction.svc.cluster.local.upstream_rq_pending_overflow: 0
cluster.outbound|80||simple-backend.istioinaction.svc.cluster.local.upstream_rq_retry_overflow: 0

kubectl exec -it deploy/simple-web -c istio-proxy -n istioinaction \
 -- curl localhost:15000/stats | grep simple-backend.istioinaction.svc.cluster.local.upstream

위와같이 초기화를 진행하고

커넥션 개수와 초당 요청 수를 2로 늘리면 어떨까? 2개의 커넥션에서 요청을 초당 하나씩 보내기 시작해보자.
- 커넥션과 요청이 지정한 임계값(병렬 요청이 너무 많거나 요청이 너무 많이 쌓임)을 충분히 넘겨 서킷 브레이커를 동작시켰음을 확인.

# 2개의 커넥션에서 요청을 초당 하나씩 보내기 : 요청이 17개 실패한 것으로 반환됐다(HTTP 5xx)
fortio load -quiet -jitter -t 30s -c 2 -qps 2 --allow-initial-errors http://simple-web.istioinaction.io:30000
...
Sockets used: 19 (for perfect keepalive, would be 2)
Code 200 : 30 (63.8 %)
Code 500 : 17 (36.2 %)
All done 47 calls (plus 2 warmup) 925.635 ms avg, 1.5 qps
...

# 로그 확인 : simple-web
kubectl logs -n istioinaction -l app=simple-web -c istio-proxy -f
...
# 오류 요청 (503 Service Unavailable, UO 플래그
## HTTP 503: 서비스가 일시적으로 사용 불가능. Envoy가 업스트림 서버(simple-backend:80)에 요청을 전달하지 못함.
## UO 플래그: "Upstream Overflow"로, Envoy의 서킷 브레이커가 트리거되었거나 최대 연결/요청 제한에 도달했음을 의미.
## upstream_reset_before_response_started{overflow}: 업스트림 서버가 응답을 시작하기 전에 연결이 리셋되었으며, 이는 오버플로우(리소스 제한)로 인함.
[2025-04-22T03:17:24.830Z] "GET // HTTP/1.1" 503 UO upstream_reset_before_response_started{overflow} - "-" 0 81 4 - ...

# 오류 요청 (500 Internal Server Error) : 최종 사용자에게 500 에러 리턴
[2025-04-22T03:17:24.825Z] "GET / HTTP/1.1" 500 - via_upstream - "-" 0 687 11 11 ...
## simple-web 서비스에서 backend 정보를 가져오지 못하여 애플리케이션 레벨 오류 발생
## HTTP 500: 서버 내부 오류. 업스트림 서버(simple-web:30000)가 요청을 처리하는 중 예기치 않은 오류 발생.
## via_upstream: 오류가 Envoy가 아니라 업스트림 서버에서 발생했음을 나타냄.
...

# 통계 확인 : 18개로 +/- 1개 정도는 무시하고 보자. 성능 테스트 실패 갯수(17개)와 아래 통계값이 일치 한다(18-1).
# 큐 대기열이 늘어나 결국 서킷 브레이커를 발동함. 
# fail-fast 동작은 이렇게 보류 중 혹은 병행 요청 갯수가 서킷 브레이커 임계값을 넘어 수행된다.
# The fail-fast behavior comes from those pending or parallel requests exceeding the circuit-breaking thresholds. 
kubectl exec -it deploy/simple-web -c istio-proxy -n istioinaction \
 -- curl localhost:15000/stats | grep simple-backend | grep overflow
cluster.outbound|80||simple-backend.istioinaction.svc.cluster.local.upstream_cx_overflow: 45
cluster.outbound|80||simple-backend.istioinaction.svc.cluster.local.upstream_cx_pool_overflow: 0
cluster.outbound|80||simple-backend.istioinaction.svc.cluster.local.upstream_rq_pending_overflow: 18
cluster.outbound|80||simple-backend.istioinaction.svc.cluster.local.upstream_rq_retry_overflow: 0

kubectl exec -it deploy/simple-web -c istio-proxy -n istioinaction \
 -- curl localhost:15000/stats | grep simple-backend.istioinaction.svc.cluster.local.upstream

병렬로 발생하는 요청(현재 로드테스트 동시 요청 2)을 더 처리하고자 http2MaxRequests(parallel requests)를 늘려보자

# 설정 전 확인
docker exec -it myk8s-control-plane istioctl proxy-config cluster deploy/simple-web.istioinaction --fqdn simple-backend.istioinaction.svc.cluster.local -o json | grep maxRequests 
docker exec -it myk8s-control-plane istioctl proxy-config cluster deploy/simple-backend-1.istioinaction --fqdn simple-backend.istioinaction.svc.cluster.local -o json | grep maxRequests 
                    "maxRequests": 1,
                    "maxRequestsPerConnection": 1

# http2MaxRequests 조정: 1 → 2, '동시요청 처리개수'를 늘림
kubectl patch destinationrule simple-backend-dr -n istioinaction \
-n istioinaction --type merge --patch \
'{"spec": {"trafficPolicy": {"connectionPool": {"http": {"http2MaxRequests": 2}}}}}'

# 설정 후 확인
docker exec -it myk8s-control-plane istioctl proxy-config cluster deploy/simple-backend-1.istioinaction --fqdn simple-backend.istioinaction.svc.cluster.local -o json | grep maxRequests 
                    "maxRequests": 2,
                    "maxRequestsPerConnection": 1

# istio-proxy stats 카운터 초기화
kubectl exec -it deploy/simple-web -c istio-proxy -n istioinaction \
-- curl -X POST localhost:15000/reset_counters

# 로그 확인 : simple-web >> 아래 500(503) 발생 로그 확인
kubectl logs -n istioinaction -l app=simple-web -c istio-proxy -f
... 
## jaeger 에서 Tags 필터링 찾기 : guid:x-request-id=3e1789ba-2fa4-94b6-a782-cfdf0a405e13
[2025-04-22T03:55:22.424Z] "GET / HTTP/1.1" 503 UO upstream_reset_before_response_started{overflow} - "-" 0 81 0 - "172.18.0.1" "fortio.org/fortio-1.69.3" "3e1789ba-2fa4-94b6-a782-cfdf0a405e13" "simple-backend:80" "-" outbound|80||simple-backend.istioinaction.svc.cluster.local - 10.200.1.137:80 172.18.0.1:0 - -
[2025-04-22T03:55:22.410Z] "GET / HTTP/1.1" 500 - via_upstream - "-" 0 688 15 15 "172.18.0.1" "fortio.org/fortio-1.69.3" "3e1789ba-2fa4-94b6-a782-cfdf0a405e13" "simple-web.istioinaction.io:30000" "10.10.0.18:8080" inbound|8080|| 127.0.0.6:43259 10.10.0.18:8080 172.18.0.1:0 outbound_.80_._.simple-web.istioinaction.svc.cluster.local default
...

# 로그 확인 : simple-backend >> 503 에러가 발생하지 않았다??? 
kubectl logs -n istioinaction -l app=simple-backend -c istio-proxy -f


# 2개의 커넥션에서 요청을 초당 하나씩 보내기 : 동시요청 처리개수가 기존 1 에서 2로 증가되어서 거의 대부분 처리했다. >> 참고로 모두 성공 되기도함.
fortio load -quiet -jitter -t 30s -c 2 -qps 2 --allow-initial-errors http://simple-web.istioinaction.io:30000
...
Sockets used: 3 (for perfect keepalive, would be 2)
Code 200 : 33 (97.1 %)
Code 500 : 1 (2.9 %)
All done 34 calls (plus 2 warmup) 1789.433 ms avg, 1.1 qps
...

# 'cx_overflow: 40' 대비 'rq_pending_overflow: 1' 가 현저히 낮아짐을 확인
kubectl exec -it deploy/simple-web -c istio-proxy -n istioinaction \
 -- curl localhost:15000/stats | grep simple-backend | grep overflow
cluster.outbound|80||simple-backend.istioinaction.svc.cluster.local.upstream_cx_overflow: 40
cluster.outbound|80||simple-backend.istioinaction.svc.cluster.local.upstream_cx_pool_overflow: 0
cluster.outbound|80||simple-backend.istioinaction.svc.cluster.local.upstream_rq_pending_overflow: 1
cluster.outbound|80||simple-backend.istioinaction.svc.cluster.local.upstream_rq_retry_overflow: 0

kubectl exec -it deploy/simple-web -c istio-proxy -n istioinaction \
 -- curl localhost:15000/stats | grep simple-backend.istioinaction.svc.cluster.local.upstream

맥스 리퀘스트를 1개에서 2개로 늘려주니 503 에러가 확연히 줄어들었다.

이번엔 보류 대기열 댑스를 2로 늘리고 실행해본다.

# http1MaxPendingRequests : 1 → 2, 'queuing' 개수를 늘립니다
kubectl patch destinationrule simple-backend-dr \
-n istioinaction --type merge --patch \
'{"spec": {"trafficPolicy": {"connectionPool": {"http": {"http1MaxPendingRequests": 2}}}}}'

#
docker exec -it myk8s-control-plane istioctl proxy-config cluster deploy/simple-web.istioinaction --fqdn simple-backend.istioinaction.svc.cluster.local -o json | grep maxPendingRequests 
docker exec -it myk8s-control-plane istioctl proxy-config cluster deploy/simple-backend-1.istioinaction --fqdn simple-backend.istioinaction.svc.cluster.local -o json | grep maxPendingRequests           
                    "maxPendingRequests": 2,
                    
# istio-proxy stats 카운터 초기화
kubectl exec -it deploy/simple-web -c istio-proxy -n istioinaction \
-- curl -X POST localhost:15000/reset_counters


# 2개의 커넥션에서 요청을 초당 하나씩 보내기 : 모두 성공!
fortio load -quiet -jitter -t 30s -c 2 -qps 2 --allow-initial-errors http://simple-web.istioinaction.io:30000
...
Sockets used: 2 (for perfect keepalive, would be 2) # 큐 길이 증가 덕분에, 소켓을 2개만 사용했다.
Code 200 : 33 (100.0 %)
All done 33 calls (plus 2 warmup) 1846.745 ms avg, 1.1 qps
...

# 'cx_overflow가 45이 발생했지만, upstream_rq_pending_overflow 는 이다!
kubectl exec -it deploy/simple-web -c istio-proxy -n istioinaction \
 -- curl localhost:15000/stats | grep simple-backend | grep overflow
cluster.outbound|80||simple-backend.istioinaction.svc.cluster.local.upstream_cx_overflow: 45
cluster.outbound|80||simple-backend.istioinaction.svc.cluster.local.upstream_cx_pool_overflow: 0
cluster.outbound|80||simple-backend.istioinaction.svc.cluster.local.upstream_rq_pending_overflow: 0
cluster.outbound|80||simple-backend.istioinaction.svc.cluster.local.upstream_rq_retry_overflow: 0

kubectl exec -it deploy/simple-web -c istio-proxy -n istioinaction \
 -- curl localhost:15000/stats | grep simple-backend.istioinaction.svc.cluster.local.upstream

이제 500에러는 발생하지 않는다.

Istio 서킷 브레이커가 발동되면 x-envoy-overloaded 헤더를 통해 요청 실패 원인이 서킷 브레이커 임계값 초과임을 식별할 수 있으며, 이를 통해 애플리케이션/네트워크 장애와 구분합니다.

# 
kubectl patch destinationrule simple-backend-dr \
-n istioinaction --type merge --patch \
'{"spec": {"trafficPolicy": {"connectionPool": {"http": {"http1MaxPendingRequests": 1}}}}}'

kubectl patch destinationrule simple-backend-dr -n istioinaction \
-n istioinaction --type merge --patch \
'{"spec": {"trafficPolicy": {"connectionPool": {"http": {"http2MaxRequests": 1}}}}}'

# 설정 적용 확인
docker exec -it myk8s-control-plane istioctl proxy-config cluster deploy/simple-backend-1.istioinaction --fqdn simple-backend.istioinaction.svc.cluster.local -o json

# istio-proxy stats 카운터 초기화
kubectl exec -it deploy/simple-web -c istio-proxy -n istioinaction \
-- curl -X POST localhost:15000/reset_counters

# 로드 테스트
fortio load -quiet -jitter -t 30s -c 2 -qps 2 --allow-initial-errors http://simple-web.istioinaction.io:30000

# 로드 테스트 하는 상태에서 아래 curl 접속 
curl -v http://simple-web.istioinaction.io:30000
{
  "name": "simple-web",
  "uri": "/",
  "type": "HTTP",
  "ip_addresses": [
    "10.10.0.18"
  ],
  "start_time": "2025-04-22T04:23:50.468693",
  "end_time": "2025-04-22T04:23:50.474941",
  "duration": "6.247ms",
  "body": "Hello from simple-web!!!",
  "upstream_calls": [
    {
      "uri": "http://simple-backend:80/",
      "headers": {
        "Content-Length": "81",
        "Content-Type": "text/plain",
        "Date": "Tue, 22 Apr 2025 04:23:50 GMT",
        "Server": "envoy",
        "X-Envoy-Overloaded": "true" # Header indication
      },
      "code": 503,
      "error": "Error processing upstream request: http://simple-backend:80//, expected code 200, got 503"
    }
  ],
  "code": 500
}

6.5.2 Guarding against unhealthy services with outlier detection* : 이상값 감지로 비정상 서비스에 대응하기

Istio는 엔보이의 이상값 감지(Outlier Detection) 기능을 활용해 오동작하는 호스트를 서비스에서 자동 제거하여 시스템 안정성을 유지합니다.

- 실습 환경 초기화
    - 동작을 살펴보기 위해 이스티오의 **기본 재시도 메커니즘도 비활성화** 한다.
    - 재시도와 이상값 감지는 잘 어울리지만, 이 예제에서는 **이상값 감지 기능을 고립**시키려고 한다.
    - 재시도는 마지막에 추가해서 이상값 감지와 재시도가 서로 어떻게 보완하는지 확인해본다.

#
kubectl delete destinationrule --all -n istioinaction
kubectl delete vs simple-backend-vs -n istioinaction

# disable retries (default) : 이미 적용 되어 있음
docker exec -it myk8s-control-plane bash
----------------------------------------
istioctl install --set profile=default --set meshConfig.defaultHttpRetryPolicy.attempts=0
y
exit
----------------------------------------

#
kubectl apply -f ch6/simple-backend.yaml -n istioinaction
kubectl apply -f ch6/simple-web-stats-incl.yaml -n istioinaction # 통계 활성화

# istio-proxy stats 카운터 초기화
kubectl exec -it deploy/simple-web -c istio-proxy -n istioinaction \
-- curl -X POST localhost:15000/reset_counters

# 호출 테스트 : 모두 성공
fortio load -quiet -jitter -t 30s -c 2 -qps 2 --allow-initial-errors http://simple-web.istioinaction.io:30000

# 확인
kubectl exec -it deploy/simple-web -c istio-proxy -n istioinaction \
 -- curl localhost:15000/stats | grep simple-backend.istioinaction.svc.cluster.local.upstream
 
 
 
 
 
#
kubectl apply -n istioinaction -f ch6/simple-backend-periodic-failure-500.yaml
kubectl exec -it deploy/simple-backend-1 -n istioinaction -- env | grep ERROR

#
kubectl exec -it deploy/simple-backend-1 -n istioinaction -- sh
---------------------------------------------------------------
export ERROR_TYPE=http_error
export ERROR_RATE=0.75
export ERROR_CODE=500
exit
---------------------------------------------------------------

# 정보 확인
kubectl get deploy,pod -n istioinaction -o wide
NAME                               READY   UP-TO-DATE   AVAILABLE   AGE   CONTAINERS       IMAGES                                 SELECTOR
deployment.apps/simple-backend-1   1/1     1            1           20h   simple-backend   nicholasjackson/fake-service:v0.14.1   app=simple-backend
deployment.apps/simple-backend-2   2/2     2            2           20h   simple-backend   nicholasjackson/fake-service:v0.17.0   app=simple-backend
deployment.apps/simple-web         1/1     1            1           21h   simple-web       nicholasjackson/fake-service:v0.17.0   app=simple-web

NAME                                   READY   STATUS    RESTARTS   AGE     IP           NODE                  NOMINATED NODE   READINESS GATES
pod/simple-backend-1-bdb6c7ff8-rqqlr   2/2     Running   0          2m25s   10.10.0.30   myk8s-control-plane   <none>           <none>
pod/simple-backend-2-6799f8bf-d4b6t    2/2     Running   0          11m     10.10.0.27   myk8s-control-plane   <none>           <none>
pod/simple-backend-2-6799f8bf-dk78j    2/2     Running   0          11m     10.10.0.29   myk8s-control-plane   <none>           <none>
pod/simple-web-865f4949ff-56kbq        2/2     Running   0          3h32m   10.10.0.18   myk8s-control-plane   <none>           <none>


# 로드 테스트 실행 : 재시도를 끄고, backend-1 엔드포인트에 주기적인 실패를 설정했으니, 테스트 일부는 실패
fortio load -quiet -jitter -t 30s -c 2 -qps 2 --allow-initial-errors http://simple-web.istioinaction.io:30000
...
Sockets used: 19 (for perfect keepalive, would be 2)
Code 200 : 43 (71.7 %)
Code 500 : 17 (28.3 %)
All done 60 calls (plus 2 warmup) 134.138 ms avg, 2.0 qps
...

# 통계 확인
kubectl exec -it deploy/simple-web -c istio-proxy -n istioinaction \
 -- curl localhost:15000/stats | grep simple-backend.istioinaction.svc.cluster.local.upstream

설정에서처럼 약 75프로 가량 실패하는것을 확인할 수 있다.

정기적으로 실패하는 서비스에 요청을 보내고 있는데 서비스의 다른 엔드포인트들은 실패하지 않고 있다면, 해당 엔드포인트가 과부하됐거나 어떤 이유로든 성능이 저하된 상태일 수 있으므로 당분간 그 엔드포인트로 트래픽을 전송하는 것을 멈춰야 한다.
이상값 감지를 설정해보자 : 기존 오류율 대비 극적으로 감소. 오동작하는 엔드포인트를 잠시 제거했기 때문이다. - Docs
- consecutive5xxErrors: 잘못된 요청이 하나만 발생해도 이상값 감지가 발동. 기본값 5, 연속적인 에러 횟수 threshold
- interval: 이스티오 서비스 프록시가 체크하는 주기. 기본값 10초. Time interval between ejection sweep analysis
- baseEjectionTime: 서비스 엔드포인트에서 제거된다면, 제거 시간은 n(해당 엔드포인트가 쫓겨난 횟수) * baseEjectionTime.
  - 해당 시간이 지나면 로드 밸런싱 풀에 다시 추가됨. 기본값 30초.
- maxEjectionPercent: 로드 밸런싱 풀에서 제거 가능한 호스트 개수(%).
  - 100% 설정 시모든 호스트가 오동작하면 어떤 요청도 통과 못함(회로가 열린 것과 같다). 기본값 10%

#
cat ch6/simple-backend-dr-outlier-5s.yaml
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
  name: simple-backend-dr
spec:
  host: simple-backend.istioinaction.svc.cluster.local
  trafficPolicy:
    outlierDetection:
      consecutive5xxErrors: 1 # 잘못된 요청이 하나만 발생해도 이상값 감지가 발동. 기본값 5
      interval: 5s # 이스티오 서비스 프록시가 체크하는 주기. 기본값 10초. Time interval between ejection sweep analysis
      baseEjectionTime: 5s # 서비스 엔드포인트에서 제거된다면, 제거 시간은 n(해당 엔드포인트가 쫓겨난 횟수) * baseEjectionTime. 해당 시간이 지나면 로드 밸런싱 풀에 다시 추가됨. 기본값 30초. 
      maxEjectionPercent: 100 # 로드 밸런싱 풀에서 제거 가능한 호스트 개수(%). 모든 호스트가 오동작하면 어떤 요청도 통과 못함(회로가 열린 것과 같다). 기본값 10%

kubectl apply -f ch6/simple-backend-dr-outlier-5s.yaml -n istioinaction
kubectl get dr -n istioinaction

#
docker exec -it myk8s-control-plane istioctl proxy-config cluster deploy/simple-web.istioinaction --fqdn simple-backend.istioinaction.svc.cluster.local -o json
...
        "outlierDetection": {
            "consecutive5xx": 1,
            "interval": "5s",
            "baseEjectionTime": "5s",
            "maxEjectionPercent": 100,
            "enforcingConsecutive5xx": 100,
            "enforcingSuccessRate": 0
        },
...

docker exec -it myk8s-control-plane istioctl proxy-config endpoint deploy/simple-web.istioinaction --cluster 'outbound|80||simple-backend.istioinaction.svc.cluster.local'
ENDPOINT            STATUS      OUTLIER CHECK     CLUSTER
10.10.0.27:8080     HEALTHY     OK                outbound|80||simple-backend.istioinaction.svc.cluster.local
10.10.0.29:8080     HEALTHY     OK                outbound|80||simple-backend.istioinaction.svc.cluster.local
10.10.0.30:8080     HEALTHY     OK                outbound|80||simple-backend.istioinaction.svc.cluster.local



# 통계 초기화
kubectl exec -it deploy/simple-web -c istio-proxy -n istioinaction \
-- curl -X POST localhost:15000/reset_counters

# 엔드포인트 모니터링 먼저 해두기 : 신규 터미널
	while true; do docker exec -it myk8s-control-plane istioctl proxy-config endpoint deploy/simple-web.istioinaction --cluster 'outbound|80||simple-backend.istioinaction.svc.cluster.local' ; date; sleep 1; echo; done
	ENDPOINT            STATUS      OUTLIER CHECK     CLUSTER
10.10.0.27:8080     HEALTHY     OK                outbound|80||simple-backend.istioinaction.svc.cluster.local
10.10.0.29:8080     HEALTHY     OK                outbound|80||simple-backend.istioinaction.svc.cluster.local
10.10.0.30:8080     HEALTHY     FAILED            outbound|80||simple-backend.istioinaction.svc.cluster.local


# 로드 테스트 실행 : 기존 오류율 대비 극적으로 감소. 오동작하는 엔드포인트를 잠시 제거했기 때문이다.
fortio load -quiet -jitter -t 30s -c 2 -qps 2 --allow-initial-errors http://simple-web.istioinaction.io:30000
...
Sockets used: 5 (for perfect keepalive, would be 2)
Code 200 : 58 (96.7 %)
Code 500 : 2 (3.3 %)
All done 60 calls (plus 2 warmup) 166.592 ms avg, 2.0 qps
...

# 통계 확인
kubectl exec -it deploy/simple-web -c istio-proxy -n istioinaction \
 -- curl localhost:15000/stats | grep simple-backend.istioinaction.svc.cluster.local.upstream

# 엔드포인트 이상 감지 전에 3번 실패했고, 이상 상태가 되고 나면 로드 밸런서 풀에서 제거되어서 이후 부터는 정상 엔드포인트로 호출 응답됨.
kubectl exec -it deploy/simple-web -c istio-proxy -n istioinaction \
 -- curl localhost:15000/stats | grep simple-backend | grep outlier
cluster.outbound|80||simple-backend.istioinaction.svc.cluster.local.outlier_detection.ejections_active: 0
cluster.outbound|80||simple-backend.istioinaction.svc.cluster.local.outlier_detection.ejections_consecutive_5xx: 3
cluster.outbound|80||simple-backend.istioinaction.svc.cluster.local.outlier_detection.ejections_detected_consecutive_5xx: 3
cluster.outbound|80||simple-backend.istioinaction.svc.cluster.local.outlier_detection.ejections_detected_consecutive_gateway_failure: 0
cluster.outbound|80||simple-backend.istioinaction.svc.cluster.local.outlier_detection.ejections_detected_consecutive_local_origin_failure: 0
cluster.outbound|80||simple-backend.istioinaction.svc.cluster.local.outlier_detection.ejections_detected_failure_percentage: 0
cluster.outbound|80||simple-backend.istioinaction.svc.cluster.local.outlier_detection.ejections_detected_local_origin_failure_percentage: 0
cluster.outbound|80||simple-backend.istioinaction.svc.cluster.local.outlier_detection.ejections_detected_local_origin_success_rate: 0
cluster.outbound|80||simple-backend.istioinaction.svc.cluster.local.outlier_detection.ejections_detected_success_rate: 0
cluster.outbound|80||simple-backend.istioinaction.svc.cluster.local.outlier_detection.ejections_enforced_consecutive_5xx: 3
cluster.outbound|80||simple-backend.istioinaction.svc.cluster.local.outlier_detection.ejections_enforced_consecutive_gateway_failure: 0
cluster.outbound|80||simple-backend.istioinaction.svc.cluster.local.outlier_detection.ejections_enforced_consecutive_local_origin_failure: 0
cluster.outbound|80||simple-backend.istioinaction.svc.cluster.local.outlier_detection.ejections_enforced_failure_percentage: 0
cluster.outbound|80||simple-backend.istioinaction.svc.cluster.local.outlier_detection.ejections_enforced_local_origin_failure_percentage: 0
cluster.outbound|80||simple-backend.istioinaction.svc.cluster.local.outlier_detection.ejections_enforced_local_origin_success_rate: 0
cluster.outbound|80||simple-backend.istioinaction.svc.cluster.local.outlier_detection.ejections_enforced_success_rate: 0
cluster.outbound|80||simple-backend.istioinaction.svc.cluster.local.outlier_detection.ejections_enforced_total: 3
cluster.outbound|80||simple-backend.istioinaction.svc.cluster.local.outlier_detection.ejections_overflow: 0
cluster.outbound|80||simple-backend.istioinaction.svc.cluster.local.outlier_detection.ejections_success_rate: 0
cluster.outbound|80||simple-backend.istioinaction.svc.cluster.local.outlier_detection.ejections_total: 3


# 5초 후(baseEjectionTime: 5s) 다시 엔드포인트 모니터링
ENDPOINT            STATUS      OUTLIER CHECK     CLUSTER
10.10.0.27:8080     HEALTHY     OK                outbound|80||simple-backend.istioinaction.svc.cluster.local
10.10.0.29:8080     HEALTHY     OK                outbound|80||simple-backend.istioinaction.svc.cluster.local
10.10.0.30:8080     HEALTHY     OK                outbound|80||simple-backend.istioinaction.svc.cluster.local

위와같이 오류율이 확 떨어진것을 알 수 있다.

오류율이 2개가 뜨는 이유는 ? 처음 2개 실패 후 이후부터 성공한것이기 때문이다.

오류율을 더 떨어트리려면 ?

#
cat ch6/simple-backend-vs-retry-500.yaml
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: simple-backend-vs
spec:
  hosts:
  - simple-backend
  http:
  - route:
    - destination:
        host: simple-backend
    retries:
      attempts: 2
      retryOn: 5x

kubectl apply -f ch6/simple-backend-vs-retry-500.yaml -n istioinaction

# 통계 초기화
kubectl exec -it deploy/simple-web -c istio-proxy -n istioinaction \
-- curl -X POST localhost:15000/reset_counters

# 엔드포인트 모니터링 먼저 해두기 : 신규 터미널
while true; do docker exec -it myk8s-control-plane istioctl proxy-config endpoint deploy/simple-web.istioinaction --cluster 'outbound|80||simple-backend.istioinaction.svc.cluster.local' ; date; sleep 1; echo; done

# 로드 테스트 실행 : 모두 성공!
fortio load -quiet -jitter -t 30s -c 2 -qps 2 --allow-initial-errors http://simple-web.istioinaction.io:30000
...
Sockets used: 2 (for perfect keepalive, would be 2)
Code 200 : 60 (100.0 %)
All done 60 calls (plus 2 warmup) 173.837 ms avg, 2.0 qps
...

# 엔드포인트 이상 감지 전에 3번 실패했지만, 재시도 retry 덕분에 결과적으로 모두 성공!
kubectl exec -it deploy/simple-web -c istio-proxy -n istioinaction \
 -- curl localhost:15000/stats | grep simple-backend | grep outlier

# 통계 확인
kubectl exec -it deploy/simple-web -c istio-proxy -n istioinaction \
 -- curl localhost:15000/stats | grep simple-backend.istioinaction.svc.cluster.local.upstream | grep retry
cluster.outbound|80||simple-backend.istioinaction.svc.cluster.local.upstream_rq_retry: 4
cluster.outbound|80||simple-backend.istioinaction.svc.cluster.local.upstream_rq_retry_backoff_exponential: 4
cluster.outbound|80||simple-backend.istioinaction.svc.cluster.local.upstream_rq_retry_backoff_ratelimited: 0
cluster.outbound|80||simple-backend.istioinaction.svc.cluster.local.upstream_rq_retry_limit_exceeded: 0
cluster.outbound|80||simple-backend.istioinaction.svc.cluster.local.upstream_rq_retry_overflow: 0
cluster.outbound|80||simple-backend.istioinaction.svc.cluster.local.upstream_rq_retry_success: 4

6장 요약

로드 밸런싱 알고리즘: DestinationRule을 통해 ROUND_ROBIN, RANDOM, LEAST_CONN 등의 알고리즘 선택 가능하며, 기본값은 LEAST_CONN(최소 연결 우선)입니다.
지역 기반 트래픽 제어:
- Locality-aware LB: 동일 리전/영역 내 엔드포인트 우선 라우팅 (outlierDetection 활성화 필수).
- 가중치 분배: DestinationRule의 distribute 설정으로 특정 지역에 트래픽 비율 할당 가능(예: 70% 로컬, 30% 타 영역).
복원력 설정:
- 재시도/타임아웃: VirtualService에서 retries.attempts, timeout 지정.
- 서킷 브레이커: DestinationRule의 connectionPool+outlierDetection으로 연결 수/에러 임계값 관리.
- EnvoyFilter: Istio 미지원 기능(예: 백오프 시간 커스텀)을 Envoy 프록시 레벨에서 확장 적용.

저작자표시 (새창열림)

안녕하세요.

istio-3-2

6.1 Building resilience into the application

6.2 Client-side load balancing 클라이언트 측 로드 밸런싱 (실습)

6.3 Locality-aware load balancing 지역 인식 로드 밸런싱 (실습)

+ Recent posts

티스토리툴바