【云计算】Kubernetes入门与实践：从部署到运维

Kubernetes是一个开源的容器编排平台，用于自动化容器化应用的部署、扩展和管理。自我修复：自动重启失败的容器，替换和重新调度不可用的节点水平扩展：通过对Deployment的简单命令或基于CPU使用率的自动扩展服务发现与负载均衡：为容器提供稳定的网络标识和流量分发自动装箱：根据资源需求自动放置容器到合适的节点配置管理与密钥管理：管理敏感信息和配置，避免泄漏到镜像中存储编排：自动挂载存储系统，

南屹川

97人浏览 · 2026-05-24 20:32:44

南屹川 · 2026-05-24 20:32:44 发布

【云计算】Kubernetes入门与实践：从部署到运维

引言

Kubernetes（简称K8s）作为容器编排领域的标杆技术，已经成为现代云原生应用部署的事实标准。它源自Google内部的Borg系统，经过多年的生产环境验证，于2015年开源并捐赠给CNCF（云原生计算基金会）。本文将全面介绍Kubernetes的核心概念、架构设计、核心资源对象、资源调度机制以及运维实践，帮助读者从零基础到能够独立完成生产环境的部署和运维工作。

一、Kubernetes概述

1.1 什么是Kubernetes

Kubernetes是一个开源的容器编排平台，用于自动化容器化应用的部署、扩展和管理。其核心特性包括：

自我修复：自动重启失败的容器，替换和重新调度不可用的节点
水平扩展：通过对Deployment的简单命令或基于CPU使用率的自动扩展
服务发现与负载均衡：为容器提供稳定的网络标识和流量分发
自动装箱：根据资源需求自动放置容器到合适的节点
配置管理与密钥管理：管理敏感信息和配置，避免泄漏到镜像中
存储编排：自动挂载存储系统，如本地存储、NFS、云存储等

1.2 Kubernetes架构

┌─────────────────────────────────────────────────────────────────┐
│                         Kubernetes Cluster                       │
│                                                                  │
│  ┌──────────────────┐                                            │
│  │   Control Plane  │                                            │
│  │  ┌────────────┐ │                                            │
│  │  │   API Server │ │                                           │
│  │  └────────────┘ │                                            │
│  │  ┌────────────┐ │  ┌────────────┐  ┌────────────┐          │
│  │  │ Scheduler  │ │  │ Controller │  │   etcd     │          │
│  │  │            │ │  │   Manager  │  │            │          │
│  │  └────────────┘ │  └────────────┘  └────────────┘          │
│  └──────────────────┘                                            │
│           │                                                       │
│  ┌────────┴────────────────────────────────────────────────┐     │
│  │                      Data Plane                          │     │
│  │  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐    │     │
│  │  │    Node 1   │  │    Node 2   │  │    Node 3   │    │     │
│  │  │ ┌─────────┐ │  │ ┌─────────┐ │  │ ┌─────────┐ │    │     │
│  │  │ │  kubelet │ │  │ │  kubelet │ │  │ │  kubelet │ │    │     │
│  │  │ │kube-proxy│ │  │ │kube-proxy│ │  │ │kube-proxy│ │    │     │
│  │  │ └────┬────┘ │  │ └────┬────┘ │  │ └────┬────┘ │    │     │
│  │  │      │       │  │      │       │  │      │       │    │     │
│  │  │ ┌────▼────┐ │  │ ┌────▼────┐ │  │ ┌────▼────┐ │    │     │
│  │  │ │Container│ │  │ │Container│ │  │ │Container│ │    │     │
│  │  │ │ Runtime │ │  │ │ Runtime │ │  │ │ Runtime │ │    │     │
│  │  │ └─────────┘ │  │ └─────────┘ │  │ └─────────┘ │    │     │
│  │  └─────────────┘  └─────────────┘  └─────────────┘    │     │
│  └─────────────────────────────────────────────────────────┘     │
└─────────────────────────────────────────────────────────────────┘

1.3 核心组件详解

Control Plane（控制平面）组件：

kube-apiserver：集群的统一入口，处理所有RESTful API请求
etcd：高可用的键值存储，保存集群所有状态数据
kube-scheduler：负责Pod调度，将Pod分配到合适的节点
kube-controller-manager：运行各种控制器，确保集群期望状态

Node（工作节点）组件：

kubelet：节点代理，负责管理容器生命周期
kube-proxy：网络代理，维护网络规则
Container Runtime：容器运行时（Docker/containerd）

二、核心资源对象

2.1 Pod - 最小调度单元

# Pod基本定义
apiVersion: v1
kind: Pod
metadata:
  name: nginx-pod
  labels:
    app: nginx
    environment: production
spec:
  containers:
  - name: nginx
    image: nginx:1.24
    ports:
    - containerPort: 80
      name: http
      protocol: TCP
    - containerPort: 443
      name: https
      protocol: TCP
    resources:
      requests:
        memory: "128Mi"
        cpu: "250m"
      limits:
        memory: "256Mi"
        cpu: "500m"
    livenessProbe:
      httpGet:
        path: /healthz
        port: 80
      initialDelaySeconds: 15
      periodSeconds: 10
    readinessProbe:
      httpGet:
        path: /ready
        port: 80
      initialDelaySeconds: 5
      periodSeconds: 5
    env:
    - name: NGINX_HOST
      value: "localhost"
    - name: NGINX_PORT
      value: "80"

# 多容器Pod - Sidecar模式
apiVersion: v1
kind: Pod
metadata:
  name: web-app-with-log-collector
  labels:
    app: web-app
spec:
  containers:
  # 主应用容器
  - name: web-app
    image: myapp:latest
    ports:
    - containerPort: 8080
    volumeMounts:
    - name: shared-logs
      mountPath: /var/log/app
  
  # Sidecar日志收集容器
  - name: log-collector
    image: fluent/fluent-bit:latest
    volumeMounts:
    - name: shared-logs
      mountPath: /var/log/app
    - name: fluentd-config
      mountPath: /fluentd/etc
    env:
    - name: FLUENTD_CONF
      value: "app.conf"
  
  # Sidecar代理容器
  - name: envoy-proxy
    image: envoyproxy/envoy:v1.20
    ports:
    - containerPort: 15001
    env:
    - name: ENVOY_EDGE_STATS
      value: "true"
  
  volumes:
  - name: shared-logs
    emptyDir: {}
  - name: fluentd-config
    configMap:
      name: fluentd-config

2.2 ReplicaSet与Deployment

# ReplicaSet定义
apiVersion: apps/v1
kind: ReplicaSet
metadata:
  name: nginx-replicaset
  labels:
    app: nginx
spec:
  replicas: 3
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:1.24
        ports:
        - containerPort: 80

# Deployment定义 - 生产环境推荐
apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-deployment
  labels:
    app: web-application
spec:
  replicas: 5
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0
  selector:
    matchLabels:
      app: web-application
  template:
    metadata:
      labels:
        app: web-application
        version: v1.0.0
    spec:
      terminationGracePeriodSeconds: 30
      containers:
      - name: web-app
        image: myorg/web-app:v1.0.0
        ports:
        - containerPort: 8080
          name: http
        - containerPort: 8443
          name: https
        resources:
          requests:
            memory: "256Mi"
            cpu: "100m"
          limits:
            memory: "512Mi"
            cpu: "500m"
        env:
        - name: DATABASE_URL
          valueFrom:
            secretKeyRef:
              name: app-secrets
              key: database-url
        - name: REDIS_HOST
          value: "redis-service"
        - name: LOG_LEVEL
          valueFrom:
            configMapKeyRef:
              name: app-config
              key: log-level
        livenessProbe:
          httpGet:
            path: /health/live
            port: 8080
          initialDelaySeconds: 30
          periodSeconds: 10
          failureThreshold: 3
        readinessProbe:
          httpGet:
            path: /health/ready
            port: 8080
          initialDelaySeconds: 5
          periodSeconds: 5
          successThreshold: 1
          failureThreshold: 3
        lifecycle:
          preStop:
            exec:
              command: ["/bin/sh", "-c", "sleep 10"]

2.3 Service与Ingress

# ClusterIP Service - 内部访问
apiVersion: v1
kind: Service
metadata:
  name: backend-service
  labels:
    app: backend
spec:
  type: ClusterIP
  selector:
    app: backend
  ports:
  - name: http
    port: 80
    targetPort: 8080
    protocol: TCP
  - name: grpc
    port: 50051
    targetPort: 50051
    protocol: TCP

# NodePort Service - 节点端口访问
apiVersion: v1
kind: Service
metadata:
  name: frontend-service
spec:
  type: NodePort
  selector:
    app: frontend
  ports:
  - name: http
    port: 80
    targetPort: 3000
    nodePort: 30080
  - name: https
    port: 443
    targetPort: 3001
    nodePort: 30443

# LoadBalancer Service - 云厂商负载均衡器
apiVersion: v1
kind: Service
metadata:
  name: web-service
  annotations:
    # AWS ALB annotations
    service.beta.kubernetes.io/aws-load-balancer-type: "nlb"
    service.beta.kubernetes.io/aws-load-balancer-backend-protocol: "http"
    service.beta.kubernetes.io/aws-load-balancer-ssl-cert: "arn:aws:acm:xxx"
spec:
  type: LoadBalancer
  selector:
    app: web
  ports:
  - name: https
    port: 443
    targetPort: 8080

# Ingress - HTTP/HTTPS入口
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: web-ingress
  annotations:
    kubernetes.io/ingress.class: "nginx"
    nginx.ingress.kubernetes.io/ssl-redirect: "true"
    nginx.ingress.kubernetes.io/force-ssl-redirect: "true"
    nginx.ingress.kubernetes.io/proxy-body-size: "50m"
    nginx.ingress.kubernetes.io/proxy-connect-timeout: "30"
    nginx.ingress.kubernetes.io/proxy-read-timeout: "60"
    nginx.ingress.kubernetes.io/proxy-send-timeout: "60"
spec:
  tls:
  - hosts:
    - www.example.com
    - api.example.com
    secretName: tls-secret
  rules:
  - host: www.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: web-frontend
            port:
              number: 80
      - path: /api
        pathType: Prefix
        backend:
          service:
            name: api-gateway
            port:
              number: 8080
  - host: api.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: api-service
            port:
              number: 8080

2.4 ConfigMap与Secret

# ConfigMap - 应用配置
apiVersion: v1
kind: ConfigMap
metadata:
  name: app-config
data:
  # Properties格式
  database.properties: |
    db.host=postgres-service
    db.port=5432
    db.name=appdb
    db.pool.size=20
  # JSON格式
  config.json: |
    {
      "logLevel": "info",
      "features": {
        "newUI": true,
        "betaAPI": false
      },
      "rateLimit": {
        "requests": 100,
        "window": "1m"
      }
    }

# Secret - 敏感数据
apiVersion: v1
kind: Secret
metadata:
  name: app-secrets
type: Opaque
data:
  # Base64编码的值
  # echo -n "password123" | base64
  db-password: cGFzc3dvcmQxMjM=
  api-key: c29tZS1hcGkta2V5LWJhc2U2NC1lbmNvZGVk
  tls.crt: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0t...
  tls.key: LS0tLS1CRUdJTiBQUklWQVRFIEtFWS0tLS0t...
stringData:
  # 纯文本格式，会自动Base64编码
  username: admin

2.5 PersistentVolume与PersistentVolumeClaim

# PersistentVolume - NFS存储
apiVersion: v1
kind: PersistentVolume
metadata:
  name: nfs-pv
  labels:
    type: nfs
spec:
  capacity:
    storage: 100Gi
  accessModes:
    - ReadWriteMany
  persistentVolumeReclaimPolicy: Retain
  mountOptions:
    - hard
    - nfsvers=4.1
  nfs:
    server: nfs-server.example.com
    path: /exported/path

---
# PersistentVolumeClaim
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: app-storage
spec:
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 50Gi
  selector:
    matchLabels:
      type: nfs

---
# Pod使用PVC
apiVersion: v1
kind: Pod
metadata:
  name: app-with-storage
spec:
  containers:
  - name: app
    image: myapp:latest
    volumeMounts:
    - name: app-data
      mountPath: /data
  volumes:
  - name: app-data
    persistentVolumeClaim:
      claimName: app-storage

三、核心概念详解

3.1 命名空间

# 命名空间定义
apiVersion: v1
kind: Namespace
metadata:
  name: production
  labels:
    environment: production
    team: platform
---
# 使用命名空间的资源
apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-deployment
  namespace: production
spec:
  replicas: 3
  selector:
    matchLabels:
      app: web
  template:
    metadata:
      labels:
        app: web
    spec:
      containers:
      - name: web
        image: myorg/web:v1

# 命名空间操作
kubectl get namespaces
kubectl create namespace staging
kubectl delete namespace unused-namespace

# 查看特定命名空间的资源
kubectl get pods -n production
kubectl get all -n production

# 设置默认命名空间
kubectl config set-context --current --namespace=production

3.2 标签与选择器

# 资源标签示例
apiVersion: apps/v1
kind: Deployment
metadata:
  name: api-deployment
  labels:
    app: api
    version: v2.1.0
    tier: backend
    environment: production
    team: backend
    managed-by: kubectl
spec:
  replicas: 3
  selector:
    matchLabels:
      app: api
  template:
    metadata:
      labels:
        app: api
        version: v2.1.0
        tier: backend
        environment: production
    spec:
      containers:
      - name: api
        image: myorg/api:v2.1.0
        labels:
          framework: spring-boot

# 标签选择器
kubectl get pods -l "app=api"
kubectl get pods -l "app=api,version=v2"
kubectl get pods -l "app in (api,web)"
kubectl get pods -l "app notin (api,web)"
kubectl get pods -l "environment=production,tier=backend"
kubectl get deployments -l "!release"

# 修改标签
kubectl label pods nginx-pod environment=production
kubectl label pods nginx-pod version=v2 --overwrite
kubectl label pods -l "app=api" team=backend --overwrite

3.3 注解

# 注解用于存储非标识性元数据
apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-deployment
  annotations:
    # 构建信息
    kubernetes.io/change-cause: "Deployment updated to v2.1.0"
    last-modified-by: "devops-team"
    # 配置信息
    config.example.com/owner: "backend-team"
    config.example.com/support-email: "backend@example.com"
    # 监控信息
    prometheus.io/scrape: "true"
    prometheus.io/port: "8080"
    prometheus.io/path: "/metrics"
spec:
  # ...

四、资源调度与伸缩

4.1 HPA - 水平Pod自动伸缩

# HorizontalPodAutoscaler
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: web-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-deployment
  minReplicas: 2
  maxReplicas: 10
  metrics:
  # 基于CPU使用率
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  # 基于内存使用率
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80
  # 基于自定义指标
  - type: Pods
    pods:
      metric:
        name: http_requests_per_second
      target:
        type: AverageValue
        averageValue: "1000"
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
      - type: Percent
        value: 10
        periodSeconds: 60
    scaleUp:
      stabilizationWindowSeconds: 0
      policies:
      - type: Percent
        value: 100
        periodSeconds: 15
      - type: Pods
        value: 4
        periodSeconds: 15
      selectPolicy: Max

# 查看HPA状态
kubectl get hpa
kubectl describe hpa web-hpa

# 手动伸缩
kubectl scale deployment web-deployment --replicas=5

4.2 VPA - 垂直Pod自动伸缩

# VerticalPodAutoscaler
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: api-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api-deployment
  updatePolicy:
    updateMode: "Auto"  # Auto, Off, Initial
  resourcePolicy:
    containerPolicies:
    - containerName: api
      minAllowed:
        cpu: 100m
        memory: 128Mi
      maxAllowed:
        cpu: 2
        memory: 2Gi
      controlledResources: ["cpu", "memory"]

4.3 资源配额与限制

# ResourceQuota - 命名空间级别资源配额
apiVersion: v1
kind: ResourceQuota
metadata:
  name: compute-quota
spec:
  hard:
    requests.cpu: "10"
    requests.memory: 20Gi
    limits.cpu: "20"
    limits.memory: 40Gi
    pods: "50"
    services: "10"
    persistentvolumeclaims: "20"

---
# LimitRange - Pod/Container资源限制
apiVersion: v1
kind: LimitRange
metadata:
  name: default-limits
spec:
  limits:
  - max:
      cpu: "2"
      memory: 2Gi
    min:
      cpu: 50m
      memory: 64Mi
    default:
      cpu: 500m
      memory: 512Mi
    defaultRequest:
      cpu: 200m
      memory: 256Mi
    type: Container

五、运维实践

5.1 滚动更新与回滚

# 滚动更新
kubectl set image deployment/web-deployment web=myorg/web:v2.0.0
kubectl rollout status deployment/web-deployment

# 查看更新历史
kubectl rollout history deployment/web-deployment
kubectl rollout history deployment/web-deployment --revision=3

# 回滚到上一版本
kubectl rollout undo deployment/web-deployment

# 回滚到指定版本
kubectl rollout undo deployment/web-deployment --to-revision=2

# Deployment更新策略详解
spec:
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1          # 最多超出期望副本数
      maxUnavailable: 0    # 不可用Pod数量（建议为0保证服务连续性）
  
  # 探针配置影响更新过程
  minReadySeconds: 30      # 新Pod就绪后最少运行时间
  progressDeadlineSeconds: 600  # 更新超时时间

5.2 污点与容忍

# 节点污点
kubectl taint nodes node1 dedicated=gpu:NoSchedule
kubectl taint nodes node1 special=true:PreferNoSchedule
kubectl taint nodes node1 maintenance=true:NoExecute --overwrite

# 查看污点
kubectl describe node node1 | grep -A5 Taints

# Pod容忍污点
kubectl taint nodes node1 dedicated=gpu:NoSchedule

# Pod配置容忍
apiVersion: apps/v1
kind: Deployment
metadata:
  name: ml-training
spec:
  replicas: 1
  selector:
    matchLabels:
      app: ml-training
  template:
    spec:
      tolerations:
      # 匹配NoSchedule污点
      - key: "dedicated"
        operator: "Equal"
        value: "gpu"
        effect: "NoSchedule"
      # 匹配任意污点
      - key: "dedicated"
        operator: "Exists"
        effect: "NoSchedule"
      # 匹配任意effect
      - key: "special"
        operator: "Exists"
      nodeSelector:
        gpu: "true"
      containers:
      - name: training
        image: ml-training:latest
        resources:
          requests:
            nvidia.com/gpu: 1
          limits:
            nvidia.com/gpu: 1

5.3 亲和性与反亲和性

# Pod反亲和性 - 分散部署
apiVersion: apps/v1
kind: Deployment
metadata:
  name: redis-cluster
spec:
  replicas: 6
  selector:
    matchLabels:
      app: redis
  template:
    metadata:
      labels:
        app: redis
    spec:
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: "app"
                operator: In
                values: ["redis"]
            topologyKey: "kubernetes.io/hostname"
      containers:
      - name: redis
        image: redis:7-alpine
        ports:
        - containerPort: 6379

# Pod亲和性 - 同节点部署
apiVersion: apps/v1
kind: Deployment
metadata:
  name: logging-agent
spec:
  replicas: 3
  template:
    spec:
      affinity:
        podAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: "app"
                operator: In
                values: ["web-app"]
            topologyKey: "kubernetes.io/hostname"
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 100
            podAffinityTerm:
              labelSelector:
                matchExpressions:
                - key: "app"
                  operator: In
                  values: ["logging-agent"]
              topologyKey: "kubernetes.io/hostname"
      containers:
      - name: fluentd
        image: fluent/fluentd:latest

5.4 调度器配置

# Pod优先级与抢占
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: high-priority
value: 1000
globalDefault: false
description: "High priority for production workloads"
---
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: low-priority
value: 100
globalDefault: true
description: "Default priority for batch jobs"
---
# 使用优先级
apiVersion: apps/v1
kind: Deployment
metadata:
  name: critical-service
spec:
  template:
    spec:
      priorityClassName: high-priority
      containers:
      - name: app
        image: myapp:latest

# 调度器配置
kube-scheduler --config=/etc/kubernetes/scheduler-config.yaml

# Pod调度多选题
kubectl label nodes node1 zone=primary
kubectl label nodes node2 zone=secondary
kubectl label nodes node1 disk-type=ssd
kubectl label nodes node2 disk-type=HDD

5.5 集群监控与日志

# Prometheus监控配置
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: web-app-monitor
  labels:
    team: platform
spec:
  selector:
    matchLabels:
      app: web
  endpoints:
  - port: metrics
    path: /metrics
    interval: 15s
  namespaceSelector:
    matchNames:
    - production

---
# 日志收集 - Fluentd配置
apiVersion: v1
kind: ConfigMap
metadata:
  name: fluentd-config
data:
  fluent.conf: |
    <source>
      @type tail
      path /var/log/containers/*.log
      pos_file /var/log/fluentd-containers.log.pos
      tag kubernetes.*
      <parse>
        @type json
        time_key time
        time_format %Y-%m-%dT%H:%M:%S.%NZ
      </parse>
    </source>
    
    <filter kubernetes.**>
      @type kubernetes_metadata
      @id kubernetes_metadata
    </filter>
    
    <match kubernetes.**>
      @type elasticsearch
      host elasticsearch.logging.svc
      port 9200
      logstash_format true
      logstash_prefix kubernetes
    </match>

六、生产环境最佳实践

6.1 高可用架构

# 高可用Deployment配置
apiVersion: apps/v1
kind: Deployment
metadata:
  name: ha-web-app
spec:
  replicas: 3
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0
  selector:
    matchLabels:
      app: web
  template:
    spec:
      affinity:
        # 反亲和性确保Pod分布在不同节点
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchLabels:
                app: web
            topologyKey: kubernetes.io/hostname
        # 节点亲和性分散到不同可用区
        nodeAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 100
            preference:
              matchExpressions:
              - key: topology.kubernetes.io/zone
                operator: In
                values:
                - zone-a
                - zone-b
                - zone-c
      containers:
      - name: web
        image: myorg/web:v1
        resources:
          requests:
            memory: "256Mi"
            cpu: "100m"
          limits:
            memory: "512Mi"
            cpu: "1000m"
        readinessProbe:
          httpGet:
            path: /health
            port: 8080
          initialDelaySeconds: 5
          periodSeconds: 5
          successThreshold: 1
        livenessProbe:
          httpGet:
            path: /health/live
            port: 8080
          initialDelaySeconds: 30
          periodSeconds: 10
      terminationGracePeriodSeconds: 60

6.2 灾难恢复

# 备份策略
# 1. etcd快照
ETCDCTL_API=3 etcdctl snapshot save /backup/etcd-snapshot.db \
  --endpoints=https://127.0.0.1:2379 \
  --cacert=/etc/kubernetes/pki/etcd/ca.crt \
  --cert=/etc/kubernetes/pki/etcd/server.crt \
  --key=/etc/kubernetes/pki/etcd/server.key

# 2. 恢复集群
ETCDCTL_API=3 etcdctl snapshot restore /backup/etcd-snapshot.db \
  --data-dir=/var/lib/etcd/restore

# 3. 资源导出
kubectl get all --all-namespaces -o yaml > all-resources.yaml
kubectl get configmaps -n production -o yaml > configmaps.yaml
kubectl get secrets -n production -o yaml > secrets.yaml

总结

Kubernetes作为云原生时代的核心基础设施，提供了强大的容器编排能力。本文从核心概念出发，详细介绍了Pod、Deployment、Service、ConfigMap、Secret等核心资源对象，以及调度机制、运维实践和最佳配置。

掌握Kubernetes需要理论与实践相结合，建议读者：

动手实践：搭建本地集群（Minikube/kind）进行实验
深入原理：理解Kubernetes的设计理念和架构
关注生产：学习高可用部署、监控告警等运维技能
持续学习：关注CNCF生态和Kubernetes版本更新

希望本文能为读者的Kubernetes学习之旅提供系统性的指导。

openEuler 社区

openEuler 是由开放原子开源基金会孵化的全场景开源操作系统项目，面向数字基础设施四大核心场景（服务器、云计算、边缘计算、嵌入式），全面支持 ARM、x86、RISC-V、loongArch、PowerPC、SW-64 等多样性计算架构

更多推荐

ZS315Q + 2.0Hub Type-C 拓展坞转DP1.4 投屏8k 5合1 DP1.4 (8K@60Hz) + PD 100W + 3xUSB2.0

openEuler 社区

Linux网络编程基础（socket选项）

本文系统介绍了Socket编程中的选项设置机制，重点解析了setsockopt()和getsockopt()函数的使用方法。内容涵盖不同协议层级的选项分类（SOL_SOCKET、IPPROTO_TCP等），详细说明了SO_REUSEADDR、TCP_NODELAY等常用选项的功能参数及典型应用场景，包括地址复用、缓冲区设置、超时控制等。文章还提供了高性能服务器配置模板，分析了常见问题的解决方案，并

openEuler 社区

Linux网络编程基础（IO多路复用）

IO多路复用技术是现代高性能网络编程的核心，它通过单线程监控多个文件描述符状态，解决了传统IO模型在高并发场景下的性能瓶颈。文章详细介绍了三种主流实现机制：select（早期方案，存在性能限制）、poll（改进select但仍有不足）和epoll（Linux特有，最高效）。重点分析了epoll的工作原理、优势（无数量限制、事件驱动、高效通知）和使用示例，并提供了性能优化策略。该技术广泛应用于Ngi