[k8s源碼分析][kube-scheduler]scheduler/algorithm之預(yù)選predicate

1. 前言

轉(zhuǎn)載請說明原文出處, 尊重他人勞動成果!

本文將分析調(diào)度器中的預(yù)選方法, 主要涉及pkg/scheduler/algorithm/predicate/predicates.gopkg/scheduler/algorithm/type.go
源碼位置: https://github.com/nicktming/kubernetes
分支: tming-v1.13 (基于v1.13版本)

2. 預(yù)選方法定義(predicate)

type FitPredicate func(pod *v1.Pod, meta PredicateMetadata, nodeInfo *schedulercache.NodeInfo) (bool, []PredicateFailureReason, error)
// PredicateMetadataProducer is a function that computes predicate metadata for a given pod.
type PredicateMetadataProducer func(pod *v1.Pod, nodeNameToInfo map[string]*schedulercache.NodeInfo) PredicateMetadata

// PredicateMetadata interface represents anything that can access a predicate metadata.
type PredicateMetadata interface {
    ShallowCopy() PredicateMetadata
    AddPod(addedPod *v1.Pod, nodeInfo *schedulercache.NodeInfo) error
    RemovePod(deletedPod *v1.Pod) error
}

可以看到FitPredicate就是預(yù)選方法
pod: 需要調(diào)度的pod.
meta: 一個PredicateMetadata(對predicateMetadata目前可以不用在意, 因為不會影響到整個對預(yù)選方法的理解, 會在后面有分析)
nodeInfo: 就是節(jié)點信息
返回該pod在該節(jié)點nodeInfo是否可以通過, 如果不通過, 理由也會返回.

所以接下來就分析幾個常見的預(yù)選方法. 因為在[k8s源碼分析][kube-scheduler]scheduler/algorithmprovider之注冊default-scheduler 已經(jīng)有涉及到, 所以直接就從pkg/scheduler/algorithmprovider/defaults/defaults.go中選幾個簡單看看就可以了.

3. 預(yù)選方法

3.1 PodFitsHostPorts

判斷該節(jié)點中使用了的port是否與requested pod ports有沖突

func PodFitsHostPorts(pod *v1.Pod, meta algorithm.PredicateMetadata, nodeInfo *schedulercache.NodeInfo) (bool, []algorithm.PredicateFailureReason, error) {
    var wantPorts []*v1.ContainerPort
    // 如果meta可以轉(zhuǎn)成predicateMetadata 就從meta中取
    // 這里不用太在意meta, 因為該meta如果不為nil的話 其實就是從該pod中做了一些操作而已
    if predicateMeta, ok := meta.(*predicateMetadata); ok {
        wantPorts = predicateMeta.podPorts
    } else {
        // We couldn't parse metadata - fallback to computing it.
        wantPorts = schedutil.GetContainerPorts(pod)
    }
    if len(wantPorts) == 0 {
        return true, nil, nil
    }

    // 從該節(jié)點信息中拿到該節(jié)點已經(jīng)使用過的端口
    existingPorts := nodeInfo.UsedPorts()

    // try to see whether existingPorts and  wantPorts will conflict or not
    //判斷是否有沖突
    if portsConflict(existingPorts, wantPorts) {
        return false, []algorithm.PredicateFailureReason{ErrPodNotFitsHostPorts}, nil
    }

    return true, nil, nil
}

3.2 PodFitsResources

判斷資源是否充足

func PodFitsResources(pod *v1.Pod, meta algorithm.PredicateMetadata, nodeInfo *schedulercache.NodeInfo) (bool, []algorithm.PredicateFailureReason, error) {
    node := nodeInfo.Node()
    if node == nil {
        return false, nil, fmt.Errorf("node not found")
    }

    var predicateFails []algorithm.PredicateFailureReason
    allowedPodNumber := nodeInfo.AllowedPodNumber()
    // 如果該節(jié)點所可以容納的pod數(shù)量達到上限時
    if len(nodeInfo.Pods())+1 > allowedPodNumber {
        predicateFails = append(predicateFails, NewInsufficientResourceError(v1.ResourcePods, 1, int64(len(nodeInfo.Pods())), int64(allowedPodNumber)))
    }

    // No extended resources should be ignored by default.
    ignoredExtendedResources := sets.NewString()

    var podRequest *schedulercache.Resource
    if predicateMeta, ok := meta.(*predicateMetadata); ok {
        podRequest = predicateMeta.podRequest
        if predicateMeta.ignoredExtendedResources != nil {
            ignoredExtendedResources = predicateMeta.ignoredExtendedResources
        }
    } else {
        // We couldn't parse metadata - fallback to computing it.
        podRequest = GetResourceRequest(pod)
    }
    if podRequest.MilliCPU == 0 &&
        podRequest.Memory == 0 &&
        podRequest.EphemeralStorage == 0 &&
        len(podRequest.ScalarResources) == 0 {
        return len(predicateFails) == 0, predicateFails, nil
    }

    allocatable := nodeInfo.AllocatableResource()
    // 判斷cpu部分  都是按request計算的
    if allocatable.MilliCPU < podRequest.MilliCPU+nodeInfo.RequestedResource().MilliCPU {
        predicateFails = append(predicateFails, NewInsufficientResourceError(v1.ResourceCPU, podRequest.MilliCPU, nodeInfo.RequestedResource().MilliCPU, allocatable.MilliCPU))
    }
    // 判斷memory部分  都是按request計算的
    if allocatable.Memory < podRequest.Memory+nodeInfo.RequestedResource().Memory {
        predicateFails = append(predicateFails, NewInsufficientResourceError(v1.ResourceMemory, podRequest.Memory, nodeInfo.RequestedResource().Memory, allocatable.Memory))
    }
    if allocatable.EphemeralStorage < podRequest.EphemeralStorage+nodeInfo.RequestedResource().EphemeralStorage {
        predicateFails = append(predicateFails, NewInsufficientResourceError(v1.ResourceEphemeralStorage, podRequest.EphemeralStorage, nodeInfo.RequestedResource().EphemeralStorage, allocatable.EphemeralStorage))
    }

    // 判斷擴展的資源 比如利用device_plugin注冊的資源
    for rName, rQuant := range podRequest.ScalarResources {
        if v1helper.IsExtendedResourceName(rName) {
            // If this resource is one of the extended resources that should be
            // ignored, we will skip checking it.
            if ignoredExtendedResources.Has(string(rName)) {
                continue
            }
        }
        if allocatable.ScalarResources[rName] < rQuant+nodeInfo.RequestedResource().ScalarResources[rName] {
            predicateFails = append(predicateFails, NewInsufficientResourceError(rName, podRequest.ScalarResources[rName], nodeInfo.RequestedResource().ScalarResources[rName], allocatable.ScalarResources[rName]))
        }
    }

    if klog.V(10) {
        if len(predicateFails) == 0 {
            // We explicitly don't do klog.V(10).Infof() to avoid computing all the parameters if this is
            // not logged. There is visible performance gain from it.
            klog.Infof("Schedule Pod %+v on Node %+v is allowed, Node is running only %v out of %v Pods.",
                podName(pod), node.Name, len(nodeInfo.Pods()), allowedPodNumber)
        }
    }
    return len(predicateFails) == 0, predicateFails, nil
}

3.3 PodFitsHost (HostNamePred = "HostName")

判斷pod.Spec.NodeName是否匹配當前節(jié)點名稱.
如果pod.Spec.NodeName為空或者等于當前節(jié)點名稱, 就返回true.
否則返回false.

func PodFitsHost(pod *v1.Pod, meta algorithm.PredicateMetadata, nodeInfo *schedulercache.NodeInfo) (bool, []algorithm.PredicateFailureReason, error) {
    if len(pod.Spec.NodeName) == 0 {
        return true, nil, nil
    }
    node := nodeInfo.Node()
    if node == nil {
        return false, nil, fmt.Errorf("node not found")
    }
    if pod.Spec.NodeName == node.Name {
        return true, nil, nil
    }
    return false, []algorithm.PredicateFailureReason{ErrPodNotMatchHostName}, nil
}

4. 總結(jié)

簡單介紹了幾個常見的預(yù)選方法PodFitsHostPorts, PodFitsResourcesHostName. 主要是為了能理解預(yù)選方法的工作性質(zhì)是什么即可.

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時請結(jié)合常識與多方信息審慎甄別。
平臺聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點,簡書系信息發(fā)布平臺,僅提供信息存儲服務(wù)。

友情鏈接更多精彩內(nèi)容