Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-slb related bug fixes #7432

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

nilo19
Copy link
Contributor

@nilo19 nilo19 commented Oct 29, 2024

What type of PR is this?

/kind bug

What this PR does / why we need it:

  1. All endpointslices of a local service should be included in local backend pool updater, instead of only the first endpointslice.
  2. In some rare cases, migration from NIC to IP-based LB can be in a middle state where the NIC references are removed, but those IPConfigs in the backend pool are not. In this case, we should manually exclude those IPConfigs from the request body.
  3. localServiceOwnsBackendPool should compare the full backend pool name, not just prefix, because two service names can share the same prefix.
  4. There is a corner case when the cluster is being updated to multi-slb from classic NIC-based single lb, not from an IP-based cluster. In this case, if the service being reconciled is local, the cloud provider will try to update a NIC pool to IP-based pool direct, which is not allowed. We should skip adding IPs to NIC-based pool in multi-slb mode.
  5. There is a bug in ReconcileBackendPools, where we by mistake parse the LB name to use as the backend pool name.

Which issue(s) this PR fixes:

Fixes #7113
Fixes #7200
Fixes #6980

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Fix: several bugs related to multiple standard load balancers mode.
1. All endpointslices of a local service should be included in local backend pool updater, instead of only the first endpointslice.
2. In some rare cases, migration from NIC to IP-based LB can be in a middle state where the NIC references are removed, but those IPConfigs in the backend pool are not. In this case, we should manually exclude those IPConfigs from the request body.
3. localServiceOwnsBackendPool should compare the full backend pool name, not just prefix, because two service names can share the same prefix.
4. There is a corner case when the cluster is being updated to multi-slb from classic NIC-based single lb, not from an IP-based cluster. In this case, if the service being reconciled is local, the cloud provider will try to update a NIC pool to IP-based pool direct, which is not allowed. We should skip adding IPs to NIC-based pool in multi-slb mode.
5. There is a bug in ReconcileBackendPools, where we by mistake parse the LB name to use as the backend pool name.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Oct 29, 2024
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: nilo19

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Oct 29, 2024
activeNodes = bi.getLocalServiceEndpointsNodeNames(service)
}

if isNICPool(backendPool) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix no. 4

@@ -886,7 +889,13 @@ func removeNodeIPAddressesFromBackendPool(
if addresses[i].LoadBalancerBackendAddressPropertiesFormat != nil {
ipAddress := ptr.Deref((*backendPool.LoadBalancerBackendAddresses)[i].IPAddress, "")
if ipAddress == "" {
klog.V(4).Infof("removeNodeIPAddressFromBackendPool: LoadBalancerBackendAddress %s is not IP-based, skipping", ptr.Deref(addresses[i].Name, ""))
if isNodeIP {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix no. 2

@@ -633,7 +634,7 @@ func (bi *backendPoolTypeNodeIP) ReconcileBackendPools(ctx context.Context, clus
if isMigration && bi.EnableMigrateToIPBasedBackendPoolAPI {
var backendPoolNames []string
for _, id := range lbBackendPoolIDsSlice {
name, err := getLBNameFromBackendPoolID(id)
name, err := getBackendPoolNameFromBackendPoolID(id)
Copy link
Contributor Author

@nilo19 nilo19 Oct 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix no. 5

@@ -446,8 +445,10 @@ func (az *Cloud) getLocalServiceBackendPoolID(serviceName string, lbName string,

// localServiceOwnsBackendPool checks if a backend pool is owned by a local service.
func localServiceOwnsBackendPool(serviceName, bpName string) bool {
prefix := strings.Replace(serviceName, "/", "-", -1)
return strings.HasPrefix(strings.ToLower(bpName), strings.ToLower(prefix))
if strings.HasSuffix(strings.ToLower(bpName), consts.IPVersionIPv6StringLower) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix no. 3

ep = endpointSlice
foundInCache = true
return false
eps = append(eps, endpointSlice)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix no. 1

@nilo19 nilo19 force-pushed the fix/multi-slb/endpointslice branch from 2b510ac to 5084147 Compare October 29, 2024 04:30
client := fake.NewSimpleClientset(&svc)
// if tc.existingEPS != nil {
// client = fake.NewSimpleClientset(&svc, tc.existingEPS)
// } else {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: delete unused codes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

bp.LoadBalancerBackendAddresses != nil {
for _, addr := range *bp.LoadBalancerBackendAddresses {
if ptr.Deref(addr.IPAddress, "") == "" {
logger.Info("The load balancer backend address has empty ip address, assuming it is a NIC pool",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change to v(4)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed

@feiskyer
Copy link
Member

/retest

1 similar comment
@nilo19
Copy link
Contributor Author

nilo19 commented Nov 11, 2024

/retest

@nilo19
Copy link
Contributor Author

nilo19 commented Nov 12, 2024

/test pull-cloud-provider-azure-e2e-ccm-capz

@nilo19
Copy link
Contributor Author

nilo19 commented Nov 12, 2024

/test pull-cloud-provider-azure-e2e-ccm-vmss-capz

@feiskyer
Copy link
Member

Thanks for the fixes
/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 12, 2024
@nilo19
Copy link
Contributor Author

nilo19 commented Nov 12, 2024

/test pull-cloud-provider-azure-e2e-ccm-capz

@nilo19
Copy link
Contributor Author

nilo19 commented Nov 13, 2024

/retest

@nilo19
Copy link
Contributor Author

nilo19 commented Nov 13, 2024

/retest

1 similar comment
@MartinForReal
Copy link
Contributor

/retest

@k8s-ci-robot
Copy link
Contributor

@nilo19: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-cloud-provider-azure-e2e-ccm-vmss-ip-lb-capz 1731376 link true /test pull-cloud-provider-azure-e2e-ccm-vmss-ip-lb-capz

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@MartinForReal
Copy link
Contributor

/retest

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
4 participants