Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to check Cincinnati for updates #691

Closed
gongx opened this issue Dec 1, 2021 · 6 comments
Closed

Failed to check Cincinnati for updates #691

gongx opened this issue Dec 1, 2021 · 6 comments

Comments

@gongx
Copy link

gongx commented Dec 1, 2021

Bug Report

I am setting zincat with fleetlock strategy.
But keep getting this exception:

[INFO  zincati::update_agent::actor] reached steady state, periodically polling for updates
zincati[8956]: [ERROR zincati::cincinnati] failed to check Cincinnati for updates: server-side error, code 500: (unknown/generic server error)
zincati[8956]: [ERROR zincati::cincinnati] failed to check Cincinnati for updates: server-side error, code 500: (unknown/generic server error)

I did not set any configuration for Cincinnati. So, I think that it should follow the default behavior.

[INFO  zincati::cli::agent] starting update agent (zincati 0.0.23)
[INFO  zincati::strategy::fleet_lock] remote fleet_lock reboot manager: http://fleetlock.fleetlock.svc.cluster.local:8080/
[INFO  zincati::cincinnati] Cincinnati service: https://updates.coreos.fedoraproject.org
[INFO  zincati::cli::agent] agent running on node '9dbff4370da742d1a76c7193ce119158', in update group 'kubelet'
[INFO  zincati::update_agent::actor] registering as the update driver for rpm-ostree
[INFO  zincati::update_agent::actor] initialization complete, auto-updates logic enabled
[INFO  zincati::strategy] update strategy: fleet_lock

Environment

using fedora coreos : fedora:fedora/aarch64/coreos/testing-devel

Expected Behavior

can connect to Cincinnati service and periodically check if any update is available

Actual Behavior

failed to check Cincinnati for updates: server-side error, code 500: (unknown/generic server error)

Reproduction Steps

start the zincati service and it happens all the time

Other Information

The host is in AWS. Not sure whether it is related to aws network setting or something else?
For the stable version of coreOS(fedora:fedora/x86_64/coreos/stable), I see the same exception "failed to check Cincinnati for updates: server-side error, code 500: (unknown/generic server error)"

@lucab
Copy link
Contributor

lucab commented Dec 1, 2021

Thanks for the report. Can you please attach the full logs (with timestamps) coming from journalctl -b 0 -u zincati.service?

My guess is that you are seeing some sporadic errors (and spaced over time) from the Fedora infra. Looking at my own nodes, I can also see a few transient hiccups logged today.

Overall, it shouldn't be a problem as the agent re-checks for updates after few minutes. You can check the current agent status with systemctl status zincati.service | grep Status, or you can record its metrics on an ongoing basis.

@lucab
Copy link
Contributor

lucab commented Dec 1, 2021

Actually, the comment above is valid only for your stable machines.

The testing-devel stream does not support auto-updates, thus it can't really work. The default Zincati configuration in that case does even disable the update logic. Did you manually override that?

@gongx
Copy link
Author

gongx commented Dec 1, 2021

Yes, I manually enable it on testing-devel coreos node. But thank you for confirming that the testing-devel stream does not support auto-updates

@lucab
Copy link
Contributor

lucab commented Dec 1, 2021

For reference, these are all the FCOS update streams: https://docs.fedoraproject.org/en-US/fedora-coreos/update-streams/

@gongx
Copy link
Author

gongx commented Dec 1, 2021

Thank you, @lucab. I check the metrics on the coreOS stable node.
zincati_cincinnati_update_checks_errors_total{kind="client_failed_request"} 5 zincati_cincinnati_update_checks_total 204
Looks like that there are some sporadic errors.

And for the node which is using testing-devel version coreOS,
zincati_cincinnati_update_checks_errors_total{kind="generic_http_500"} 2074 zincati_cincinnati_update_checks_total 2074

because testing-devel stream does not support auto-updates, failure is expected?

@lucab
Copy link
Contributor

lucab commented Dec 7, 2021

because testing-devel stream does not support auto-updates, failure is expected?

Yes, as you can see from the 100% of failures.
By comparison, your other node hit a ~2% of temporarily failed requests, which is a reasonable SLI.

I'm going ahead and closing this ticket. There are some improvements that could be done on the backed to report back to the clients that an invalid arch+stream combination have been requested; coreos/fedora-coreos-cincinnati#64 tracks that.

@lucab lucab closed this as completed Dec 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants