Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

updates/strategy: local timezone aware reboot windows #301

Closed
lucab opened this issue Jun 4, 2020 · 20 comments · Fixed by #524
Closed

updates/strategy: local timezone aware reboot windows #301

lucab opened this issue Jun 4, 2020 · 20 comments · Fixed by #524
Assignees

Comments

@lucab
Copy link
Contributor

lucab commented Jun 4, 2020

Zincati 0.0.11 gained a new periodic strategy for scheduling reboot windows on a weekly calendar.

Its design decisions are noted here and the configuration options here.

There are three main points that pinned the periodic strategy to use UTC times and dates:

  • avoiding ambiguities for human observers
  • avoiding duplicating/skipping a reboot window on Daylight Saving Time (DST) bumps
  • guaranteeing the length of the reboot windows even when DST/TZ change

However this comes with a few usability drawbacks:

  • people may not be used to handle/reason about UTC times (especially in places far from GMT)
  • some users may be more interested about guaranteeing local-time invariants, accepting occasional duplication/skips on time changes

One idea that was floated is to add a new strategy which uses systemd time labels as documented in https://www.freedesktop.org/software/systemd/man/systemd.time.html.
That format however is meant for edge-triggers (i.e. precise points in time) instead of level-triggers (i.e. time windows). It also results in possibly weird windows definitions like yesterday and weekly.

Another idea is to add low-level strategy (e.g. the filesystem-based one at #245) so that high-level components could plug into it. In this case a systemd timer+service unit would handle the filesystem entry based on local time.

@jlebon
Copy link
Member

jlebon commented Jun 8, 2020

Heh, was going to file a similar issue. I think for the "single pet FCOS server" case, it's really annoying to have to define things in UTC time. And it also means that the time window shifts wrt your local time when going from e.g. EST to EDT or vice versa.

@bh7cw bh7cw added the jira for syncing to jira label Sep 1, 2020
@bh7cw bh7cw self-assigned this Sep 1, 2020
@lucab
Copy link
Contributor Author

lucab commented Sep 9, 2020

On this topic, I still think it's worthy exploring a path where a timer unit does most of the time-related work (i.e. expose a flexible configuration), and Zincati only checks its status in order to get a go/no-go signal.

@bh7cw bh7cw removed their assignment Oct 26, 2020
@lucab
Copy link
Contributor Author

lucab commented Mar 1, 2021

Expanding a bit more on the timer units, from an high level perspective the optimal UX would be something like:

  • via Ignition, the user defines one or more systemd timer units covering all the allowed reboot timespans (e.g. wed-night.timer and sunday-morning.timer).
  • via Ignition, the user defines a Zincati fragment using this new timer-based strategy and referencing the above units, like strategy.nametbd.timers = ["wed-night", "sunday-morning"].
  • when trying to finalize, Zincati polls the state of all the specified timers, and it's allowed to finalize/reboot if any of those is on.

This however may require investigating/implementing that systemd timer units can be used:

  1. as level-triggers (i.e. being on for a whole timespan, not continuously going off->on->off->on->off)
  2. as freestanding units, decoupled from any service unit (i.e. we don't want to start/stop any service, we are interested in the state of the timer unit on its own)

@jlebon
Copy link
Member

jlebon commented Mar 1, 2021

From a UX perspective at least IMO it'd be more coherent to work this on top of Zincati's existing periodic strategy which already knows about calendars and times. E.g. we could just add a timezone key to override the UTC timezone.

@kelvinfan001
Copy link
Member

kelvinfan001 commented Mar 1, 2021

IIUC, if we are able to use systemd timer units (particularly useful is systemd.time's "Calendar Events"), this new to-be-named "timer strategy" should be able to do everything that the current periodic strategy can do, in addition to timezone support and much more flexibility. This is mainly so that we can outsource all the timezone-related issues and corner cases (e.g. guaranteeing local-time invariants, accepting occasional duplication/skips on time changes) to systemd.

@kelvinfan001
Copy link
Member

kelvinfan001 commented Mar 2, 2021

I believe systemd timer units can only be "edge-triggered". The Calendar Events' range(..) syntax is essentially shorthand for multiple discrete values, where the smallest division of time we can get is a second. For example, *-*-* 05..23:*:* EST would mean the timer would go off at every second of every minute of every day between 5am and 11pm EST. Thus, having users create their own timer unit using Calendar Events syntax would seem a bit unintuitive to me, since when users specify a strategy, they probably worry about a range of continuous values, rather than discrete values. Also not sure if there's a good way to sample the "state" of edge-triggered timers.
Additionally, it looks like every timer unit must have an associated service unit.

I think what we really want out of the timer units is the Calendar Events specification format. Perhaps we could sort of create continuous ranges out of discrete values by saying that the entire second after any Calendar Event value is considered "OK to reboot". E.g. *-*-* 05..23:*:* EST would mean that we allow reboots during every second of every minute of every day between 5am and 11pm EST. We can use systemd analyze calendar to determine when the next event is and check whether that event is within one second of our current time (a bit of shifting might be required here). This way, we don't need users to even bring their own timer unit, we're just making use of the Calendar Events format.

@kelvinfan001
Copy link
Member

kelvinfan001 commented Apr 9, 2021

One simple, non-intrusive solution would be to include a time zone field (that defaults to UTC if not specified) in the periodic strategy's configuration. Using the chrono_tz crate, whenever we need to check whether we're currently in an allowed reboot window, we convert the current time to a DateTime with the configured time zone, and plug this DateTime into the existing contains_datetime() function.
Some drawbacks:

  • It allows for longer and shorter weekly windows during daylight savings changes
  • Corollary of the above, reboot windows may be completely skipped
  • Time remaining till next window is no longer accurate (the method we currently use to calculate time remaining would not work, and if we want to give accurate time remaining, we might need to be very careful and deal with many edge cases).
  • chrono_tz does not update itself automatically with data from the IANA database; each release of the chrono_tz crate is tied to a specific release of the IANA database. So Zincati wouldn't automatically pick up on policy changes. Whenever there are policy changes, we would need to wait for chrono_tz to build itself with those changes, and do a new release of Zincati, as well.

I believe the first two points are inevitable, but I think we can potentially take action on the latter two points.

/cc @lucab @jlebon @travier for thoughts.

@kelvinfan001
Copy link
Member

kelvinfan001 commented Apr 9, 2021

Possible solutions to the fourth point (alternatives to using chrono_tz):

  1. Parse /usr/share/zoneinfo ourselves to get the current time in the time zone configured by the user.
  2. Like 1., but shell out to zdump.
  3. Use the tzfile crate. It is like chrono_tz, except it reads from the local /usr/share/zoneinfo.
  4. Do not support configuring specific time zones; instead, have a single knob that tells the periodic strategy to either use local system time or UTC time.

@jlebon
Copy link
Member

jlebon commented Apr 9, 2021

Do not support configuring specific time zones; instead, have a single knob that tells the periodic strategy to either use local system time or UTC time.

That sounds like a nice compromise to me! If you actually care about timezones, then you should probably use UTC anyway.

@travier
Copy link
Member

travier commented Apr 12, 2021

According to me, I'm not sure that's the best approach as it would suggest doing the wrong thing, which is using a non UTC timezone for the whole system.

The use case I think we should focus on is users keeping UTC for the system (which is the only correct option) and making it easier for them to specify the reboot windows without having to do the time conversion manually or taking care of DST shifts.

For example, I'm in Europe/Paris, and I want my server to reboot between 2 AM / 3AM my time, whatever the current DST is. I keep my server in UTC but I don't want to have my server reboot at different time during the year depending on which DST we are and this is currently not possible with the options nor would it be possible with a local timezone option.

@lucab
Copy link
Contributor Author

lucab commented Apr 12, 2021

I agree that encouraging more people to use non-UTC localtime in order to use this feature is a bit of an unfortunate drawback.

On a different topic, it seems that there is an implicit consensus that a single modifier for the whole strategy is fine, i.e. that we are NOT interested in being able to define multiple reboot windows according to different timezones (for example one in UTC and another one in Paris time).
It sounds reasonable to decide NOT to support this feature (and explicitly state so in the devdocs).
Am I reading the room correctly on this?

@kelvinfan001
Copy link
Member

On a different topic, it seems that there is an implicit consensus that a single modifier for the whole strategy is fine.

I agree that it doesn't seem like a very common use case to be able to define multiple reboot windows according to different time zones.

@jlebon
Copy link
Member

jlebon commented Apr 12, 2021

According to me, I'm not sure that's the best approach as it would suggest doing the wrong thing, which is using a non UTC timezone for the whole system.

The use case I think we should focus on is users keeping UTC for the system (which is the only correct option) and making it easier for them to specify the reboot windows without having to do the time conversion manually or taking care of DST shifts.

For example, I'm in Europe/Paris, and I want my server to reboot between 2 AM / 3AM my time, whatever the current DST is. I keep my server in UTC but I don't want to have my server reboot at different time during the year depending on which DST we are and this is currently not possible with the options nor would it be possible with a local timezone option.

OK, I might've misunderstood, because that's exactly what I want too. :)

@jlebon
Copy link
Member

jlebon commented Apr 12, 2021

3. Use  the [`tzfile`](https://crates.io/crates/tzfile) crate. It is like `chrono_tz`, except it reads from the local `/usr/share/zoneinfo`.

4. Do not support configuring specific time zones; instead, have a single knob that tells the `periodic` strategy to either use local system time or UTC time.

I think I'm proposing a hybrid of those two, where it uses tzfile so it's in sync with the system database, but the knob itself allows you to just say "use the system configured timezone" instead of having to duplicate the /etc/localtime setting.

@kelvinfan001
Copy link
Member

According to me, I'm not sure that's the best approach as it would suggest doing the wrong thing, which is using a non UTC timezone for the whole system.

Ah, I wasn't aware that servers should only use UTC as their system time (but now that I think about it, it makes total sense). In this case, it looks like we have no way of getting around reading from /usr/share/zoneinfo somehow.

@travier
Copy link
Member

travier commented Apr 13, 2021

What I have in mind for the UX:

[updates]
strategy = "periodic"
timezone = "Europe/Paris"

[[updates.periodic.window]]
days = [ "Sat", "Sun" ]
start_time = "23:30"
length_minutes = 60

What I have in mind for Zincati:

  1. Read the corresponding TZ info from the local /usr/share/zoneinfo.
  2. Compute the UTC time for those window (this is the tricky part if the window is during a DST change). I think we should favor starting no sooner than the configured time (earlier is not OK, later is fine) and keeping the length as much as possible.
  3. Keep the computed UTC time for scheduling.

It sounds reasonable to decide NOT to support this feature (and explicitly state so in the devdocs).
Am I reading the room correctly on this?

Agree. I don't see a use case for that.

but the knob itself allows you to just say "use the system configured timezone" instead of having to duplicate the /etc/localtime setting.

I don't understand how you could do that without changing the system timezone.

@kelvinfan001
Copy link
Member

kelvinfan001 commented Apr 13, 2021

@travier Thanks for the suggestions! I agree with the UX. Currently, I've also added a localtime option that looks at /etc/localtime.
However, in my opinion, I think computing the UTC time into the future for configured windows will require significant change/additions in the current code base.

Gain:

  • By computing the UTC time for configured windows into the future, we would indeed be able to choose to keep the scheduled length unchanged, but then we have reboot windows spanning clock/naive times that aren't usually allowed to reboot.

Pain:

  • The way Zincati's Periodic strategy currently stores reboot windows is by keeping a WeeklyCalendar data type. The WeeklyCalendar does not change and is not aware of actual dates, it is only aware of week days and time (I see it as simply a machine-friendly view into the reboot window snippets in the .toml files). For UTC, this data type works very well. If we wish to precompute the UTC time for windows, we must include information about the date. This means we cannot use WeeklyCalendar (which makes up the majority of the periodic strategy's logic) or we'll need to rewrite/augment it quite significantly.
  • If there is a policy change, i.e. change in the IANA database, we will need to recalculate these UTC windows, so Zincati will need some logic to detect database changes. Edit: database updates come from rpm-ostree commits, so there'll always be a reboot where Zincati re-loads all the configuration anyway. This point isn't a concern.

I think the above "pains" aren't too bad, but they make the solution a bit less elegant; overall, it seems to me that the cost is not worth the gain. But this could be because I am totally overlooking the value of having invariant lengths over invariant time.

@kelvinfan001 kelvinfan001 self-assigned this Apr 13, 2021
@kelvinfan001
Copy link
Member

After some offline discussion with @lucab, we agreed that because having invariant naive time seems more intuitive/natural over having invariant length, we will pursue the invariant naive time route for now. But if there are factors in favour of having invariant length that we overlooked, please let me know!

@kelvinfan001
Copy link
Member

kelvinfan001 commented Apr 21, 2021

Couple notes regarding keeping invariant length:

There are a couple points that can make this approach significantly more complex than keeping invariant clock time (naive time). Mainly the trickiness and edge cases stem from having to make some arbitrary decisions around mapping clock time to UTC time. Essentially, we cannot easily compute the UTC time for a given clock time in a time zone.

For example, say a reboot window configuration is 2:30 AM for 60 minutes and the time zone is US/Eastern. There is no problem on most days of the year, except for "Spring Forward Day" and "Fall Back Day". On "Spring Forward Day", 2:30 AM does not exist, so we'll need to make an arbitrary decision like changing the reboot window to 3:00 AM for 60 minutes. A similar situation exists for "Fall Back Day"; we will need to make the arbitrary decision to use allow a 60 minute reboot window starting at the second occurrence of 2:30 AM (like @travier suggested).
A plausible rule for DST changes could be always changing a start time to the nearest existing time in the future if the start time does not exist, and always using the final occurrence of a time as the start time if the time occurs multiple times in a day. On the other hand, with the case of cron, from the manpage, "If time was adjusted one hour forward, those jobs that would have run in the interval that has been skipped will be run immediately. Conversely, if time was adjusted backward, running the same job twice is avoided.", which is essentially favouring an earlier start time when facing ambiguity.
In addition to having to make such arbitrary decisions, there are probably additional edge cases where it may be difficult to map a given clock time to a unique UTC datetime, especially with policy changes where entire days can be skipped.
Nevertheless, if we make some arbitrary decisions as described above, it is probably possible to map every reboot window start time to a UTC time.

However, keeping track of this will not be as simple as having something like a WeeklyCalendar struct (as we use right now for UTC), as reboot windows' clock time in UTC vary throughout the year depending on e.g. whether the current date is in "daylight time" or "standard time", in addition to the "special mappings" needed to handle the special days like "Fall Back Day" and "Spring Forward Day". One approach could be to store allowed reboot windows in UTC time, along with their date, and calculate all allowed reboot windows n amount of time into the future at initialization. This feels sub-optimal, though, as we're going to store this relatively large amount of information somewhere and do a non insignificant amount of computing to get the reboot windows. Alternatively, we can compute whether we are in a reboot window only when needed (i.e. when Zincati calls can_finalize() on the strategy). In this case, we only need to compute "nearby" reboot windows and see if we are in one. We will need to figure out what "nearby" means and how far back in time we need to look for start times and compute reboot windows. With time zone policy changes, where entire days can be skipped, it can be seen that a range of 7 days (one week) before is not guaranteed to be enough. So it seems like keeping track of when reboot windows are allowed if we guarantee a certain invariant length could get quite complex; but there could be smarter implementations that could be simpler that I hadn't thought of at the moment.

To me, it is simpler and more intuitive for the reboot windows to always obey the clock time for a time zone, and sacrifice guaranteeing an invariant reboot window length. Additionally, if I understood correctly, systemd timer units also work in a similar way, where jobs can be skipped if the specified CalendarEvent doesn't exist.

P.S. just a thought, but we can potentially just say we don't support windows with start times at "special" times (this would be between 2:00 AM to 3:00 AM for US/Eastern), so we don't need to deal with having to make arbitrary decisions. But this makes the window configurations very coupled with the specified time zone and also very vulnerable to policy changes.

@travier
Copy link
Member

travier commented Oct 28, 2021

https://blog.healthchecks.io/2021/10/how-debian-cron-handles-dst-transitions/ > For reference.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants