Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CA-400560: Support tilde in RPM version/release comparison #6092

Merged

Conversation

changlei-li
Copy link
Contributor

@changlei-li changlei-li commented Oct 30, 2024

  1. Fix version segment division error.
  2. Support tilde in RPM version/release comparison.

*)
match (Astring.String.cut ~sep:"~" s1, Astring.String.cut ~sep:"~" s2) with
| None, None ->
compare_no_tilde_version_strings s1 s2
Copy link
Member

@minglumlu minglumlu Oct 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A general approach of parsing is preferred than the specific functions.

For example of 1.2.3~rc1~beta "<" 1.2.3~rc1, the segment lists would be:
[Int 1; Int 2; Int 3; Tilde; Str "rc"; Int 1; Tilde; Str "beta"] and
[Int 1; Int 2; Int 3; Tilde; Str "rc"; Int 1].

It would be easy to update the pattern matching in compare_segments.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Accept

Copy link
Contributor

@lindig lindig Oct 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I support the proposal to parse a string into tagged components and to define order over these lists. The reason is that order defined over numbers and strings is different; so it matters if a sequence of characters is treated as a number or a string; this extends to special cases like tilde.

:ring3 $ echo "1 10 2" | tr ' ' \\012 | sort -n
1
2
10
:ring3 $ echo "1 10 2" | tr ' ' \\012 | sort 
1
10
2

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the comment, I split the version string to segment of Int, Str and Tilde and fix some errors. I split the commits to fix segment division and support tilde. Because there is a big difference from the initial version, I push -f directly for not affecting the review experience.

Copy link
Contributor

@lindig lindig Oct 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if the tilde could be considered just a character in a Str and you would get the desired order.

:ring3 $ echo "ab abc ~abc" | tr ' ' \\012 | sort 
ab
~abc
abc

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lindig, in your example, bash command sort thinks ~abc < abc because the locale settings. For example in the en_US.UTF-8 locale, the tilde character is often treated as less than alphanumeric characters.
In man sort

*** WARNING *** The locale specified by the environment affects sort order. Set LC_ALL=C to get the traditional sort order that uses native byte values.

And in OCaml, String.compare uses lexicographical order.

utop # List.sort String.compare ["ab"; "abc"; "~abc"];;
- : string list = ["ab"; "abc"; "~abc"]

Tilde is more complicated than just a character in a Str. It may be along with letters or numbers, and even used separately. You can find more examples in the ut.
Generally, tilde (and the part after it) in the version string stands for pre-release:

  1. Compare the formal version part first then compare the pre-release version part.
  2. Version with tilde is earlier than the same version without tilde.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It the end, a total order needs to be defined between two arbitrary versions. Hence, we can't rely on "same versions without a tilde" because this is just a special case. So I would like to see this total order being defined over lists of Str, Int and Tilde or similar.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course. It's just a general statement to help comprehension. The function compare_segments in the code define how to compare segment list.
Function compare_version_segment define how to compare single segment.

  type order = LT | EQ | GT

  type version_segment = Int of int | Str of string | Tilde

  let order_of_int = function 0 -> EQ | r when r > 0 -> GT | _ -> LT

  let rec compare_segments l1 l2 =
    match (l1, l2) with
    | c1 :: t1, c2 :: t2 -> (
      match compare_version_segment c1 c2 with
      | EQ ->
          compare_segments t1 t2
      | r ->
          r
    )
    | Tilde :: _, [] ->
        LT
    | [], Tilde :: _ ->
        GT
    | _ :: _, [] ->
        GT
    | [], _ :: _ ->
        LT
    | [], [] ->
        EQ

  let compare_version_segment s1 s2 =
    match (s1, s2) with
    | Int i1, Int i2 ->
        Int.compare i1 i2 |> order_of_int
    | Str s1, Str s2 ->
        String.compare s1 s2 |> order_of_int
    | Tilde, Tilde ->
        EQ
    | Int _, Str _ ->
        GT
    | Str _, Int _ ->
        LT
    | Tilde, _ ->
        LT
    | _, Tilde ->
        GT

I think the code is clear to specify the order over segment lists.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code is clear; this needs to be tested against the examples to show it has the desired order.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The description is just like code :)
(1) Compare two segments:
For different kinds of segments:
Int > Str > Tilde
If they are the same kind, just compare the value (string or int)

(2) Compare two segment lists:
Compare segments in the two lists in order as (1).
If the order is determined (Not Eq) -> end.
If they are equal, compare the next segments from the lists recursively.
If one list is empty, then Str/Int > Empty > Tilde -> end

ocaml/xapi/rpm.ml Outdated Show resolved Hide resolved
@changlei-li changlei-li force-pushed the private/changleli/CA-400560 branch from d5bd6ed to def777e Compare October 30, 2024 11:43
For example "1.2.3a" will be divided to [Int 1; Int 2; Str 3a]
and "1.2.3" is divided to [Int 1; Int 2; Int 3]. It leads to
"1.2.3" > "1.2.3a" which is incorrect.
After fix, "1.2.3a" will be divided to [Int 1; Int 2; Int 3;
Str a], then we can get the right compare result.

Signed-off-by: Changlei Li <[email protected]>
@changlei-li changlei-li force-pushed the private/changleli/CA-400560 branch from def777e to ec44b29 Compare October 30, 2024 11:50
ocaml/xapi/rpm.ml Outdated Show resolved Hide resolved
GT

let split_version_string =
let r = Re.Posix.compile_pat {|[a-zA-Z]+|[0-9]+|~|} in
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have to detect when a version string contains illegal characters? For example, the above regex does not allow a underscore to be part of the string.

Copy link
Contributor Author

@changlei-li changlei-li Nov 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All characters expect for [alphabetic]+, [digit]+ and ~ are acted as delimiter and omitted in the results:

1.2#3_45abc -> ["1"; "2"; "3"; "45"; "abc"]
1.2.3α45 -> ["1"; "2"; "3"; "45"]
12汉字45 -> ["12"; "45"]

| "~" ->
Tilde
| s -> (
try Int (int_of_string s) with _ -> Str s
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Be aware that int_of_string "0xabc" does not throw an exception as you might expect.,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. No need to test the string is all digits when convert string to Int as the original implementation, because the split method has promised the string is all digits or all alphabetic. There is no situation like "0b11" may mislead int_of_string. So just use int_of_string exception to distinguish Str and Int.
Due to the function has precondition that the input is from the split result, it is not a complete function to convert string to version_segment. So I just put it in the normalize function scope.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An alternative would be to use String.sscanf "%d". But you are correct that the problem does not arise because the split makes sure this is only digtits or only letters.

@changlei-li
Copy link
Contributor Author

changlei-li commented Nov 1, 2024

Hi maintainer, I need clean up my patches before merge.
=> done

Tilde `~` used in RPM version stands for pre-release. So version
with `~` is earlier than the same version without `~`.
For example:
1.2.3~beta < 1.2.3
1.xs8 > 1.xs8~2_1

Signed-off-by: Changlei Li <[email protected]>
@changlei-li changlei-li force-pushed the private/changleli/CA-400560 branch from b138c57 to a6187d5 Compare November 4, 2024 01:19
@minglumlu minglumlu added this pull request to the merge queue Nov 5, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Nov 5, 2024
@minglumlu minglumlu added this pull request to the merge queue Nov 5, 2024
@minglumlu minglumlu removed this pull request from the merge queue due to a manual request Nov 5, 2024
@minglumlu minglumlu added this pull request to the merge queue Nov 5, 2024
Merged via the queue into xapi-project:master with commit 8e89449 Nov 5, 2024
15 checks passed
@changlei-li changlei-li deleted the private/changleli/CA-400560 branch November 7, 2024 01:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants