A natural-sort comparator for strings
in Clojure/Script. Treats embedded digits as integers, so strings like ["v12" "v2"]
will receive a natural sort, as opposed to a lexical sort:
(def v ["v12" "v2"])
(sort v)
;;=> ("v12" "v2") ;; lexical sort
(sort natural-compare v)
;;=> ("v2" "v12") ;; natural sort
deps.edn
wevre/natural-compare {:mvn/version "0.0.10"}
project.clj
[wevre/natural-compare "0.0.10"]
Or, and this might be the easiest, just copy the body of
impl/natural_compare_parse.cljc
directly into your project.
(require '[wevre.natural-compare :refer [natural-compare])
(def ss ["t3" "t1" "t10" "t12" "t2" "t27"])
(sort natural-compare ss)
;;=> ("t1" "t2" "t3" "t10" "t12" "t27")
(sort (comp - natural-compare) ss)
;;=> ("t27" "t12" "t10" "t3" "t2" "t1")
natural-compare
operates on strings. Each is split into a sequence of
alternating text and integer elements (always starting with a text element) and
then the two sequences are compared element by element. The elements are padded
as necessary with values that always sort lower than 'legit' values, to ensure
shorter strings sort first.
See test cases for more examples.
Adapted from a gist (and subsequent comments) by Wilker Lúcio -- Alphabetical/Natural sorting in Clojure/Clojurescript
Reading those comments, you'll see there is some discussion about limitations (overflow) and performance. This library actually contains a few different implementations taken from that discussion. Based on my rudimentary benchmarks so far, I have drawn these conclusions:
-
For shorter strings, the implementations based on parsing integers are fastest, but they have the potential limitation to overflow on (string representations of) large integer inputs.
-
The "lazy" implementation outperforms the others on large strings where it can stop early without needing to split and parse the entire string.
-
It depends on your use case, but I feel that most situations where a natural sort order is desired are about small strings (like filenames or labels), so the concerns about overflow or laziness are (in my experience) not as important.
-
The default implemenation used in this library ("parse") is based on parsing integers and will overflow. If you need something different for your use case, check out one of the other implementations.
Something I would consider useful, but haven't needed yet in my own projects: properly sort strings that are date names, like "MON", "TUE", …; "JAN", "FEB", ….
Copyright © Mike Weaver
Licensed under the terms of Ecplise Public License 2.0, see license.txt.