Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is-Sorted Detection and Binary Search #34

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 15 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,21 @@ package updates, you can specify your package dependency using

## [Unreleased]

*No changes yet.*
### Additions

- The `sortedEndIndex(by:)` and `sortedEndIndex()` methods check when a
collection stops being sorted. The `rampedEndIndex(by:)` and
`rampedEndIndex()` methods are variants that check for strict increases in
rank, instead of non-decreases. The `firstVariance(by:)` and
`firstVariance()` are variants in the other direction, checking for a run
with no changes in value.
- The `sortedRange(for: by:)` and `sortedRange(for:)` methods perform a binary
search for a value within an already-sorted collection. To optimize time in
some circumstances, an isolated phase of the binary-search procedure can be
done via the `someSortedPosition(of: by:)`, `lowerSortedBound(around: by:)`,
and `upperSortedBound(around: by:)` methods, each of which has a
defaulted-comparison overload (`someSortedPosition(of:)`,
`lowerSortedBound(around:)`, and `upperSortedBound(around:)`).

---

Expand Down
67 changes: 67 additions & 0 deletions Guides/BinarySearch.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
# Binary Search

[[Source](../Sources/Algorithms/BinarySearch.swift) |
[Tests](../Tests/SwiftAlgorithmsTests/BinarySearchTests.swift)]

Methods that locate a given value within a collection, narrowing the location by half in each round. The collection already has to be sorted along the given predicate, or simple non-decreasing order if the predicate is defaulted to the standard less-than operator.

As many data structures need to internally store their elements in order, the pre-sorted requirement usually isn't onerous.

(To-Do: put better explanation here.)

## Detailed Design

The core methods are declared as extensions to `Collection`. The versions that default comparison to the less-than operator are constrained to collections where the element type conforms to `Comparable`.

```swift
extension Collection {
func someSortedPosition(
of target: Element,
by areInIncreasingOrder: (Element, Element) throws -> Bool
) rethrows -> (index: Index, isMatch: Bool)

func lowerSortedBound(
around match: Index,
by areInIncreasingOrder: (Element, Element) throws -> Bool
) rethrows -> Index

func upperSortedBound(
around match: Index,
by areInIncreasingOrder: (Element, Element) throws -> Bool
) rethrows -> Index

func sortedRange(
for target: Element,
by areInIncreasingOrder: (Element, Element) throws -> Bool
) rethrows -> Range<Index>
}

extension Collection where Element: Comparable {
func someSortedPosition(of target: Element) -> (index: Index, isMatch: Bool)
func lowerSortedBound(around match: Index) -> Index
func upperSortedBound(around match: Index) -> Index
func sortedRange(for target: Element) -> Range<Index>
}
```

Generally, only `sortedRange(for:)`, or `sortedRange(for: by:)`, is needed to perform a binary search. These methods are wrappers to calls to the other three. Use those other methods if you need only one phase of the search process and you want to save time.

Note that while `sortedRange` and `someSortedPosition` work with a target value, and therefore may not be actually present in the collection, the `lowerSortedBound` and `upperSortedBound` methods work with a target index; that index must point to a known-good match, such as the first result from `someSortedPosition` (if the second result from that same call is `true`).

### Complexity

The search process narrows down the range in half each time, leading the search to work in O(log _n_) rounds, where _n_ is the length of the collection. When the collection supports O(1) traversal, _i.e._ random access, the search will then work in O(log _n_) operations. Search is permitted for collections with sub-random-access traversal, but this worsens the time for search to O(_n_).

### Comparison with other languages

**C++:** The `<algorithm>` library defines `binary_search` as an analog to `someSortedPosition`. The C++ function returns only an existence check; you cannot exploit the result, either success or failure, without calling a related method. Since the computation ends up with the location anyway, the Swift method bundles the existence check along with where the qualifying element was found. The returned index helps even during failure, as it's the best place to insert a matching element.

Of course, immediately using only the `isMatch` member from a call to `someSortedPosition` acts as a direct counterpart to `binary_search`.

Some implementations of `binary_search` may punt to `lower_bound`, but `someSortedPosition` stops at the first discovered match, without unnecessarily taking extra time searching for the border. The trade-off is that `someSortedPosition` needs to do up to two comparisons per round instead of one.

The same library defines `lower_bound` and `upper_bound` as analogs to `lowerSortedBound` and `upperSortedBound`. The C++ functions match `binary_search` in that they search for a target value, while the Swift methods take a known-good target index. This difference in the Swift methods is meant to segregate functionality.

The same C++ library defines `equal_range` as an analog to `sortedRange`.

(To-Do: Put other languages here.)
49 changes: 49 additions & 0 deletions Guides/SortedPrefix.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
# Sorted Prefix

[[Source](../Sources/Algorithms/SortedPrefix.swift) |
[Tests](../Tests/SwiftAlgorithmsTests/SortedPrefixTests.swift)]

Methods to measure how long a collection maintains being sorted, either along a given predicate or defaulting to the standard less-than operator, with variants for strictly-increasing and steady-state sequences.

(To-Do: put better explanation here.)

## Detailed Design

The core methods are declared as extensions to `Collection`. The versions that default comparison to the less-than operator are constrained to collections where the element type conforms to `Comparable`.

```swift
extension Collection {
func sortedEndIndex(
by areInIncreasingOrder: (Element, Element) throws -> Bool
) rethrows -> Index

func rampedEndIndex(
by areInIncreasingOrder: (Element, Element) throws -> Bool
) rethrows -> Index

func firstVariance(
by areEquivalent: (Element, Element) throws -> Bool
) rethrows -> Index
}

extension Collection where Element: Comparable {
func sortedEndIndex() -> Index
func rampedEndIndex() -> Index
}

extension Collection where Element: Equatable {
func firstVariance() -> Index
}
```

Checking if the entire collection is sorted (or strictly increasing, or steady-state) can be done by comparing the result of a showcased method to `endIndex`.

### Complexity

These methods have to walk their entire collection until a non-match is found, so they all work in O(_n_) operations, where _n_ is the length of the collection.

### Comparison with other languages

**C++:** The `<algorithm>` library defines `is_sorted` and `is_sorted_until`, the latter of which functions like `sortedEndPrefix`.

(To-Do: Put other languages here.)
10 changes: 10 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,16 @@ Read more about the package, and the intent behind it, in the [announcement on s
- [`randomStableSample(count:)`, `randomStableSample(count:using:)`](https://github.com/apple/swift-algorithms/blob/main/Guides/RandomSampling.md): Randomly selects a specific number of elements from a collection, preserving their original relative order.
- [`uniqued()`, `uniqued(on:)`](https://github.com/apple/swift-algorithms/blob/main/Guides/Unique.md): The unique elements of a collection, preserving their order.

#### Sorted-Collection operations

- [`sortedEndIndex(by:)`, `sortedEndIndex()`](./Guides/SortedPrefix.md): Reports when a collection stops being sorted.
- [`rampedEndIndex(by:)`, `rampedEndIndex()`](./Guides/SortedPrefix.md): Reports when a collection stops being strictly increasing.
- [`firstVariance(by:)`, `firstVariance()`](./Guides/SortedPrefix.md): Reports when a collection stops being at a constant value.
- [`someSortedPosition(of: by:)`, `someSortedPosition(of:)`](./Guides/BinarySearch.md): Locates if and where a target value is within a sorted collection.
- [`lowerSortedBound(around: by:)`, `lowerSortedBound(around:)`](./Guides/BinarySearch.md): Reports the lower bound for the equal-valued subsequence within a sorted collection that covers the targeted element.
- [`upperSortedBound(around: by:)`, `upperSortedBound(around:)`](./Guides/BinarySearch.md): Reports the upper bound for the equal-valued subsequence within a sorted collection that covers the targeted element.
- [`sortedRange(for: by:)`, `sortedRange(for:)`](./Guides/BinarySearch.md): Locates the subsequence within a sorted collection that contains all the elements matching a target value.

#### Other useful operations

- [`chunked(by:)`, `chunked(on:)`](https://github.com/apple/swift-algorithms/blob/main/Guides/Chunked.md): Eager and lazy operations that break a collection into chunks based on either a binary predicate or when the result of a projection changes.
Expand Down
Loading