Skip to content

Make uniqued() lazy by default #71

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Merged
merged 3 commits into from
Apr 7, 2021
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 14 additions & 7 deletions Guides/Unique.md
Original file line number Diff line number Diff line change
@@ -5,7 +5,7 @@

Methods to strip repeated elements from a sequence or collection.

The `uniqued()` method returns an array, dropping duplicate elements
The `uniqued()` method returns a sequence, dropping duplicate elements
from a sequence. The `uniqued(on:)` method does the same, using
the result of the given closure to determine the "uniqueness" of each
element.
@@ -14,29 +14,36 @@ element.
let numbers = [1, 2, 3, 3, 2, 3, 3, 2, 2, 2, 1]

let unique = numbers.uniqued()
// unique == [1, 2, 3]
// Array(unique) == [1, 2, 3]
```

## Detailed Design

Both methods are available for sequences, with the simplest limited to
when the element type conforms to `Hashable`. Both methods preserve
the relative order of the elements.
the relative order of the elements. `uniqued(on:)` has a matching lazy
version that is added to `LazySequenceProtocol`.

```swift
extension Sequence where Element: Hashable {
func uniqued() -> [Element]
func uniqued() -> Uniqued<Self, Element>
}

extension Sequence {
func uniqued<T>(on: (Element) throws -> T) rethrows -> [Element]
where T: Hashable
func uniqued<Subject>(on projection: (Element) throws -> Subject) rethrows -> [Element]
where Subject: Hashable
}

extension LazySequenceProtocol {
func uniqued<Subject>(on projection: @escaping (Element) -> Subject) -> Uniqued<Self, Subject>
where Subject: Hashable
}
```

### Complexity

The `uniqued` methods are O(_n_) in both time and space complexity.
The eager `uniqued(on:)` method is O(_n_) in both time and space complexity.
The lazy versions are O(_1_).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's a very salient consideration mentioned by @kylemacomber that most uses seem to expect an Array, in which case a .lazy.uniqued design would make more sense.

Additionally, this discrepancy here, where uniqued(on:) isn't lazy by default but uniqued would be, seems like it invites confusion. I certainly would not expect that supplying a custom predicate would change the complexity or behavior so fundamentally, and I can see that issue arising when people make changes to their code and encounter this difference. Therefore, if it makes sense to make uniqued lazy, then I think the same change should be applied to uniqued(on:).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do people expect uniqued() to return an array, or do implementations usually simply return an array because it's easier to implement that way? I genuinely don't know the answer to this.

I share your concern about the discrepancy between uniqued() and uniqued(on:), but I do think that viewing them in isolation, this change is consistent with other sequence operations in the standard library. Operations that can be lazy without having to compromise (except perhaps on the return type) typically are, even when called on a sequence not conforming to LazySequenceProtocol. Collection's joined() and reversed() fall into this category, and I argue that uniqued() does as well. uniqued(on:) does not, because making it lazy would require the closure to be escaping and non-throwing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do any other algorithms in this library or the standard library differ in laziness depending on the presence of a custom predicate?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the pair of operations that comes closest is joined() being lazy and flatMap { $0 } being eager while effectively doing the same thing. Of course they don't have similar names like uniqued() and uniqued(on:) do.

This is a fairly unique situation because other potential pairs lack one of the variants. chunked(by: ==) and compactMap { $0 } have no corresponding closure-less version, and operations like zip, reversed, and combinations have no versions that do take a closure.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ya I think the joined()/flatMap { $0 } and compacted()/compactMap { $0 } (see #112) serve as good precedent of similar of lazy/eager pairs.

To try to articulate the philosophy:

  1. Anything that can efficiently be lazy should be lazy, because (i) it can avoid extra work (e.g. unnecessary computation or allocation) and (ii) it's easy to go from lazy to eager by constructing an Array, but it's impossible to go the other way.
  2. Require an explicit .lazy for algorithms that take a closure to emphasize that (i) the closure will not run immediately and (ii) may run more than once per element in the collection, which can introduce a surprising vector for error.
  3. An algorithm shouldn't be lazy if its being lazy would pessimize its runtime complexity. For example, a lazy reversed adapter over a plain Collection would be absurd because looping over all of its indices would be O(n²).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that we have OrderedSet via the Swift Collections packages, I think it's even clearer to me that uniqued should be lazy... the ability to do partial computation is really the only thing (other than method call syntax) distinguishing this algorithm from just creating an OrderedSet

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that we have OrderedSet via the Swift Collections packages, I think it's even clearer to me that uniqued should be lazy... the ability to do partial computation is really the only thing (other than method call syntax) distinguishing this algorithm from just creating an OrderedSet

+1


### Comparison with other languages

89 changes: 82 additions & 7 deletions Sources/Algorithms/Unique.swift
Original file line number Diff line number Diff line change
@@ -9,25 +9,82 @@
//
//===----------------------------------------------------------------------===//

/// A sequence wrapper that leaves out duplicate elements of a base sequence.
public struct Uniqued<Base: Sequence, Subject: Hashable> {
/// The base collection.
@usableFromInline
internal let base: Base

/// The projection function.
@usableFromInline
internal let projection: (Base.Element) -> Subject

@usableFromInline
internal init(base: Base, projection: @escaping (Base.Element) -> Subject) {
self.base = base
self.projection = projection
}
}

extension Uniqued: Sequence {
/// The iterator for a `Uniqued` sequence.
public struct Iterator: IteratorProtocol {
@usableFromInline
internal var base: Base.Iterator

@usableFromInline
internal let projection: (Base.Element) -> Subject

@usableFromInline
internal var seen: Set<Subject> = []

@usableFromInline
internal init(
base: Base.Iterator,
projection: @escaping (Base.Element) -> Subject
) {
self.base = base
self.projection = projection
}

@inlinable
public mutating func next() -> Base.Element? {
while let element = base.next() {
if seen.insert(projection(element)).inserted {
return element
}
}
return nil
}
}

@inlinable
public func makeIterator() -> Iterator {
Iterator(base: base.makeIterator(), projection: projection)
}
}

extension Uniqued: LazySequenceProtocol where Base: LazySequenceProtocol {}

//===----------------------------------------------------------------------===//
// uniqued()
//===----------------------------------------------------------------------===//

extension Sequence where Element: Hashable {
/// Returns an array with only the unique elements of this sequence, in the
/// Returns a sequence with only the unique elements of this sequence, in the
/// order of the first occurrence of each unique element.
///
/// let animals = ["dog", "pig", "cat", "ox", "dog", "cat"]
/// let uniqued = animals.uniqued()
/// print(uniqued)
/// print(Array(uniqued))
/// // Prints '["dog", "pig", "cat", "ox"]'
///
/// - Returns: An array with only the unique elements of this sequence.
/// - Returns: A sequence with only the unique elements of this sequence.
/// .
/// - Complexity: O(*n*), where *n* is the length of the sequence.
/// - Complexity: O(1).
@inlinable
public func uniqued() -> [Element] {
uniqued(on: { $0 })
public func uniqued() -> Uniqued<Self, Element> {
Uniqued(base: self, projection: { $0 })
}
}

@@ -40,7 +97,7 @@ extension Sequence {
/// first characters:
///
/// let animals = ["dog", "pig", "cat", "ox", "cow", "owl"]
/// let uniqued = animals.uniqued(on: {$0.first})
/// let uniqued = animals.uniqued(on: { $0.first })
/// print(uniqued)
/// // Prints '["dog", "pig", "cat", "ox"]'
///
@@ -67,3 +124,21 @@ extension Sequence {
return result
}
}

//===----------------------------------------------------------------------===//
// lazy.uniqued()
//===----------------------------------------------------------------------===//

extension LazySequenceProtocol {
/// Returns a lazy sequence with the unique elements of this sequence (as
/// determined by the given projection), in the order of the first occurrence
/// of each unique element.
///
/// - Complexity: O(1).
@inlinable
public func uniqued<Subject: Hashable>(
on projection: @escaping (Element) -> Subject
) -> Uniqued<Self, Subject> {
Uniqued(base: self, projection: projection)
}
}
23 changes: 21 additions & 2 deletions Tests/SwiftAlgorithmsTests/UniqueTests.swift
Original file line number Diff line number Diff line change
@@ -17,10 +17,13 @@ final class UniqueTests: XCTestCase {
let a = repeatElement(1...10, count: 15).joined().shuffled()
let b = a.uniqued()
XCTAssertEqual(b.sorted(), Set(a).sorted())
XCTAssertEqual(10, b.count)
XCTAssertEqual(10, Array(b).count)

let c: [Int] = []
XCTAssertEqual(c.uniqued(), [])
XCTAssertEqualSequences(c.uniqued(), [])

let d = Array(repeating: 1, count: 10)
XCTAssertEqualSequences(d.uniqued(), [1])
}

func testUniqueOn() {
@@ -30,5 +33,21 @@ final class UniqueTests: XCTestCase {

let c: [Int] = []
XCTAssertEqual(c.uniqued(on: { $0.bitWidth }), [])

let d = Array(repeating: "Andromeda", count: 10)
XCTAssertEqualSequences(d.uniqued(on: { $0.first }), ["Andromeda"])
}

func testLazyUniqueOn() {
let a = ["Albemarle", "Abeforth", "Astrology", "Brandywine", "Beatrice", "Axiom"]
let b = a.lazy.uniqued(on: { $0.first })
XCTAssertEqualSequences(b, ["Albemarle", "Brandywine"])
XCTAssertLazySequence(b)

let c: [Int] = []
XCTAssertEqualSequences(c.lazy.uniqued(on: { $0.bitWidth }), [])

let d = Array(repeating: "Andromeda", count: 10)
XCTAssertEqualSequences(d.lazy.uniqued(on: { $0.first }), ["Andromeda"])
}
}