Skip to content

Commit 328631d

Browse files
authored
Add grouped(by:) and keyed(by:) (#197)
* Add Grouped and Keyed * Add `Key` to `combine` closure * Split `keyed(by:)` into two overloads * Rename uniquingKeysWith → resolvingConflictsWith * Keep latest instead of throwing on collision
1 parent 0ebed14 commit 328631d

File tree

7 files changed

+372
-0
lines changed

7 files changed

+372
-0
lines changed

Guides/Grouped.md

+68
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,68 @@
1+
# Grouped
2+
3+
[[Source](https://github.com/apple/swift-algorithms/blob/main/Sources/Algorithms/Grouped.swift) |
4+
[Tests](https://github.com/apple/swift-algorithms/blob/main/Tests/SwiftAlgorithmsTests/GroupedTests.swift)]
5+
6+
Groups up elements of a sequence into a new Dictionary, whose values are Arrays of grouped elements, each keyed by the result of the given closure.
7+
8+
```swift
9+
let fruits = ["Apricot", "Banana", "Apple", "Cherry", "Avocado", "Coconut"]
10+
let fruitsByLetter = fruits.grouped(by: { $0.first! })
11+
// Results in:
12+
// [
13+
// "B": ["Banana"],
14+
// "A": ["Apricot", "Apple", "Avocado"],
15+
// "C": ["Cherry", "Coconut"],
16+
// ]
17+
```
18+
19+
If you wish to achieve a similar effect but for single values (instead of Arrays of grouped values), see [`keyed(by:)`](Keyed.md).
20+
21+
## Detailed Design
22+
23+
The `grouped(by:)` method is declared as a `Sequence` extension returning
24+
`[GroupKey: [Element]]`.
25+
26+
```swift
27+
extension Sequence {
28+
public func grouped<GroupKey>(
29+
by keyForValue: (Element) throws -> GroupKey
30+
) rethrows -> [GroupKey: [Element]]
31+
}
32+
```
33+
34+
### Complexity
35+
36+
Calling `grouped(by:)` is an O(_n_) operation.
37+
38+
### Comparison with other languages
39+
40+
| Language | Grouping API |
41+
|---------------|--------------|
42+
| Java | [`groupingBy`](https://docs.oracle.com/en/java/javase/20/docs/api/java.base/java/util/stream/Collectors.html#groupingBy(java.util.function.Function)) |
43+
| Kotlin | [`groupBy`](https://kotlinlang.org/api/latest/jvm/stdlib/kotlin.collections/group-by.html) |
44+
| C# | [`GroupBy`](https://learn.microsoft.com/en-us/dotnet/api/system.linq.enumerable.groupby?view=net-7.0#system-linq-enumerable-groupby) |
45+
| Rust | [`group_by`](https://doc.rust-lang.org/std/primitive.slice.html#method.group_by) |
46+
| Ruby | [`group_by`](https://ruby-doc.org/3.2.2/Enumerable.html#method-i-group_by) |
47+
| Python | [`groupby`](https://docs.python.org/3/library/itertools.html#itertools.groupby) |
48+
| PHP (Laravel) | [`groupBy`](https://laravel.com/docs/10.x/collections#method-groupby) |
49+
50+
#### Naming
51+
52+
All the surveyed languages name this operation with a variant of "grouped" or "grouping". The past tense `grouped(by:)` best fits [Swift's API Design Guidelines](https://www.swift.org/documentation/api-design-guidelines/).
53+
54+
#### Customization points
55+
56+
Java and C# are interesting in that they provide multiple overloads with several points of customization:
57+
58+
1. Changing the type of the groups.
59+
1. E.g. the groups can be Sets instead of Arrays.
60+
1. Akin to calling `.transformValues { group in Set(group) }` on the resultant dictionary, but avoiding the intermediate allocation of Arrays of each group.
61+
2. Picking which elements end up in the groupings.
62+
1. The default is the elements of the input sequence, but can be changed.
63+
2. Akin to calling `.transformValues { group in group.map(someTransform) }` on the resultant dictionary, but avoiding the intermediate allocation of Arrays of each group.
64+
3. Changing the type of the outermost collection.
65+
1. E.g using an `OrderedDictionary`, `SortedDictionary` or `TreeDictionary` instead of the default (hashed, unordered) `Dictionary`.
66+
2. There's no great way to achieve this with the `grouped(by:)`. One could wrap the resultant dictionary in an initializer to one of the other dictionary types, but that isn't sufficient: Once the `Dictionary` loses the ordering, there's no way to get it back when constructing one of the ordered dictionary variants.
67+
68+
It is not clear which of these points of customization are worth supporting, or what the best way to express them might be.

Guides/Keyed.md

+78
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,78 @@
1+
# Keyed
2+
3+
[[Source](https://github.com/apple/swift-algorithms/blob/main/Sources/Algorithms/Keyed.swift) |
4+
[Tests](https://github.com/apple/swift-algorithms/blob/main/Tests/SwiftAlgorithmsTests/KeyedTests.swift)]
5+
6+
Stores the elements of a sequence as the values of a Dictionary, keyed by the result of the given closure.
7+
8+
```swift
9+
let fruits = ["Apricot", "Banana", "Apple", "Cherry", "Blackberry", "Avocado", "Coconut"]
10+
let fruitByLetter = fruits.keyed(by: { $0.first! })
11+
// Results in:
12+
// [
13+
// "A": "Avocado",
14+
// "B": "Blackberry",
15+
// "C": "Coconut",
16+
// ]
17+
```
18+
19+
On a key-collision, the latest element is kept by default. Alternatively, you can provide a closure which specifies which value to keep:
20+
21+
```swift
22+
let fruits = ["Apricot", "Banana", "Apple", "Cherry", "Blackberry", "Avocado", "Coconut"]
23+
let fruitsByLetter = fruits.keyed(
24+
by: { $0.first! },
25+
resolvingConflictsWith: { key, old, new in old } // Always pick the first fruit
26+
)
27+
// Results in:
28+
// [
29+
// "A": "Apricot",
30+
// "B": "Banana",
31+
// "C": "Cherry",
32+
// ]
33+
```
34+
35+
## Detailed Design
36+
37+
The `keyed(by:)` and `keyed(by:resolvingConflictsWith:)` methods are declared in an `Sequence` extension, both returning `[Key: Element]`.
38+
39+
```swift
40+
extension Sequence {
41+
public func keyed<Key>(
42+
by keyForValue: (Element) throws -> Key
43+
) rethrows -> [Key: Element]
44+
45+
public func keyed<Key>(
46+
by keyForValue: (Element) throws -> Key,
47+
resolvingConflictsWith resolve: ((Key, Element, Element) throws -> Element)? = nil
48+
) rethrows -> [Key: Element]
49+
}
50+
```
51+
52+
### Complexity
53+
54+
Calling `keyed(by:)` is an O(_n_) operation.
55+
56+
### Comparison with other languages
57+
58+
| Language | "Keying" API |
59+
|---------------|-------------|
60+
| Java | [`toMap`](https://docs.oracle.com/en/java/javase/20/docs/api/java.base/java/util/stream/Collectors.html#toMap(java.util.function.Function,java.util.function.Function)) |
61+
| Kotlin | [`associatedBy`](https://kotlinlang.org/api/latest/jvm/stdlib/kotlin.collections/associate-by.html) |
62+
| C# | [`ToDictionary`](https://learn.microsoft.com/en-us/dotnet/api/system.linq.enumerable.todictionary?view=net-7.0#system-linq-enumerable-todictionary) |
63+
| Ruby (ActiveSupport) | [`index_by`](https://rubydoc.info/gems/activesupport/7.0.5/Enumerable#index_by-instance_method) |
64+
| PHP (Laravel) | [`keyBy`](https://laravel.com/docs/10.x/collections#method-keyby) |
65+
66+
#### Rejected alternative names
67+
68+
1. Java's `toMap` is referring to `Map`/`HashMap`, their naming for Dictionaries and other associative collections. It's easy to confuse with the transformation function, `Sequence.map(_:)`.
69+
2. C#'s `toXXX()` naming doesn't suite Swift well, which tends to prefer `Foo.init` over `toFoo()` methods.
70+
3. Ruby's `index_by` naming doesn't fit Swift well, where "index" is a specific term (e.g. the `associatedtype Index` on `Collection`). There is also a [`index(by:)`](Index.md) method in swift-algorithms, is specifically to do with matching elements up with their indices, and not any arbitrary derived value.
71+
72+
#### Alternative names
73+
74+
Kotlin's `associatedBy` naming is a good alterative, and matches the past tense of [Swift's API Design Guidelines](https://www.swift.org/documentation/api-design-guidelines/), though perhaps we'd spell it `associated(by:)`.
75+
76+
#### Customization points
77+
78+
Java and C# are interesting in that they provide overloads that let you customize the type of the outermost collection. E.g. using an `OrderedDictionary` instead of the default (hashed, unordered) `Dictionary`.

README.md

+2
Original file line numberDiff line numberDiff line change
@@ -45,8 +45,10 @@ Read more about the package, and the intent behind it, in the [announcement on s
4545
- [`adjacentPairs()`](https://github.com/apple/swift-algorithms/blob/main/Guides/AdjacentPairs.md): Lazily iterates over tuples of adjacent elements.
4646
- [`chunked(by:)`, `chunked(on:)`, `chunks(ofCount:)`](https://github.com/apple/swift-algorithms/blob/main/Guides/Chunked.md): Eager and lazy operations that break a collection into chunks based on either a binary predicate or when the result of a projection changes or chunks of a given count.
4747
- [`firstNonNil(_:)`](https://github.com/apple/swift-algorithms/blob/main/Guides/FirstNonNil.md): Returns the first non-`nil` result from transforming a sequence's elements.
48+
- [`grouped(by:)](https://github.com/apple/swift-algorithms/blob/main/Guides/Grouped.md): Group up elements using the given closure, returning a Dictionary of those groups, keyed by the results of the closure.
4849
- [`indexed()`](https://github.com/apple/swift-algorithms/blob/main/Guides/Indexed.md): Iterate over tuples of a collection's indices and elements.
4950
- [`interspersed(with:)`](https://github.com/apple/swift-algorithms/blob/main/Guides/Intersperse.md): Place a value between every two elements of a sequence.
51+
- [`keyed(by:)`, `keyed(by:resolvingConflictsBy:)`](https://github.com/apple/swift-algorithms/blob/main/Guides/Keyed.md): Returns a Dictionary that associates elements of a sequence with the keys returned by the given closure.
5052
- [`partitioningIndex(where:)`](https://github.com/apple/swift-algorithms/blob/main/Guides/Partition.md): Returns the starting index of the partition of a collection that matches a predicate.
5153
- [`reductions(_:)`, `reductions(_:_:)`](https://github.com/apple/swift-algorithms/blob/main/Guides/Reductions.md): Returns all the intermediate states of reducing the elements of a sequence or collection.
5254
- [`split(maxSplits:omittingEmptySubsequences:whereSeparator)`, `split(separator:maxSplits:omittingEmptySubsequences)`](https://github.com/apple/swift-algorithms/blob/main/Guides/Split.md): Lazy versions of the Standard Library's eager operations that split sequences and collections into subsequences separated by the specified separator element.

Sources/Algorithms/Grouped.swift

+25
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
//===----------------------------------------------------------------------===//
2+
//
3+
// This source file is part of the Swift Algorithms open source project
4+
//
5+
// Copyright (c) 2021 Apple Inc. and the Swift project authors
6+
// Licensed under Apache License v2.0 with Runtime Library Exception
7+
//
8+
// See https://swift.org/LICENSE.txt for license information
9+
//
10+
//===----------------------------------------------------------------------===//
11+
12+
extension Sequence {
13+
/// Groups up elements of `self` into a new Dictionary,
14+
/// whose values are Arrays of grouped elements,
15+
/// each keyed by the group key returned by the given closure.
16+
/// - Parameters:
17+
/// - keyForValue: A closure that returns a key for each element in
18+
/// `self`.
19+
/// - Returns: A dictionary containing grouped elements of self, keyed by
20+
/// the keys derived by the `keyForValue` closure.
21+
@inlinable
22+
public func grouped<GroupKey>(by keyForValue: (Element) throws -> GroupKey) rethrows -> [GroupKey: [Element]] {
23+
try Dictionary(grouping: self, by: keyForValue)
24+
}
25+
}

Sources/Algorithms/Keyed.swift

+65
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,65 @@
1+
//===----------------------------------------------------------------------===//
2+
//
3+
// This source file is part of the Swift Algorithms open source project
4+
//
5+
// Copyright (c) 2020 Apple Inc. and the Swift project authors
6+
// Licensed under Apache License v2.0 with Runtime Library Exception
7+
//
8+
// See https://swift.org/LICENSE.txt for license information
9+
//
10+
//===----------------------------------------------------------------------===//
11+
12+
extension Sequence {
13+
/// Creates a new Dictionary from the elements of `self`, keyed by the
14+
/// results returned by the given `keyForValue` closure.
15+
///
16+
/// If the key derived for a new element collides with an existing key from a previous element,
17+
/// the latest value will be kept.
18+
///
19+
/// - Parameters:
20+
/// - keyForValue: A closure that returns a key for each element in `self`.
21+
@inlinable
22+
public func keyed<Key>(
23+
by keyForValue: (Element) throws -> Key
24+
) rethrows -> [Key: Element] {
25+
return try self.keyed(by: keyForValue, resolvingConflictsWith: { _, old, new in new })
26+
}
27+
28+
/// Creates a new Dictionary from the elements of `self`, keyed by the
29+
/// results returned by the given `keyForValue` closure. As the dictionary is
30+
/// built, the initializer calls the `resolve` closure with the current and
31+
/// new values for any duplicate keys. Pass a closure as `resolve` that
32+
/// returns the value to use in the resulting dictionary: The closure can
33+
/// choose between the two values, combine them to produce a new value, or
34+
/// even throw an error.
35+
///
36+
/// - Parameters:
37+
/// - keyForValue: A closure that returns a key for each element in `self`.
38+
/// - resolve: A closure that is called with the values for any duplicate
39+
/// keys that are encountered. The closure returns the desired value for
40+
/// the final dictionary.
41+
@inlinable
42+
public func keyed<Key>(
43+
by keyForValue: (Element) throws -> Key,
44+
resolvingConflictsWith resolve: (Key, Element, Element) throws -> Element
45+
) rethrows -> [Key: Element] {
46+
var result = [Key: Element]()
47+
48+
for element in self {
49+
let key = try keyForValue(element)
50+
51+
if let oldValue = result.updateValue(element, forKey: key) {
52+
let valueToKeep = try resolve(key, oldValue, element)
53+
54+
// This causes a second look-up for the same key. The standard library can avoid that
55+
// by calling `mutatingFind` to get access to the bucket where the value will end up,
56+
// and updating in place.
57+
// Swift Algorithms doesn't have access to that API, so we make do.
58+
// When this gets merged into the standard library, we should optimize this.
59+
result[key] = valueToKeep
60+
}
61+
}
62+
63+
return result
64+
}
65+
}
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,46 @@
1+
//===----------------------------------------------------------------------===//
2+
//
3+
// This source file is part of the Swift Algorithms open source project
4+
//
5+
// Copyright (c) 2020 Apple Inc. and the Swift project authors
6+
// Licensed under Apache License v2.0 with Runtime Library Exception
7+
//
8+
// See https://swift.org/LICENSE.txt for license information
9+
//
10+
//===----------------------------------------------------------------------===//
11+
12+
import XCTest
13+
import Algorithms
14+
15+
final class GroupedTests: XCTestCase {
16+
private class SampleError: Error {}
17+
18+
// Based on https://github.com/apple/swift/blob/4d1d8a9de5ebc132a17aee9fc267461facf89bf8/validation-test/stdlib/Dictionary.swift#L1974-L1988
19+
20+
func testGroupedBy() {
21+
let r = 0..<10
22+
23+
let d1 = r.grouped(by: { $0 % 3 })
24+
XCTAssertEqual(3, d1.count)
25+
XCTAssertEqual(d1[0]!, [0, 3, 6, 9])
26+
XCTAssertEqual(d1[1]!, [1, 4, 7])
27+
XCTAssertEqual(d1[2]!, [2, 5, 8])
28+
29+
let d2 = r.grouped(by: { $0 })
30+
XCTAssertEqual(10, d2.count)
31+
32+
let d3 = (0..<0).grouped(by: { $0 })
33+
XCTAssertEqual(0, d3.count)
34+
}
35+
36+
func testThrowingFromKeyFunction() {
37+
let input = ["Apple", "Banana", "Cherry"]
38+
let error = SampleError()
39+
40+
XCTAssertThrowsError(
41+
try input.grouped(by: { (_: String) -> Character in throw error })
42+
) { thrownError in
43+
XCTAssertIdentical(error, thrownError as? SampleError)
44+
}
45+
}
46+
}
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,88 @@
1+
//===----------------------------------------------------------------------===//
2+
//
3+
// This source file is part of the Swift Algorithms open source project
4+
//
5+
// Copyright (c) 2020 Apple Inc. and the Swift project authors
6+
// Licensed under Apache License v2.0 with Runtime Library Exception
7+
//
8+
// See https://swift.org/LICENSE.txt for license information
9+
//
10+
//===----------------------------------------------------------------------===//
11+
12+
import XCTest
13+
import Algorithms
14+
15+
final class KeyedTests: XCTestCase {
16+
private class SampleError: Error {}
17+
18+
func testUniqueKeys() {
19+
let d = ["Apple", "Banana", "Cherry"].keyed(by: { $0.first! })
20+
XCTAssertEqual(d.count, 3)
21+
XCTAssertEqual(d["A"]!, "Apple")
22+
XCTAssertEqual(d["B"]!, "Banana")
23+
XCTAssertEqual(d["C"]!, "Cherry")
24+
XCTAssertNil(d["D"])
25+
}
26+
27+
func testEmpty() {
28+
let d = EmptyCollection<String>().keyed(by: { $0.first! })
29+
XCTAssertEqual(d.count, 0)
30+
}
31+
32+
func testNonUniqueKeys() throws {
33+
let d = ["Apple", "Avocado", "Banana", "Cherry"].keyed(by: { $0.first! })
34+
XCTAssertEqual(d.count, 3)
35+
XCTAssertEqual(d["A"]!, "Avocado", "On a key-collision, keyed(by:) should take the latest value.")
36+
XCTAssertEqual(d["B"]!, "Banana")
37+
XCTAssertEqual(d["C"]!, "Cherry")
38+
}
39+
40+
func testNonUniqueKeysWithMergeFunction() {
41+
var resolveCallHistory = [(key: Character, current: String, new: String)]()
42+
let expectedCallHistory = [
43+
(key: "A", current: "Apple", new: "Avocado"),
44+
(key: "C", current: "Cherry", new: "Coconut"),
45+
]
46+
47+
let d = ["Apple", "Avocado", "Banana", "Cherry", "Coconut"].keyed(
48+
by: { $0.first! },
49+
resolvingConflictsWith: { key, older, newer in
50+
resolveCallHistory.append((key, older, newer))
51+
return "\(older)-\(newer)"
52+
}
53+
)
54+
55+
XCTAssertEqual(d.count, 3)
56+
XCTAssertEqual(d["A"]!, "Apple-Avocado")
57+
XCTAssertEqual(d["B"]!, "Banana")
58+
XCTAssertEqual(d["C"]!, "Cherry-Coconut")
59+
XCTAssertNil(d["D"])
60+
61+
XCTAssertEqual(
62+
resolveCallHistory.map(String.init(describing:)), // quick/dirty workaround: tuples aren't Equatable
63+
expectedCallHistory.map(String.init(describing:))
64+
)
65+
}
66+
67+
func testThrowingFromKeyFunction() {
68+
let input = ["Apple", "Banana", "Cherry"]
69+
let error = SampleError()
70+
71+
XCTAssertThrowsError(
72+
try input.keyed(by: { (_: String) -> Character in throw error })
73+
) { thrownError in
74+
XCTAssertIdentical(error, thrownError as? SampleError)
75+
}
76+
}
77+
78+
func testThrowingFromCombineFunction() {
79+
let input = ["Apple", "Avocado", "Banana", "Cherry"]
80+
let error = SampleError()
81+
82+
XCTAssertThrowsError(
83+
try input.keyed(by: { $0.first! }, resolvingConflictsWith: { _, _, _ in throw error })
84+
) { thrownError in
85+
XCTAssertIdentical(error, thrownError as? SampleError)
86+
}
87+
}
88+
}

0 commit comments

Comments
 (0)