Skip to content

Commit f724af4

Browse files
milsemannatecook1000finagolfinoleamartini51
authored
Update swift main (#674) (#676)
* Atomically load the lowered program (#610) Since we're atomically initializing the compiled program in `Regex.Program`, we need to pair that with an atomic load. Resolves #609. * Add tests for line start/end word boundary diffs (#616) The `default` and `simple` word boundaries have different behaviors at the start and end of strings/lines. These tests validate that we have the correct behavior implemented. Related to issue #613. * Add tweaks for Android * Fix documentation typo (#615) * Fix abstract for Regex.dotMatchesNewlines(_:). (#614) The old version looks like it was accidentally duplicated from anchorsMatchLineEndings(_:) just below it. * Remove `RegexConsumer` and fix its dependencies (#617) * Remove `RegexConsumer` and fix its dependencies This eliminates the RegexConsumer type and rewrites its users to call through to other, existing functionality on Regex or in the Algorithms implementations. RegexConsumer doesn't take account of the dual subranges required for matching, so it can produce results that are inconsistent with matches(of:) and ranges(of:), which were rewritten earlier. rdar://102841216 * Remove remaining from-end algorithm methods This removes methods that are left over from when we were considering from-end algorithms. These aren't tested and may not have the correct semantics, so it's safer to remove them entirely. * Improve StringProcessing and RegexBuilder documentation (#611) This includes documentation improvements for core types/methods, RegexBuilder types along with their generated variadic initializers, and adds some curation. It also includes tests of the documentation code samples. * Set availability for inverted character class test (#621) This feature depends on running with a Swift 5.7 stdlib, and fails when that isn't available. * Add type annotations in RegexBuilder tests These changes work around a change to the way result builders are compiled that removes the ability for result builder closure outputs to affect the overload resolution elsewhere in an expression. Workarounds for rdar://104881395 and rdar://104645543 * Workaround for fileprivate array issue A recent compiler change results in fileprivate arrays sometimes not keeping their buffers around long enough. This change avoids that issue by removing the fileprivate annotations from the affected type. * Fix an issue where named character classes weren't getting converted in the result builder. <rdar://104480703> * Stop at end of search string in TwoWaySearcher (#631) When searching for a substring that doesn't exist, it was possible for TwoWaySearcher to advance beyond the end of the search string, causing a crash. This change adds a `limitedBy:` parameter to that index movement, avoiding the invalid movement. Fixes rdar://105154010 * Correct misspelling in DSL renderer (#627) vertial -> vertical rdar://104602317 * Fix output type mismatch with RegexBuilder (#626) Some regex literals (and presumably other `Regex` instances) lose their output type information when used in a RegexBuilder closure due to the way the concatenating builder calls are overloaded. In particular, any output type with labeled tuples or where the sum of tuple components in the accumulated and new output types is greater than 10 will be ignored. Regex internals don't make this distinction, however, so there ends up being a mismatch between what a `Regex.Match` instance tries to produce and the output type of the outermost regex. For example, this code results in a crash, because `regex` is a `Regex<Substring>` but the match tries to produce a `(Substring, number: Substring)`: let regex = Regex { ZeroOrMore(.whitespace) /:(?<number>\d+):/ ZeroOrMore(.whitespace) } let match = try regex.wholeMatch(in: " :21: ") print(match!.output) To fix this, we add a new `ignoreCapturesInTypedOutput` DSLTree node to mark situations where the output type is discarded. This status is propagated through the capture list into the match's storage, which lets us produce the correct output type. Note that we can't just drop the capture groups when building the compiled program because (1) different parts of the regex might reference the capture group and (2) all capture groups are available if a developer converts the output to `AnyRegexOutput`. let anyOutput = AnyRegexOutput(match) // anyOutput[1] == "21" // anyOutput["number"] == Optional("21") Fixes #625. rdar://104823356 Note: Linux seems to crash on different tests when the two customTest overloads have `internal` visibility or are called. Switching one of the functions to be generic over a RegexComponent works around the issue. * Revert "Merge pull request #628 from apple/result_builder_changes_workaround" This reverts commit 7e059b7, reversing changes made to 3ca8b13. * Use `some` syntax in variadics This supports a type checker fix after the change in how result builder closure parameters are type-checked. * Type checker workaround: adjust test * Further refactor to work around type checker regression * Align availability macro with OS versions (#641) * Speed up general character class matching (#642) Short-circuit Character.isASCII checks inside built in character class matching. Also, make benchmark try a few more times before giving up. * Test for \s matching CRLF when scalar matching (#648) * General ascii fast paths for character classes (#644) General ASCII fast-paths for builtin character classes * Remove the unsupported `anyScalar` case (#650) We decided not to support the `anyScalar` character class, which would match a single Unicode scalar regardless of matching mode. However, its representation was still included in the various character class types in the regex engine, leading to unreachable code and unclear requirements when changing or adding new code. This change removes that representation where possible. The `DSLTree.Atom.CharacterClass` enum is left unchanged, since it is marked `@_spi(RegexBuilder) public`. Any use of that enum case is handled with a `fatalError("Unsupported")`, and it isn't produced on any code path. * Fix range-based quantification fast path (#653) The fast path for quantification incorrectly discards the last save position when the quantification used up all possible trips, which is only possible with range-based quantifications (e.g. `{0,3}`). This bug shows up when a range-based quantifier matches the maximum - 1 repetitions of the preceding pattern. For example, the regex `/a{0,2}a/` should succeed as a full match any of the strings "aa", "aaa", or "aaaa". However, the pattern fails to match "aaa", since the save point allowing a single "a" to match the first `a{0,2}` part of the regex is discarded. This change only discards the last save position when advancing the quantifier fails due to a failure to match, not maxing out the number of trips. * Add in ASCII fast-path for anyNonNewline (#654) * Avoid long expression type checks (#657) These changes remove several seconds of type-checking time from the RegexBuilder test cases, bringing all expressions under 150ms (on the tested computer). * Processor cleanup (#655) Clean up and refactor the processor * Simplify instruction fetching * Refactor metrics out, and void their storage in release builds *Put operations onto String * Fix `firstRange(of:)` search (#656) Calls to `ranges(of:)` and `firstRange(of:)` with a string parameter actually use two different string searching algorithms. `ranges(of:)` uses the "z-searcher" algorithm, while `firstRange(of:)` uses a two-way search. Since it's better to align on a single path for these searches, the z-searcher has lower requirements, and the two-way search implementation has a correctness bug, this change removes the two-way search algorithm and uses z-search for `firstRange(of:)`. The correctness bug in `firstRange(of:)` appears only when searching for the second (or later) occurrence of a substring, which you have to be fairly deliberate about. In the example below, the substring at offsets `7..<12` is missed: let text = "ADACBADADACBADACB" // ===== -----===== let pattern = "ADACB" let firstRange = text.firstRange(of: pattern)! // firstRange ~= 0..<5 let secondRange = text[firstRange.upperBound...].firstRange(of: pattern)! // secondRange ~= 12..<17 This change also removes some unrelated, unused code in Split.swift, in addition to removing an (unused) usage of `TwoWaySearcher`. rdar://92794248 * Bug fix and hot path for quantified `.` (#658) Bug fix in newline hot path, and apply hot path to quantified dot * Run scalar-semantic benchmark variants (#659) Run scalar semantic benchmarks * Refactor operations to be on String (#664) Finish refactoring logic onto String * Provide unique generic method parameter names (#669) This is getting warned on in the 5.9 compiler, will be an error starting in Swift 6. * Enable quantification optimizations for scalar semantics (#671) * Quantified scalar semantic matching * Remove redundant test --------- Co-authored-by: Nate Cook <natecook@apple.com> Co-authored-by: Butta <repo@butta.fastem.com> Co-authored-by: Ole Begemann <ole@oleb.net> Co-authored-by: Alex Martini <amartini@apple.com> Co-authored-by: Alejandro Alonso <alejandro_alonso@apple.com> Co-authored-by: David Ewing <dewing@apple.com> Co-authored-by: Dave Ewing <96321608+DaveEwing@users.noreply.github.com>
1 parent 02f6b71 commit f724af4

31 files changed

+823
-642
lines changed

Sources/RegexBenchmark/Benchmark.swift

Lines changed: 35 additions & 57 deletions
Original file line numberDiff line numberDiff line change
@@ -71,6 +71,13 @@ struct NSBenchmark: RegexBenchmark {
7171
enum NSMatchType {
7272
case allMatches
7373
case first
74+
75+
init(_ type: Benchmark.MatchType) {
76+
switch type {
77+
case .whole, .first: self = .first
78+
case .allMatches: self = .allMatches
79+
}
80+
}
7481
}
7582

7683
func run() {
@@ -126,7 +133,7 @@ struct CrossBenchmark {
126133
/// The base name of the benchmark
127134
var baseName: String
128135

129-
/// The string to compile in differnet engines
136+
/// The string to compile in different engines
130137
var regex: String
131138

132139
/// The text to search
@@ -143,57 +150,32 @@ struct CrossBenchmark {
143150
/// Whether or not to do firstMatch as well or just allMatches
144151
var includeFirst: Bool = false
145152

153+
/// Whether to also run scalar-semantic mode
154+
var alsoRunScalarSemantic: Bool = true
155+
146156
func register(_ runner: inout BenchmarkRunner) {
147-
let swiftRegex = try! Regex(regex)
148-
let nsRegex: NSRegularExpression
149157
if isWhole {
150-
nsRegex = try! NSRegularExpression(pattern: "^" + regex + "$")
158+
runner.registerCrossBenchmark(
159+
nameBase: baseName,
160+
input: input,
161+
pattern: regex,
162+
.whole,
163+
alsoRunScalarSemantic: alsoRunScalarSemantic)
151164
} else {
152-
nsRegex = try! NSRegularExpression(pattern: regex)
153-
}
165+
runner.registerCrossBenchmark(
166+
nameBase: baseName,
167+
input: input,
168+
pattern: regex,
169+
.allMatches,
170+
alsoRunScalarSemantic: alsoRunScalarSemantic)
154171

155-
if isWhole {
156-
runner.register(
157-
Benchmark(
158-
name: baseName + "Whole",
159-
regex: swiftRegex,
160-
pattern: regex,
161-
type: .whole,
162-
target: input))
163-
runner.register(
164-
NSBenchmark(
165-
name: baseName + "Whole" + CrossBenchmark.nsSuffix,
166-
regex: nsRegex,
167-
type: .first,
168-
target: input))
169-
} else {
170-
runner.register(
171-
Benchmark(
172-
name: baseName + "All",
173-
regex: swiftRegex,
174-
pattern: regex,
175-
type: .allMatches,
176-
target: input))
177-
runner.register(
178-
NSBenchmark(
179-
name: baseName + "All" + CrossBenchmark.nsSuffix,
180-
regex: nsRegex,
181-
type: .allMatches,
182-
target: input))
183172
if includeFirst || runner.includeFirstOverride {
184-
runner.register(
185-
Benchmark(
186-
name: baseName + "First",
187-
regex: swiftRegex,
188-
pattern: regex,
189-
type: .first,
190-
target: input))
191-
runner.register(
192-
NSBenchmark(
193-
name: baseName + "First" + CrossBenchmark.nsSuffix,
194-
regex: nsRegex,
195-
type: .first,
196-
target: input))
173+
runner.registerCrossBenchmark(
174+
nameBase: baseName,
175+
input: input,
176+
pattern: regex,
177+
.first,
178+
alsoRunScalarSemantic: alsoRunScalarSemantic)
197179
}
198180
}
199181
}
@@ -209,20 +191,16 @@ struct CrossInputListBenchmark {
209191

210192
/// The list of strings to search
211193
var inputs: [String]
194+
195+
/// Also run in scalar-semantic mode
196+
var alsoRunScalarSemantic: Bool = true
212197

213198
func register(_ runner: inout BenchmarkRunner) {
214-
let swiftRegex = try! Regex(regex)
215-
runner.register(InputListBenchmark(
199+
runner.registerCrossBenchmark(
216200
name: baseName,
217-
regex: swiftRegex,
201+
inputList: inputs,
218202
pattern: regex,
219-
targets: inputs
220-
))
221-
runner.register(InputListNSBenchmark(
222-
name: baseName + CrossBenchmark.nsSuffix,
223-
regex: regex,
224-
targets: inputs
225-
))
203+
alsoRunScalarSemantic: alsoRunScalarSemantic)
226204
}
227205
}
228206

Sources/RegexBenchmark/BenchmarkRunner.swift

Lines changed: 141 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,16 @@ import Foundation
44
/// The number of times to re-run the benchmark if results are too varying
55
private var rerunCount: Int { 3 }
66

7+
extension Benchmark.MatchType {
8+
fileprivate var nameSuffix: String {
9+
switch self {
10+
case .whole: return "_Whole"
11+
case .first: return "_First"
12+
case .allMatches: return "_All"
13+
}
14+
}
15+
}
16+
717
struct BenchmarkRunner {
818
let suiteName: String
919
var suite: [any RegexBenchmark] = []
@@ -16,12 +26,141 @@ struct BenchmarkRunner {
1626

1727
// Forcibly include firstMatch benchmarks for all CrossBenchmarks
1828
let includeFirstOverride: Bool
29+
30+
// Register a cross-benchmark
31+
mutating func registerCrossBenchmark(
32+
nameBase: String,
33+
input: String,
34+
pattern: String,
35+
_ type: Benchmark.MatchType,
36+
alsoRunScalarSemantic: Bool = true
37+
) {
38+
let swiftRegex = try! Regex(pattern)
39+
let nsRegex: NSRegularExpression
40+
if type == .whole {
41+
nsRegex = try! NSRegularExpression(pattern: "^" + pattern + "$")
42+
} else {
43+
nsRegex = try! NSRegularExpression(pattern: pattern)
44+
}
45+
let nameSuffix = type.nameSuffix
46+
47+
register(
48+
Benchmark(
49+
name: nameBase + nameSuffix,
50+
regex: swiftRegex,
51+
pattern: pattern,
52+
type: type,
53+
target: input))
54+
register(
55+
NSBenchmark(
56+
name: nameBase + nameSuffix + CrossBenchmark.nsSuffix,
57+
regex: nsRegex,
58+
type: .init(type),
59+
target: input))
60+
61+
if alsoRunScalarSemantic {
62+
register(
63+
Benchmark(
64+
name: nameBase + nameSuffix + "_Scalar",
65+
regex: swiftRegex.matchingSemantics(.unicodeScalar),
66+
pattern: pattern,
67+
type: type,
68+
target: input))
69+
register(
70+
NSBenchmark(
71+
name: nameBase + nameSuffix + "_Scalar" + CrossBenchmark.nsSuffix,
72+
regex: nsRegex,
73+
type: .init(type),
74+
target: input))
75+
}
76+
}
77+
78+
// Register a cross-benchmark list
79+
mutating func registerCrossBenchmark(
80+
name: String,
81+
inputList: [String],
82+
pattern: String,
83+
alsoRunScalarSemantic: Bool = true
84+
) {
85+
let swiftRegex = try! Regex(pattern)
86+
register(InputListBenchmark(
87+
name: name,
88+
regex: swiftRegex,
89+
pattern: pattern,
90+
targets: inputList
91+
))
92+
register(InputListNSBenchmark(
93+
name: name + CrossBenchmark.nsSuffix,
94+
regex: pattern,
95+
targets: inputList
96+
))
97+
98+
if alsoRunScalarSemantic {
99+
register(InputListBenchmark(
100+
name: name + "_Scalar",
101+
regex: swiftRegex.matchingSemantics(.unicodeScalar),
102+
pattern: pattern,
103+
targets: inputList
104+
))
105+
register(InputListNSBenchmark(
106+
name: name + "_Scalar" + CrossBenchmark.nsSuffix,
107+
regex: pattern,
108+
targets: inputList
109+
))
110+
}
111+
112+
}
113+
114+
// Register a swift-only benchmark
115+
mutating func register(
116+
nameBase: String,
117+
input: String,
118+
pattern: String,
119+
_ swiftRegex: Regex<AnyRegexOutput>,
120+
_ type: Benchmark.MatchType,
121+
alsoRunScalarSemantic: Bool = true
122+
) {
123+
let nameSuffix = type.nameSuffix
124+
125+
register(
126+
Benchmark(
127+
name: nameBase + nameSuffix,
128+
regex: swiftRegex,
129+
pattern: pattern,
130+
type: type,
131+
target: input))
132+
133+
if alsoRunScalarSemantic {
134+
register(
135+
Benchmark(
136+
name: nameBase + nameSuffix + "_Scalar",
137+
regex: swiftRegex,
138+
pattern: pattern,
139+
type: type,
140+
target: input))
141+
}
142+
}
19143

20-
mutating func register(_ benchmark: some RegexBenchmark) {
144+
private mutating func register(_ benchmark: NSBenchmark) {
21145
suite.append(benchmark)
22146
}
23147

24-
mutating func register(_ benchmark: some SwiftRegexBenchmark) {
148+
private mutating func register(_ benchmark: Benchmark) {
149+
var benchmark = benchmark
150+
if enableTracing {
151+
benchmark.enableTracing()
152+
}
153+
if enableMetrics {
154+
benchmark.enableMetrics()
155+
}
156+
suite.append(benchmark)
157+
}
158+
159+
private mutating func register(_ benchmark: InputListNSBenchmark) {
160+
suite.append(benchmark)
161+
}
162+
163+
private mutating func register(_ benchmark: InputListBenchmark) {
25164
var benchmark = benchmark
26165
if enableTracing {
27166
benchmark.enableTracing()

Sources/RegexBenchmark/Suite/CustomCharacterClasses.swift

Lines changed: 37 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -12,53 +12,55 @@ extension BenchmarkRunner {
1212

1313
let input = Inputs.graphemeBreakData
1414

15-
register(Benchmark(
16-
name: "BasicCCC",
17-
regex: try! Regex(basic),
15+
// TODO: Which of these can be cross-benchmarks?
16+
17+
register(
18+
nameBase: "BasicCCC",
19+
input: input,
1820
pattern: basic,
19-
type: .allMatches,
20-
target: input))
21+
try! Regex(basic),
22+
.allMatches)
2123

22-
register(Benchmark(
23-
name: "BasicRangeCCC",
24-
regex: try! Regex(basicRange),
24+
register(
25+
nameBase: "BasicRangeCCC",
26+
input: input,
2527
pattern: basicRange,
26-
type: .allMatches,
27-
target: input))
28+
try! Regex(basicRange),
29+
.allMatches)
2830

29-
register(Benchmark(
30-
name: "CaseInsensitiveCCC",
31-
regex: try! Regex(caseInsensitive),
31+
register(
32+
nameBase: "CaseInsensitiveCCC",
33+
input: input,
3234
pattern: caseInsensitive,
33-
type: .allMatches,
34-
target: input))
35+
try! Regex(caseInsensitive),
36+
.allMatches)
3537

36-
register(Benchmark(
37-
name: "InvertedCCC",
38-
regex: try! Regex(inverted),
38+
register(
39+
nameBase: "InvertedCCC",
40+
input: input,
3941
pattern: inverted,
40-
type: .allMatches,
41-
target: input))
42+
try! Regex(inverted),
43+
.allMatches)
4244

43-
register(Benchmark(
44-
name: "SubtractionCCC",
45-
regex: try! Regex(subtraction),
45+
register(
46+
nameBase: "SubtractionCCC",
47+
input: input,
4648
pattern: subtraction,
47-
type: .allMatches,
48-
target: input))
49+
try! Regex(subtraction),
50+
.allMatches)
4951

50-
register(Benchmark(
51-
name: "IntersectionCCC",
52-
regex: try! Regex(intersection),
52+
register(
53+
nameBase: "IntersectionCCC",
54+
input: input,
5355
pattern: intersection,
54-
type: .allMatches,
55-
target: input))
56+
try! Regex(intersection),
57+
.allMatches)
5658

57-
register(Benchmark(
58-
name: "symDiffCCC",
59-
regex: try! Regex(symmetricDifference),
59+
register(
60+
nameBase: "symDiffCCC",
61+
input: input,
6062
pattern: symmetricDifference,
61-
type: .allMatches,
62-
target: input))
63+
try! Regex(symmetricDifference),
64+
.allMatches)
6365
}
6466
}

Sources/RegexBenchmark/Utils/Stats.swift

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,8 +3,8 @@ import Foundation
33
enum Stats {}
44

55
extension Stats {
6-
// Maximum allowed standard deviation is 5% of the median runtime
7-
static let maxAllowedStdev = 0.05
6+
// Maximum allowed standard deviation is 7.5% of the median runtime
7+
static let maxAllowedStdev = 0.075
88

99
static func tTest(_ a: Measurement, _ b: Measurement) -> Bool {
1010
// Student's t-test

0 commit comments

Comments
 (0)