Description
I propose to add to package strings:
// Cut cuts s around the first instance of sep,
// returning the text before and after sep.
// The found result reports whether sep appears in s.
// If sep does not appear in s, cut returns s, "", false.
func Cut(s, sep string) (before, after string, found bool) {
if i := Index(s, sep); i >= 0 {
return s[:i], s[i+len(sep):], true
}
return s, "", false
}
I similarly propose to add to package bytes:
// Cut cuts s around the first instance of sep,
// returning the text before and after sep.
// The found result reports whether sep appears in s.
// If sep does not appear in s, cut returns s, "", false.
func Cut(s, sep []byte) (before, after []byte, found bool)
Cut is a single function that replaces and simplifies the overwhelming majority of the usage of four different standard library functions at once (Index, IndexByte, IndexRune, and SplitN). It has also been invented twice in the main repo. For these reasons, I propose to add Cut to the standard library.
These were previously proposed as part of #40135, which turned into quite the bikeshed discussion about compiler optimization and so on. There were also various other functions proposed, which made the discussion even more wide-ranging.
This issue is a replacement for #40135, to start a new and I hope more focused discussion. Thanks very much to @ainar-g for starting the discussion on #40135.
Edit, May 24: Renamed the third result to found
to avoid confusion with the comma-ok form for map access, channel receive, and so on, which always return zero values with ok = false.
The Unreasonable Effectiveness of Cut
To attempt to cut off the bikeshedding this time, let me present data showing why Cut is worth adding.
Anecdotally, after the discussion on #40135 I copied the suggested implementation of Cut into every program that I wrote that could use it for a while, and that turned out to be just about every program I wrote that did anything with strings. That made me think there is something here.
To check that belief, I recently searched for uses of strings.Index, strings.IndexByte, or strings.IndexRune in the main repo and converted the ones that could use strings.Cut instead. Calling those all “Index” for the sake of discussion (any call with a constant sep should use Index anyway), there were:
- 311 Index calls outside examples and testdata.
- 20 should have been Contains
- 2 should have been 1 call to IndexAny
- 2 should have been 1 call to ContainsAny
- 1 should have been TrimPrefix
- 1 should have been HasSuffix
That leaves 285 calls. Of those, 221 were better written as Cut, leaving 64 that were not.
That is, 77% of Index calls are more clearly written using Cut. That's incredible!
A typical rewrite is to replace:
// The first equals separates the key from the value.
eq := strings.IndexByte(rec, '=')
if eq == -1 {
return "", "", s, ErrHeader
}
k, v = rec[:eq], rec[eq+1:]
with
// The first equals separates the key from the value.
k, v, ok = strings.Cut(rec, "=")
if !ok {
return "", "", s, ErrHeader
}
I have long believed that this line from the first version is an elegant Go idiom:
k, v = rec[:eq], rec[eq+1:]
However, it is undeniably more elegant not to write that line at all, as in the second version. More complex examples became quite a lot more readable by working with strings instead of indexes. It seems to me that indexes are the equivalent of C pointers. Working with actual strings instead, as returned by Cut, lets the code raise the level of abstraction significantly.
As noted in #40135, Cut is very much like SplitN with n=2, but without the awkwardness and potential inefficiency of the slice. A typical rewrite is to replace:
elts := strings.SplitN(flag, "=", 2)
name := elts[0]
value := ""
if len(elts) == 2 {
value = elts[1]
}
with
name, value, _ := strings.Cut(flag, "=")
In the discussion on #40135, the point was raised that maybe we could make SplitN with count 2 allocate the result slice on the caller's stack, so that the first version would be as efficient as the second. I am sure we can with mid-stack inlining, and we probably should to help existing code, but making SplitN with n=2 more efficient does not make the code any less awkward to write. The reason to adopt Cut is clarity of code, not efficiency.
Of the 33 calls to SplitN in the main repo (excluding examples), 24 were using count 2 and were more clearly written using Cut instead.
That is, 72% of SplitN calls are also more clearly written using Cut. That's also incredible!
Looking in my public Go corpus, I found 88,230 calls to strings.SplitN. Of these, 77,176 (87%) use a fixed count 2. I expect that essentially all of them would be more clearly written using Cut.
It is also worth noting that something like Cut had been reinvented in two different packages as an unexported function: golang.org/x/mod/sumdb/note's chop (7 uses) and net/url's split (4 uses). Clearly, slicing out the separator is an incredibly common thing to do with the result of strings.Index.
The conversions described here can be viewed in CL 322210.
As noted in #40135, Cut is similar to Python's str.partition(sep), which returns (before, sep, after) if sep is found and (str, "", "") if not. The boolean result seems far more idiomatic in Go, and it was used directly as a boolean in over half the Cut calls I introduced in the main repo. That is, the fact that str.partition is useful in Python is added evidence for Cut, but we need not adopt the Pythonic signature. (The more idiomatic Go signature was first suggested by @nightlyone.)
Again, Cut is a single function that replaces and simplifies the overwhelming majority of the usage of four different standard library functions at once (Index, IndexByte, IndexRune, and SplitN), and it has been invented twice in the main repo. That seems above the bar for the standard library.
Why not other related functions?
A handful of other functions were suggested in #40135 as well. Here is some more data addressing those.
LastCut. This function would be
// LastCut cuts s around the last instance of sep,
// returning the text before and after sep.
// The found result reports whether sep appears in s.
// If sep does not appear in s, cut returns s, "", false.
func LastCut(s, sep string) (before, after string, found bool) {
if i := LastIndex(s, sep); i >= 0 {
return s[:i], s[i+len(sep):], true
}
return s, "", false
}
There are 107 calls to strings.LastIndex and strings.LastIndexByte in the main repo (compared to 285 for Index/IndexByte/IndexRune). I expect a substantial fraction of them would be made simpler using LastCut but have not investigated except for looking at a couple dozen random examples. I considered including LastCut in this proposal, but it seemed easier to limit it to the single function Cut.
It may be worth adding LastCut as well.
Until. This function would be
// Until returns the prefix of s until the first instance of sep,
// or the whole string s if sep does not appear.
func Until(s, sep string) string {
s, _, _ = Cut(s, sep)
return s
}
Of the 261 calls to strings.Cut I ended up with in the main repo, 51 (20%) were of this form. Having the single-result form would further simplify uses by allowing it to appear inside other expressions. Like with LastCut, I considered including Until in the proposal but it seemed easier to limit it to the single function Cut.
It may be worth adding Until as well.
(Until was also suggested as PrefixUntil.)
SuffixAfter. This function was suggested as meaning
func SuffixAfter(s, sep string) string {
_, s, _ = strings.LastCut(s, sep)
return s
}
Similarly, we could consider a variant using Cut instead of LastCut. In the 261 calls to strings.Cut I ended up with in the main repo, only 6 (2%) were of the form _, x, _ = strings.Cut
. So the SuffixAfter form using the result of Cut is definitely not worth adding. It is possible that the data would come out differently for LastCut, but in the absence of that evidence, it is not worth proposing to add SuffixAfter. (Adding the LastCut form would require first adding LastCut as well.)
SplitFirst. This function was suggested as meaning
func SplitFirst(s, sep string) (before, after string) {
before, after, _ = strings.Cut(s, sep)
return before, after
}
Of the 261 calls to Cut I added to the main repo, only 106 (41%) discard the final result. Although the result can be reconstructed as len(before) != len(s)
, it seems wrong to force the majority of uses to do that, especially since there are multiple results either way.
Why not other unrelated functions?
There was concern in #40135 even among those who thought Cut was worth adding about where the bar was for standard library additions and whether adding Cut meant opening the floodgates. I am comfortable saying that the bar is somewhere near “replaces and simplifies the overwhelming majority of the usage of four different standard library functions at once” and similarly confident that establishing such a bar would not open any floodgates.