Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

String split return type #49635

Closed
Forlini91 opened this issue Jun 22, 2022 · 8 comments
Closed

String split return type #49635

Forlini91 opened this issue Jun 22, 2022 · 8 comments
Labels
Working as Intended The behavior described is the intended behavior; this is not a bug

Comments

@Forlini91
Copy link

lib Update Request

Configuration Check

My compilation target is ES2020 and my lib is ES2020.

Missing / Incorrect Definition

Method split of String currently has this signature

split(separator: string | RegExp, limit?: number): string[];

This method, when called with a limit argument = 0, returns an empty array, but, when called without limit argument or with a non 0 limit argument, returns an array with at least one element (the initial string).

This means the signature could be changed to acknowledge at least the "missing limit argument" case, where we are sure the array contains at least one element

split(separator: string | RegExp): [string, ...string[]];
split(separator: string | RegExp, limit: number): string[];

Sample Code

Current situation:

const strArr = "123".split("=");  // strArr has type string[]
const item = strArr[0];   // item has type string | undefined

Expected:

const strArr = "123".split("=");  // strArr has type [string, ...string[]]
const item = strArr[0];   // item has type string

I called split without "limit", so I'm sure to find at least 1 element in the returned array

Documentation Link

https://262.ecma-international.org/11.0/#sec-string.prototype.split

@Josh-Cena
Copy link
Contributor

"".split("")

@RyanCavanaugh RyanCavanaugh added the Working as Intended The behavior described is the intended behavior; this is not a bug label Jun 22, 2022
@RyanCavanaugh
Copy link
Member

RyanCavanaugh commented Jun 22, 2022

Yeah, that 👆

@RyanCavanaugh RyanCavanaugh closed this as not planned Won't fix, can't repro, duplicate, stale Jun 22, 2022
@helmturner
Copy link

helmturner commented Jul 8, 2022

@RyanCavanaugh @Josh-Cena I do think there is actually a bug here, if not the one OP is pointing to. Shouldn't the return type be (string | undefined)[] - making the elements of type string | undefined? Seems the undefined case is missed by TS.

Here is the behavior I'm getting:

const test1 = 'test'; // type: 'test'
const splitresult1 = test1.split(':'); // type: string[]
const firstelement1 = splitresult1[0]; // type: string
const secondelement1 = splitresult1[1]; // type: string

console.log(test1) // test
console.log(splitresult1) // [ 'test' ]
console.log(firstelement1) // test
console.log(secondelement1) // undefined

const splitresult2 = "".split(""); // type: string[]
const firstelement2 = splitresult2[0]; // type: string
const secondelement2 = splitresult2[1]; // type: string

console.log(splitresult2) // []
console.log(firstelement2) // undefined
console.log(secondelement2) // undefined

@Josh-Cena
Copy link
Contributor

@AlecVision If you want that behavior please turn on noUncheckedIndexAccess. It's a general problem with index-accessing arrays.

MicahZoltu added a commit to MicahZoltu/TypeScript that referenced this issue May 7, 2023
Fixes microsoft#49635
Behavior similar to microsoft#49682

microsoft#49635 was originally rejected because JavaScript has a stupid edge case where `''.split('')` and `''.split(new RegExp(''))` return an empty array so we cannot assert that the first element is *always* a string.  However, given that the vast majority of use cases for `split` include a literal separator, we can create a set of overloads that results in the non-empty literal separator returning a more narrow type that has a string for the first element.

```ts
''.split('') // []; type= string[]
''.split(new RegExp('')) // []; type= string[]
''.split('a') // ['']; type= [string, ...string[]]
''.split(new RegExp('a')) // ['']; type= string[]
```

Unfortunately, I don't know of a way to differentiate a regular expression literal and a regular expression so that case still always returns an empty first element even though I don't even think it is possible to have an empty regular expression literal.

I don't *believe* there are any other edge cases where split can return an empty array, but if there are please let me know and I can try to update, or we can close the PR and just leave a comment on microsoft#49635 for the next person.
@MicahZoltu
Copy link
Contributor

I'm not sure if the team monitors closed issues or if I should open a new issue, but I believe this JS edge case can be handled with:

split(separator: '', limit?: number): string[];
split(separator: string, limit?: number): [string, ...string[]];
split(separator: string | RegExp, limit?: number): string[];
// test cases
const a = ''.split('') // []; type= string[]
const b = ''.split(new RegExp('')) // []; type= string[]
const c = ''.split('a') // ['']; type= [string, ...string[]]
const d = ''.split(new RegExp('a')) // ['']; type= string[]

It makes it so if a non-empty string literal is provided then the first element is a string, otherwise it behaves as it does today. I don't believe there are any other edge cases where split can return an empty array.

I don't know of a way to differentiate a regular expression literal and a RegExp so that case still always returns an empty first element even though I don't even think it is possible to have an empty regular expression literal (you can do new RegExp('') though).

@ivanhofer
Copy link

I don't believe there are any other edge cases where split can return an empty array.

It can return an empty array if you set the limit to 0.
In a few applications I'm using the following patch that handles that caswe too:

interface String {
	split<Separator extends string | RegExp, Limit extends number>(separator: Separator, limit?: Limit): Limit extends 0
		? []
		: Separator extends ''
			? string[]
			: [string, ...string[]]
}

@Josh-Cena
Copy link
Contributor

@MicahZoltu Consider the following:

function doIt(str: string, sep: string) {
  str.split(sep)[0].toUpperCase();
}

This compiles with your overload, but doIt("", "") causes an error. In general having a wider input type map to a narrower return type is likely unsound.

An empty regex literal is /(?:)/. (This is what you get when you do new RegExp().toString().) Also, you can get an empty array with regexes as long as the regex is able to match empty strings. For this behavior see https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/@@split. For example:

"".split(/?:/);
"".split(/a?/);

So I don't think this can be helped with in a generic way; you just have to assert for each individual use case. Alternatively, if we have negated types, you may be able to do split(sep: string \ "", limit?: number): [string, ...string[]].

@MicahZoltu
Copy link
Contributor

Ah, you are totally right, I don't know why I thought this would work. 😬

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
Working as Intended The behavior described is the intended behavior; this is not a bug
Projects
None yet
Development

No branches or pull requests

6 participants