Skip to content

go/token: add IsIdentifier, IsKeyword, and IsExported funcs #30064

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Closed
mvdan opened this issue Feb 2, 2019 · 13 comments
Closed

go/token: add IsIdentifier, IsKeyword, and IsExported funcs #30064

mvdan opened this issue Feb 2, 2019 · 13 comments
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done.
Milestone

Comments

@mvdan
Copy link
Member

mvdan commented Feb 2, 2019

We already have go/ast.IsExported, which is helpful; it makes the operation simple and expressive, and it avoids the simple mistake of only checking ASCII and ignoring Unicode.

Checking if a string is a valid identifier as per the spec (https://golang.org/ref/spec#Identifiers) falls under the same category, I believe. It seems intuitive, so developers tend to write a simple implementation by hand without much effort. However, there are many gotchas which are easy to get wrong:

  • Unicode
  • Digits are allowed, but not at the beginning
  • _ is allowed, even as a first character
  • Empty strings are not valid identifiers

For example, take this implementation I just came across this morning:

for _, c := range str {
	switch {
	case c >= '0' && c <= '9':
	case c >= 'a' && c <= 'z':
	case c >= 'A' && c <= 'Z':
	case c == '_':
	default:
		return false
	}
}
return true

It gets three of the gotchas wrong; it would return true on "" and "123", and return false on "αβ". I imagine that a correct implementation would be like:

if len(str) == 0 {
	return false
}
for i, c := range str {
	if unicode.IsLetter(c) || c == '_' || (i > 0 && unicode.IsDigit(c)) {
		continue
	}
	return false
}
return true

However, I'm still not 100% confident that this is correct. I'd probably compare it to implementations in go/* and x/tools before using the code, just in case.

The standard library implements variants of this in multiple places; go/scanner does it on a per-rune basis while scanning, go/build implements it in its importReader, and cmd/doc has an isIdentifier func. The same applies to other official repos like x/tools, where I've found godoc.isIdentifier, semver.isIdentChar, analysis.validIdent, and rename.isValidIdentifier.

I think we should have a canonical implementation with a proper godoc and tests. My proposal is to add this to go/ast, as it would already be useful to other packages within the standard library. I also think this is a fairly low-level API, and it's commonly needed whenever one is modifying Go syntax trees, creating new Go files, or otherwise tinkering with Go syntax.

If adding API to go/ast is a problem, perhaps golang.org/x/tools/go/ast/astutil. However, in my mind that package contains much higher-level functions, so it feels a bit out of place.

I realise this could have been a proposal, but I thought the proposed change wouldn't be controversial enough to warrant one. Feel free to retitle it as a proposal if preferred.

/cc @griesemer @josharian @alandonovan @dominikh

@mvdan mvdan added the NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. label Feb 2, 2019
@adonovan
Copy link
Member

adonovan commented Feb 2, 2019

I think this is a fine idea, but the function belongs in go/token, not go/ast.

@mvdan
Copy link
Member Author

mvdan commented Feb 2, 2019

go/token sounds fine; I just suggested go/ast as that's where IsExported is.

@ianthehat pointed out keywords, which I was forgetting about. The spec mentions:

The following keywords are reserved and may not be used as identifiers.

So I presume that IsIdentifier should return false for all Go keywords.

Ian also mentions that go/scanner could be used for this by the user, checking that the entire string scans as a single identifier. I've implemented what I think is a correct way to do it below:

func isIdentifier(str string) bool {
        src := []byte(str)
        var s scanner.Scanner
        fset := token.NewFileSet()
        file := fset.AddFile("", fset.Base(), len(src))
        s.Init(file, src, nil, scanner.ScanComments)
        _, tok, lit := s.Scan()
        return tok == token.IDENT && lit == str
}

However, I still think that an IsIdentifier func in the standard library would be better. Compared to using go/scanner directly, it would be a single line and require zero allocations.

@josharian
Copy link
Contributor

We might consider IsKeyword in go/token as well. In addition to being convenient, the implementation could use a perfect hash for faster checks (as does cmd/compile/internal/syntax).

@mvdan mvdan changed the title go/ast: add func IsIdentifier(string) bool go/token: add func IsIdentifier(string) bool Feb 2, 2019
@adonovan
Copy link
Member

adonovan commented Feb 2, 2019

I'd be happy for us to add IsIdentifier, IsKeyword, and IsExported to go/token.

@mvdan
Copy link
Member Author

mvdan commented Feb 3, 2019

IsKeyword sounds like a great idea; I imagine that IsIdentifier would still return false for inputs like for, though. That is, IsIdentifier would use IsKeyword as part of its implementation.

and IsExported to go/token

Not sure if this is strictly necessary, since it's already part of go/ast. I assume you're suggesting it for the sake of consistency.

@adonovan
Copy link
Member

adonovan commented Feb 3, 2019

I assume you're suggesting it for the sake of consistency.

Exactly. The go/ast function would delegate to the go/token implementation.

@griesemer
Copy link
Contributor

I'm ok with @adonovan said.

@griesemer griesemer added this to the Go1.13 milestone Feb 5, 2019
@griesemer griesemer added NeedsFix The path to resolution is known, but the work has not been done. and removed NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. labels Feb 5, 2019
@griesemer
Copy link
Contributor

Please send CLs to me. Thanks.

@mvdan mvdan self-assigned this Feb 5, 2019
@mvdan mvdan changed the title go/token: add func IsIdentifier(string) bool go/token: add IsIdentifier, IsKeyword, and IsExported funcs Feb 5, 2019
@Skarlso
Copy link
Contributor

Skarlso commented Feb 19, 2019

Hi @mvdan @adonovan @griesemer. Can I work on this one? I don't see a CL attached.

@josharian
Copy link
Contributor

I’m pretty sure @mvdan is working on this; he assigned it to himself.

@Skarlso
Copy link
Contributor

Skarlso commented Feb 19, 2019

@josharian Cool, thanks. :)

@mvdan
Copy link
Member Author

mvdan commented Feb 28, 2019

I will get to it soon - I just have a moderately sized backlog of stuff to go through, now that the tree is open again :)

@gopherbot
Copy link
Contributor

Change https://golang.org/cl/169018 mentions this issue: go/token: add IsIdentifier, IsKeyword, and IsExported

# for free to subscribe to this conversation on GitHub. Already have an account? #.
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done.
Projects
None yet
Development

No branches or pull requests

6 participants