Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

(c) Declaration of struct variable triggers class title mode #2736

Closed
klmr opened this issue Oct 5, 2020 · 10 comments · Fixed by #3078
Closed

(c) Declaration of struct variable triggers class title mode #2736

klmr opened this issue Oct 5, 2020 · 10 comments · Fixed by #3078
Labels
bug help welcome Could use help from community language

Comments

@klmr
Copy link
Contributor

klmr commented Oct 5, 2020

Describe the issue

When declaring a variable of type struct foobar, this triggers the rule for a struct/class title, which leads to the whole declaration being highlighted as a title. See https://meta.stackexchange.com/a/354753/1968.

Which language seems to have the issue?

c/cpp

Are you using highlight or highlightAuto?

highlight

Sample Code to Reproduce

struct List *newnode = (struct List *)malloc(size * sizeof(struct List));

This is rendered as

<class><keyword>struct</keyword> <title>List</title> *<title>newnode</title> = (<title>struct</title> <title>List</span> *)<span class="hljs-title">malloc</title>(<title>size</title> * <title>sizeof</title>(<title>struct</title> <title>List</title>));</class>

Expected behavior

Should render as

<keyword>struct</keyword> List *newnode = (<keyword>struct</keyword> List *)<built_in>malloc</built_in>( * <keyword>sizeof</keyword>(<keyword>struct</keyword> List));

Additional context

Without the initial struct the highlighter doesn’t trigger class/title mode. It still doesn’t correctly render the struct keywords in the rest of the initialisation but that’s already addressed (incidentally) by #2728.

@klmr klmr added bug help welcome Could use help from community language labels Oct 5, 2020
@joshgoebel
Copy link
Member

joshgoebel commented Oct 6, 2020

I think the struct class-like rule could be enhanced to ONLY match struct IDENTIFIER { (splitting it out from the class rule), no? That would resolve this I think?

@klmr
Copy link
Contributor Author

klmr commented Oct 6, 2020

Yes, that was my initial thought as well. However, since C and C++ are currently the same definition we need to do some extra work since the following are all also valid:

struct C;
struct C : A { …
struct C final : A { …
struct C : A<B<C>> { …
template <> struct C<void> { …
struct [[using CC: opt(1), debug]] [[foo([[]])]] C { …
struct N::N1::C { …
template struct N::C<char*>::D<void> { …
struct alignas(8) C { …
struct __declspec(dllexport) C { …
struct __attribute__((__packed__)) C { …

And of course any and all parts in the above may be hidden behind macros.

Damn I hate C++ syntax …

@klmr
Copy link
Contributor Author

klmr commented Oct 6, 2020

As a best-effort heuristic, we might consider something like this (not sure how easy even that is to implement, since it backtracks):

After a struct, class or union, find the next {; then, find the last identifier before that { which is not final, appears before :, and does not appear inside <>. No guarantee this will work everywhere (in particular, it will fail for anonymous classes which have attributes) but for named classes I can’t currently think of a case that will break it.

@joshgoebel
Copy link
Member

joshgoebel commented Oct 6, 2020

Will it greatly simplify things if the grammars were split? I presume that the C one would be much simpler but would it reduce the complexity of the C++ problem?

@klmr
Copy link
Contributor Author

klmr commented Oct 6, 2020

It would for C, but since I’m also interested in good C++ support that doesn’t help me personally. ;-) (Since virtually all relevant C code is also legal C++, the split won’t simplify the C++ grammar.)

That said, yes, I think now’s the time to split the two grammars (#2146). I’m happy to work on this.

@joshgoebel
Copy link
Member

If seems right now we're at the same fidelity as GitHub which seems to highlight the second term after struct always as the name... If GitHub can't find a simple way to do better I'm not sure we should invest a lot of time into this.

@ztane
Copy link

ztane commented Oct 8, 2020

@joshgoebel it is notable that in C the word following struct would be the struct tag. In C you cannot even put _Alignas immediately after the struct keyword, grammar does not allow for it. If there is any word following the word struct it always will be a tag in C. It seems like Github syntax colouring follows the C rule:

C++:

struct tag *foo = malloc(sizeof (struct tag));

Strangely enough, that is not used for C on Github:

struct tag *foo = malloc(sizeof (struct tag));

@joshgoebel
Copy link
Member

Yes, the C rule will be much easier to get right when we split the grammars. I'm not opposed to looking at what it might take to get it perfect for C++ also, but I'm worried it'll be too complex.

@joshgoebel
Copy link
Member

This list you provided in #2736 (comment)

If we split it up between C and C++ what would those two sep lists look like?

joshgoebel added a commit to joshgoebel/highlight.js that referenced this issue Mar 26, 2021
@klmr
Copy link
Contributor Author

klmr commented Mar 26, 2021

C

struct C;
typedef struct { … } C;
struct __declspec(dllexport) C { …
struct __attribute__((packed, aligned(4))) C { …
typedef struct C { … } __attribute__((packed)) C;

Furthermore, union and enum can occur in place of struct in most of the above.

@ztane made the point that C structs can have nothing between the struct keyword and its name. That’s correct in standard C; however, common compiler extensions allow attributes there, as shown above. These are in extremely widespread use, and should ideally not trip up the lexer.

C++

All of the above, plus

struct C : A { …
struct C final : A { …
struct C : private A { …
struct C : A<B<C>> { …
template <typename T> struct C { …
template <template <typename> class T> struct C { …
template <> struct C<void> { …
struct [[using CC: opt(1), debug]] [[foo([[]])]] C { …
struct N::N1::C { …
template struct N::C<char*>::D<void> { …
struct alignas(8) C { …

Furthermore, C++ also allows class in to appear in place of struct, as well as union, enum and enum class. C++20 adds concept and requires to the mix, but their syntax is very different (and I don’t know it exactly), and I wouldn’t attempt to handle it with the same rule as structs.

joshgoebel added a commit to joshgoebel/highlight.js that referenced this issue Mar 27, 2021
joshgoebel added a commit to joshgoebel/highlight.js that referenced this issue Apr 4, 2021
joshgoebel added a commit to joshgoebel/highlight.js that referenced this issue Apr 4, 2021
joshgoebel added a commit to joshgoebel/highlight.js that referenced this issue Apr 6, 2021
joshgoebel added a commit to joshgoebel/highlight.js that referenced this issue Apr 12, 2021
joshgoebel added a commit to joshgoebel/highlight.js that referenced this issue Apr 13, 2021
joshgoebel added a commit that referenced this issue Apr 13, 2021
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
bug help welcome Could use help from community language
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants
@joshgoebel @klmr @ztane and others