Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Possible to support different grammars for the same language? #95

Closed
sogaiu opened this issue Jan 11, 2021 · 5 comments
Closed

Possible to support different grammars for the same language? #95

sogaiu opened this issue Jan 11, 2021 · 5 comments
Labels
documentation Improvements or additions to documentation question Not a bug report or feature request

Comments

@sogaiu
Copy link
Contributor

sogaiu commented Jan 11, 2021

TLDR:

I would like to know whether it's technically feasible for multiple grammars for a single programming language (so the same major mode) to be integrated into emacs-tree-sitter, at least so that a user might choose which one to use.


The long version:

I made a grammar for the Janet programming language last year and discovered that at about the same time someone else had also made one.

It turns out we took somewhat different approaches in what we supported.

The approach I've taken is to try to recognize the bare minimum. One advantage of this is that the parse results appear to be more accurate -- adding more constructs yielded more errors during testing and it was also more difficult to get tree-sitter to accept. A drawback is that more work may need to be done for highligting (and possibly other features) because less is recognized.

The other grammar tries to recognize more constructs but at least from my testing I think it can misrecognize certain things (though that doesn't necessarily show up as an error) which the "simpler" grammar does not. It also may not be as suited for structural editing purposes because it recognizes certain forms specifically (e.g. there is a def node) and such things are consequently not idenitfiable via the grammar as lists (actually in Janet called "tuples").

I suspect that depending on what one wants to achieve which one is preferable may differ. It didn't occur to me until recently that it might be possible for both to be available in a single editor, but after talking with the nvim-treesitter folks, it seems it might be possible there.

This brings me to the TLDR point:

I would like to know whether it's technically feasible for multiple grammars for a single programming language (so the same major mode) to be integrated into emacs-tree-sitter, at least so that a user might choose which one to use.

Is it currently possible?

I looked at: https://github.com/ubolonton/emacs-tree-sitter/blob/master/langs/tree-sitter-langs.el#L81-L107 and started to wonder whether it's not going to work to put a couple of things in there that both start with janet-mode.

I've changed the name of the grammar I worked on to differ from the other one so at least adding to the following section seems like it could work out ok: https://github.com/ubolonton/emacs-tree-sitter/blob/master/langs/tree-sitter-langs-build.el#L81-L104

Thanks for your consideration.

@shackra shackra added feature request unexplored Need more research labels Jan 11, 2021
@ubolonton
Copy link
Collaborator

I would like to know whether it's technically feasible for multiple grammars for a single programming language (so the same major mode) to be integrated into emacs-tree-sitter, at least so that a user might choose which one to use.

Is it currently possible?

I looked at: https://github.com/ubolonton/emacs-tree-sitter/blob/master/langs/tree-sitter-langs.el#L81-L107 and started to wonder whether it's not going to work to put a couple of things in there that both start with janet-mode.

I've changed the name of the grammar I worked on to differ from the other one so at least adding to the following section seems like it could work out ok: https://github.com/ubolonton/emacs-tree-sitter/blob/master/langs/tree-sitter-langs-build.el#L81-L104

If you can change the major mode, you don't actually need to put it in tree-sitter-major-mode-language-alist. That list is mainly intended for major modes that are not aware of tree-sitter.

You can simply set tree-sitter-language in the major mode's initialization code. Then it's up to the major mode to select the grammar. You should still have different names for the grammars though, to make it work with tree-sitter-require. (It's possible to use the same name, but you would have to use tsc--load-language directly, and it would be more confusing.) If you don't have control over the major mode, you can still put the grammar-selecting logic in the mode's hook.

If you want 2 parse trees in the same buffer instead, you would need to define an advice for tree-sitter--do-parse, as well as additional buffer-local variables for the secondary grammar.

@ubolonton ubolonton added documentation Improvements or additions to documentation question Not a bug report or feature request and removed feature request unexplored Need more research labels Jan 13, 2021
@sogaiu
Copy link
Contributor Author

sogaiu commented Jan 14, 2021

Thanks again for your considered response!

I took the path of trying to use an existing mode janet-mode and utilized janet-mode-hook to set tree-sitter-language. It seemed that setting tree-sitter-hl-default-patterns was helpful in gettings things to work as well.

For future reference, I ended up with something like:

;;; janet-mode
(straight-use-package
 '(janet-mode :host github
              :repo "ALSchwalm/janet-mode"
              :files ("*.el")))

(use-package janet-mode
  :straight t
  :config
  (add-hook 'janet-mode-hook
            (lambda ()
              (setq tree-sitter-language
                    (tree-sitter-require 'janet
                                         "janet_simple"
                                         "tree_sitter_janet_simple"))
              (setq tree-sitter-hl-default-patterns
                    (condition-case nil
                        (with-temp-buffer
                          (insert-file-contents
                           ;; XXX
                           (expand-file-name
                            (concat
                             "~/.emacs.d/straight/repos/emacs-tree-sitter"
                             "/langs/queries/janet-simple/highlights.scm")))
                          (goto-char (point-max))
                          (insert "\n")
                          (buffer-string))
                      (file-missing nil))))))

Thank you also for the hints on having 2 parse trees in the same buffer. I haven't tried this yet, but am glad to hear it seems doable as I would like to give it a try at some point.

@sogaiu
Copy link
Contributor Author

sogaiu commented Jan 15, 2021

Continuing along these lines, I experimented with making a major mode that works with tree-sitter to begin with.

I noticed there was a sketch of necessary bits for a major mode here: #70 (comment)

Are the details still up-to-date?

There is at least one thing that seemed different in that example from what was mentioned earlier in this issue. I got the impression that it was not necessary to set tree-sitter-major-mode-language-alist if one is defining a major mode appropriately, but the other issue's sample code seems to be setting it.

I am asking because I did not have much luck with getting the highlighting to function. It seems like an appropriate grammar is loaded and tree-sitter-hl-default-patterns is set. I am also able to see the tree via the debug mode and see highlighted results via the query builder.

The major mode code does basically:

(require 'tree-sitter)
(require 'tree-sitter-hl)
;;; skipping lines
(define-derived-mode a-janet-mode prog-mode "a-janet"
  "Major mode for the Janet language"
  ;; skipping lines
  (setq tree-sitter-hl-default-patterns
        "(keyword) @type\n")
  (setq tree-sitter-language
        (tree-sitter-require 'janet-simple
                             "janet_simple"
                             "tree_sitter_janet_simple"))
  (tree-sitter-hl-mode))

Any hints?

May be I should start a new issue or append to the other one?

@ubolonton
Copy link
Collaborator

I noticed there was a sketch of necessary bits for a major mode here: #70 (comment)

Are the details still up-to-date?

I think so, but you should probably look at csharp-tree-sitter.el instead.

I am asking because I did not have much luck with getting the highlighting to function. It seems like an appropriate grammar is loaded and tree-sitter-hl-default-patterns is set. I am also able to see the tree via the debug mode and see highlighted results via the query builder.

Do you have the grammar and the full code for the major mode somewhere? You can also check #87 on how to debug highlighting issues.

@sogaiu
Copy link
Contributor Author

sogaiu commented Jan 17, 2021

@ubolonton

The csharp-tree-sitter.el code was very helpful. With this bit from it, highlighting is now working here:

  ;; https://github.com/ubolonton/emacs-tree-sitter/issues/84
  (unless font-lock-defaults
    (setq font-lock-defaults '(nil)))

Thanks also for pointing out #87. Having examined the mentioned symbols / values pointed out there, I noticed some were not as expected here. With those in mind, the above portion of csharp-tree-sitter.el stood out as possibly relevant (though ofc the comment should have been a big hint).

Thanks again!

@emacs-tree-sitter emacs-tree-sitter locked and limited conversation to collaborators Mar 19, 2021

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
documentation Improvements or additions to documentation question Not a bug report or feature request
Projects
None yet
Development

No branches or pull requests

3 participants