Run ispell on text nodes using tree sitter

tree-sitter is a great tool to have an incremental syntax tree of our code, in emacs it can be used to add syntax highlighting instead of using the regular regex based highlighting system. There are other use cases for this and now we're going to use it to build a simple tool to run ispell using the content of a text node, of course we can select the text manually but it will be easier and fancy to do it in a programmatic way using the syntax tree generated by tree-sitter.

Emacs doesn't have support by default for tree-sitter so we need to install it, the following code will do it using use-package:

(use-package tree-sitter
  :ensure t
  ;; enable highlight using tree-sitter instead of regex based system
  (tree-sitter-after-on . tree-sitter-hl-mode)

(use-package tree-sitter-langs
  :ensure t)

To build this tool we need two things:

  • Find a way to check if given our current position we are in a string node, it can be a literal string, a multiple line string, a comment, etc.

  • Call ispell pragmatically using the position of a tree-sitter node.

Using tree-sitter to get node at the current position

tree-sitter package has some functions we can use, tree-sitter-node-at-pos will give us the nearest node of a given type for our current position, so if we run (tree-sitter-node-at-post 'string (point)) if will return, in case it exists, a string node otherwise it will return nil, using this we can check for any possible "string" values, in a programming language we can have string, comment and other elements which have text that need a spell check.

Because tree-sitter use a specific grammar for every programming language, a "string" element can have different names, for example in python we have string but in go we have interpreted_string_literal, we can check this values by running M-x tree-sitter-debug-mode from a buffer using the language we want to know their "text" element names.

Now we have to define a list with all the supported languages we want to use:

(setq tree-sitter-text-grammar-mapping '((python-mode . (string comment))
                                         (go-mode . (interpreted_string_literal comment))
                                         (js-mode . (string template_string comment))
                                         (elixir-mode . (string comment))))

Here we only support 4 languages, but it easy to add more, these are the ones I use more often.

Now with the following function we can use the previous defined list of languages and extract a valid text node at current position:

(defun get-text-node-at-point ()
  "Get valid node for the current major mode using `tree-sitter-text-grammar-mapping'"
  (let* ((types (alist-get major-mode tree-sitter-text-grammar-mapping))
         ;; get string nodes from all the available nodes at the current point
         (matches (seq-map (lambda (x) (tree-sitter-node-at-pos x (point) t)) types))
         (filtered-matches (remove-if (lambda (x) (eq nil x)) matches)))
    ;; get first valid match
    (if filtered-matches
        (car filtered-matches))))

Call ispell using a tree-sitter text node

We can use ispell-region to run ispell over a specific region, this function receives the start and end positions of a region so we need to extract those values from our tree-sitter text node.

tsc-node-start-position and tsc-node-end-position can be used for this:

(defun run-ispell-on-node (node)
  "Run ispell over the text of the received `node'"
  (ispell-region (tsc-node-start-position node) (tsc-node-end-position node)))

Putting all together

Now we can combine these two functions and assign it to a keybinding:

(defun run-ispell-at-point ()
  "Run ispell at current point if there is a text node."
  (let ((node (get-text-node-at-point)))
    (if node
        (run-ispell-on-node node))))

(global-set-key (kbd "C-x C-s") 'run-ispell-at-point)

Now when we run C-x C-s, if we are at a text node, ispell will run and check the spelling of that node.