Custom Font Lock configuration in Emacs

Font Lock is the builtin Emacs minor mode used to highlight textual elements in buffers. Major modes usually configure it to detect various syntaxic constructions and attach faces to them.

The reason I ended up deep into Font Lock is because I was not satisfied with the way it is configured for lisp-mode, the major mode used for both Common Lisp and Emacs Lisp code. This forced me to get acquainted with various aspects of Font Lock in order to change its configuration. If you want to change highlighting for your favourite major mode, you will find this article useful.

Common Lisp highlighting done wrong

The core issue of Common Lisp highlighting in Emacs is that a lot of it is arbitrary and inconsistent:

The mode highlights what it calls “definers” and “keywords”, but it does not really make sense in Common Lisp. Why would WITH-OUTPUT-TO-STRING be listed as a keyword, but not CLASS-OF?
SIGNAL uses font-lock-warning-face. Why would it be a warning? Even stranger, why would you use this warning face for CHECK-TYPE?
Keywords and uninterned symbols are all highlighted with font-lock-builtin-face. But they are not functions or variables. They are not even special in any way, and their syntax already indicates clearly their nature. Having so many yellow symbols everywhere is really distracting.
All symbols starting with & are highlighted using font-lock-type-face. But lambda list arguments are not types, and symbols starting with & are not always lambda list arguments.
All symbols preceded by ( whose name starts with DO- or WITH- are highlighted as keywords. There is even a comment by RMS stating that it is too general. He is right.

Beyond these issues, the mode sadly uses default Font Lock faces instead of defining semantically appropriate faces and mapping them to existing ones as default values.

The chances of successfully driving this kind of large and disruptive change directly into Emacs are incredibly low. Even if it was to be accepted, the result would not be available until the next release, which could mean months. Fortunately, Emacs is incredibly flexible and we can change all of this ourselves.

Note that you may not agree with the list of issues above, and this is fine. The point of this article is to show you how you can change the way Emacs highlights content in order to match your preferences. And you can do that for all major modes!

Font Lock configuration

Font Lock always felt a bit magic and it took me some time to find the motivation to read the documentation. As is turned out, it can be used for very complex highlighting schemes, but basic features are not that hard to use.

The main configuration of Font Lock is stored in the font-lock-defaults buffer-local variable. It is a simple list containing the following entries:

A list of symbols containing the value to use for font-lock-keywords at each level, the first symbol being the default value.
The value used for font-lock-keywords-only. If it is nil, it enables syntaxic highlighting (strings and comments) in addition of search-based (keywords) highlighting.
The value used for font-lock-keywords-case-fold-search. If true, highlighting is case insensitive.
The value used for font-lock-syntax-table, the association list controlling syntaxic highlighting. If it is nil, Font Lock uses the syntax table configured with set-syntax-table. In lisp-mode this would mean lisp-mode-syntax-table.
All remaining values are bindings using the form (VARIABLE-NAME . VALUE) used to set buffer-local values for other Font Lock variables.

The part we are interested about is search-based highlighting which uses regular expressions to find specific text fragments and attach faces to them.

Values used for font-lock-keywords are also lists. Each element is a construct used to specify one or more keywords to highlight. While these constructs can have multiple forms for more complex use cases, we will only use the two simplest ones:

(REGEXP . FACE) tells Font Lock to use FACE for text fragments which match REGEXP. For example, you could use ("\\_<-?[0-9]+\\_>" . font-lock-constant-face) to highlight integers as constants (note the use of \_< and \_> to match the start and end of a symbol; see the regexp documentation for more information).
(REGEXP (GROUP FACE)…) is a bit more advanced. When REGEXP matches a subset of the buffer, Font Lock assigns faces to the capture group identified by their number. You could use this construction to detect a complex syntaxic element and highlight some of its parts with different faces.

Simplified Common Lisp highlighting

We are going to configure keyword highlighting for the following types of values:

Character literals, e.g. #\Space.
Function names in the context of a function call for standard Common lisp functions.
Standard Common Lisp values such as *STANDARD-OUTPUT* or PI.

Additionally, we want to keep the default syntaxic highlighting configuration which recognizes character strings, documentation strings and comments.

Faces

Let us start by defining new faces for the different values we are going to match:

(defface g-cl-character-face
  '((default :inherit font-lock-constant-face))
  "The face used to highlight Common Lisp character literals.")

(defface g-cl-standard-function-face
  '((default :inherit font-lock-keyword-face))
  "The face used to highlight standard Common Lisp function symbols.")

(defface g-cl-standard-value-face
  '((default :inherit font-lock-variable-name-face))
  "The face used to highlight standard Common Lisp value symbols.")

Nothing complicated here, we simply inherit from default Font Lock faces. You can then configure these faces in your color theme without affecting other modes using Font Lock.

Keywords

To detect standard Common Lisp functions and values, we are going to need a regular expression. The first step is to build a list of strings for both functions and values. Easy to do with a bit of Common Lisp code!

(defun standard-symbol-names (predicate)
  (let ((symbols nil))
    (do-external-symbols (symbol :common-lisp)
      (when (funcall predicate symbol)
        (push (string-downcase (symbol-name symbol)) symbols)))
    (sort symbols #'string<)))
    
(standard-symbol-names #'fboundp)
(standard-symbol-names #'boundp)

The STANDARD-SYMBOL-NAMES build a list of symbols exported from the :COMMON-LISP package which satisfy a predicate. The first call gives us the name of all symbols bound to a function, and the second all which are bound to a value.

The astute reader will immediately wonder about symbols which are bound both a function and a value. They are easy to find by calling INTERSECTION on both sets of names: +, /, *, -. It is not really a problem: we can highlight function calls by matching function names preceded by (, making sure that these symbols will be correctly identified as either function symbols or value symbols depending on the context.

We store these lists of strings in the g-cl-function-names and g-cl-value-names (the associated code is not reproduced here: these lists are quite long; but I posted them as a Gist).

With this lists, we can use the regexp-opt Emacs Lisp function to build optimized regular expressions matching them:

(defvar g-cl-font-lock-keywords
  (let* ((character-re (concat "#\\\\" lisp-mode-symbol-regexp "\\_>"))
         (function-re (concat "(" (regexp-opt g-cl-function-names t) "\\_>"))
         (value-re (regexp-opt g-cl-value-names 'symbols)))
    `((,character-re . 'g-cl-character-face)
      (,function-re
       (1 'g-cl-standard-function-face))
      (,value-re . 'g-cl-standard-value-face))))

Characters literals are reasonably easy to match.

Functions are a bit more complicated since we want to match the function name when it is preceded by an opening parenthesis. We use a capture capture (see the last argument of regexp-opt) for the function name and highlight it separately.

Values are always matched as full symbols: we do not want to highlight parts of a symbol, for example MAP in a symbol named MAPPING.

Final configuration

Finally we can define the variable which will be used for font-lock-defaults in the initialization hook; we copy the original value from lisp-mode, and change the keyword list for what is going to be our own configuration:

(defvar g-cl-font-lock-defaults
  '((g-cl-font-lock-keywords)
    nil                                 ; enable syntaxic highlighting
    t                                   ; case insensitive highlighting
    nil                                 ; use the lisp-mode syntax table
    (font-lock-mark-block-function . mark-defun)
    (font-lock-extra-managed-props help-echo)
	(font-lock-syntactic-face-function
	 . lisp-font-lock-syntactic-face-function)))

To configure font-lock-defaults, we simply set it in the initialization hook of lisp-mode:

(defun g-init-lisp-font-lock ()
  (setq font-lock-defaults g-cl-font-lock-defaults))
  
(add-hook 'lisp-mode-hook 'g-init-lisp-font-lock)

Comparison

Let us compare highlighting for a fragment of code before and after our changes:

Before After

The differences are subtle but important:

All standard functions are highlighted, helping to distinguish them from user-defined functions.
Standard values such as *ERROR-OUTPUT* are highlighted.
Character literals are highlighted the same way as character strings.
Keywords are not highlighted anymore, avoiding the confusion with function names.

Conclusion

That was not easy; but as always, the effort of going through the documentation and experimenting with different Emacs components was very rewarding. Font Lock does not feel like a black box anymore, opening the road for the customization of other major modes.

In the future, I will work on a custom color scheme to use more subtle colors, with the hope of reducing the rainbow effect of so many major modes, including lisp-mode.