Publish post 'Add a Pygments Lexer to Chroma' #2

Merged
ccm merged 6 commits from ccm-chroma-post into trunk 2025-06-22 16:56:46 +00:00
2 changed files with 288 additions and 0 deletions

1
html/.gitignore vendored
View File

@ -2,6 +2,7 @@ app.css
code.html
index.html
pandoc.css
posts/add-a-pygments-lexer-to-chroma.html
posts/build-a-neovim-qt-appimage-from-source.html
posts/build-static-website-generator-part-1.html
posts/deploy-elixir-generated-html-with-docker-on-digitalocean.html

View File

@ -0,0 +1,287 @@
{
title: "Add a Pygments Lexer to Chroma"
blurb: "[Pygments][4] and [Chroma][5] are syntax highlighting libraries
written in [Python][6] and [Go][7], respecitvely. Chroma is missing a
language we like, which Pygments already supports. We add support for our
language to Chroma by converting the existing lexer from Pygments.
[4]: https://github.com/pygments/pygments
[5]: https://github.com/alecthomas/chroma
[6]: https://www.python.org/
[7]: https://go.dev/"
}
$index
## Introduction
[Gitea][8] uses [Chroma][9] for syntax highlighting. Chroma is based on the
Python syntax highlighter, [Pygments][10], and includes a [script][11] to help
convert Pygments lexers for use with Chroma. We describe how below.
[8]: https://github.com/go-gitea/gitea
[9]: https://github.com/alecthomas/chroma
[10]: https://github.com/pygments/pygments
[11]: https://github.com/alecthomas/chroma/blob/484750a96fc430f49d6b69cc2a2a8b7a67691446/_tools/pygments2chroma_xml.py
## Setup
We're going to be using the `python` and `golang` [Docker][3] images. Docker
Desktop is _not_ required.
```console
$ docker pull python
$ docker pull golang
```
Let's set up some aliases to make running the commands easier.
```console
$ alias docker-run='docker run --rm -it -w /opt -v $PWD:/opt'
$ alias docker-run-go='docker-run golang'
$ alias docker-run-py='docker-run python'
```
[3]: https://docs.docker.com/engine/
## Convert a Pygments lexer to a Chroma lexer with `pygments2chroma_xml.py`
```console
$ git clone https://github.com/alecthomas/chroma.git
$ cd chroma
```
In the Chroma root directory, we run:
```console
$ docker-run-py bash -c \
"pip install pystache pygments && \
python _tools/pygments2chroma_xml.py \
pygments.lexers.scripting.LuaLexer > lexers/embedded/lua.xml && \
pip list"
```
We should see this in the output:
```
Package Version
-------- -------
pip 25.0.1
Pygments 2.19.2
pystache 0.6.8
```
This just helps us know what version of Pygments we generated our lexer from.
The file `lexers/embedded/lua.xml` should now contain all the tokenization
rules for the [Lua](https://www.lua.org) language.
::: filename-for-code-block
`lexers/embedded/lua.xml`
:::
```xml
<lexer>
<config>
<name>Lua</name>
...
```
## Highlight some code with a Chroma lexer
Chroma provides a [simple example test file][1] we can modify to see what syntax
highlighting with our new lexer looks like. First, though, we need to create a
new Go module by running `go mod init`:
```console
$ cd ..
$ docker-run-go go mod init main
go: creating new go.mod: module main
go: to add module requirements and sums:
go mod tidy
```
We will need required modules, so let's go ahead and run `go mod tidy` as the
output suggests.
```console
$ docker-run-go go mod tidy
```
We should now have 2 additional files, `go.mod` and `go.sum`. `go.sum` has some
package hashes while `go.mod` should look like this:
::: filename-for-code-block
`go.mod`
:::
```
module main
go 1.25
require github.com/alecthomas/chroma/v2 v2.18.0
require github.com/dlclark/regexp2 v1.11.5 // indirect
```
Now we can create a `main.go` file and copy over the code from Chroma's example
test file, but we update the `code` variable with some Lua, `print("hello")`,
and the lexer we pass into the `Highlight` function is changed to `"lua"`:
::: filename-for-code-block
`main.go`
:::
```go
package main
import (
"log"
"os"
"github.com/alecthomas/chroma/v2/quick"
)
func main() {
code := `print("hello")`
err := quick.Highlight(os.Stdout, code, "lua", "html", "monokai")
if err != nil {
log.Fatal(err)
}
}
```
Now we can try running our `main.go` like this:
```console
$ docker-run-go go run main.go
go: downloading github.com/alecthomas/chroma/v2 v2.18.0
go: downloading github.com/dlclark/regexp2 v1.11.5
<html>
<style type="text/css">
...
```
And that should output markup (and styles) for highlighting that block of Lua
code to the console. But if we notice, it's importing the Chroma package from
the GitHub repo. If we want to use a local version of Chroma, we have to use a
[`replace` directive][2] to import Chroma from our local directory:
```console
$ docker-run-go go mod edit -replace \
github.com/alecthomas/chroma/v2@v2.18.0=./chroma
```
Which adds this line to our `go.mod` file:
::: filename-for-code-block
`go.mod`
:::
```
...
replace github.com/alecthomas/chroma/v2 v2.18.0 => ./chroma
```
Now, when we run `main.go`, we should no longer see Chroma being imported,
because it's using our local copy:
```console
$ docker-run-go go run main.go
go: downloading github.com/dlclark/regexp2 v1.11.5
<html>
<style type="text/css">
...
```
We should also see a list of styles followed by the HTML markup for
highlighting our Lua code (formatted for legibility):
```html
<pre class="chroma">
<code>
<span class="line">
<span class="cl">
<span class="n">print</span>
<span class="p">(</span>
<span class="s2">&#34;hello&#34;</span>
<span class="p">)</span>
</span>
</span>
</code>
</pre>
```
[1]: https://github.com/alecthomas/chroma/blob/484750a96fc430f49d6b69cc2a2a8b7a67691446/quick/example_test.go
[2]: https://go.dev/ref/mod#go-mod-file-replace
## Add test data
If we want to add our lexer to Chroma, we will need to create some test data
for it. We can create a file in `lexers/testdata` called `lua.actual` and
add the language tokens to it.
## Record test output
Once we have test data, we need to record the expected output. We create
another file called `lexers/testdata/lua.expected`. This is the file we
will record to by running the following command from the Chroma root directory:
```console
$ docker-run -e RECORD=true golang go test ./lexers
```
Once test output is recorded in `lexers/testdata/lua.expected`, we should
visually inspect and verify that the expected data is correct.
## Run tests
As a final confirmation, we can run the tests to make sure we have not broken
anything:
```console
$ docker-run-go go test ./lexers
```
## Conclusion
If we followed all these steps correctly, our lexer should be ready to be
pushed to a `git` repo and for us to open a pull request!
## Bonus!: Use local Pygments with `pygments2chroma_xml.py`
These lines in `pygments2chroma_xml.py`,
```python
import pystache
from pygments import lexer as pygments_lexer
from pygments.token import _TokenType
```
import Pygments from the [Python Package Index](https://pypi.org/). But, if we
want to convert a Pygments lexer from a local `git` repo, we can import it
by simply running the `pygments2chroma_xml.py` script from the repo root
directory.
```console
$ git clone https://github.com/pygments/pygments.git
$ cd pygments
$ docker-run \
-v ../chroma/_tools/pygments2chroma_xml.py:/opt/pygments2chroma_xml.py \
python bash -c \
"pip install pystache && \
python pygments2chroma_xml.py pygments.lexers.scripting.LuaLexer && \
pip list"
```
We should see the lexer output followed by
```console
Package Version
-------- -------
pip 25.0.1
pystache 0.6.8
```
which indicates no remote `pygments` package was installed.