2025-06-22 16:56:46 +00:00
2 changed files with 288 additions and 0 deletions
--- a/html/.gitignore
+++ b/html/.gitignore
@ -2,6 +2,7 @@ app.css
 code.html
 index.html
 pandoc.css
+posts/add-a-pygments-lexer-to-chroma.html
 posts/build-a-neovim-qt-appimage-from-source.html
 posts/build-static-website-generator-part-1.html
 posts/deploy-elixir-generated-html-with-docker-on-digitalocean.html
--- a/posts/2025-06-20-add-a-pygments-lexer-to-chroma.md
+++ b/posts/2025-06-20-add-a-pygments-lexer-to-chroma.md
@ -0,0 +1,287 @@
+{
+  title: "Add a Pygments Lexer to Chroma"
+  blurb: "[Pygments][4] and [Chroma][5] are syntax highlighting libraries
+  written in [Python][6] and [Go][7], respecitvely. Chroma is missing a
+  language we like, which Pygments already supports. We add support for our
+  language to Chroma by converting the existing lexer from Pygments.
+
+  [4]: https://github.com/pygments/pygments
+  [5]: https://github.com/alecthomas/chroma
+  [6]: https://www.python.org/
+  [7]: https://go.dev/"
+}
+$index
+
+## Introduction
+
+[Gitea][8] uses [Chroma][9] for syntax highlighting. Chroma is based on the
+Python syntax highlighter, [Pygments][10], and includes a [script][11] to help
+convert Pygments lexers for use with Chroma. We describe how below.
+
+[8]: https://github.com/go-gitea/gitea
+[9]: https://github.com/alecthomas/chroma
+[10]: https://github.com/pygments/pygments
+[11]: https://github.com/alecthomas/chroma/blob/484750a96fc430f49d6b69cc2a2a8b7a67691446/_tools/pygments2chroma_xml.py
+
+## Setup
+
+We're going to be using the `python` and `golang` [Docker][3] images. Docker
+Desktop is _not_ required.
+
+```console
+$ docker pull python
+$ docker pull golang
+```
+
+Let's set up some aliases to make running the commands easier.
+
+```console
+$ alias docker-run='docker run --rm -it -w /opt -v $PWD:/opt'
+$ alias docker-run-go='docker-run golang'
+$ alias docker-run-py='docker-run python'
+```
+
+[3]: https://docs.docker.com/engine/
+
+## Convert a Pygments lexer to a Chroma lexer with `pygments2chroma_xml.py`
+
+```console
+$ git clone https://github.com/alecthomas/chroma.git
+$ cd chroma
+```
+
+In the Chroma root directory, we run:
+
+```console
+$ docker-run-py bash -c \
+ "pip install pystache pygments && \
+  python _tools/pygments2chroma_xml.py \
+    pygments.lexers.scripting.LuaLexer > lexers/embedded/lua.xml && \
+  pip list"
+```
+
+We should see this in the output:
+
+```
+Package  Version
+-------- -------
+pip      25.0.1
+Pygments 2.19.2
+pystache 0.6.8
+```
+
+This just helps us know what version of Pygments we generated our lexer from.
+The file `lexers/embedded/lua.xml` should now contain all the tokenization
+rules for the [Lua](https://www.lua.org) language.
+
+::: filename-for-code-block
+`lexers/embedded/lua.xml`
+:::
+
+```xml
+<lexer>
+  <config>
+    <name>Lua</name>
+    ...
+```
+
+## Highlight some code with a Chroma lexer
+
+Chroma provides a [simple example test file][1] we can modify to see what syntax
+highlighting with our new lexer looks like. First, though, we need to create a
+new Go module by running `go mod init`:
+
+```console
+$ cd ..
+$ docker-run-go go mod init main
+go: creating new go.mod: module main
+go: to add module requirements and sums:
+	go mod tidy
+```
+
+We will need required modules, so let's go ahead and run `go mod tidy` as the
+output suggests.
+
+```console
+$ docker-run-go go mod tidy
+```
+
+We should now have 2 additional files, `go.mod` and `go.sum`. `go.sum` has some
+package hashes while `go.mod` should look like this:
+
+::: filename-for-code-block
+`go.mod`
+:::
+
+```
+module main
+
+go 1.25
+
+require github.com/alecthomas/chroma/v2 v2.18.0
+
+require github.com/dlclark/regexp2 v1.11.5 // indirect
+```
+
+Now we can create a `main.go` file and copy over the code from Chroma's example
+test file, but we update the `code` variable with some Lua, `print("hello")`,
+and the lexer we pass into the `Highlight` function is changed to `"lua"`:
+
+::: filename-for-code-block
+`main.go`
+:::
+
+```go
+package main
+
+import (
+	"log"
+	"os"
+
+	"github.com/alecthomas/chroma/v2/quick"
+)
+
+func main() {
+	code := `print("hello")`
+
+	err := quick.Highlight(os.Stdout, code, "lua", "html", "monokai")
+	if err != nil {
+		log.Fatal(err)
+	}
+}
+```
+
+Now we can try running our `main.go` like this:
+
+```console
+$ docker-run-go go run main.go
+go: downloading github.com/alecthomas/chroma/v2 v2.18.0
+go: downloading github.com/dlclark/regexp2 v1.11.5
+<html>
+<style type="text/css">
+...
+```
+
+And that should output markup (and styles) for highlighting that block of Lua
+code to the console. But if we notice, it's importing the Chroma package from
+the GitHub repo. If we want to use a local version of Chroma, we have to use a
+[`replace` directive][2] to import Chroma from our local directory:
+
+```console
+$ docker-run-go go mod edit -replace \
+github.com/alecthomas/chroma/v2@v2.18.0=./chroma
+```
+
+Which adds this line to our `go.mod` file:
+
+::: filename-for-code-block
+`go.mod`
+:::
+
+```
+...
+
+replace github.com/alecthomas/chroma/v2 v2.18.0 => ./chroma
+```
+
+Now, when we run `main.go`, we should no longer see Chroma being imported,
+because it's using our local copy:
+
+```console
+$ docker-run-go go run main.go
+go: downloading github.com/dlclark/regexp2 v1.11.5
+<html>
+<style type="text/css">
+...
+```
+
+We should also see a list of styles followed by the HTML markup for
+highlighting our Lua code (formatted for legibility):
+
+```html
+<pre class="chroma">
+  <code>
+    <span class="line">
+      <span class="cl">
+        <span class="n">print</span>
+        <span class="p">(</span>
+        <span class="s2">&#34;hello&#34;</span>
+        <span class="p">)</span>
+      </span>
+    </span>
+  </code>
+</pre>
+```
+
+[1]: https://github.com/alecthomas/chroma/blob/484750a96fc430f49d6b69cc2a2a8b7a67691446/quick/example_test.go
+[2]: https://go.dev/ref/mod#go-mod-file-replace 
+
+## Add test data
+
+If we want to add our lexer to Chroma, we will need to create some test data
+for it. We can create a file in `lexers/testdata` called `lua.actual` and
+add the language tokens to it.
+
+## Record test output
+
+Once we have test data, we need to record the expected output. We create
+another file called `lexers/testdata/lua.expected`. This is the file we
+will record to by running the following command from the Chroma root directory:
+
+```console
+$ docker-run -e RECORD=true golang go test ./lexers
+```
+
+Once test output is recorded in `lexers/testdata/lua.expected`, we should
+visually inspect and verify that the expected data is correct.
+
+## Run tests
+
+As a final confirmation, we can run the tests to make sure we have not broken
+anything:
+
+```console
+$ docker-run-go go test ./lexers
+```
+
+## Conclusion
+
+If we followed all these steps correctly, our lexer should be ready to be
+pushed to a `git` repo and for us to open a pull request!
+
+## Bonus!: Use local Pygments with `pygments2chroma_xml.py`
+
+These lines in `pygments2chroma_xml.py`,
+
+```python
+import pystache
+from pygments import lexer as pygments_lexer
+from pygments.token import _TokenType
+```
+
+import Pygments from the [Python Package Index](https://pypi.org/). But, if we
+want to convert a Pygments lexer from a local `git` repo, we can import it
+by simply running the `pygments2chroma_xml.py` script from the repo root
+directory.
+
+```console
+$ git clone https://github.com/pygments/pygments.git
+$ cd pygments
+$ docker-run \
+-v ../chroma/_tools/pygments2chroma_xml.py:/opt/pygments2chroma_xml.py \
+python bash -c \
+ "pip install pystache && \
+  python pygments2chroma_xml.py pygments.lexers.scripting.LuaLexer && \
+  pip list"
+
+```
+We should see the lexer output followed by
+
+```console
+Package  Version
+-------- -------
+pip      25.0.1
+pystache 0.6.8
+```
+
+which indicates no remote `pygments` package was installed.