6.4 KiB
{ blurb: "Add a new lexer to Chroma" } $index
Introduction
Gitea uses Chroma for syntax highlighting. Chroma is based on the Python syntax highlighter, Pygments, and includes a script to help convert Pygments lexers for use with Chroma. This post describes that process.
Convert a Pygments lexer to a Chroma lexer with pygments2chroma_xml.py
In the Chroma root directory, we run:
$ docker run --rm -it -w /opt -v $PWD:/opt python bash -c \
"pip install pystache pygments && pip list \
&& python _tools/pygments2chroma_xml.py \
pygments.lexers.scripting.LuaLexer > lexers/embedded/lua.xml"
As output, we should see this in our terminal:
Package Version
-------- -------
pip 25.0.1
Pygments 2.19.2
pystache 0.6.8
This just helps us know what version of Pygments we generated our lexer from.
The file lexers/embedded/lua.xml
should now contain all the tokenization
rules for the Lua language.
::: filename-for-code-block
lexers/embedded/lua.xml
:::
<lexer>
<config>
<name>Lua</name>
...
Highlight some code with our new lexer
Chroma provides a simple example test file we can modify to see what syntax
highlighting with our new lexer looks like. First, though, we need to create a
new Go module by running go mod init
:
$ docker run --rm -it -w /opt -v $PWD:/opt golang:tip-bookworm \
go mod init main
go: creating new go.mod: module main
go: to add module requirements and sums:
go mod tidy
We will need required modules, so let's go ahead and run go mod tidy
as the
output suggests.
$ docker run --rm -it -w /opt -v $PWD:/opt golang:tip-bookworm \
go mod tidy
We should now have 2 additional files, go.mod
and go.sum
. go.sum
has some
package hashes while go.mod
should look like this:
::: filename-for-code-block
go.mod
:::
module main
go 1.25
require github.com/alecthomas/chroma/v2 v2.18.0
require github.com/dlclark/regexp2 v1.11.5 // indirect
Now we can create a main.go
file and copy over the code from Chroma's example
test file, but we update the code
variable and the lexer we pass into the
Highlight
function for Lua:
::: filename-for-code-block
main.go
:::
package main
import (
"log"
"os"
"github.com/alecthomas/chroma/v2/quick"
)
func main() {
code := `print("hello")`
err := quick.Highlight(os.Stdout, code, "lua", "html", "monokai")
if err != nil {
log.Fatal(err)
}
}
Now we can try running our main.go
like this:
$ docker run --rm -it -w /opt -v $PWD:/opt golang:tip-bookworm go run main.go
go: downloading github.com/alecthomas/chroma/v2 v2.18.0
go: downloading github.com/dlclark/regexp2 v1.11.5
<html>
<style type="text/css">
...
And that should output markup (and styles) for highlighting that block of Lua
code to the console. But if we notice, it's importing the Chroma package from
the GitHub repo. If we want to use a local version of Chroma, we have to use a
replace
directive to import Chroma from our local directory:
$ docker run --rm -it -w /opt -v $PWD:/opt golang:tip-bookworm \
go mod edit -replace github.com/alecthomas/chroma/v2@v2.18.0=./chroma
Which adds this line to our go.mod
file:
::: filename-for-code-block
go.mod
:::
...
replace github.com/alecthomas/chroma/v2 v2.18.0 => ./chroma
Now, when we run main.go
, we should no longer see Chroma being imported,
because it's using our local copy:
$ docker run --rm -it -w /opt -v $PWD:/opt golang:tip-bookworm go run main.go
go: downloading github.com/dlclark/regexp2 v1.11.5
<html>
<style type="text/css">
...
We should also see a list of styles followed by the HTML markup for highlighting our Lua code (formatted for legibility):
<pre class="chroma">
<code>
<span class="line">
<span class="cl">
<span class="n">print</span>
<span class="p">(</span>
<span class="s2">"hello"</span>
<span class="p">)</span>
</span>
</span>
</code>
</pre>
Add test data
If we want to add our lexer to Chroma, we will need to create some test data
for it. We can create a file in lexers/testdata
called lua.actual
and
add the language tokens to it.
Record test output
Once we have test data, we need to record the expected output. We create
another file called lexers/testdata/lua.expected
. This is the file we
will record to by running the following command from the Chroma root directory:
$ docker run --rm -it -w /opt -v $PWD:/opt -e RECORD=true golang:tip-bookworm \
go test ./lexers
Once test output is recorded in lexers/testdata/lua.expected
, we should
visually inspect and verify that the expected data is correct.
Run tests
As a final confirmation, we can run the tests to make sure we have not broken anything:
$ docker run --rm -it -w /opt -v $PWD:/opt golang:tip-bookworm \
go test ./lexers
Conclusion
If we followed all these steps correctly, our lexer should be ready to be
pushed to a git
repo and for us to open a pull request!
Bonus!: Use local Pygments with pygments2chroma_xml.py
These lines in pygments2chroma_xml.py
,
import pystache
from pygments import lexer as pygments_lexer
from pygments.token import _TokenType
import Pygments from the Python Package Index. But, if we are working on a
Pygments lexer locally, we might want to convert it to a Chroma lexer for
testing. We can import a local version of Pygments when running
pygments2chroma_xml.py
by running the following from the Pygments root
directory:
$ docker run --rm -it -w /opt -v $PWD:/opt \
-v path/to/chroma/_tools/pygments2chroma_xml.py:/opt/pygments2chroma_xml.py \
python bash -c "pip install pystache && pip list \
&& python pygments2chroma_xml.py pygments.lexers.scripting.LuaLexer"
We should see
Package Version
-------- -------
pip 25.0.1
pystache 0.6.8
which indicates no remote Pygments package is installed. Following that, we should also see the lexer markup output.
<lexer>
<config>
...