Publish post 'Add a Pygments Lexer to Chroma' #2

Merged
ccm merged 6 commits from ccm-chroma-post into trunk 2025-06-22 16:56:46 +00:00
Showing only changes of commit 4274618525 - Show all commits

View File

@ -1,62 +1,74 @@
{ {
blurb: "Add a new lexer to chroma" blurb: "Add a new lexer to Chroma"
} }
$index $index
## Intro ## Introduction
Gitea uses Chroma for syntax highlighting. Chroma doesn't have a MoonScript [Gitea](https://github.com/go-gitea/gitea) uses [Chroma](https://github.com/alecthomas/chroma) for syntax highlighting. Chroma is based on the Python
lexer. It does has a Python script that can convert Pygments lexers, though, syntax highlighter, [Pygments](https://github.com/pygments/pygments), and includes a [script](https://github.com/alecthomas/chroma/blob/484750a96fc430f49d6b69cc2a2a8b7a67691446/_tools/pygments2chroma_xml.py) to help convert Pygments
and Pygments has a MoonScript lexer. lexers for use with Chroma. This post describes that process.
## Run MoonScript lexer generation script ## Convert a Pygments lexer to a Chroma lexer with `pygments2chroma_xml.py`
To create the lexer, in the Chroma root directory run: In the Chroma root directory, we run:
```console ```console
$ docker run --rm -it -w /opt -v $PWD:/opt python bash -c \ $ docker run --rm -it -w /opt -v $PWD:/opt python bash -c \
"pip install pystache pygments \ "pip install pystache pygments && pip list \
&& python _tools/pygments2chroma_xml.py \ && python _tools/pygments2chroma_xml.py \
pygments.lexers.scripting.MoonScriptLexer > lexers/embedded/moonscript.xml \ pygments.lexers.scripting.LuaLexer > lexers/embedded/lua.xml"
&& pip list"
``` ```
## Use the Chroma MoonScript lexer to highlight some code As output, we should see this in our terminal:
Create a file like this: ```
Package Version
-------- -------
pip 25.0.1
Pygments 2.19.2
pystache 0.6.8
```
This just helps us know what version of Pygments we generated our lexer from.
The file `lexers/embedded/lua.xml` should now contain all the tokenization
rules for the [Lua](https://www.lua.org) language.
::: filename-for-code-block ::: filename-for-code-block
`main.go` `lexers/embedded/lua.xml`
::: :::
```go ```xml
package main <lexer>
<config>
import ( <name>Lua</name>
"fmt" ...
"os"
"github.com/alecthomas/chroma/v2/quick"
)
func main() {
code := `package main
func main() { }
`
fmt.Println(quick.Highlight(os.Stdout, code, "go", "html", "monokai"))
}
``` ```
I did one of these: ## Highlight some code with our new lexer
Chroma provides a [simple example test file][1] we can modify to see what syntax
highlighting with our new lexer looks like. First, though, we need to create a
new Go module by running `go mod init`:
```console ```console
$ docker run --rm -it -w /opt -v $PWD:/opt golang:tip-bookworm \ $ docker run --rm -it -w /opt -v $PWD:/opt golang:tip-bookworm \
go mod init main go mod init main
go: creating new go.mod: module main
go: to add module requirements and sums:
go mod tidy
``` ```
Which gave me the `go.mod` file. We will need required modules, so let's go ahead and run `go mod tidy` as the
output suggests.
```console
$ docker run --rm -it -w /opt -v $PWD:/opt golang:tip-bookworm \
go mod tidy
```
We should now have 2 additional files, `go.mod` and `go.sum`. `go.sum` has some
package hashes while `go.mod` should look like this:
::: filename-for-code-block ::: filename-for-code-block
`go.mod` `go.mod`
@ -67,23 +79,55 @@ module main
go 1.25 go 1.25
require ( require github.com/alecthomas/chroma/v2 v2.18.0
github.com/alecthomas/chroma/v2 v2.18.0 // indirect
github.com/dlclark/regexp2 v1.11.5 // indirect require github.com/dlclark/regexp2 v1.11.5 // indirect
)
``` ```
Then I did one of these: Now we can create a `main.go` file and copy over the code from Chroma's example
test file, but we update the `code` variable and the lexer we pass into the
`Highlight` function for Lua:
::: filename-for-code-block
`main.go`
:::
```go
package main
import (
"log"
"os"
"github.com/alecthomas/chroma/v2/quick"
)
func main() {
code := `print("hello")`
err := quick.Highlight(os.Stdout, code, "lua", "html", "monokai")
if err != nil {
log.Fatal(err)
}
}
```
Now we can try running our `main.go` like this:
```console ```console
$ docker run --rm -it -w /opt -v $PWD:/opt golang:tip-bookworm \ $ docker run --rm -it -w /opt -v $PWD:/opt golang:tip-bookworm go run main.go
go run main.go go: downloading github.com/alecthomas/chroma/v2 v2.18.0
go: downloading github.com/dlclark/regexp2 v1.11.5
<html>
<style type="text/css">
...
``` ```
And that should output markup (and styles) for highlighting that block of Go And that should output markup (and styles) for highlighting that block of Lua
code to the console. But if we notice, it's importing the Chroma package from code to the console. But if we notice, it's importing the Chroma package from
the GitHub repo. We want to use our local version of chroma, so we use `go mod the GitHub repo. If we want to use a local version of Chroma, we have to use a
edit` to [replace the chroma import with our local version](https://go.dev/ref/mod#go-mod-file-replace): [`replace` directive][2] to import Chroma from our local directory:
```console ```console
$ docker run --rm -it -w /opt -v $PWD:/opt golang:tip-bookworm \ $ docker run --rm -it -w /opt -v $PWD:/opt golang:tip-bookworm \
@ -102,49 +146,76 @@ Which adds this line to our `go.mod` file:
replace github.com/alecthomas/chroma/v2 v2.18.0 => ./chroma replace github.com/alecthomas/chroma/v2 v2.18.0 => ./chroma
``` ```
Now we can put some MoonScript in `main.go`. Now, when we run `main.go`, we should no longer see Chroma being imported,
because it's using our local copy:
```go
code := `print "Hello, #{@name}!"`
fmt.Println(quick.Highlight(os.Stdout, code, "moonscript", "html", "monokai"))
```
And we have it:
```console ```console
$ docker run --rm -it -w /opt -v $PWD:/opt golang:tip-bookworm \ $ docker run --rm -it -w /opt -v $PWD:/opt golang:tip-bookworm go run main.go
go run main.go
go: downloading github.com/dlclark/regexp2 v1.11.5 go: downloading github.com/dlclark/regexp2 v1.11.5
<html>
<style type="text/css">
...
``` ```
That should output syntax highlighting using our local version of chroma. We should also see a list of styles followed by the HTML markup for
highlighting our Lua code (formatted for legibility):
## Create testdata ```html
<pre class="chroma">
<code>
<span class="line">
<span class="cl">
<span class="n">print</span>
<span class="p">(</span>
<span class="s2">&#34;hello&#34;</span>
<span class="p">)</span>
</span>
</span>
</code>
</pre>
```
Create a file in `lexers/testdata` called `moonscript.actual`. Add the tokens [1]: https://github.com/alecthomas/chroma/blob/484750a96fc430f49d6b69cc2a2a8b7a67691446/quick/example_test.go
from the language in this file. [2]: https://go.dev/ref/mod#go-mod-file-replace
## Add test data
If we want to add our lexer to Chroma, we will need to create some test data
for it. We can create a file in `lexers/testdata` called `lua.actual` and
add the language tokens to it.
## Record test output ## Record test output
Create another file called `lexers/testdata/moonscript.expected`. This is the Once we have test data, we need to record the expected output. We create
file we will record to. another file called `lexers/testdata/lua.expected`. This is the file we
will record to by running the following command from the Chroma root directory:
```console ```console
$ RECORD=true go test ./lexers $ docker run --rm -it -w /opt -v $PWD:/opt -e RECORD=true golang:tip-bookworm \
go test ./lexers
``` ```
Visually inspect and verify that the expected data is correct. Once test output is recorded in `lexers/testdata/lua.expected`, we should
visually inspect and verify that the expected data is correct.
## Run tests ## Run tests
As a final confirmation, we can run the tests to make sure we have not broken
anything:
```console ```console
$ go test ./lexers $ docker run --rm -it -w /opt -v $PWD:/opt golang:tip-bookworm \
go test ./lexers
``` ```
## Bonus!: Use local `pygments` with `pygments2chroma_xml.py` ## Conclusion
These lines in `pygments2chroma_xml.py`: If we followed all these steps correctly, our lexer should be ready to be
pushed to a `git` repo and for us to open a pull request!
## Bonus!: Use local Pygments with `pygments2chroma_xml.py`
These lines in `pygments2chroma_xml.py`,
```python ```python
import pystache import pystache
@ -152,19 +223,20 @@ from pygments import lexer as pygments_lexer
from pygments.token import _TokenType from pygments.token import _TokenType
``` ```
Import pygments from pip? How do we get it to load a local version of import Pygments from the [Python Package Index](https://pypi.org/). But, if we are working on a
`pygments`? Pygments lexer locally, we might want to convert it to a Chroma lexer for
testing. We can import a local version of Pygments when running
In Pygments root directory: `pygments2chroma_xml.py` by running the following from the Pygments root
directory:
```console ```console
$ docker run --rm -it -w /opt -v $PWD:/opt \ $ docker run --rm -it -w /opt -v $PWD:/opt \
-v ../gitea-syntax-highlight/chroma/_tools/pygments2chroma_xml.py:/opt/pygments2chroma_xml.py \ -v path/to/chroma/_tools/pygments2chroma_xml.py:/opt/pygments2chroma_xml.py \
python bash -c "pip install pystache && pip list \ python bash -c "pip install pystache && pip list \
&& python pygments2chroma_xml.py pygments.lexers.scripting.LuaLexer" && python pygments2chroma_xml.py pygments.lexers.scripting.LuaLexer"
``` ```
Should see. We should see
```console ```console
Package Version Package Version
@ -173,8 +245,8 @@ pip 25.0.1
pystache 0.6.8 pystache 0.6.8
``` ```
That shows no remote pygments package is installed. After that you will see the which indicates no remote Pygments package is installed. Following that, we
lexer markup output. should also see the lexer markup output.
```console ```console
<lexer> <lexer>