Publish post 'Add a Pygments Lexer to Chroma' #2
@ -1,62 +1,74 @@
|
|||||||
{
|
{
|
||||||
blurb: "Add a new lexer to chroma"
|
blurb: "Add a new lexer to Chroma"
|
||||||
}
|
}
|
||||||
$index
|
$index
|
||||||
|
|
||||||
## Intro
|
## Introduction
|
||||||
|
|
||||||
Gitea uses Chroma for syntax highlighting. Chroma doesn't have a MoonScript
|
[Gitea](https://github.com/go-gitea/gitea) uses [Chroma](https://github.com/alecthomas/chroma) for syntax highlighting. Chroma is based on the Python
|
||||||
lexer. It does has a Python script that can convert Pygments lexers, though,
|
syntax highlighter, [Pygments](https://github.com/pygments/pygments), and includes a [script](https://github.com/alecthomas/chroma/blob/484750a96fc430f49d6b69cc2a2a8b7a67691446/_tools/pygments2chroma_xml.py) to help convert Pygments
|
||||||
and Pygments has a MoonScript lexer.
|
lexers for use with Chroma. This post describes that process.
|
||||||
|
|
||||||
## Run MoonScript lexer generation script
|
## Convert a Pygments lexer to a Chroma lexer with `pygments2chroma_xml.py`
|
||||||
|
|
||||||
To create the lexer, in the Chroma root directory run:
|
In the Chroma root directory, we run:
|
||||||
|
|
||||||
```console
|
```console
|
||||||
$ docker run --rm -it -w /opt -v $PWD:/opt python bash -c \
|
$ docker run --rm -it -w /opt -v $PWD:/opt python bash -c \
|
||||||
"pip install pystache pygments \
|
"pip install pystache pygments && pip list \
|
||||||
&& python _tools/pygments2chroma_xml.py \
|
&& python _tools/pygments2chroma_xml.py \
|
||||||
pygments.lexers.scripting.MoonScriptLexer > lexers/embedded/moonscript.xml \
|
pygments.lexers.scripting.LuaLexer > lexers/embedded/lua.xml"
|
||||||
&& pip list"
|
|
||||||
```
|
```
|
||||||
|
|
||||||
## Use the Chroma MoonScript lexer to highlight some code
|
As output, we should see this in our terminal:
|
||||||
|
|
||||||
Create a file like this:
|
```
|
||||||
|
Package Version
|
||||||
|
-------- -------
|
||||||
|
pip 25.0.1
|
||||||
|
Pygments 2.19.2
|
||||||
|
pystache 0.6.8
|
||||||
|
```
|
||||||
|
|
||||||
|
This just helps us know what version of Pygments we generated our lexer from.
|
||||||
|
The file `lexers/embedded/lua.xml` should now contain all the tokenization
|
||||||
|
rules for the [Lua](https://www.lua.org) language.
|
||||||
|
|
||||||
::: filename-for-code-block
|
::: filename-for-code-block
|
||||||
`main.go`
|
`lexers/embedded/lua.xml`
|
||||||
:::
|
:::
|
||||||
|
|
||||||
```go
|
```xml
|
||||||
package main
|
<lexer>
|
||||||
|
<config>
|
||||||
import (
|
<name>Lua</name>
|
||||||
"fmt"
|
...
|
||||||
"os"
|
|
||||||
|
|
||||||
"github.com/alecthomas/chroma/v2/quick"
|
|
||||||
)
|
|
||||||
|
|
||||||
func main() {
|
|
||||||
code := `package main
|
|
||||||
|
|
||||||
func main() { }
|
|
||||||
`
|
|
||||||
|
|
||||||
fmt.Println(quick.Highlight(os.Stdout, code, "go", "html", "monokai"))
|
|
||||||
}
|
|
||||||
```
|
```
|
||||||
|
|
||||||
I did one of these:
|
## Highlight some code with our new lexer
|
||||||
|
|
||||||
|
Chroma provides a [simple example test file][1] we can modify to see what syntax
|
||||||
|
highlighting with our new lexer looks like. First, though, we need to create a
|
||||||
|
new Go module by running `go mod init`:
|
||||||
|
|
||||||
```console
|
```console
|
||||||
$ docker run --rm -it -w /opt -v $PWD:/opt golang:tip-bookworm \
|
$ docker run --rm -it -w /opt -v $PWD:/opt golang:tip-bookworm \
|
||||||
go mod init main
|
go mod init main
|
||||||
|
go: creating new go.mod: module main
|
||||||
|
go: to add module requirements and sums:
|
||||||
|
go mod tidy
|
||||||
```
|
```
|
||||||
|
|
||||||
Which gave me the `go.mod` file.
|
We will need required modules, so let's go ahead and run `go mod tidy` as the
|
||||||
|
output suggests.
|
||||||
|
|
||||||
|
```console
|
||||||
|
$ docker run --rm -it -w /opt -v $PWD:/opt golang:tip-bookworm \
|
||||||
|
go mod tidy
|
||||||
|
```
|
||||||
|
|
||||||
|
We should now have 2 additional files, `go.mod` and `go.sum`. `go.sum` has some
|
||||||
|
package hashes while `go.mod` should look like this:
|
||||||
|
|
||||||
::: filename-for-code-block
|
::: filename-for-code-block
|
||||||
`go.mod`
|
`go.mod`
|
||||||
@ -67,23 +79,55 @@ module main
|
|||||||
|
|
||||||
go 1.25
|
go 1.25
|
||||||
|
|
||||||
require (
|
require github.com/alecthomas/chroma/v2 v2.18.0
|
||||||
github.com/alecthomas/chroma/v2 v2.18.0 // indirect
|
|
||||||
github.com/dlclark/regexp2 v1.11.5 // indirect
|
require github.com/dlclark/regexp2 v1.11.5 // indirect
|
||||||
)
|
|
||||||
```
|
```
|
||||||
|
|
||||||
Then I did one of these:
|
Now we can create a `main.go` file and copy over the code from Chroma's example
|
||||||
|
test file, but we update the `code` variable and the lexer we pass into the
|
||||||
|
`Highlight` function for Lua:
|
||||||
|
|
||||||
|
|
||||||
|
::: filename-for-code-block
|
||||||
|
`main.go`
|
||||||
|
:::
|
||||||
|
|
||||||
|
```go
|
||||||
|
package main
|
||||||
|
|
||||||
|
import (
|
||||||
|
"log"
|
||||||
|
"os"
|
||||||
|
|
||||||
|
"github.com/alecthomas/chroma/v2/quick"
|
||||||
|
)
|
||||||
|
|
||||||
|
func main() {
|
||||||
|
code := `print("hello")`
|
||||||
|
|
||||||
|
err := quick.Highlight(os.Stdout, code, "lua", "html", "monokai")
|
||||||
|
if err != nil {
|
||||||
|
log.Fatal(err)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Now we can try running our `main.go` like this:
|
||||||
|
|
||||||
```console
|
```console
|
||||||
$ docker run --rm -it -w /opt -v $PWD:/opt golang:tip-bookworm \
|
$ docker run --rm -it -w /opt -v $PWD:/opt golang:tip-bookworm go run main.go
|
||||||
go run main.go
|
go: downloading github.com/alecthomas/chroma/v2 v2.18.0
|
||||||
|
go: downloading github.com/dlclark/regexp2 v1.11.5
|
||||||
|
<html>
|
||||||
|
<style type="text/css">
|
||||||
|
...
|
||||||
```
|
```
|
||||||
|
|
||||||
And that should output markup (and styles) for highlighting that block of Go
|
And that should output markup (and styles) for highlighting that block of Lua
|
||||||
code to the console. But if we notice, it's importing the Chroma package from
|
code to the console. But if we notice, it's importing the Chroma package from
|
||||||
the GitHub repo. We want to use our local version of chroma, so we use `go mod
|
the GitHub repo. If we want to use a local version of Chroma, we have to use a
|
||||||
edit` to [replace the chroma import with our local version](https://go.dev/ref/mod#go-mod-file-replace):
|
[`replace` directive][2] to import Chroma from our local directory:
|
||||||
|
|
||||||
```console
|
```console
|
||||||
$ docker run --rm -it -w /opt -v $PWD:/opt golang:tip-bookworm \
|
$ docker run --rm -it -w /opt -v $PWD:/opt golang:tip-bookworm \
|
||||||
@ -102,49 +146,76 @@ Which adds this line to our `go.mod` file:
|
|||||||
replace github.com/alecthomas/chroma/v2 v2.18.0 => ./chroma
|
replace github.com/alecthomas/chroma/v2 v2.18.0 => ./chroma
|
||||||
```
|
```
|
||||||
|
|
||||||
Now we can put some MoonScript in `main.go`.
|
Now, when we run `main.go`, we should no longer see Chroma being imported,
|
||||||
|
because it's using our local copy:
|
||||||
```go
|
|
||||||
code := `print "Hello, #{@name}!"`
|
|
||||||
|
|
||||||
fmt.Println(quick.Highlight(os.Stdout, code, "moonscript", "html", "monokai"))
|
|
||||||
```
|
|
||||||
|
|
||||||
And we have it:
|
|
||||||
|
|
||||||
```console
|
```console
|
||||||
$ docker run --rm -it -w /opt -v $PWD:/opt golang:tip-bookworm \
|
$ docker run --rm -it -w /opt -v $PWD:/opt golang:tip-bookworm go run main.go
|
||||||
go run main.go
|
|
||||||
go: downloading github.com/dlclark/regexp2 v1.11.5
|
go: downloading github.com/dlclark/regexp2 v1.11.5
|
||||||
|
<html>
|
||||||
|
<style type="text/css">
|
||||||
|
...
|
||||||
```
|
```
|
||||||
|
|
||||||
That should output syntax highlighting using our local version of chroma.
|
We should also see a list of styles followed by the HTML markup for
|
||||||
|
highlighting our Lua code (formatted for legibility):
|
||||||
|
|
||||||
## Create testdata
|
```html
|
||||||
|
<pre class="chroma">
|
||||||
|
<code>
|
||||||
|
<span class="line">
|
||||||
|
<span class="cl">
|
||||||
|
<span class="n">print</span>
|
||||||
|
<span class="p">(</span>
|
||||||
|
<span class="s2">"hello"</span>
|
||||||
|
<span class="p">)</span>
|
||||||
|
</span>
|
||||||
|
</span>
|
||||||
|
</code>
|
||||||
|
</pre>
|
||||||
|
```
|
||||||
|
|
||||||
Create a file in `lexers/testdata` called `moonscript.actual`. Add the tokens
|
[1]: https://github.com/alecthomas/chroma/blob/484750a96fc430f49d6b69cc2a2a8b7a67691446/quick/example_test.go
|
||||||
from the language in this file.
|
[2]: https://go.dev/ref/mod#go-mod-file-replace
|
||||||
|
|
||||||
|
## Add test data
|
||||||
|
|
||||||
|
If we want to add our lexer to Chroma, we will need to create some test data
|
||||||
|
for it. We can create a file in `lexers/testdata` called `lua.actual` and
|
||||||
|
add the language tokens to it.
|
||||||
|
|
||||||
## Record test output
|
## Record test output
|
||||||
|
|
||||||
Create another file called `lexers/testdata/moonscript.expected`. This is the
|
Once we have test data, we need to record the expected output. We create
|
||||||
file we will record to.
|
another file called `lexers/testdata/lua.expected`. This is the file we
|
||||||
|
will record to by running the following command from the Chroma root directory:
|
||||||
|
|
||||||
```console
|
```console
|
||||||
$ RECORD=true go test ./lexers
|
$ docker run --rm -it -w /opt -v $PWD:/opt -e RECORD=true golang:tip-bookworm \
|
||||||
|
go test ./lexers
|
||||||
```
|
```
|
||||||
|
|
||||||
Visually inspect and verify that the expected data is correct.
|
Once test output is recorded in `lexers/testdata/lua.expected`, we should
|
||||||
|
visually inspect and verify that the expected data is correct.
|
||||||
|
|
||||||
## Run tests
|
## Run tests
|
||||||
|
|
||||||
|
As a final confirmation, we can run the tests to make sure we have not broken
|
||||||
|
anything:
|
||||||
|
|
||||||
```console
|
```console
|
||||||
$ go test ./lexers
|
$ docker run --rm -it -w /opt -v $PWD:/opt golang:tip-bookworm \
|
||||||
|
go test ./lexers
|
||||||
```
|
```
|
||||||
|
|
||||||
## Bonus!: Use local `pygments` with `pygments2chroma_xml.py`
|
## Conclusion
|
||||||
|
|
||||||
These lines in `pygments2chroma_xml.py`:
|
If we followed all these steps correctly, our lexer should be ready to be
|
||||||
|
pushed to a `git` repo and for us to open a pull request!
|
||||||
|
|
||||||
|
## Bonus!: Use local Pygments with `pygments2chroma_xml.py`
|
||||||
|
|
||||||
|
These lines in `pygments2chroma_xml.py`,
|
||||||
|
|
||||||
```python
|
```python
|
||||||
import pystache
|
import pystache
|
||||||
@ -152,19 +223,20 @@ from pygments import lexer as pygments_lexer
|
|||||||
from pygments.token import _TokenType
|
from pygments.token import _TokenType
|
||||||
```
|
```
|
||||||
|
|
||||||
Import pygments from pip? How do we get it to load a local version of
|
import Pygments from the [Python Package Index](https://pypi.org/). But, if we are working on a
|
||||||
`pygments`?
|
Pygments lexer locally, we might want to convert it to a Chroma lexer for
|
||||||
|
testing. We can import a local version of Pygments when running
|
||||||
In Pygments root directory:
|
`pygments2chroma_xml.py` by running the following from the Pygments root
|
||||||
|
directory:
|
||||||
|
|
||||||
```console
|
```console
|
||||||
$ docker run --rm -it -w /opt -v $PWD:/opt \
|
$ docker run --rm -it -w /opt -v $PWD:/opt \
|
||||||
-v ../gitea-syntax-highlight/chroma/_tools/pygments2chroma_xml.py:/opt/pygments2chroma_xml.py \
|
-v path/to/chroma/_tools/pygments2chroma_xml.py:/opt/pygments2chroma_xml.py \
|
||||||
python bash -c "pip install pystache && pip list \
|
python bash -c "pip install pystache && pip list \
|
||||||
&& python pygments2chroma_xml.py pygments.lexers.scripting.LuaLexer"
|
&& python pygments2chroma_xml.py pygments.lexers.scripting.LuaLexer"
|
||||||
```
|
```
|
||||||
|
|
||||||
Should see.
|
We should see
|
||||||
|
|
||||||
```console
|
```console
|
||||||
Package Version
|
Package Version
|
||||||
@ -173,8 +245,8 @@ pip 25.0.1
|
|||||||
pystache 0.6.8
|
pystache 0.6.8
|
||||||
```
|
```
|
||||||
|
|
||||||
That shows no remote pygments package is installed. After that you will see the
|
which indicates no remote Pygments package is installed. Following that, we
|
||||||
lexer markup output.
|
should also see the lexer markup output.
|
||||||
|
|
||||||
```console
|
```console
|
||||||
<lexer>
|
<lexer>
|
||||||
|
Loading…
x
Reference in New Issue
Block a user