lingo
is a simple tool for literate programming with Go and Markdown. lingo
is
heavily inspired by tango
, a similar tool designed
for literate programming with Rust and Markdown.
When run, lingo
will extract Go source code from fenced code blocks in each Markdown
file in the current directory. Markdown files must use the .md
extension, and code
will only be extracted from fenced code blocks with the language go
. Each Markdown
file some-file.md
that contains Go code will be converted into a file some-file.go
.
To author a program with lingo
, simply write your program as fenced code blocks in
Markdown files, then add a .go
file in the same directory with a //go:generate lingo
directive preceding its package name.
This file is the source for lingo
itself; let's break it down!
As usual, we start our program with a package clause followed by our import declarations. Because we're going to be working with Markdown, our only imports outside the standard library are from a fork of the Goldmark Markdown parser. We'll be using that package to parse Markdown into an AST that we'll then use as a basis for source code extraction.
package main
import (
"bytes"
"fmt"
"log"
"os"
"path/filepath"
"sort"
"github.com/pgavlin/goldmark"
"github.com/pgavlin/goldmark/ast"
"github.com/pgavlin/goldmark/extension"
goldmark_parser "github.com/pgavlin/goldmark/parser"
"github.com/pgavlin/goldmark/text"
"github.com/pgavlin/goldmark/util"
)
Because we're essentially generating source code, we'd like the extracted source code to retain its original source positions. This allows downstream tools to reference positions in the Markdown rather than positions in the extracted code. Go gives us the ability to propagate this information through the use of line directives.
The only position information we need is the line number itself, as we'll be emitting
directives of the form //line filename:line
. Unfortunately, Goldmark does not track
line information in its AST! It does, however, track the byte offset of each block of
text, including the contents of code blocks. We can determine the line number of a code
block ourselves by first building a byte offset to line number index from the Markdown
source. This index is a simple list of integers, where each entry tracks E_i
is the
byte offset of the end of line i
. With this structure, we can determine the number of
the line that contains a particular offset o
by searching for the smallest index i
where E_i > o
; the 1-indexed line number containing o
is then i + 1
.
type lineIndex []int
func (index lineIndex) lineNumber(offset int) int {
i := sort.Search(len(index), func(i int) bool {
return index[i] > offset
})
return i + 1
}
func indexLines(f []byte) lineIndex {
var index lineIndex
for offset, b := range f {
if b == '\n' {
index = append(index, offset)
}
}
return index
}
With our line index implemented, converting each file is straightforward. First, we read in the source code and build our line index:
func convertFile(name string) error {
contents, err := os.ReadFile(name)
if err != nil {
return err
}
index := indexLines(contents)
Next, we parse the Markdown:
parser := goldmark.DefaultParser()
parser.AddOptions(goldmark_parser.WithParagraphTransformers(
util.Prioritized(extension.NewTableParagraphTransformer(), 200),
))
document := parser.Parse(text.NewReader(contents))
Then, we walk the parsed AST, looking for fenced code blocks with the language go
:
var source bytes.Buffer
ast.Walk(document, func(n ast.Node, enter bool) (ast.WalkStatus, error) {
code, ok := n.(*ast.FencedCodeBlock)
if !ok || !enter || string(code.Language(contents)) != "go" {
return ast.WalkContinue, nil
}
lines := code.Lines()
if lines.Len() == 0 {
return ast.WalkContinue, nil
}
When we find a suitable code block, we determine its line number, then emit a line directive followed by the contents of the code block into our output:
lineNumber := index.lineNumber(lines.At(0).Start)
fmt.Fprintf(&source, "//line %v:%v\n", name, lineNumber)
for i := 0; i < lines.Len(); i++ {
line := lines.At(i)
source.Write(line.Value(contents))
}
return ast.WalkContinue, nil
})
Finally, we emit the collected source into an output file and return. If the walk did not extract any source code, we do not emit an output file.
if source.Len() == 0 {
return nil
}
return os.WriteFile(name[:len(name)-3]+".go", source.Bytes(), 0600)
}
The only thing left to do now is to implement lingo
's entry point. The entry point is
responsible for finding the Markdown files the tool will convert and driving their
conversion using convertFile
.
lingo
operates on Markdown files in the current directory, so we begin by fetching the
path of the current directory and listing its contents:
func main() {
wd, err := os.Getwd()
if err != nil {
log.Fatalf("could not read current directory: %v", err)
}
entries, err := os.ReadDir(wd)
if err != nil {
log.Fatalf("could not read current directory: %v", err)
}
Then, we iterate the directory's contents and attempt to convert each .md
file to a
.go
file.
for _, entry := range entries {
name := entry.Name()
ext := filepath.Ext(name)
if ext != ".md" {
continue
}
if err = convertFile(name); err != nil {
log.Fatalf("could not convert file '%v': %v", name, err)
}
}
}
And we're done!
Hacking on lingo
is a little bit different from working with a more traditional Go
code base. In order to make changes to lingo
itself, you'll need to edit this file,
then run go generate
. You should commit the changes to this file and the changes to
README.go
:
$ touch README.md
$ go generate
$ git add README.{md,go}
To build or install lingo
, just run go install
from the root of the repository.
Before testing lingo
, first build it using the instructions above. Once you've built
lingo
, you can run the tests by invoking go test
. The test data for lingo
lives
in the directories under testdata
. Each directory contains a single test, with the
inputs in the directory itself and the expected outputs in the expected
subdirectory.
Tests are driven by the code in main_test.go
. Each test runs lingo
in a particular
test directory, then compares the contents of the files in the directory with the contents
of the expected
subdirectory.