Skip to content

Commit

Permalink
Add changelog parsing code (#11)
Browse files Browse the repository at this point in the history
* format

* copy src over

* move generation code to its own file

* rename to SimpleLog

* docs

* renames

* copy tests over

* rm toplevel url

* rm unused example

* tweak comments

* add to Changelog's changelog

* wip

* add comment

* wip

* more tests

* mv

* up

* test show methods

* document parsing functionality

* reorganize code to prepare for adding parsing code

* mark mutating

* relax stdlib compat

* 1.6-compatible multiple replacements

* format

* tweak VersionInfo struct to have two fields

* rename SimpleLog -> SimpleChangelog

* Rename SimpleChangeLog.jl to SimpleChangelog.jl

* add `tryparse` and `tryparsefile`

* format

* document try*

* typo

* rm OrderedDict & need for unique section names
  • Loading branch information
ericphanson authored Jan 17, 2025
1 parent 3c1511a commit 791e67a
Show file tree
Hide file tree
Showing 24 changed files with 5,039 additions and 2 deletions.
6 changes: 5 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,11 @@ All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

<!-- ## [Unreleased] -->
<!-- ## [Unreleased]
### Added
- Public types `SimpleChangelog` and `VersionInfo` which capture a simple in-memory representation of a changelog
- `Base.parse(SimpleChangelog, input)` and public function `parsefile` for parsing `SimpleChangelog`s from files and other representations. -->

## [v1.1.0] - 2023-11-13
### Added
Expand Down
10 changes: 10 additions & 0 deletions Project.toml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,17 @@ name = "Changelog"
uuid = "5217a498-cd5d-4ec6-b8c2-9b85a09b6e3e"
version = "1.1.0"

[deps]
AbstractTrees = "1520ce14-60c1-5f80-bbc7-55ef81b5835c"
CommonMark = "a80b9123-70ca-4bc0-993e-6e3bcb318db6"
Dates = "ade2ca70-3891-5945-98fb-dc099432e06a"
MarkdownAST = "d0879d2d-cac2-40c8-9cee-1863dc0c7391"

[compat]
AbstractTrees = "0.4.5"
CommonMark = "0.8.15"
Dates = "1"
MarkdownAST = "0.1.2"
julia = "1.6"

[extras]
Expand Down
41 changes: 41 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,3 +52,44 @@ The typical workflow is as follows:
- Description of new feature with reference to pull request
([#123](https://github.com/JuliaDocs/Changelog.jl/issues/123)).
```

### Parsing changelogs

Changelog also provides functionality for parsing changelogs into a simple structure which can be programmatically queried,
e.g. to check what the changes are for a particular version. This functionality is primarily intended for parsing [KeepAChangeLog](https://keepachangelog.com/en/1.1.0/)-style changelogs, that have a title as a H1 (e.g. `#`) markdown header, followed by a list of versions with H2-level headers (`##`) formatted like `[1.1.0] - 2019-02-15` with or without a link on the version number, followed by a bulleted list of changes, potentially in subsections, each with H3 header. For such changelogs, parsing should be stable. We may also attempt to parse a wider variety of headers, for which the extent that we can parse may change in non-breaking releases (typically improving the parsing, but potentially regressing in some cases).

The API for this functionality consists of:

- `SimpleChangelog`: structure that contains a simple representation of a changelog.
- `VersionInfo`: structure that contains a simple representation of a version in a changelog.
- `Base.parse(SimpleChangelog, str)`: parse a markdown-formatted string into a `SimpleChangelog` and likewise `Base.tryparse`
- `Changelog.parsefile` (and `Changelog.tryparsefile`): parses a markdown-formatted file into a `SimpleChangelog`

For example, using `Changelog.parsefile` on the [CHANGELOG.md](./CHANGELOG.md) as of version 1.1 gives:

```julia
julia> changelog = Changelog.parsefile("CHANGELOG.md")
SimpleChangelog with
- title: Changelog.jl changelog
- intro: All notable changes to this project will be documented in this file.
- 2 versions:
- 1.1.0
- url: https://github.com/JuliaDocs/Changelog.jl/releases/tag/v1.1.0
- date: 2023-11-13
- changes
- Added
- Links of the form `[<commit hash>]`, where `<commit hash>` is a commit hashof length 7 or 40, are now linkified. (#4)
- 1.0.0
- url: https://github.com/JuliaDocs/Changelog.jl/releases/tag/v1.0.0
- date: 2023-11-13
- changes
- First release. See README.md for currently supported functionality.
```
The changes for 1.1.0 can be obtained by `changelog.versions[1].sectioned_changes`:
```julia
julia> changelog.versions[1].sectioned_changes
1-element Vector{Pair{String, Vector{String}}}:
"Added" => ["Links of the form `[<commit hash>]`, where `<commit hash>` is a commit hashof length 7 or 40, are now linkified. (#4)"]
```
20 changes: 19 additions & 1 deletion src/Changelog.jl
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,27 @@ documentation.
"""
module Changelog

VERSION >= v"1.11.0-DEV.469" && eval(Meta.parse("public generate"))
using MarkdownAST
using Dates
using AbstractTrees
import CommonMark as CM

VERSION >= v"1.11.0-DEV.469" && eval(Meta.parse("public parsefile, VersionInfo, SimpleChangelog, generate, tryparsefile"))

# generate Documenter changelogs and links
include("generate.jl")

# CommonMark <> MarkdownAST code
include("commonmark_markdownast_interop.jl")
using .CommonMarkMarkdownASTInterop: md_convert

# Convert MarkdownAST tree to our own tree
include("heading_tree.jl")

# SimpleChangelog and VersionInfo types, as well as API entrypoints
include("SimpleChangelog.jl")

# Tree traversal and parsing code
include("parse_changelog.jl")

end # module
153 changes: 153 additions & 0 deletions src/SimpleChangelog.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,153 @@
"""
VersionInfo
A struct representing the information in a changelog about a particular version, with properties:
- `version::Union{Nothing, String}`: a string representation of a version number or name (e.g. "Unreleased" or "1.2.3").
- `url::Union{Nothing, String}`: a URL associated to the version, if available
- `date::Union{Nothing, Date}`: a date associated to the version, if available
- `toplevel_changes::Vector{String}`: a list of changes which are not within a section
- `sectioned_changes::Vector{Pair{String, Vector{String}}}`: an ordered mapping of section name to a list of changes in that section.
"""
struct VersionInfo
version::Union{Nothing, String}
url::Union{Nothing, String}
date::Union{Nothing, Date}
toplevel_changes::Vector{String}
sectioned_changes::Vector{Pair{String, Vector{String}}}
end
function Base.show(io::IO, ::MIME"text/plain", v::VersionInfo)
return full_show(io, v)
end

function full_show(io, v::VersionInfo; indent = 0, showtype = true)
pad = " "^indent
if showtype
print(io, pad, VersionInfo, " with")
print(io, pad, "\n- version: ", v.version)
else
print(io, pad, "- ", v.version)
pad *= " "
end
if v.url !== nothing
print(io, "\n", pad, "- url: ", v.url)
end
print(io, "\n", pad, "- date: ", v.date)
return if isempty(v.sectioned_changes) && isempty(v.toplevel_changes)
print(io, "\n", pad, "- and no documented changes")
else
print(io, "\n", pad, "- changes")
if !isempty(v.toplevel_changes)
for b in v.toplevel_changes
print(io, "\n", pad, " - $b")
end
end

if !isempty(v.sectioned_changes)
for (section_name, bullets) in v.sectioned_changes
print(io, "\n", pad, " - $section_name")
for b in bullets
print(io, "\n", pad, " - $b")
end
end
end
end
end

"""
SimpleChangelog
A simple in-memory changelog format, with properties:
- `title::Union{Nothing, String}`
- `intro::Union{Nothing, String}`
- `versions::Vector{VersionInfo}`
A `SimpleChangelog` can be parsed out of a markdown-formatted string with `Base.parse`.
SimpleChangelogs are not intended to be roundtrippable in-memory representations of markdown
changelogs; rather, they discard most formatting and other details to provide a simple
view to make it easy to query if the changelog has an entry for some particular version,
or what the changes are for that version.
See also: [`VersionInfo`](@ref), [`parsefile`](@ref).
"""
struct SimpleChangelog
title::Union{Nothing, String}
intro::Union{Nothing, String}
versions::Vector{VersionInfo}
end

function Base.show(io::IO, ::MIME"text/plain", c::SimpleChangelog)
print(io, SimpleChangelog, " with")
print(io, "\n- title: ", c.title)
print(io, "\n- intro: ", c.intro)
n_versions = length(c.versions)
plural = n_versions > 1 ? "s" : ""
print(io, "\n- $(n_versions) version$plural:")
n_to_show = 5
for v in first(c.versions, n_to_show)
print(io, "\n")
full_show(io, v; showtype = false, indent = 2)
end
if n_versions > n_to_show
print(io, "\n")
end
return
end

"""
parse(::Type{SimpleChangelog}, text::AbstractString)
Parse a [`SimpleChangelog`](@ref) from a markdown-formatted string.
!!! note
This functionality is primarily intended for parsing [KeepAChangeLog](https://keepachangelog.com/en/1.1.0/)-style changelogs, that have a title as a H1 (e.g. `#`) markdown header, followed by a list of versions with H2-level headers (`##`) formatted like `[1.1.0] - 2019-02-15` with or without a link on the version number, followed by a bulleted list of changes, potentially in subsections, each with H3 header. For such changelogs, parsing should be stable. We may also attempt to parse a wider variety of headers, for which the extent that we can parse may change in non-breaking releases (typically improving the parsing, but potentially regressing in some cases).
"""
function Base.parse(::Type{SimpleChangelog}, text::AbstractString)
# parse into CommonMark AST
parser = CM.Parser()
CM.enable!(parser, CM.FootnoteRule())
ast = parser(text)
# convert to MarkdownAST AST
ast = md_convert(MarkdownAST.Node, ast)
return _parse_simple_changelog!(ast)
end

"""
tryparse(::Type{SimpleChangelog}, text::AbstractString)
Try to parse a [`SimpleChangelog`](@ref) from a markdown-formatted string,
returning `nothing` if unable to.
"""
function Base.tryparse(::Type{SimpleChangelog}, text::AbstractString)
return try
parse(SimpleChangelog, text)
catch e
# This may be handy occasionally if we want to understand why we couldn't parse
# and don't want to manually run `parse(SimpleChangelog, text)`.
@debug "Error when parsing `SimpleChangelog` from changelog, returning `nothing`" exception = sprint(Base.display_error, e, catch_backtrace())
nothing
end
end

"""
parsefile(path) -> SimpleChangelog
Parse a [`SimpleChangelog`](@ref) from a file path `path`.
"""
function parsefile(path)
return parse(SimpleChangelog, read(path, String))
end

"""
tryparsefile(path) -> SimpleChangelog
Try to parse a [`SimpleChangelog`](@ref) from a file path `path`, returning
`nothing` if unable to.
"""
function tryparsefile(path)
return tryparse(SimpleChangelog, read(path, String))
end
115 changes: 115 additions & 0 deletions src/commonmark_markdownast_interop.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,115 @@
# Copy of:
# https://github.com/MichaelHatherly/CommonMark.jl/pull/56
# by Morten Piibeleht
# That PR is not merged, so we will vendor a copy here.
# We rename the function `md_convert` instead of `Base.convert`
# to avoid piracy.
# We stick it into a module to avoid unexpected side-effects, so hopefully we can cleanly
# delete this file once the PR is merged.
module CommonMarkMarkdownASTInterop
using CommonMark: Node, AbstractContainer, NULL_NODE,
Document, Paragraph, BlockQuote, ThematicBreak, HtmlBlock, DisplayMath,
Heading, CodeBlock, Admonition, List, Item, FootnoteDefinition,
LineBreak, Backslash, SoftBreak, Emph, Strong, HtmlInline, Math, FootnoteLink, Text,
Code, Image, Link, JuliaValue, Table, TableBody, TableCell, TableHeader, TableRow

import MarkdownAST

function md_convert(::Type{MarkdownAST.Node}, node::Node)
mdast = _mdast_node(node)
let child = node.first_child
while child != NULL_NODE
mdast_child = md_convert(MarkdownAST.Node, child)
push!(mdast.children, mdast_child)
child = child.nxt
end
end
return mdast
end

_mdast_node(node::Node) = _mdast_node(node, node.t)

# Fallback convert function
struct UnsupportedContainerError <: Exception
container_type::Type{<:AbstractContainer}
end
function Base.showerror(io::IO, e::UnsupportedContainerError)
return print(
io,
"UnsupportedContainerError: container of type '$(e.container_type)' is not supported in MarkdownAST",
)
end

function _mdast_node(::Node, ::T) where {T <: AbstractContainer}
throw(UnsupportedContainerError(T))
end

# For all singleton containers that map trivially (i.e. they have no attributes),
# we can have a single implementation.
const SINGLETON_CONTAINER_MAP = Dict(
Document => MarkdownAST.Document,
Paragraph => MarkdownAST.Paragraph,
BlockQuote => MarkdownAST.BlockQuote,
ThematicBreak => MarkdownAST.ThematicBreak,
LineBreak => MarkdownAST.LineBreak,
Backslash => MarkdownAST.Backslash,
SoftBreak => MarkdownAST.SoftBreak,
Emph => MarkdownAST.Emph,
Strong => MarkdownAST.Strong,
# CommonMark.Item contains a field, but it's discarded in MarkdownAST
Item => MarkdownAST.Item,
# Internal nodes for tables
TableBody => MarkdownAST.TableBody,
TableHeader => MarkdownAST.TableHeader,
TableRow => MarkdownAST.TableRow,
)
const SINGLETON_CONTAINERS = Union{keys(SINGLETON_CONTAINER_MAP)...}
function _mdast_node(node::Node, container::SINGLETON_CONTAINERS)
e = SINGLETON_CONTAINER_MAP[typeof(container)]()
return MarkdownAST.Node(e)
end

# Some containers use the .literal field of the Node object to store the content,
# which generally maps to MarkdownAST.T(node.literal).
const LITERAL_CONTAINER_MAP = Dict(
Text => MarkdownAST.Text,
HtmlBlock => MarkdownAST.HTMLBlock,
HtmlInline => MarkdownAST.HTMLInline,
DisplayMath => MarkdownAST.DisplayMath,
Math => MarkdownAST.InlineMath,
Code => MarkdownAST.Code,
)
const LITERAL_CONTAINERS = Union{keys(LITERAL_CONTAINER_MAP)...}
function _mdast_node(node::Node, container::LITERAL_CONTAINERS)
e = LITERAL_CONTAINER_MAP[typeof(container)](node.literal)
return MarkdownAST.Node(e)
end

# Containers that need special handling
_mdast_node(n::Node, c::Heading) = MarkdownAST.Node(MarkdownAST.Heading(c.level))
_mdast_node(n::Node, c::Link) = MarkdownAST.Node(MarkdownAST.Link(c.destination, c.title))
_mdast_node(n::Node, c::Image) = MarkdownAST.Node(MarkdownAST.Image(c.destination, c.title))
_mdast_node(n::Node, c::List) =
MarkdownAST.Node(MarkdownAST.List(c.list_data.type, c.list_data.tight))
_mdast_node(n::Node, c::CodeBlock) =
MarkdownAST.Node(MarkdownAST.CodeBlock(c.info, n.literal))
_mdast_node(n::Node, c::Admonition) =
MarkdownAST.Node(MarkdownAST.Admonition(c.category, c.title))
_mdast_node(n::Node, c::FootnoteDefinition) =
MarkdownAST.Node(MarkdownAST.FootnoteDefinition(c.id))
_mdast_node(n::Node, c::FootnoteLink) = MarkdownAST.Node(MarkdownAST.FootnoteLink(c.id))
_mdast_node(n::Node, c::Table) = MarkdownAST.Node(MarkdownAST.Table(c.spec))
_mdast_node(n::Node, c::TableCell) =
MarkdownAST.Node(MarkdownAST.TableCell(c.align, c.header, c.column))
_mdast_node(n::Node, c::JuliaValue) = MarkdownAST.Node(MarkdownAST.JuliaValue(c.ex, c.ref))

# Unsupported containers (no MarkdownAST equivalent currently):
#
# Attributes, Citation, CitationBracket, FrontMatter, ReferenceList, References,
# LaTeXBlock, LaTeXInline
#
# Should never appear in a CommonMark tree:
#
# TablePipe (internal use), TableComponent (abstract), JuliaExpression (internal use)

end # module
Loading

0 comments on commit 791e67a

Please # to comment.