Skip to content

Commit 383797d

Browse files
authored
Merge pull request #20 from VolumeGraphics/new_csv_tokenizer
New csv tokenizer - 0.1.5-RC2
2 parents 313c60f + d6e167d commit 383797d

File tree

13 files changed

+971
-361
lines changed

13 files changed

+971
-361
lines changed

Cargo.toml

+6-3
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,16 @@
11
[package]
22
name = "havocompare"
3-
description = "A flexible folder comparison tool / crate with html reporting."
3+
description = "A flexible rule-based file and folder comparison tool and crate including nice html reporting. Compares CSVs, text files, pdf-texts and images."
44
repository = "https://github.com/VolumeGraphics/havocompare"
55
homepage = "https://github.com/VolumeGraphics/havocompare"
66
documentation = "https://docs.rs/havocompare"
7-
version = "0.1.5-RC1"
7+
version = "0.2.0"
88
edition = "2021"
99
license = "MIT"
1010
authors = ["Volume Graphics GmbH"]
1111
exclude = ["tests/integ", "tests/html", "target", "tests/csv", ".github", "test_report"]
12+
keywords = ["diff" ,"compare", "csv", "image", "difference", "csv-diff", "image-diff", "pdf", "pdf-diff", "plain-text", "plain-text-diff"]
13+
categories = ["filesystem"]
1214

1315
[dependencies]
1416
clap = {version= "4.0", features=["derive"]}
@@ -19,7 +21,7 @@ schemars_derive = "0.8"
1921
thiserror = "1.0"
2022
regex = "1.6"
2123
image = "0.24.4"
22-
image-compare = "0.2.3"
24+
image-compare = "0.2.4"
2325
tracing = "0.1"
2426
tracing-subscriber = "0.3"
2527
serde_json = "1.0"
@@ -35,6 +37,7 @@ data-encoding = "2.3.2"
3537
permutation = "0.4.1"
3638
pdf-extract = "0.6.4"
3739
vg_errortools = "0.1.0"
40+
rayon = "1.6"
3841

3942
[target.'cfg(windows)'.dependencies]
4043
ansi_term = "0.12"

README.md

+12-5
Original file line numberDiff line numberDiff line change
@@ -91,16 +91,17 @@ rules:
9191
exclude_field_regex: "Excluded"
9292
# optional: preprocessing of the csv files
9393
preprocessing:
94-
# extracts the headers to the header-fields, makes reportings more legible and allows for further processing "ByName"
94+
# extracts the headers to the header-fields, makes reports more legible and allows for further processing "ByName".
95+
# While it may fail, there's no penalty for it, as long as you don't rely on it.
9596
- ExtractHeaders
9697
# Sort the table by column 0, beware that the column must only contain numbers / quantities
9798
- SortByColumnNumber: 0
98-
# Delete a column by name, needs `ExtractHeaders` first - delete sets all values to 'DELETED's
99+
# Delete a column by name, needs `ExtractHeaders` first - delete sets all values to 'DELETED'
99100
- DeleteColumnByName: "Column to delete"
100101
- DeleteColumnByNumber: 1
101102
# Sorts are stable, so a second sort will keep the first sort as sub-order.
102103
- SortByColumnName: "Sort by column name blabla"
103-
# Deletes the first row by setting all values to 'DELETED's - meaning that numbering stays constant
104+
# Deletes the first row by setting all values to 'DELETED' - meaning that numbering stays constant
104105
- DeleteRowByNumber: 0
105106
# Deletes rows having any element matching the given regex (may delete different lines in nom / act!
106107
- DeleteRowByRegex: "Vertex_Count"
@@ -177,12 +178,18 @@ Currently we only support SHA-256 but more checks can be added easily.
177178

178179
## Changelog
179180

180-
### 0.1.5
181+
### 0.2.0
181182
- Deletion of columns will no longer really delete them but replace every value with "DELETED"
182183
- Expose config struct to library API
183184
- Fixed a bug regarding wrong handling of multiple empty lines
184185
- Reworked CSV reporting to have an interleaved and more compact view
185-
- Display the relative path of compared files instead of file name in the report index.html
186+
- Display the relative path of compared files instead of file name in the report index.html
187+
- Made header-extraction fallible but uncritical - can now always be enabled
188+
- Wrote a completely new csv parser:
189+
- Respects escaping with '\'
190+
- Allows string-literals containing unescaped field separators (field1, "field2, but as literal", field3)
191+
- Allows multi-line string literals with quotes
192+
- CSVs with non-rectangular format will now fail
186193

187194
### 0.1.4
188195
- Add multiple includes and excludes - warning, this will break yamls from 0.1.3 and earlier

0 commit comments

Comments
 (0)