Skip to content
forked from simonw/csv-diff

Python CLI tool and library for comparing CSV database dumps and finding differences.

License

Notifications You must be signed in to change notification settings

datsom1/db-diff

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

db-diff

Changelog License

Python CLI tool and library for comparing CSV database dumps and finding differences.

Installation

pip install git+https://github.com/datsom1/db-diff.git

To install the unofficial latest version (you probably don't need to do this):

pip install --upgrade --force-reinstall git+https://github.com/datsom1/db-diff.git

Usage

Consider two CSV files:

one.csv

Id,name,age
1,Cleo,4
2,Pancakes,2

two.csv

Id,name,age
1,Cleo,5
3,Bailey,1

db-diff can show a human-readable summary of differences between the files:

$ db-diff one.csv two.csv --key=Id
1 rows changed, 1 rows added, 1 rows removed

1 rows changed

  Rows 1
    age: "4" => "5"

1 rows added

  Id: 3
  name: Bailey
  age: 1

1 rows removed

  Id: 2
  name: Pancakes
  age: 2

The --key=Id option means that the Id column should be treated as the unique key, to identify which records have changed.

The tool will automatically detect if your files are comma- or tab-separated. You can over-ride this automatic detection and force the tool to use a specific format using --format=tsv or --format=csv.

You can also feed it JSON files, provided they are a JSON array of objects where each object has the same keys. Use --format=json if your input files are JSON.

Use --show-unchanged to include full details of the unchanged values for rows with at least one change in the diff output:

% db-diff one.csv two.csv --key=Id --showunchanged
1 rows changed

  Id: 1
    age: "4" => "5"

    Unchanged:
      name: "Cleo"

JSON output

You can use the --output=json option to get a machine-readable difference:

$ db-diff one.csv two.csv --key=Id --output=json
{
    "added": [
        {
            "id": "3",
            "name": "Bailey",
            "age": "1"
        }
    ],
    "removed": [
        {
            "id": "2",
            "name": "Pancakes",
            "age": "2"
        }
    ],
    "changed": [
        {
            "key": "1",
            "changes": {
                "age": [
                    "4",
                    "5"
                ]
            }
        }
    ],
    "columns_added": [],
    "columns_removed": []
}

JSON file output

You can use the --output=jsonfile and --outputfile= option to automatically save a .json file of the output:

$ db-diff one.csv two.csv --key=Id --output=jsonfile --outputfile=diffs.json

Measure time

You can use the --time option to meaure the time it takes:

$ db-diff one.csv two.csv --key=Id --time
.
.
.
Elapsed time: 0.016 seconds

As a Python library

You can also import the Python library into your own code like so:

from csv_diff import load_csv, compare
diff = compare(
    load_csv(open("one.csv"), key="Id"),
    load_csv(open("two.csv"), key="Id")
)

diff will now contain the same data structure as the output in the --output=json example above.

If the columns in the CSV have changed, those added or removed columns will be ignored when calculating changes made to specific rows.

Full CLI Helptext

$ db-diff --help
Usage: db-diff [OPTIONS] PREVIOUS CURRENT

  Compare the differences between two CSV or JSON files to find differences.

Options:
  --format TEXT      Explicitly specify input format. Available (csv|tsv|json) [default: auto-detect based on file extension]
  --encoding TEXT    Input File Encoding. Available: (utf-8|utf-16|utf-16le|utf-16be|latin1|cp1252|ascii|...) [default: utf-8]
  --key TEXT         Column to use as a unique ID for each row (ex: --key=Id) [default: first column if not specified] 
  --output TEXT      Output format. Available: (readable|json|jsonfile)  [default: readable]
  --outputfile FILE  File to write JSON output to (only used with --output=jsonfile)
  --showunchanged    If a record is changed, show ALL fields, not just the changed fields.
  --time             Measure and display elapsed time for the diff operation
  --version          Show the version and exit.
  -h, --help         Show this message and exit.

  Example: db-diff old.csv new.csv --key=Id --output=jsonfile --outputfile=diff.json

About

Python CLI tool and library for comparing CSV database dumps and finding differences.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 98.7%
  • Dockerfile 1.3%