feat: Character-level diffs #273

afnanenayet · 2022-02-12T06:05:11Z

Up until this commit, diffsitter has computed diffs by directing
comparing the equality of the text of two nodes. This means that if two
nodes had text content that was unequal at all, the contents of that
entire node would be considered a difference.

As a concrete example, consider the two snippets:

fn example_a() {}

fn example_b() {}

The diff would see the text corresponding to the identifiers for each
function are different, so the corresponding diff would be:

-example_a
+example_b

Because we just check if "example_a" == "example_b". What we really
want is:

-a
+b

We are able to achieve this by breaking up each node into Entry
objects that correspond to a single unicode grapheme, so when the diffs
are computed, they are on a per-grapheme basis rather than at a per-node
basis. Of course, the actual diff mechanism is generic, so we only have
to modify how the Entry object is created, and the diff and hunk
construction mechanisms remain unchanged.

This PR also updates the equality check to account for the type of a node, so a
comparison between:

fn some_function() {}

fn r#fn() {}

would treat the different fn blocks as different based on whether they are a
keyword or as part of an identifier.

Up until this commit, `diffsitter` has computed diffs by directing comparing the equality of the text of two nodes. This means that if two nodes had text content that was unequal *at all*, the contents of that *entire* node would be considered a difference. As a concrete example, consider the two snippets: ```rust fn example_a() {} fn example_b() {} ``` The diff would see the text corresponding to the identifiers for each function are different, so the corresponding diff would be: ``` -example_a +example_b ``` Because we just check if `"example_a" == "example_b"`. What we really want is: ``` -a +b ``` We are able to achieve this by breaking up each node into `Entry` objects that correspond to a single unicode grapheme, so when the diffs are computed, they are on a per-grapheme basis rather than at a per-node basis. Of course, the actual diff mechanism is generic, so we only have to modify how the `Entry` object is created, and the diff and hunk construction mechanisms remain unchanged. This PR also adds test cases for medium-length source files.

This reverts commit 879d21c.

Up until this commit, `diffsitter` has computed diffs by directing comparing the equality of the text of two nodes. This means that if two nodes had text content that was unequal *at all*, the contents of that *entire* node would be considered a difference. As a concrete example, consider the two snippets: ```rust fn example_a() {} fn example_b() {} ``` The diff would see the text corresponding to the identifiers for each function are different, so the corresponding diff would be: ``` -example_a +example_b ``` Because we just check if `"example_a" == "example_b"`. What we really want is: ``` -a +b ``` We are able to achieve this by breaking up each node into `Entry` objects that correspond to a single unicode grapheme, so when the diffs are computed, they are on a per-grapheme basis rather than at a per-node basis. Of course, the actual diff mechanism is generic, so we only have to modify how the `Entry` object is created, and the diff and hunk construction mechanisms remain unchanged. This PR also adds test cases for medium-length source files.

afnanenayet force-pushed the afnan/more-granular-diffs branch 4 times, most recently from a8f0c37 to af81545 Compare February 18, 2022 05:03

afnanenayet force-pushed the afnan/more-granular-diffs branch 7 times, most recently from a3257ff to 895ba40 Compare February 23, 2022 05:23

afnanenayet force-pushed the afnan/more-granular-diffs branch from 895ba40 to c89f395 Compare February 26, 2022 21:55

afnanenayet merged commit 879d21c into main Feb 26, 2022

afnanenayet deleted the afnan/more-granular-diffs branch February 26, 2022 22:30

afnanenayet added a commit that referenced this pull request Feb 26, 2022

Revert "feat: Character-level diffs (#273)"

255df36

This reverts commit 879d21c.

afnanenayet added a commit that referenced this pull request Feb 26, 2022

Revert "feat: Character-level diffs (#273)" (#296)

e03cce1

This reverts commit 879d21c.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Character-level diffs #273

feat: Character-level diffs #273

afnanenayet commented Feb 12, 2022

feat: Character-level diffs #273

feat: Character-level diffs #273

Conversation

afnanenayet commented Feb 12, 2022