Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

feat: Character-level diffs #273

Merged
merged 1 commit into from
Feb 26, 2022
Merged

feat: Character-level diffs #273

merged 1 commit into from
Feb 26, 2022

Conversation

afnanenayet
Copy link
Owner

Up until this commit, diffsitter has computed diffs by directing
comparing the equality of the text of two nodes. This means that if two
nodes had text content that was unequal at all, the contents of that
entire node would be considered a difference.

As a concrete example, consider the two snippets:

fn example_a() {}

fn example_b() {}

The diff would see the text corresponding to the identifiers for each
function are different, so the corresponding diff would be:

-example_a
+example_b

Because we just check if "example_a" == "example_b". What we really
want is:

-a
+b

We are able to achieve this by breaking up each node into Entry
objects that correspond to a single unicode grapheme, so when the diffs
are computed, they are on a per-grapheme basis rather than at a per-node
basis. Of course, the actual diff mechanism is generic, so we only have
to modify how the Entry object is created, and the diff and hunk
construction mechanisms remain unchanged.

This PR also updates the equality check to account for the type of a node, so a
comparison between:

fn some_function() {}

fn r#fn() {}

would treat the different fn blocks as different based on whether they are a
keyword or as part of an identifier.

@afnanenayet afnanenayet force-pushed the afnan/more-granular-diffs branch 4 times, most recently from a8f0c37 to af81545 Compare February 18, 2022 05:03
@afnanenayet afnanenayet force-pushed the afnan/more-granular-diffs branch 7 times, most recently from a3257ff to 895ba40 Compare February 23, 2022 05:23
Up until this commit, `diffsitter` has computed diffs by directing
comparing the equality of the text of two nodes. This means that if two
nodes had text content that was unequal *at all*, the contents of that
*entire* node would be considered a difference.

As a concrete example, consider the two snippets:

```rust
fn example_a() {}

fn example_b() {}
```

The diff would see the text corresponding to the identifiers for each
function are different, so the corresponding diff would be:

```
-example_a
+example_b
```

Because we just check if `"example_a" == "example_b"`. What we really
want is:

```
-a
+b
```

We are able to achieve this by breaking up each node into `Entry`
objects that correspond to a single unicode grapheme, so when the diffs
are computed, they are on a per-grapheme basis rather than at a per-node
basis. Of course, the actual diff mechanism is generic, so we only have
to modify how the `Entry` object is created, and the diff and hunk
construction mechanisms remain unchanged.

This PR also adds test cases for medium-length source files.
@afnanenayet afnanenayet force-pushed the afnan/more-granular-diffs branch from 895ba40 to c89f395 Compare February 26, 2022 21:55
@afnanenayet afnanenayet merged commit 879d21c into main Feb 26, 2022
@afnanenayet afnanenayet deleted the afnan/more-granular-diffs branch February 26, 2022 22:30
afnanenayet added a commit that referenced this pull request Feb 26, 2022
afnanenayet added a commit that referenced this pull request Feb 26, 2022
afnanenayet added a commit that referenced this pull request Mar 5, 2022
Up until this commit, `diffsitter` has computed diffs by directing
comparing the equality of the text of two nodes. This means that if two
nodes had text content that was unequal *at all*, the contents of that
*entire* node would be considered a difference.

As a concrete example, consider the two snippets:

```rust
fn example_a() {}

fn example_b() {}
```

The diff would see the text corresponding to the identifiers for each
function are different, so the corresponding diff would be:

```
-example_a
+example_b
```

Because we just check if `"example_a" == "example_b"`. What we really
want is:

```
-a
+b
```

We are able to achieve this by breaking up each node into `Entry`
objects that correspond to a single unicode grapheme, so when the diffs
are computed, they are on a per-grapheme basis rather than at a per-node
basis. Of course, the actual diff mechanism is generic, so we only have
to modify how the `Entry` object is created, and the diff and hunk
construction mechanisms remain unchanged.

This PR also adds test cases for medium-length source files.
afnanenayet added a commit that referenced this pull request Mar 5, 2022
Up until this commit, `diffsitter` has computed diffs by directing
comparing the equality of the text of two nodes. This means that if two
nodes had text content that was unequal *at all*, the contents of that
*entire* node would be considered a difference.

As a concrete example, consider the two snippets:

```rust
fn example_a() {}

fn example_b() {}
```

The diff would see the text corresponding to the identifiers for each
function are different, so the corresponding diff would be:

```
-example_a
+example_b
```

Because we just check if `"example_a" == "example_b"`. What we really
want is:

```
-a
+b
```

We are able to achieve this by breaking up each node into `Entry`
objects that correspond to a single unicode grapheme, so when the diffs
are computed, they are on a per-grapheme basis rather than at a per-node
basis. Of course, the actual diff mechanism is generic, so we only have
to modify how the `Entry` object is created, and the diff and hunk
construction mechanisms remain unchanged.

This PR also adds test cases for medium-length source files.
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant