Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

feat(findExports): use acorn tokenizer to filter false positive exports #56

Merged
merged 14 commits into from
Aug 3, 2022

Conversation

hubvue
Copy link
Contributor

@hubvue hubvue commented Jun 29, 2022

This PR is designed to address this issue #34

example:

// export { foo } from 'foo1';
// exports default 'foo';
// export { useB, _useC as useC };
// export function useA () { return 'a' }
// export { default } from "./other"
// export async function foo () {}
// export { foo as default }
//export * from "./other"
//export * as foo from "./other"

/**
 * export const a = 123
 * export { foo } from 'foo1';
 * exports default 'foo'
 * export function useA () { return 'a' }
 * export { useB, _useC as useC };
 *export { default } from "./other"
 *export async function foo () {}
 * export { foo as default }
 * export * from "./other"
 export * as foo from "./other"
 */

export { bar } from 'foo2';
export { foobar } from 'foo2';

The expected length of matches should be "2", but instead we get "17".

'\b' does not solve such problems, its semantics is the junction of words and non-words. Apparently the following code is also semantically correct.

// (\b is here)export { foo } from 'foo1';

Solutions:filter out comments before matching to avoid them affecting the matching process.

@pi0
Copy link
Member

pi0 commented Jun 29, 2022

We can also use strip-literal by @antfu. However, downside is that it involves full AST parsing and sloweer.

I think this is last tested version based on regex.

@antfu Can you please explain finally what were limitations regex couldn't support? Did you migrate unimport for a specific reason or avoiding possible edge cases in the future?

@pi0 pi0 closed this Jun 29, 2022
@pi0 pi0 reopened this Jun 29, 2022
@antfu
Copy link
Member

antfu commented Jun 30, 2022

Here are the cases that regexp can't possibly handle correctly (either incompletely removed, or overkill with quotes unmatched.

Context:

In addition:

  • strip-literal only does tokenizing instead of full parsing. It's probably the most efficient way to do this correctly.
  • While tokenizing requires the input to be valid JavaScript, I also introduced stripLiteralRegex as a fallback (implementation inherited from vite and unimport), you can either call it directly, or use stripLiteral() to do it with auto fallback.

@hubvue
Copy link
Contributor Author

hubvue commented Jun 30, 2022

Using strip-literal to filter noise is indeed a good option, but a problem arose during my testing.
example:

export { foobar } from 'foo2';

After strip-literal processing

export { foobar } from '     ';

Expected namedExports:

[{
    type: 'named',
    exports: ' bar ',
    specifier: 'foo2',
    code: "export { bar } from 'foo2';",
    start: 121,
    end: 148,
    names: [ 'bar' ]
}]

Received namedExports:

[{
    type: 'named',
    exports: ' bar ',
    specifier: undefined,
    code: 'export { bar }',
    start: 794,
    end: 808,
    names: [ 'bar' ]
}]

specifier got undefined

@hubvue
Copy link
Contributor Author

hubvue commented Jun 30, 2022

I have found that the presence of an export statement in the string also affects the result.

example:

const test1 = "export { ba1 } from 'foo2'"
const test2 = "testexport { bar2 } from 'foo2'"
const test3 = "test export { bar3 } from 'foo2'"
const test4 = "export { bar4 } from 'foo2' test"
const test5 = \`
  test1
  export { bar4 } from 'foo2' test
  test2
`

expected 0 got 4.

regexp matching is too cumbersome and perhaps AST is an option (but only if performance is accepted).

@pi0
Copy link
Member

pi0 commented Jun 30, 2022

Thanks for information @antfu. I guess performance would be reasonable and we can introduce two versions of find* utils for a regex-only extractor when performance/bundle is matter.

Can we maybe support a filter function from strip-literal? We want to avoid stripping strings that do not include an import for mlly case.

@antfu
Copy link
Member

antfu commented Jul 1, 2022

Can we maybe support a filter function from strip-literal?

While we could do it, I am not sure if there is a good way to detect whether a string should be kept or striped. Since strip-literal does not change the position of content, I would suggest maybe detecting the exports using the striped code, while reading the content using the original.

@hubvue
Copy link
Contributor Author

hubvue commented Jul 1, 2022

I would suggest maybe detecting the exports using the striped code, while reading the content using the original.

@antfu This is a great idea, I've been trying to figure out how to parse and filter out the export statements in the strings all morning and still haven't had a good result.

@hubvue hubvue requested a review from pi0 July 1, 2022 09:38
Copy link
Member

@pi0 pi0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice changes!

@hubvue hubvue requested a review from pi0 July 1, 2022 14:48
@hubvue
Copy link
Contributor Author

hubvue commented Jul 11, 2022

@pi0 Is there any progress on this PR?

@codecov
Copy link

codecov bot commented Jul 22, 2022

Codecov Report

Merging #56 (0a61617) into main (83502b4) will increase coverage by 0.36%.
The diff coverage is 76.00%.

@@            Coverage Diff             @@
##             main      #56      +/-   ##
==========================================
+ Coverage   52.52%   52.89%   +0.36%     
==========================================
  Files          13       13              
  Lines        2115     2159      +44     
  Branches      171      179       +8     
==========================================
+ Hits         1111     1142      +31     
- Misses        842      847       +5     
- Partials      162      170       +8     
Impacted Files Coverage Δ
src/analyze.ts 80.68% <76.00%> (-3.41%) ⬇️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@pi0 pi0 changed the title fix: filter out commented code to eliminate the effect of regular mat… feat(findExports): use acorn tokenizer Aug 3, 2022
@pi0 pi0 changed the title feat(findExports): use acorn tokenizer feat(findExports): use acorn tokenizer to filter false positive exports Aug 3, 2022
@pi0
Copy link
Member

pi0 commented Aug 3, 2022

Thanks for working on this PR @hubvue <3

@pi0 pi0 merged commit 7039f54 into unjs:main Aug 3, 2022
@hubvue
Copy link
Contributor Author

hubvue commented Aug 3, 2022

@pi0 I feel very happy to be able to contribute to an open source project. 😄

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants