Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Improve Batch iterator variable recognizing #348

Closed
wants to merge 1 commit into from

Conversation

lifenjoiner
Copy link
Contributor

The batch lexer has been better than ever since the recent commits (#332).
This one focuses on improving the iterator like variable recognizing.

Here is the test cases:

@echo off

::out of tokens range
for /f "tokens=1, 3 usebackq delims= " %%a in ('1 2 3 4') do (
    echo %%a %%b %%c %%~dpnd
)

:: . as text
for /L %%a in (1, 1, 3) do (
    for /L %%j in (5, 1, 6) do (
        echo 10.1.%%a.%%j
        echo 10.1.%%~a.%%j
    )
)

for %%i in (1 2 3) do for %%a in (7 8 9) do echo %%~a.%%i

:: change file name and extension
for %%a in (wget curl) do (
    echo %%a -h ^>%1/%%~na.txt
    echo %%a -h ^>%1/%%~na_help.txt
)

::*, remaining
for /f "tokens=2* usebackq" %%a in ('1 2 3 4') do (
    echo %%a
    echo %%b
    echo %%c
    echo %%~dpnd
)

::m-n, m <= n
for /f "tokens=1,3-4 usebackq" %%a in ('1 2 3 4') do (
    echo %%a
    echo %%b
    echo %%c
    echo %%~dpnd
)

::m-n, m > n
for /f "tokens=1,3-1 usebackq" %%a in ('1 2 3 4') do (
    echo %%a
    echo %%b
    echo %%c
    echo %%~dpnd
)

::not in order
for /f "tokens=1,4,02-5 usebackq" %%a in ('1 2 3 4') do (
    echo %%a
    echo %%b
    echo %%c
    echo %%~dpnd
)

::search pathes
for %%I in ("cmd.exe") do (
    echo %%~$PATH:I
)

for %%I in ("notepad.exe") do (
    echo %%~$windir:I
)

::search pathes, non-exist
for %%I in ("notepad.exe") do (
    echo.%%~$windir1:I
)

for %%I in ("notepad.exe") do (
    echo.%%~$ProgramData:I
)

::%
echo.%*

::CALL
echo %0
echo %~0
echo %~nx0
echo.%~dp$PATH:0

::special

::not variable
echo %abc

::%% escape if not valid variable
echo %%
echo %%0
echo %%~0
for %%I in ("notepad.exe") do (
    echo %%~$ProgramData
)

::^ escape
for %%^" in (1 2 3) do echo %%^"

for %%^" in ("cmd.exe") do (
    echo %%~$PATH:^"
)

exit /b

::%% escape
:removequotes
FOR /F "delims=" %%A IN ('echo %%%1%%') DO set %1=%%~A
GOTO :eof

::error
echo %~*
echo %~dpX

Before and After:
RecognizeVar

@zufuliu
Copy link
Owner

zufuliu commented Jun 27, 2021

I don't know whether it's worth the complex to fix for loop variables, I will need some time to understand the code.
One complex thing is do is is followed by a command, parentheses just makes command group, e.g. following code is valid:

@echo off

::out of tokens range
for /f "tokens=1 usebackq delims= " ^
%%a ^
in (
    '1 2 
    3 4'
) ^
do ^
if "%~1" == "" ^
echo %%a %%b %%c %%~dpnd

:: . as text
for /L %%a in (1, 1, 3) do for /L %%j in (5, 1, 6) do (
    echo 10.1.%%a.%%j
    echo 10.1.%%~a.%%j
)

@lifenjoiner
Copy link
Contributor Author

Besides the breaking line case, one more example:

for %%^" in ("cmd.exe") do (
    echo %%^"
    echo %%^~^d^p^n^x^"
)

Although it is not a usual case, but it is valid.
^ is an special case that I'm also thinking about (another PR?).

@lifenjoiner
Copy link
Contributor Author

BTW:

reg delete "HKCR\.fdf" /f
reg delete "HKCR\AppID\{GUID}" /f

\ is escape char for some special commands, normally, it isn't.

@zufuliu
Copy link
Owner

zufuliu commented Jun 27, 2021

echo %%^~^d^p^n^x^" can be ignored, I think no people write code like that.

\ can be changed to only handle double quotes and backslash:
if (command == Command::Escape && (sc.chNext == '\\' || sc.chNext == '"')).

GetCurrentBlockStart() can be replaced with following changes.

if (startPos != 0) {
	// backtrack to the line starts outermost command.
	BacktrackToStart(styler, BatchLineStateLineContinuation | (0xff << 8), startPos, length, initStyle);
}

StyleContext sc(startPos, length, initStyle, styler);

after backtracking, other line states no longer make sense.

@lifenjoiner
Copy link
Contributor Author

Updated:
Breaking line FOR clause case is fixed.
BacktrackToStart is great!

Left:
if (command == Command::Escape && (sc.chNext == '\\' || sc.chNext == '"')) doesn't work because of:
https://github.com/zufuliu/notepad2/blob/e0fc12c0930fd17074fd50c3cd66a4da384bb77b/scintilla/lexers/LexBatch.cxx#L304

@lifenjoiner lifenjoiner marked this pull request as draft June 27, 2021 13:44
@zufuliu
Copy link
Owner

zufuliu commented Jun 28, 2021

Please rebase the code, I update variable highlighting in be819e7.

I think GetValidIterator() is wrong: extra allocated variables is counted by commas (max 30) after tokens=.
following is Python script to generate all ASCII variables:

with open('test.bat', 'w', newline='\r\n') as fd:
	tokens = ','.join(map(str, range(1, 31)))
	items = ' '.join(map(str, range(1, 35)))
	fd.write(f"""@echo off
SetLocal EnableExtensions EnableDelayedExpansion

for /F "tokens={tokens},*" %%! ^
in ("{items}") ^
do (
""")

	escape_map = {
		'%': '%',

		')': '^)',
		'&': '^&',
		'|': '^|',
		'<': '^<',
		'>': '^>',
	}

	lines = []
	for code in range(33, 127):
		ch = chr(code)
		esc = escape_map.get(ch, '')
		if esc:
			if esc[0] == '^':
				lines.append(f'echo {esc}: %%{esc}')
			else:
				lines.append(f':: invalid {esc}')
		elif ch == '!':
			lines.append(f'echo ^^!: %%!')
		elif ch == '^':
			lines.append(f'echo ^^: %%^^')
		else:
			lines.append(f'echo {ch}: %%{ch}')

	fd.write('\t')
	fd.write('\n\t'.join(lines))
	fd.write("\n)\n")

and the generated result:

@echo off
SetLocal EnableExtensions EnableDelayedExpansion

for /F "tokens=1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,*" %%! ^
in ("1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34") ^
do (
	echo ^^!: %%!
	echo ": %%"
	echo #: %%#
	echo $: %%$
	:: invalid %
	echo ^&: %%^&
	echo ': %%'
	echo (: %%(
	echo ^): %%^)
	echo *: %%*
	echo +: %%+
	echo ,: %%,
	echo -: %%-
	echo .: %%.
	echo /: %%/
	echo 0: %%0
	echo 1: %%1
	echo 2: %%2
	echo 3: %%3
	echo 4: %%4
	echo 5: %%5
	echo 6: %%6
	echo 7: %%7
	echo 8: %%8
	echo 9: %%9
	echo :: %%:
	echo ;: %%;
	echo ^<: %%^<
	echo =: %%=
	echo ^>: %%^>
	echo ?: %%?
	echo @: %%@
	echo A: %%A
	echo B: %%B
	echo C: %%C
	echo D: %%D
	echo E: %%E
	echo F: %%F
	echo G: %%G
	echo H: %%H
	echo I: %%I
	echo J: %%J
	echo K: %%K
	echo L: %%L
	echo M: %%M
	echo N: %%N
	echo O: %%O
	echo P: %%P
	echo Q: %%Q
	echo R: %%R
	echo S: %%S
	echo T: %%T
	echo U: %%U
	echo V: %%V
	echo W: %%W
	echo X: %%X
	echo Y: %%Y
	echo Z: %%Z
	echo [: %%[
	echo \: %%\
	echo ]: %%]
	echo ^^: %%^^
	echo _: %%_
	echo `: %%`
	echo a: %%a
	echo b: %%b
	echo c: %%c
	echo d: %%d
	echo e: %%e
	echo f: %%f
	echo g: %%g
	echo h: %%h
	echo i: %%i
	echo j: %%j
	echo k: %%k
	echo l: %%l
	echo m: %%m
	echo n: %%n
	echo o: %%o
	echo p: %%p
	echo q: %%q
	echo r: %%r
	echo s: %%s
	echo t: %%t
	echo u: %%u
	echo v: %%v
	echo w: %%w
	echo x: %%x
	echo y: %%y
	echo z: %%z
	echo {: %%{
	echo ^|: %%^|
	echo }: %%}
	echo ~: %%~
)

@lifenjoiner
Copy link
Contributor Author

Updated, and tested with the attached cases and many .bat/.cmd files.
Test-BatIteratorVar.zip

2 special cases was found:

for /F "tokens=1-31,*" %%^" ^
in ("1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34") ^
do (
	echo ": %%"
	echo ": %%^"
	echo ^": %%^"
)

I don't know why it is not variable in echo ": %%^".

for /f "tokens=%msys2_shiftCounter%,* delims=,;=	 " %%i in ("!msys2_full_cmd!") do set SHELL_ARGS=%%j

Dynamic tokens. Should all possible ones be treated as variables or not?

@lifenjoiner lifenjoiner marked this pull request as ready for review June 30, 2021 12:04
@zufuliu
Copy link
Owner

zufuliu commented Jun 30, 2021

I don't know why it is not variable in echo ": %%^"

I seems we to partial revert my previous commit be819e7, especial changes to IsVariableChar() and IsVariableEscapeChar().
I will remove the while loop inside DetectBatchVariable() (change it to varQuoteChar = ':';), sine $ is followed by an environment variable, which can contains any characters.

Dynamic tokens. Should all possible ones be treated as variables or not?

Sine we don't expand variables, stick with commas is a simple choice.

@zufuliu
Copy link
Owner

zufuliu commented Jun 30, 2021

I just tested,echo ^": %%^" works, but echo ": %%^" not, so IsVariableEscapeChar() needs update to handle enclosing quotes.

@lifenjoiner
Copy link
Contributor Author

In the example, only ": %%^" really isn't treated with a variable by cmd. Run the code, you will see.

@zufuliu
Copy link
Owner

zufuliu commented Jun 30, 2021

Also, both for /F "tokens=1-31,*" %%= ^ and for /F "tokens=1-31,*" %%^= ^ not work, but echo %%= and %%^= works, likely other punctuation ,;= originally excluded from IsVariableChar() has same problem (can not explicitly declared after for).

@zufuliu
Copy link
Owner

zufuliu commented Jun 30, 2021

Updated code in 398ffe1, ,;= are disabled after for. you can change GetIteratorRange() to use text between for and in keywords.

@lifenjoiner
Copy link
Contributor Author

Working on different branches for the same feature always generates conflicts. It would take time to resolve the conflicts, reconstruct codes, and redo tests.

@zufuliu
Copy link
Owner

zufuliu commented Jul 1, 2021

Working on different branches for the same feature always generates conflicts. It would take time to resolve the conflicts, reconstruct codes, and redo tests.

I'm sorry for that, I'm going to work on issue #33.

@lifenjoiner
Copy link
Contributor Author

Appreciate! I'm refactoring my previous DetectBatchVariable procedure, for new special cases.

@lifenjoiner
Copy link
Contributor Author

More cases are explored and covered.
Test-NewSpecialCases.zip

@zufuliu
Copy link
Owner

zufuliu commented Jul 5, 2021

I think the code can be simplified:

  1. "tokens=" must be placed after for on the same line. we can record the position after for keyword, then in DetectBatchVariable(), when command == Command::For and sc.state == SCE_BAT_DEFAULT and sc.ch == '%' && sc.chNext='%', the characters after %% is the declared variable for current for loop (%%variable can be placed on next line). Options text can retrieved from recorded position after for and current position with std::string GetRange(Sci_PositionU startPos_, Sci_PositionU endPos_).
  2. std::set can be avoided, the max valid token value is 31, so a 32 bit number (uint32_t) is enough. GetIteratorRange() can be simplified (need tests to find out correct rules) to (1) compute max token value (2) count group count (commas + trailing *) (3) take the max of the two as implicit declared variable count.
@echo off
for /f "tokens=1,3" %%A in ("1 2 3 4 5 6 7 8 9") do echo %%A %%B %%C %%D %%E %%F
for /f "tokens=1,3*" %%A in ("1 2 3 4 5 6 7 8 9") do echo %%A %%B %%C %%D %%E %%F
for /f "tokens=1-2,3*" %%A in ("1 2 3 4 5 6 7 8 9") do echo %%A %%B %%C %%D %%E %%F

output

1 3 %C %D %E %F
1 3 4 5 6 7 8 9 %D %E %F
1 2 3 4 5 6 7 8 9 %E %F
  1. code refactoring: move DetectBatchVariable(), the case SCE_BAT_VARIABLE: block (turn into a function) and related local/global variables into a structure.

@lifenjoiner
Copy link
Contributor Author

  1. std::set can be avoided, the max valid token value is 31, so a 32 bit number (uint32_t) is enough.

I thought about it, and chose std::set because of:
(1) It is like re-implementing the std::bitset, but partly.
(2) Header file bitset is bigger than set. I'm not sure it is simpler than set.
(3) I think using standard containers is encouraged, if there isn't regression. scintilla uses set. They can be shared, aren't they?

  1. "tokens=" must be placed after for

It is, after keyword for is detected.

on the same line

for ^
/f "Tokens=2" %%a in ("1 2 3 4 5") do (
    echo 10.1.1.%%a
)

This special case works. Thanks to we are not going to implement a real batch file interpreter, not all quirky cases need to be covered. Is there any failures by current solution?
When % appears, it is going to be detected if it stands for a valid variable. If we do GetIteratorRange at this time, as you say a "global" variable for FOR position should be used, and more needs to be considered ...

  1. code refactoring: move DetectBatchVariable(), the case SCE_BAT_VARIABLE: block (turn into a function) and related local/global variables into a structure.

I don't understand.

Seems a total regenerating is needed. I'm trying to understand your suggestions, but not sure if I can catch up with you ...

@zufuliu
Copy link
Owner

zufuliu commented Jul 6, 2021

DetectBatchVariable has too many parameters, and you introduced a global variable (which should be avoid).
the global ValidIterators, parameter outerStyle and varQuoteChar can be putted into a structure, so finally result looks like xxx.DetectVariable(sc, command, parenCount) or xxx.Detect(sc, command, parenCount)

@lifenjoiner
Copy link
Contributor Author

I'm not experienced in C++. I'll have a try.

@lifenjoiner
Copy link
Contributor Author

Rebased and refactored.

@zufuliu zufuliu added this to the v4.21.09 milestone Jul 9, 2021
@lifenjoiner
Copy link
Contributor Author

New conflict is resolved.

@zufuliu zufuliu removed this from the v4.21.09 milestone Aug 22, 2021
@zufuliu
Copy link
Owner

zufuliu commented Aug 22, 2021

Thanks for the update. I feel it's not urgently to improve/fix highlighting for %% (as escape char or variable), so will handle this later.

# for free to join this conversation on GitHub. Already have an account? # to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants