Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

za.co.absa.cobrix.cobol.parser.exceptions.SyntaxErrorException: Syntax error in the copybook at line 95: Invalid input '999.9999' at position 95:63 #727

Open
BharathBejoy opened this issue Nov 29, 2024 · 7 comments
Assignees
Labels
accepted Accepted for implementation bug Something isn't working

Comments

@BharathBejoy
Copy link

Background

I'm using za.co.absa.cobrix:spark-cobol_2.12:2.7.9 Maven Library in Azure Databricks to load Cobol (EBCDIC) files into Databricks. It works very well for all files except one.

Error Message

Py4JJavaError: An error occurred while calling o462.load.
: za.co.absa.cobrix.cobol.parser.exceptions.SyntaxErrorException: Syntax error in the copybook at line 95: Invalid input '999.9999' at position 95:63

CPY File (Copybook)

image

Question

I'd appreciate some assistance in resolving this issue.

@BharathBejoy BharathBejoy added the question Further information is requested label Nov 29, 2024
@yruslan
Copy link
Collaborator

yruslan commented Nov 30, 2024

Hi, thanks for the issue report! I'll try to reproduce it.

However, 'VALUE' clauses are ignored by Cobrix. It never validates that the corresponding column has one of the listed values. So you can safely comment or delete line 95 to make it work.

For the parser check, could I ask you to send me lines 92-96 of the copybook? I see '1942' there and can't put it into the context.

@BharathBejoy
Copy link
Author

Hi, thanks for the issue report! I'll try to reproduce it.

However, 'VALUE' clauses are ignored by Cobrix. It never validates that the corresponding column has one of the listed values. So you can safely comment or delete line 95 to make it work.

For the parser check, could I ask you to send me lines 92-96 of the copybook? I see '1942' there and can't put it into the context.

Hi @yruslan, Please have a look at the attached image for the lines you asked for from the copybook.

image

@yruslan
Copy link
Collaborator

yruslan commented Dec 2, 2024

I confirm this is a parser issue. It doesn't properly handle dot as part of the number. As a workaround you can replace +999.9999 with +9999999 or '+999.9999'. Let's keep this issue open, will fix the issue if it is not too hard.

@yruslan yruslan added bug Something isn't working accepted Accepted for implementation and removed question Further information is requested labels Dec 2, 2024
@yruslan yruslan self-assigned this Dec 2, 2024
@yruslan
Copy link
Collaborator

yruslan commented Dec 2, 2024

The grammar has support for dots in numeric literals:

NUMERICLITERAL: [0-9]* DOT? [0-9]+ (E SIGN_CHAR? [0-9]+)?;

I can reproduce the issue, but it is not obvious how to fix it. So will keep it open for now.

@BharathBejoy
Copy link
Author

Hi @yruslan, DOT in numeric literals is causing this issue. Will replacing +999.9999 with +9999999 or '+999.9999' in copybook affect or change the output data in any way? Please advise. Thank you!

@yruslan
Copy link
Collaborator

yruslan commented Dec 2, 2024

Not at all. Changing that literal won't affect data in any way. Cobrix ignores VALUE clause.

It is an interesting issue. If the literal contains any digit but 9, it passes. So you can replace +999.9999 with +999.99990 safely, or to any other number.

@yruslan
Copy link
Collaborator

yruslan commented Dec 2, 2024

Hi @tr11 , I hope you are doing well! This issue might be interesting to you too

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
accepted Accepted for implementation bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants