-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
[Bugfix] Improve Exception handling #23
Conversation
Could you paste the results before and after? |
I think this p-r has no problem, but I want to consider that which is better.
If the latter is better, I think one more options should be added like |
Beforestop_on_invalid_record: truePlugin throws $ embulk run /path/to/config.yml
org.embulk.exec.PartialExecutionException: java.lang.NumberFormatException: For input string: "{"k1":"v1"}"
...
Caused by: java.lang.NumberFormatException: For input string: "{"k1":"v1"}"
at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:2043)
at sun.misc.FloatingDecimal.parseDouble(FloatingDecimal.java:110)
at java.lang.Double.parseDouble(Double.java:538)
at org.embulk.filter.expand_json.FilteredPageOutput.setExpandedJsonColumns(FilteredPageOutput.java:311)
at org.embulk.filter.expand_json.FilteredPageOutput.add(FilteredPageOutput.java:214)
at org.embulk.exec.LocalExecutorPlugin$ScatterTransactionalPageOutput$OutputWorker.call(LocalExecutorPlugin.java:394)
at org.embulk.exec.LocalExecutorPlugin$ScatterTransactionalPageOutput$OutputWorker.call(LocalExecutorPlugin.java:319)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Error: java.lang.NumberFormatException: For input string: "{"k1":"v1"}" stop_on_invalid_record: falseActual: Same result as Afterstop_on_invalid_record: truePlugin throws $ embulk run /path/to/config.yml
org.embulk.exec.PartialExecutionException: org.embulk.spi.DataException: Found an invalid record
...
Caused by: org.embulk.spi.DataException: Found an invalid record
at org.embulk.filter.expand_json.FilteredPageOutput.add(FilteredPageOutput.java:222)
at org.embulk.exec.LocalExecutorPlugin$ScatterTransactionalPageOutput$OutputWorker.call(LocalExecutorPlugin.java:394)
at org.embulk.exec.LocalExecutorPlugin$ScatterTransactionalPageOutput$OutputWorker.call(LocalExecutorPlugin.java:319)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.embulk.filter.expand_json.FilteredPageOutput$JsonValueInvalidException: Failed to parse '{"k1":"v1"}' as double
at org.embulk.filter.expand_json.FilteredPageOutput.setExpandedJsonColumns(FilteredPageOutput.java:317)
at org.embulk.filter.expand_json.FilteredPageOutput.add(FilteredPageOutput.java:216)
... 6 more
Caused by: java.lang.NumberFormatException: For input string: "{"k1":"v1"}"
at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:2043)
at sun.misc.FloatingDecimal.parseDouble(FloatingDecimal.java:110)
at java.lang.Double.parseDouble(Double.java:538)
at org.embulk.filter.expand_json.FilteredPageOutput.setExpandedJsonColumns(FilteredPageOutput.java:314)
... 7 more
Error: org.embulk.spi.DataException: Found an invalid record stop_on_invalid_record: falsePlugin shows warnings but successfully executed. $ embulk run /path/to/config.yml
...
2016-07-27 13:28:10.482 +0900 [WARN] (embulk-output-executor-0): Skipped an invalid record (Failed to parse '{"k1":"v1"}' as double)
2016-07-27 13:28:10.487 +0900 [WARN] (embulk-output-executor-0): Skipped an invalid record (Failed to parse '{"k1":"v1"}' as double)
2016-07-27 13:28:10.490 +0900 [WARN] (embulk-output-executor-0): Skipped an invalid record (Failed to parse '{"k2":1.23}' as double)
2016-07-27 13:28:10.496 +0900 [WARN] (embulk-output-executor-0): Skipped an invalid record (Failed to parse '{"k2":3.14}' as double)
2016-07-27 13:28:10.502 +0900 [WARN] (embulk-output-executor-0): Skipped an invalid record (Failed to parse '{"k3":1}' as double)
2016-07-27 13:28:10.504 +0900 [INFO] (0001:transaction): {done: 1 / 1, running: 0}
2016-07-27 13:28:10.514 +0900 [INFO] (main): Committed.
2016-07-27 13:28:10.515 +0900 [INFO] (main): Next config diff: {"in":{"last_path":"/path/to/test.tsv"},"out":{}} |
In my usecase, I think 1 is better. |
ok. I got it. Actually I do not need that option in my case. I had just thought that that option was useful in some cases (like when importing from schema-less datastore and sometimes some values of the data are invalid). If someone wants, we consider this again. |
Thanks 👍 |
Current implementation throws
NumberFormatException
if long value comes at column defined as double column.In case of using Embulk from other program (e.g. EmbulkEmbed), program couldn't detect if Exception is a retryable Exception or not. And program will retry infinitely.
I changed code to throw subclass of DataException if invalid value comes.
DataException and ConfigException isn't a retryable Exception at Embulk, so program will be able to stop retrying.
Appendix
I tested with following config.yml and TSV file.
config.yml
test.tsv