-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Add LastModified to FtpFile #147
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR describes a use case related to Contents of the file being watched.
The change instead expose the previous state.Modified which has nothing to do with contents of the File, as part of the Retrieved file.
Can you please better explain why the GetResource() that is only responsible of Downloading the File should be extended to add a single property of the previous known state for this file?
If I guess correctly, what you're aiming for is to make the previous state available to Application code when Handling the file.
Also, I don't think what you are aiming for works. You have no guarantee that UpdateTime and the file Modified have any relationship. FTPs can modify files for reasons independent from the contents and in different moments.
Lastly, the Modified is not guaranteed to be sync with file Download: between when File is listed from FTP and is Downloaded, it may get written/update.
The only guarantee is that in this case, at next check, the library detects a new change as Modified changed and downloads the file again. But the checksum detects that is identical to last, and so is skipped after download. This to say that Modified and contents are unrelated.
I strongly advise to base on the Modified any logic relate to data contents.
The aim is to expose details of the previous state which can then be used by the application code to filter what is to be written. I was thinking that the UpdateTime does have a relationship with file modified date but if that's not true then this approach is doomed. The only additional thought is that maybe RetrievedAt from the last state could be used. That is the datetime when the file was last processed. |
You're still doomed, as file could be generated 10min before and then written. What you need to Track is the max UpdateTime and store as part of the State Extensions to reuse during Parsing or Handling the next run. But the Writer doesn't receive the Prev state, only the current as part of the Metadata. Adding this requires revisiting the Interfaces passing read-only to Consumers. But this change is not 'FTP' related, is Core: pass State tracking to Processors in addition to the Resource. That is a major revision. |
OK. So, there are a couple of things needed to enable this:
I'm not sure what the estimated effort would be to implement this and also what would be needed to justify the effort. |
I added some logs to the worker to collected some information about the modified timestamp and max(updatetime) from the Entso-e SFTP...
|
closed. The approach is not correct as is a narrow interface leak from a library perspective. |
For FtpWorkers the worker the FtpFile currently has,
The idea is to add LastModified to the FtpFile class where LastModified = laststate.Modified.
EntsoE SFTP files have a column which is the UpdateTime for each row in the file. Adding LastModified enables filtering of the data to write. The data to write would be rows with UpdateTime greater than LastModified.