Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Parse input file meta-data #1289

Closed
dfaller opened this issue Aug 23, 2018 · 2 comments
Closed

Parse input file meta-data #1289

dfaller opened this issue Aug 23, 2018 · 2 comments

Comments

@dfaller
Copy link
Contributor

dfaller commented Aug 23, 2018

Parse meta-data side car files for input files if provided in the output directory in a directory/file name structure that matches input file(s). Only allow the setting of meta-data ONCE on a Scale file (log a warning).

This replaces the old "parse results" capability in Scale. The current "parse results" capability still needs to remain intact.

Also fix the special url meta-data field that Scale sets on a product's or source file's meta-data. Right now the url appears to always be set to null.

@dfaller dfaller added this to the Backlog milestone Aug 23, 2018
@JohnPTobe JohnPTobe self-assigned this Dec 13, 2018
@gisjedi gisjedi assigned gisjedi and unassigned JohnPTobe Oct 16, 2019
@gisjedi
Copy link
Contributor

gisjedi commented Oct 16, 2019

We discussed this today with @cshamis as a need to complete the replacement of the pre-existing results_manifest.json parse_results capability in Scale 7. The current plan is to simply use the new Seed JSON output capture construct (seed.outputs.json) as the vehicle to consume parse_results. This will allow us to tie off this issue and move on.

@emimaesmith emimaesmith modified the milestones: Backlog, Sprint 7 Oct 16, 2019
@gisjedi
Copy link
Contributor

gisjedi commented Oct 18, 2019

It wasn't really anymore work to use the sidecar concept and didn't introduce another one-off. Adding data on source files can be done for an INPUT_FILE on the interface by writing to $OUTPUT_DIR/INPUT_FILE.metadata.json.

gisjedi added a commit that referenced this issue Oct 23, 2019
Add the ability to capture supplemental metadata to attach to SOURCE files. The only reason for this is to support augmentation of metadata associated with files that domain specific jobs can provide - the Scale Ingest doesn't have the smarts for it, nor should it.

We are expanding on the use of the metadata sidecar file for input files. There are some small differences. Since we can't guarantee writable storage within the input location, we must write them to the output directory. Details on the format and location of the metadata json can be found in the documentation here: https://github.com/ngageoint/scale/wiki/Scale-Job-Inputs-and-Outputs#source-input-metadata

The reason we are using the input file name as opposed to the same name as the input file name is because we can easily allow the algorithm developers at build time to know the location, since it is defined on the interface.

We do not support making input data updates to multiple file inputs. We also limit updates to SOURCE file_types.

Includes the addition of postman tests and a new Seed job to demonstrate source metadata capture.
@gisjedi gisjedi closed this as completed Oct 23, 2019
# for free to join this conversation on GitHub. Already have an account? # to comment
Projects
None yet
Development

No branches or pull requests

4 participants