Parse input file meta-data #1289

dfaller · 2018-08-23T14:42:35Z

Parse meta-data side car files for input files if provided in the output directory in a directory/file name structure that matches input file(s). Only allow the setting of meta-data ONCE on a Scale file (log a warning).

This replaces the old "parse results" capability in Scale. The current "parse results" capability still needs to remain intact.

Also fix the special url meta-data field that Scale sets on a product's or source file's meta-data. Right now the url appears to always be set to null.

The text was updated successfully, but these errors were encountered:

gisjedi · 2019-10-16T15:33:11Z

We discussed this today with @cshamis as a need to complete the replacement of the pre-existing results_manifest.json parse_results capability in Scale 7. The current plan is to simply use the new Seed JSON output capture construct (seed.outputs.json) as the vehicle to consume parse_results. This will allow us to tie off this issue and move on.

gisjedi · 2019-10-18T03:42:08Z

It wasn't really anymore work to use the sidecar concept and didn't introduce another one-off. Adding data on source files can be done for an INPUT_FILE on the interface by writing to $OUTPUT_DIR/INPUT_FILE.metadata.json.

Add the ability to capture supplemental metadata to attach to SOURCE files. The only reason for this is to support augmentation of metadata associated with files that domain specific jobs can provide - the Scale Ingest doesn't have the smarts for it, nor should it. We are expanding on the use of the metadata sidecar file for input files. There are some small differences. Since we can't guarantee writable storage within the input location, we must write them to the output directory. Details on the format and location of the metadata json can be found in the documentation here: https://github.com/ngageoint/scale/wiki/Scale-Job-Inputs-and-Outputs#source-input-metadata The reason we are using the input file name as opposed to the same name as the input file name is because we can easily allow the algorithm developers at build time to know the location, since it is defined on the interface. We do not support making input data updates to multiple file inputs. We also limit updates to SOURCE file_types. Includes the addition of postman tests and a new Seed job to demonstrate source metadata capture.

dfaller added storage jobs and recipes labels Aug 23, 2018

dfaller added this to the Backlog milestone Aug 23, 2018

JohnPTobe self-assigned this Dec 13, 2018

gisjedi assigned gisjedi and unassigned JohnPTobe Oct 16, 2019

gisjedi mentioned this issue Oct 16, 2019

metadata metadata #1750

Open

emimaesmith modified the milestones: Backlog, Sprint 7 Oct 16, 2019

gisjedi closed this as completed Oct 23, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parse input file meta-data #1289

Parse input file meta-data #1289

dfaller commented Aug 23, 2018 •

edited

Loading

gisjedi commented Oct 16, 2019

gisjedi commented Oct 18, 2019

Parse input file meta-data #1289

Parse input file meta-data #1289

Comments

dfaller commented Aug 23, 2018 • edited Loading

gisjedi commented Oct 16, 2019

gisjedi commented Oct 18, 2019

dfaller commented Aug 23, 2018 •

edited

Loading