Error getting DrexelMetadata in episode 10 #10

thompsonmj · 2023-03-16T18:31:35Z

(/fs/ess/PAS2136/Workshops/Snakemake/conda_env) [thompsonmj@o0647 SnakemakeWorkflow]$ snak
emake -c1 --use-singularity DrexelMetadata/bj373514.json
Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cores: 1 (use --cores to define parallelism)
Rules claiming more threads will be scaled down.
Job stats:
job                  count    min threads    max threads
-----------------  -------  -------------  -------------
generate_metadata        1              1              1
total                    1              1              1
Select jobs to execute...
[Thu Mar 16 14:26:09 2023]
rule generate_metadata:
    input: Images/bj373514.jpg
    output: DrexelMetadata/bj373514.json, Mask/bj373514_mask.png
    log: logs/generate_metadata_bj373514.log
    jobid: 0
    reason: Missing output files: DrexelMetadata/bj373514.json
    wildcards: image=bj373514
    resources: tmpdir=/tmp/slurmtmp.23961738
Activating singularity image /users/PAS2136/thompsonmj/SnakemakeWorkflow/.snakemake/singul
arity/48c2d571fde349f4656aa5ab95dccc30.simg
WARNING: Environment variable LD_PRELOAD already has value [], will not forward new value 
[/usr/local/xalt/xalt/lib64/libxalt_init.so] from parent process environment
Waiting at most 5 seconds for missing files.
MissingOutputException in rule generate_metadata in file https://raw.githubusercontent.com
/hdr-bgnn/BGNN_Core_Workflow/1.0.0/workflow/Snakefile, line 19:
Job 0 completed successfully, but some output files are missing. Missing files after 5 sec
onds. This might be due to filesystem latency. If that is the case, consider to increase t
he wait time with --latency-wait:
Mask/bj373514_mask.png
Removing output files of failed job generate_metadata since they might be corrupted:
DrexelMetadata/bj373514.json
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message```

The text was updated successfully, but these errors were encountered:

johnbradley · 2023-03-17T13:20:43Z

@thompsonmj Could you check the log mentioned above to see if there is anything helpful in logs/generate_metadata_bj373514.log?
The error is that gen_metadata.py only created DrexelMetadata/bj373514.json but not Mask/bj373514_mask.png.
What does you Snakefile look like right now?

johnbradley · 2023-03-18T12:03:53Z

@thompsonmj Where you able to figure out what caused your problem? If not you could also check Images/bj373514.jpg to see if it's a valid image.

johnbradley · 2023-03-18T13:27:15Z

This problem could be caused by a typo in the download_image rule

rule download_image:
    params: url=get_image_url    
    output:"images/bj373514.jpg"
    shell: "wget -O {output} {params.url}"

If you used a lowercase -o the log of the download would be written to the output file.

thompsonmj · 2023-03-18T14:01:13Z

I'll check on this today, I had accidentally overwritten my snakefile by copying your solution. I got it recovered so I'll check if it was that typo or something else. The solution does work fine though.

thompsonmj · 2023-03-19T01:15:21Z

Here was the Snakefile I had built up after going through the episodes:

import pandas as pd

def get_image_url(wildcards):
        filename = config["filter_multimedia"]
        df = pd.read_csv(filename)
        row = df[df["arkID"] == wildcards.ark_id]
        url = row["accessURI"].item()
        return url

def get_image_filenames(wildcards):
	filename = config["filter_multimedia"]
	df = pd.read_csv(filename)
	ark_ids = df["arkID"].tolist()
	return expand("Images/{ark_id}.jpg", ark_id=ark_ids)

configfile: "config.yaml"

rule all:
	input: get_image_filenames

rule reduce:
	input: "multimedia.csv"
	params: rows="11"
	output: "reduce/multimedia.csv"
	resources:
		mem_mb=200
	shell: "head -n {params.rows} {input} > {output}"

rule download_image:
	input: config["filter_multimedia"]
	params: url=get_image_url
	output: "Images/{ark_id}.jpg"
	container: "docker://quay.io/biocontainers/gnu-wget:1.18--h60da905_7"
	shell: "wget -O {output} {params.url}"

checkpoint filter:
	input:
		script = "Scripts/FilterImages.R",
		fishes = config["reduce_multimedia"]
	output: config["filter_multimedia"]
	shell: "Rscript {input.script}"

module bgnn_core:
	snakefile:
		github("hdr-bgnn/BGNN_Core_Workflow", path="workflow/Snakefile", tag="1.0.0")

use rule generate_metadata from bgnn_core
use rule transform_metadata from bgnn_core
use rule crop_image from bgnn_core
use rule segment_image from bgnn_core

def get_summary_inputs(wildcards):
	filename = checkpoints.filter.get().output[0]
	df = pd.read_csv(filename)
	ark_ids = df["arkID"].tolist()
	return expand('Segmented/{arkID}_segmented.png', arkID=ark_ids)

rule summary:
	input:
		scripts="Scripts/SummaryReport.R",
		markdown="Scripts/Summary.Rmd",
		morphology=get_summary_inputs
	output: config["summary_report"]
	container: "docker://ghcr.io/rocker-org/tidyverse:4.2.2"
	shell: "Rscript {input.script}"

compared to the solution Snakefile:

import pandas as pd

configfile: "config.yaml"

rule all:
    input: config["summary_report"]

rule reduce:
    input: "multimedia.csv"
    params: rows="11"
    output: "reduce/multimedia.csv"
    shell: "head -n {params.rows} {input} > {output}"

checkpoint filter:
    input:
        script="Scripts/FilterImages.R",
        fishes=config["reduce_multimedia"]
    output: config["filter_multimedia"]
    shell: "Rscript {input.script}" 

def get_image_url(wildcards):
    filename = checkpoints.filter.get().output[0]
    df = pd.read_csv(filename)
    row = df[df["arkID"] == wildcards.ark_id]
    url = row["accessURI"].item()
    return url 

rule download_image:
    input: config["filter_multimedia"]
    params: url=get_image_url
    output: "Images/{ark_id}.jpg"
    container: "docker://quay.io/biocontainers/gnu-wget:1.18--h60da905_7"
    shell: "wget -O {output} {params.url}"


module bgnn_core:
    snakefile:
        github("hdr-bgnn/BGNN_Core_Workflow", path="workflow/Snakefile", tag="1.0.0")

use rule generate_metadata from bgnn_core
use rule transform_metadata from bgnn_core
use rule crop_image from bgnn_core
use rule segment_image from bgnn_core

def get_segmentation_files(wildcards):
    filename = checkpoints.filter.get().output[0]
    df = pd.read_csv(filename)
    ark_ids = df["arkID"].tolist()
    return expand("Segmented/{ark_id}_segmented.png", ark_id=ark_ids)

rule summary:
    input:
       script="Scripts/SummaryReport.R", 
       segmentation=get_segmentation_files
    output: config["summary_report"]
    container: "docker://ghcr.io/rocker-org/tidyverse:4.2.2"
    shell: "Rscript {input.script}"

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error getting DrexelMetadata in episode 10 #10

Error getting DrexelMetadata in episode 10 #10

thompsonmj commented Mar 16, 2023

johnbradley commented Mar 17, 2023

johnbradley commented Mar 18, 2023

johnbradley commented Mar 18, 2023

thompsonmj commented Mar 18, 2023

thompsonmj commented Mar 19, 2023

Error getting DrexelMetadata in episode 10 #10

Error getting DrexelMetadata in episode 10 #10

Comments

thompsonmj commented Mar 16, 2023

johnbradley commented Mar 17, 2023

johnbradley commented Mar 18, 2023

johnbradley commented Mar 18, 2023

thompsonmj commented Mar 18, 2023

thompsonmj commented Mar 19, 2023