Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Update Google Docs Meta Data #1612

Merged
merged 1 commit into from
Feb 28, 2025
Merged

Update Google Docs Meta Data #1612

merged 1 commit into from
Feb 28, 2025

Conversation

github-actions[bot]
Copy link
Contributor

@github-actions github-actions bot commented Feb 27, 2025

Updating Google Docs Meta Data

Change summary:

  • in db_sources.csv:
    • new entry for nssp
  • in db_signals.csv:
    • 2 new google-symptoms signals for conjunctivitis
    • 8 new nssp signals for rsv and for counts of reporting hospitals
    • ~19 various edits, including:
      • "Causes"-->"Cause" in nchs-mortality names and signal sets
      • some Safegraph signal set changes

@melange396
Copy link
Collaborator

melange396 commented Feb 27, 2025

newest comparison code:

import csv
import requests


# pull down the existing and proposed/pending versions of the signal description csv file

dev_file = "https://raw.githubusercontent.com/cmu-delphi/delphi-epidata/refs/heads/dev/src/server/endpoints/covidcast_utils/db_signals.csv"
dev = []
with requests.get(dev_file, stream=True) as req:
    for row in csv.reader(req.iter_lines(decode_unicode=True)):
        dev.append(row)

new_file = "https://raw.githubusercontent.com/cmu-delphi/delphi-epidata/refs/heads/bot/update-docs/src/server/endpoints/covidcast_utils/db_signals.csv"
new = []
with requests.get(new_file, stream=True) as req:
    for row in csv.reader(req.iter_lines(decode_unicode=True)):
        new.append(row)


# column name lists
dev_cols = set(dev[0])
new_cols = set(new[0])
both_cols = list(dev_cols.intersection(new_cols))

# get the right column number for each version of the file, based on the column name
dev_col_lookup = {c: i for i,c in enumerate(dev[0])}
new_col_lookup = {c: i for i,c in enumerate(new[0])}

# get the right row number for each version of the file, based on `(source,signal)`

dev_row_lookup = {}
for i, row in enumerate(dev):
    src = row[dev_col_lookup["Source Subdivision"]]
    sig = row[dev_col_lookup["Signal"]]
    if (src, sig) in dev_row_lookup:
        print("!!! src:sig duplicate in dev file! --", src, ":", sig)
    dev_row_lookup[(src, sig)] = i
dev_signals = set(dev_row_lookup.keys())

new_row_lookup = {}
for i, row in enumerate(new):
    src = row[new_col_lookup["Source Subdivision"]]
    sig = row[new_col_lookup["Signal"]]
    if (src, sig) in new_row_lookup:
        print("!!! src:sig duplicate in new file! --", src, ":", sig)
    new_row_lookup[(src, sig)] = i
new_signals = set(new_row_lookup.keys())

# print summary info
if dev[0] != new[0]:
    print("column ordering changed!")
print("added columns:", sorted(list(new_cols-dev_cols)))
print("removed columns:", sorted(list(dev_cols-new_cols)))
print("# rows in dev file:", len(dev))
print("# rows in new file:", len(new))
print("row count difference:", len(new)-len(dev))
print("added signals:", sorted(list(new_signals-dev_signals)))
print("removed signals:", sorted(list(dev_signals-new_signals)))
print("\n")

# TODO: detect row reorderings

# add column names to this set as needed to ignore differences found in them (to simplify output for easier analysis)
columns_to_ignore = {"XXXXXX ignore me XXXXXX"}
both_cols = [col for col in both_cols if col not in columns_to_ignore]

# show individual changes
changes_count = 0
for i in range(len(dev)):
    src = dev[i][dev_col_lookup["Source Subdivision"]]
    sig = dev[i][dev_col_lookup["Signal"]]
    if (src, sig) not in new_row_lookup:
        # this is a removed signal so no summary is displayed
        continue
    dev_ln_num = i
    new_ln_num = new_row_lookup[(src, sig)]
    # prepare properly ordered list of values from both
    dev_line = [dev[dev_ln_num][dev_col_lookup[col]] for col in both_cols]
    new_line = [new[new_ln_num][new_col_lookup[col]] for col in both_cols]
    if dev_line != new_line:
        changes_count += 1
        print("\nMISMATCH!!  [", src, ":", sig, "]  dev row:", dev_ln_num+1, "/ new row:", new_ln_num+1)
        print("\n".join(["".join([
                "  ", col, ":\n    ", dev[dev_ln_num][dev_col_lookup[col]], "\n    -->\n    ", new[new_ln_num][new_col_lookup[col]]])
                for col in both_cols if dev[dev_ln_num][dev_col_lookup[col]]!=new[new_ln_num][new_col_lookup[col]]
            ]))

print("\n")
print("lines with changes:", changes_count)

# TODO: use f-string formatting in print() statements

@melange396
Copy link
Collaborator

output from code above:

added columns: []
removed columns: []
# rows in dev file: 477
# rows in new file: 487
row count difference: 10
added signals: [('google-symptoms', 's07_raw_search'), ('google-symptoms', 's07_smoothed_search'), ('nhsn', 'confirmed_admissions_rsv_ew'), ('nhsn', 'confirmed_admissions_rsv_ew_prelim'), ('nhsn', 'hosprep_confirmed_admissions_covid_ew'), ('nhsn', 'hosprep_confirmed_admissions_covid_ew_prelim'), ('nhsn', 'hosprep_confirmed_admissions_flu_ew'), ('nhsn', 'hosprep_confirmed_admissions_flu_ew_prelim'), ('nhsn', 'hosprep_confirmed_admissions_rsv_ew'), ('nhsn', 'hosprep_confirmed_admissions_rsv_ew_prelim')]
removed signals: []



MISMATCH!!  [ nchs-mortality : deaths_allcause_incidence_num ]  dev row: 411 / new row: 413
  Signal Set:
    NCHS All Causes Deaths
    -->
    NCHS All Cause Deaths
  Name:
    All Causes Deaths (Weekly new)
    -->
    All Cause Deaths (Weekly new)

MISMATCH!!  [ nchs-mortality : deaths_allcause_incidence_prop ]  dev row: 412 / new row: 414
  Signal Set:
    NCHS All Causes Deaths
    -->
    NCHS All Cause Deaths

MISMATCH!!  [ safegraph-daily : completely_home_prop ]  dev row: 442 / new row: 444
  Signal Set:
    Safegraph Daily Mobility Data
    -->
    Safegraph Home Mobility

MISMATCH!!  [ safegraph-daily : completely_home_prop_7dav ]  dev row: 443 / new row: 445
  Signal Set:
    Safegraph Daily Mobility Data
    -->
    Safegraph Home Mobility

MISMATCH!!  [ safegraph-daily : full_time_work_prop ]  dev row: 444 / new row: 446
  Signal Set:
    Safegraph Daily Mobility Data
    -->
    Safegraph Home Mobility

MISMATCH!!  [ safegraph-daily : full_time_work_prop_7dav ]  dev row: 445 / new row: 447
  Signal Set:
    Safegraph Daily Mobility Data
    -->
    Safegraph Home Mobility

MISMATCH!!  [ safegraph-daily : median_home_dwell_time ]  dev row: 446 / new row: 448
  Signal Set:
    Safegraph Daily Mobility Data
    -->
    Safegraph Home Mobility

MISMATCH!!  [ safegraph-daily : median_home_dwell_time_7dav ]  dev row: 447 / new row: 449
  Signal Set:
    Safegraph Daily Mobility Data
    -->
    Safegraph Home Mobility

MISMATCH!!  [ safegraph-daily : part_time_work_prop ]  dev row: 448 / new row: 450
  Signal Set:
    Safegraph Daily Mobility Data
    -->
    Safegraph Home Mobility

MISMATCH!!  [ safegraph-daily : part_time_work_prop_7dav ]  dev row: 449 / new row: 451
  Signal Set:
    Safegraph Daily Mobility Data
    -->
    Safegraph Home Mobility

MISMATCH!!  [ safegraph-weekly : bars_visit_num ]  dev row: 450 / new row: 452
  Signal Set:
    Safegraph Weekly Mobility Data
    -->
    Safegraph POI Mobility

MISMATCH!!  [ safegraph-weekly : bars_visit_prop ]  dev row: 451 / new row: 453
  Signal Set:
    Safegraph Weekly Mobility Data
    -->
    Safegraph POI Mobility

MISMATCH!!  [ safegraph-weekly : restaurants_visit_num ]  dev row: 452 / new row: 454
  Signal Set:
    Safegraph Weekly Mobility Data
    -->
    Safegraph POI Mobility

MISMATCH!!  [ safegraph-weekly : restaurants_visit_prop ]  dev row: 453 / new row: 455
  Signal Set:
    Safegraph Weekly Mobility Data
    -->
    Safegraph POI Mobility

MISMATCH!!  [ nhsn : confirmed_admissions_covid_ew ]  dev row: 474 / new row: 476
  Short Description:
    Total number of patients hospitalized with confirmed COVID-19 over the entire week (Sunday-Saturday).
    -->
    COVID-19 hospital admissions per week (final)
  Description:
    Total number of patients hospitalized with confirmed COVID-19 over the entire week (Sunday-Saturday). Only includes hospitalizations whose report was received before the Friday or Saturday of the following week.
    -->
    Total number of patients hospitalized with confirmed COVID-19 over the epi-week (Sunday-Saturday). Only includes hospitalizations whose report was received before the Friday or Saturday of the following week.
  Member Short Name:
    final
    -->
    "
  Format:
    
    -->
    count
  Source Name:
    National Healthcare Safety Network Respiratory Hospitalizations
    -->
    National Healthcare Safety Network
  Severity Pyramid Rungs:
    
    -->
    hospitalized

MISMATCH!!  [ nhsn : confirmed_admissions_covid_ew_prelim ]  dev row: 475 / new row: 478
  Short Description:
    Total number of patients hospitalized with confirmed COVID-19 over the entire week (Sunday-Saturday).
    -->
    COVID-19 hospital admissions per week (preliminary)
  Description:
    Total number of patients hospitalized with confirmed COVID-19 over the entire week (Sunday-Saturday). Only includes hospitalizations whose report was received before the Wednesday of the following week.
    -->
    Total number of patients hospitalized with confirmed COVID-19 over the epi-week (Sunday-Saturday). Only includes hospitalizations whose report was received before the Wednesday of the following week.
  Member Short Name:
    prelim
    -->
    "
  Format:
    
    -->
    count
  Source Name:
    National Healthcare Safety Network Respiratory Hospitalizations
    -->
    National Healthcare Safety Network
  Severity Pyramid Rungs:
    
    -->
    hospitalized

MISMATCH!!  [ nhsn : confirmed_admissions_flu_ew ]  dev row: 476 / new row: 480
  Short Description:
    Total number of patients hospitalized with confirmed influenza over the entire week (Sunday-Saturday). 
    -->
    flu hospital admissions per week (final)
  Description:
    Total number of patients hospitalized with confirmed influenza over the entire week (Sunday-Saturday). Only includes hospitalizations whose report was received before the Friday or Saturday of the following week.
    -->
    Total number of patients hospitalized with confirmed influenza over the epi-week (Sunday-Saturday). Only includes hospitalizations whose report was received before the Friday or Saturday of the following week.
  Member Short Name:
    final
    -->
    "
  Format:
    
    -->
    count
  Source Name:
    National Healthcare Safety Network Respiratory Hospitalizations
    -->
    National Healthcare Safety Network
  Severity Pyramid Rungs:
    
    -->
    hospitalized

MISMATCH!!  [ nhsn : confirmed_admissions_flu_ew_prelim ]  dev row: 477 / new row: 482
  Short Description:
    Total number of patients hospitalized with confirmed influenza over the entire week (Sunday-Saturday).
    -->
    flu hospital admissions per week (preliminary)
  Description:
    Total number of patients hospitalized with confirmed influenza over the entire week (Sunday-Saturday). Only includes hospitalizations whose report was received before the Wednesday of the following week.
    -->
    Total number of patients hospitalized with confirmed influenza over the epi-week (Sunday-Saturday). Only includes hospitalizations whose report was received before the Wednesday of the following week.
  Member Short Name:
    prelim
    -->
    "
  Format:
    
    -->
    count
  Source Name:
    National Healthcare Safety Network Respiratory Hospitalizations
    -->
    National Healthcare Safety Network
  Severity Pyramid Rungs:
    
    -->
    hospitalized


lines with changes: 18

Copy link

@carlynvandyke carlynvandyke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, sorry it took so long to review!

@melange396 melange396 merged commit 2e6cd7e into dev Feb 28, 2025
7 checks passed
@melange396 melange396 deleted the bot/update-docs branch February 28, 2025 21:08
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants