Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

On startup, have services print their name and version info #987

Open
melange396 opened this issue Sep 29, 2022 · 6 comments
Open

On startup, have services print their name and version info #987

melange396 opened this issue Sep 29, 2022 · 6 comments
Labels
code health readability, maintainability, best practices, etc devops building, running, deploying, environment stuff, handy utils, repository-related, engineer QoL, etc help wanted

Comments

@melange396
Copy link
Collaborator

On startup/restart, have services print (log) their name and some sort of versioning info, like docker image name + build date, and git repo + branch + tag + gitref (commit id), and etc...

Also print/log the current time (obviously), as well as the command line and arguments provided (though this could be problematic where passwords are provided by command line arguments)

Docker metadata isnt included in or exposed to running containers by default, so we might need to get that by adding something to our image build process (by adding stuff to the image filesystem).

The services where this should be applied include acquisition csv processing, acquisition metadata generation, webservice/api, and pipelines. We can find many of the places to insert this by looking for references to "__main__" in the code (by running grep -rl __main__ . -- ill put the output of the files found by this command in the first comment) but this is not necessarily exhaustive depending on how jobs are brought up (gunicorn, for example, doesnt seem to need main to start the webapp).

This issue is very closely related to cmu-delphi/covidcast-indicators#1702

@melange396
Copy link
Collaborator Author

Files to add this to include (but may not be limited to) :

./src/acquisition/afhsb/afhsb_csv.py
./src/acquisition/afhsb/afhsb_update.py
./src/acquisition/cdcp/cdc_dropbox_receiver.py
./src/acquisition/cdcp/cdc_extract.py
./src/acquisition/cdcp/cdc_upload.py
./src/acquisition/covidcast/delete_batch.py
./src/acquisition/covidcast/migrate_epidata_to_v4.py
./src/acquisition/covidcast_nowcast/load_sensors.py
./src/acquisition/covidcast/signal_dash_data_generator.py
./src/acquisition/covid_hosp/common/utils.py
./src/acquisition/ecdc/ecdc_db_update.py
./src/acquisition/flusurv/flusurv_update.py
./src/acquisition/fluview/fluview_notify.py
./src/acquisition/fluview/fluview_update.py
./src/acquisition/fluview/impute_missing_values.py
./src/acquisition/ght/ght_update.py
./src/acquisition/ght/google_health_trends.py
./src/acquisition/kcdc/kcdc_update.py
./src/acquisition/nidss/taiwan_nidss.py
./src/acquisition/nidss/taiwan_update.py
./src/acquisition/norostat/norostat_add_history.py
./src/acquisition/norostat/norostat_update.py
./src/acquisition/paho/paho_db_update.py
./src/acquisition/paho/paho_download.py
./src/acquisition/quidel/quidel_update.py
./src/acquisition/twtr/healthtweets.py
./src/acquisition/twtr/twitter_update.py
./src/acquisition/wiki/wiki_download.py
./src/acquisition/wiki/wiki_extract.py
./src/acquisition/wiki/wiki.py
./src/acquisition/wiki/wiki_update.py
./src/server/covidcast_issues_migration/proc_db_backups_pd.py

and especially:

./src/server/main.py
./src/acquisition/covidcast/covidcast_meta_cache_updater.py
./src/acquisition/covidcast/csv_to_database.py

@melange396
Copy link
Collaborator Author

here are some ways to get git metadata:

# repo name (and then some)
git config --get remote.origin.url

# branch and hash ref
git log -n 1 --format=format:'%D @%h [%H]'

and possibly still useful but probably less so:

# branch and remote and some meta maybe
git status -b | head -2

# a hodgepodge... (sometimes also tells you if its dirty (not reliably?))
git describe --long --tags --always --dirty

# "nearest" tag (if there is one?  sometimes?)
git describe --abbrev=0 --tags --always
git describe --contains `git rev-parse HEAD` --always

@melange396 melange396 added code health readability, maintainability, best practices, etc devops building, running, deploying, environment stuff, handy utils, repository-related, engineer QoL, etc labels Oct 6, 2022
@melange396
Copy link
Collaborator Author

it may also be worth including a list of all of the installed packages/libraries and their version numbers

@melange396
Copy link
Collaborator Author

we may also want to print the values of all of the environment variables, though like with command line args as listed above, this has a chance to possibly reveal passwords.

@melange396
Copy link
Collaborator Author

it may also be worth including a list of all of the installed packages/libraries and their version numbers

This can be done with the simple pip freeze command, or with something more intricate like the procedure described at https://stackoverflow.com/a/69081814. It would be helpful if we could generate an alert when any of the versions change, as a way of cross-referencing with other events like errors or changes in performance.

@melange396
Copy link
Collaborator Author

Updated git log format suggestion:

git log -n 1 --format=format:'%h @ %ci %d'

More ideas for logging dependency environments programatically (each has its drawbacks) :

# uses nonpublic submodule  :(
# shell command rough equivalents:
#     pip3 freeze --all
#     pip3 list --include-editable --pre  --format=freeze
from pip._internal.operations.freeze import freeze
for requirement in freeze():
    print(requirement)


# deprecated, doesnt work past python 3.9ish?  :(
import pkg_resources
dists = [str(d).replace(" ","==") for d in pkg_resources.working_set]
for i in dists:
    print(i)


# not supported til python 3.10ish?  :(
from importlib import metadata
for dist in metadata.distributions():
    print(f"{dist.name}=={dist.version}")


# dont do this, it loads all the packages into memory
import pkgutil
for p in pkgutil.walk_packages(onerror=lambda e: print(f"WHOOPS: {e}")):
    print(p.name)
import sys
for module in sys.modules.values():
    if hasattr(module, "__version__"):
	print(f"{module.__name__}=={module.__version__}")
    else:
        print(module.__name__)

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
code health readability, maintainability, best practices, etc devops building, running, deploying, environment stuff, handy utils, repository-related, engineer QoL, etc help wanted
Projects
None yet
Development

No branches or pull requests

1 participant