Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Download data files #6

Open
skuschel opened this issue Aug 4, 2022 · 6 comments · May be fixed by #9
Open

Download data files #6

skuschel opened this issue Aug 4, 2022 · 6 comments · May be fixed by #9
Labels
enhancement New feature or request

Comments

@skuschel
Copy link
Owner

skuschel commented Aug 4, 2022

Hey @anatoli-ulmer
I am not conviced that #5 was a top-shot for two reasons: first, manually downloading will fail unless you install in developer mode, and even then, its a pain. second, the downloaded data isnt even used at all. I think this is the point where we should start code reviews with one another before merging.

Thanks for spotting the bug fixed c59c37d. Next time, maybe you can add to the commit message, that functools.cache was introduced in python 3.9. So its easier to understand why this is a bug.

How do we proceed with the downloads? I would just revert 47b020f and 998b99c for now as they do not add to functionality.

@anatoli-ulmer
Copy link
Collaborator

Hey @skuschel,
1.) the manual download is for machines without internet access, e.g., psana@SLAC
2.) I don't understand what you mean when you say that it will fail. Can you please clarify?
3.) I did not remove the functions that are downloading the data live from the web. I think both methods can and should coexist.
4.) I think on the long term a function which stores the downloaded data locally makes a lot of sense and would therefore keep 47b020f and 998b99c, especially as I am currently using them and would get into conflict with my branch.

@skuschel
Copy link
Owner Author

skuschel commented Aug 4, 2022

Hi Anatoli,

I like the idea that we download the database if it's regularly used or the computer doesn't have internet. However having another function in another file for that purpose doesn't make sense. There should be just one function that either find the files or downloads it's on spot, so that the user doesn't have think about it.

With failing I mean that you cannot normally install it this way. The way you are suggesting will I my work if it's installed in development mode. When using the standard './setup.py install --user' - where to extract the files to then?

I think we should just find a better solution for it:

  1. I have no experience in using application directories, so we should just figure out how that is done properly.
  2. the download and unpacking is very much error prone. That can be a python command within the program which does just that including an integrity check.
  3. combine it with the functions which are already there so that the configuration of the package (data download or not) does not change the programs using it.

@skuschel
Copy link
Owner Author

skuschel commented Aug 4, 2022

Found something. Unfortunately it's not in the standard library. Probably a bit more work to check if that is worth the dependence.

https://stackoverflow.com/questions/13184414/how-can-i-get-the-path-to-the-appdata-directory-in-python

@skuschel
Copy link
Owner Author

skuschel commented Aug 4, 2022

Appdirs looks pretty good I would say

@anatoli-ulmer
Copy link
Collaborator

I see the problem with the data storage directory now and will try appdirs, thanks.

The plan is to have only one function to access the data in the end for a quick solution, but to have multiple possibilities to actually get the data and also choose the data source, e.g., online/offline henke or nist. The data read function will not be in another file but in the same one, which btw. has two data source functions (henke/nist). So I really don't see the problem here.

@skuschel
Copy link
Owner Author

skuschel commented Aug 5, 2022

Correct. We have two functions for two different datasources. They give us the data in the same format, but they give us different data. Having two henke functions giving identical data from a different source is just bad practice and here is why: Say you have a running program which you now move to different computer and you want the program to use the other souce. This change should just be a configuration change of our installed package. If there are two different functions you would need to change the program everywhere it uses the henke data. Surely no one wants to do that.

@anatoli-ulmer anatoli-ulmer linked a pull request Aug 8, 2022 that will close this issue
@skuschel skuschel added the enhancement New feature or request label Aug 9, 2022
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants