Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

WiP: add csv parsing and start config support #4

Open
wants to merge 12 commits into
base: master
Choose a base branch
from

Conversation

jochym
Copy link
Contributor

@jochym jochym commented Jan 11, 2023

This is a start of small tweaks:

  • Add csv parsing and data download support functions as alternative data source (new Tauron API)
  • Start support for on-disk config file (no passwords on command line)
  • Two example apps (quick and dirty for now): list data and plot range.

This is to keep track and share. I will be working on it.

@jochym
Copy link
Contributor Author

jochym commented Jan 16, 2023

@mlesniew Take a look. This is a semi-finished state. It works on my system, actually it replaced a previous solution (very "convoluted"). There are also workarounds for some quirks/bugs in the tauron API. I did not check, but the json API may need similar fixes (different end-of-day time for single and multiple days: I use 0-23 convention). The examples may be usable. I suspect you would prefer to merge only some of this, if at all.

@mlesniew
Copy link
Owner

Thanks for the submission.

Getting the readings using the CSV seems better than using the JSON API. It allows getting both production and consumption readings with one request. The new JSON API doesn't seem to have that option. I would like to include the code to use the CSV API and make that the default, leaving the JSON variant that can be used when it's explicitly requested.

While the current code works perfectly, I would like to tidy it up a bit before merging to master. I will try to add comments about possible minor improvements soon, especially in src/elicznik/elicznik.py.

There is however one higher level improvement that I'd like to suggest. With the current implementation we have two methods in ELicznik which can be used to retrieve the readings. Functionally they're the same. This can be confusing to a user of the module. We could, of course, change the method names so that they'd clearly reflect the API used internally. However, I think a better approach would be to introduce a new class, which would present the same interface, but internally it would download the CSV data. I can pull in your changes under src/ and make these adjustments on top of them.

Regarding the examples -- I didn't really have time yet to review them carefully, but we can add them to the repo, especially for their educational aspect (so that others can learn how to use the module from them). I would really appreciate if you could submit examples in a separate pull request if you'd like me to merge them. Also, please try to put them in separate directories under examples/ (seems it's a set of examples), try to document them and provide a requirements.txt file if they have extra dependencies.

You can have a look at an example script that I've added in meantime: https://github.com/mlesniew/elicznik/tree/master/examples/google-sheets, all feedback is welcome.

Before the final merge, please make sure to squash commits into logically complete bits and rebase to master, so that we can do a fast forward merge.

"form[oze]": 1,
"form[fileType]": "CSV", # or "XLS"
},
).content.decode().split('\n')
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.content.decode() should be replaced by .text. It will decode the bytes to a string automatically and it should use the right encoding based on HTTP response headers.


def get_data(self, start_date, end_date=None):
end_date = end_date or start_date
data = csv.reader(self.get_raw_data(start_date, end_date)[1:], delimiter=';')
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use csv.DictReader instead of csv.reader. It will parse the header row automatically and produce a dict for each row.

continue
date, hour = t.split()
h, m = hour.split(':')
timestamp = datetime.datetime.strptime(date, "%d.%m.%Y")
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be possible to parse the whole time stamp in one go using strptime. However, I'm not sure if it will work correctly with dates like 16.01.2023 24:00.

timestamp += datetime.timedelta(hours=int(h), minutes=int(m))
# Skip records outside a single day block
if start_date == end_date and timestamp.day != start_date.day:
print(f'Skip {timestamp} not within {start_date}.')
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid print statements here. For anything critical, raise an exception. logging could be used for everything else, but the module is probably to simple to introduce that.

@mlesniew
Copy link
Owner

@jochym, I have created a new pull request #6 with just the CSV changes extracted and with the changes above applied. If you can, please have a look before I merge this. Unfortunately, I could not add changes directly to your pull request.

@mlesniew mlesniew mentioned this pull request May 22, 2023
@jochym
Copy link
Contributor Author

jochym commented May 23, 2023

Thanks. I will. Next week please ;).

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants