Skip to content

rxavier/econuy

Repository files navigation

Overview

This project simplifies gathering and processing of Uruguayan economic statistics. Data is retrieved from (mostly) government sources, processed into a familiar tabular format, tagged with useful metadata and can be transformed in several ways (converting to dollars, calculating rolling averages, resampling to other frequencies, etc.).

If this screenshot gives you anxiety, this package should be of interest.

A webapp with a limited but interactive version of econuy is available at econ.uy. Check out the repo as well.

The most basic econuy workflow goes like this:

from econuy import load_dataset, load_datasets_parallel

data1 = load_dataset("cpi")

Installation

  • PyPI:
pip install econuy
  • Git:
git clone https://github.com/rxavier/econuy.git
cd econuy
python setup.py install

Usage

Full API documentation available at RTD

Cache directory

econuy saves and reads data to a directory which by default is at the system home / .cache / econuy. This can be modified for all data loading by setting ECONUY_DATA_DIR or directly in load_dataset(data_dir=...).

Dataset load branching

  1. Check that the dataset exists in the REGISTRY.
  2. Cache check:
  • If skip_cache=True, download dataset
  • If skip_cache=False (default):
    • Check whether the dataset exists in the cache.
      • If it exists:
        • Recency check:
          • If it was created in the last day, return existing dataset.
          • If it was created prior to the last day and skip_update=False, download dataset.
          • If it was created prior to the last day and skip_update=True, return existing dataset.
      • If it does not exist, download dataset
  1. If the dataset was downloaded, try to update the cache:
  • Validation:
    • If force_overwrite=True, overwrite dataset.
    • If force_overwrite=False (default):
      • If the new dataset is similar to the cached dataset, overwrite dataset.
      • If the new dataset is not similar to the cached dataset, do not overwrite dataset.

Loading and transforming data

from econuy import load_dataset, load_datasets_parallel


# load a single dataset
data1 = load_dataset("cpi")

# load a single dataset and chain transformations
data2 = (
    load_dataset("fiscal_balance_nonfinancial_public_sector")
    .select(names="Ingresos: SPNF")
    .resample("QE-DEC", "sum")
    .decompose(method="x13", component="t-c")
    .filter(start_date="2014-01-01")
    )

This returns a Dataset object, which contains a Metadata object.

You can also load multiple datasets fast:

# load multiple datasets using threads or processes
data3 = load_datasets_parallel(["nxr_monthly", "ppi"])

Finding datasets

from econuy.utils.operations import REGISTRY


REGISTRY.list_available()
REGISTRY.list_by_area("activity")

Dataset metadata

Datasets include the following metadata per indicator:

  1. Indicator name
  2. Area
  3. Frequency
  4. Currency
  5. Inflation adjustment
  6. Unit
  7. Seasonal adjustment
  8. Type (stock or flow)
  9. Cumulative periods

Transformation methods

Dataset objects have multiple methods to transform their underlying data and update their metadata.

  • resample() - resample data to a different frequency, taking into account whether data is of stock or flow type.
  • chg_diff() - calculate percent changes or differences for same period last year, last period or at annual rate.
  • decompose() - seasonally decompose series into trend or seasonally adjusted components.
  • convert() - convert to US dollars, constant prices or percent of GDP.
  • rebase() - set a period or window as 100, scale rest accordingly
  • rolling() - calculate rolling windows, either average or sum.

External binaries and libraries

unrar libraries

The patool package is used in order to access data provided in .rar format. This package requires that you have the unrar binaries in your system, which in most cases you should already have. You can can get them from here if you don't.


Caveats

This project is heavily based on getting data from online sources that could change without notice, causing methods that download data to fail. While I try to stay on my toes and fix these quickly, it helps if you create an issue when you find one of these (or even submit a fix!).