This project simplifies gathering and processing of Uruguayan economic statistics. Data is retrieved from (mostly) government sources, processed into a familiar tabular format, tagged with useful metadata and can be transformed in several ways (converting to dollars, calculating rolling averages, resampling to other frequencies, etc.).
If this screenshot gives you anxiety, this package should be of interest.
A webapp with a limited but interactive version of econuy is available at econ.uy. Check out the repo as well.
The most basic econuy workflow goes like this:
from econuy import load_dataset, load_datasets_parallel
data1 = load_dataset("cpi")
- PyPI:
pip install econuy
- Git:
git clone https://github.com/rxavier/econuy.git
cd econuy
python setup.py install
Full API documentation available at RTD
econuy saves and reads data to a directory which by default is at the system home / .cache / econuy
. This can be modified for all data loading by setting ECONUY_DATA_DIR
or directly in load_dataset(data_dir=...)
.
- Check that the dataset exists in the
REGISTRY
. - Cache check:
- If
skip_cache=True
, download dataset - If
skip_cache=False
(default):- Check whether the dataset exists in the cache.
- If it exists:
- Recency check:
- If it was created in the last day, return existing dataset.
- If it was created prior to the last day and
skip_update=False
, download dataset. - If it was created prior to the last day and
skip_update=True
, return existing dataset.
- Recency check:
- If it does not exist, download dataset
- If it exists:
- Check whether the dataset exists in the cache.
- If the dataset was downloaded, try to update the cache:
- Validation:
- If
force_overwrite=True
, overwrite dataset. - If
force_overwrite=False
(default):- If the new dataset is similar to the cached dataset, overwrite dataset.
- If the new dataset is not similar to the cached dataset, do not overwrite dataset.
- If
from econuy import load_dataset, load_datasets_parallel
# load a single dataset
data1 = load_dataset("cpi")
# load a single dataset and chain transformations
data2 = (
load_dataset("fiscal_balance_nonfinancial_public_sector")
.select(names="Ingresos: SPNF")
.resample("QE-DEC", "sum")
.decompose(method="x13", component="t-c")
.filter(start_date="2014-01-01")
)
This returns a Dataset
object, which contains a Metadata
object.
You can also load multiple datasets fast:
# load multiple datasets using threads or processes
data3 = load_datasets_parallel(["nxr_monthly", "ppi"])
from econuy.utils.operations import REGISTRY
REGISTRY.list_available()
REGISTRY.list_by_area("activity")
Datasets include the following metadata per indicator:
- Indicator name
- Area
- Frequency
- Currency
- Inflation adjustment
- Unit
- Seasonal adjustment
- Type (stock or flow)
- Cumulative periods
Dataset
objects have multiple methods to transform their underlying data and update their metadata.
resample()
- resample data to a different frequency, taking into account whether data is of stock or flow type.chg_diff()
- calculate percent changes or differences for same period last year, last period or at annual rate.decompose()
- seasonally decompose series into trend or seasonally adjusted components.convert()
- convert to US dollars, constant prices or percent of GDP.rebase()
- set a period or window as 100, scale rest accordinglyrolling()
- calculate rolling windows, either average or sum.
The patool package is used in order to access data provided in .rar
format. This package requires that you have the unrar
binaries in your system, which in most cases you should already have. You can can get them from here if you don't.
This project is heavily based on getting data from online sources that could change without notice, causing methods that download data to fail. While I try to stay on my toes and fix these quickly, it helps if you create an issue when you find one of these (or even submit a fix!).