This repository contains the language-agnostic documentation of the R and Python libraries implemented on the framework of the BIGG-EKATE AI toolbox. This AI toolbox is formed by the following modules:
- Data preparation
- Data trasnsformation
- Modelling
- Reinforce learning
The first two modules have been developed within the framework of the BIGG H2020 project and of the EKATE project. The EKATE project is a project funder by the POCTEFA Interreg Progrma. The The Building Information aGGregation, harmonisation and analytics (BIGG) project is a EU-funded project to aims at demonstrating the application of big data technologies and data analytics techniques for the complete buildings life-cycle of more than 4000 buildings in 6 large-scale pilot test-beds.
The functionalities implemented in biggr and biggpy requires the usage of harmonised datasets of the energy consumption, weather information, building characteristics and thermal conditions of buildings. This data model is presented in the WP4 of the BIGG project, and can be found in this repository.
- R:
character
class - Python:
string
type
- R:
float
class - Python:
float
type
- R:
integer
class - Python:
int
type
- R:
logical
class - Python:
bool
type
- R:
Date
class - Python:
datetime.date
class (datetime library is needed)
- R:
POSIXct
class in UTC timezone. - Python:
datetime.datetime
class (datetime library is needed) in UTC timezone.
- R:
c(,,...,)
function, where all elements belong to character, float, integer, logical, Date or POSIXct classes. - Python:
list
[,,...,]
, where all elements belong to the types
string, float, integer, or boolean, or to the datetime.date or datetime.datetime classes or whatever Python class.
- R:
list(:,:,...,:)
, where all keys belong to character class, and values can be float, integer, logical, Date, POSIXct, list or data.frame classes. - Python:
dict
{:,:,...,:}
, where all keys belong to string, int or any non mutable type, and values can be string, float, integer, boolean, list or whatever Python class.
Generators are iterators, but you can only iterate over them once.
It’s because they do not store all the values in memory, they generate the values on the fly.
Generators are best for calculating large sets of results (particularly calculations involving loops themselves) where
you don’t want to allocate the memory for all results at the same time.
You use them by iterating over them, either with a ‘for’ loop or by passing them to any function or construct that
iterates. Most of the time generators are implemented as functions. However, they do not return
a value,
they yield
it.
- Python:
generator
A generator expression in Python can be created with a one-liner expression like:
g = (x**2 for x in range(10))
print g.next()
or can be created by defining a python function that yield
a result, like:
def __gen(exp):
for x in exp:
yield x**2
g = __gen(iter(range(10)))
print g.next()
- R:
generator
A generator expression in R can be created in multiple ways, one of them is using thecoro
library:
library(coro)
gen <- generator(
function(X){
for(x in X){
yield(x**2)
}
}
)
# Get the elements one by one, by iteratively execute g()
g = gen(0:9)
g()
# Get all the elements at once
g = gen(0:9)
collect(g)
-
R:
data.frame
with two columns. The first one, named "time", defining the series' initial timestamp using POSIXct class and UTC timezone. The second column represents the value column and it can be of whatever class needed by the variable (character, float, integer, factor
). -
Python:
pandas.DataFrame
class with a DateTimeIndex and one column representing the values of the series. The series type can be whatever is needed by the variable (string, float, integer
) -
Note! In the case of non cumulative consumption or some weather feature time series, the time stamp of each element represents the initial time where the value applies. We assume that it does not change during that time step, until the next time series element.
- R:
data.frame
- Python:
pandas.DataFrame
- R:
whatever object
that can be serialised - Python:
whatever object
that can be serialised
It contains the functions used for time stamps alignment, data validation, time gaps detection, outlier detection and missing data management
Access to the data preparation module documentation
It contains functions used for transforming cleaned datasets to valuable features for the usage in statistical and machine learning algorithms. This functions includes low pass filtering techniques, fourier decomposition, obtaining calendar features, multiple seasonalities profiling, weather features transformations, among others.
Access to the data transformation module documentation
This module contains the functions used to train, optimise, predict and assess the statistical or machine learning models applied to energy consumption, thermal comfort and user behaviour in buildings.
Access to the data transformation module documentation
This is a very brief description of the functions implemented in this module.