Skip to content

Language-agnostic documentation of the AI toolbox of the BIGG project

Notifications You must be signed in to change notification settings

biggproject/biggdocs

Repository files navigation

Description

This repository contains the language-agnostic documentation of the R and Python libraries implemented on the framework of the BIGG-EKATE AI toolbox. This AI toolbox is formed by the following modules:

  • Data preparation
  • Data trasnsformation
  • Modelling
  • Reinforce learning

The first two modules have been developed within the framework of the BIGG H2020 project and of the EKATE project. The EKATE project is a project funder by the POCTEFA Interreg Progrma. The The Building Information aGGregation, harmonisation and analytics (BIGG) project is a EU-funded project to aims at demonstrating the application of big data technologies and data analytics techniques for the complete buildings life-cycle of more than 4000 buildings in 6 large-scale pilot test-beds.

Related repositories

Required data model for the input

The functionalities implemented in biggr and biggpy requires the usage of harmonised datasets of the energy consumption, weather information, building characteristics and thermal conditions of buildings. This data model is presented in the WP4 of the BIGG project, and can be found in this repository.

Data types definition

string

  • R: character class
  • Python: string type

float

  • R: float class
  • Python: float type

integer

  • R: integer class
  • Python: int type

boolean

  • R: logical class
  • Python: bool type

date

  • R: Date class
  • Python: datetime.date class (datetime library is needed)

datetime

  • R: POSIXct class in UTC timezone.
  • Python: datetime.datetime class (datetime library is needed) in UTC timezone.

list

  • R: c(,,...,) function, where all elements belong to character, float, integer, logical, Date or POSIXct classes.
  • Python: list[,,...,], where all elements belong to the types
    string, float, integer, or boolean, or to the datetime.date or datetime.datetime classes or whatever Python class.

dictionary

  • R: list(:,:,...,:), where all keys belong to character class, and values can be float, integer, logical, Date, POSIXct, list or data.frame classes.
  • Python: dict{:,:,...,:}, where all keys belong to string, int or any non mutable type, and values can be string, float, integer, boolean, list or whatever Python class.

generator

Generators are iterators, but you can only iterate over them once. It’s because they do not store all the values in memory, they generate the values on the fly. Generators are best for calculating large sets of results (particularly calculations involving loops themselves) where you don’t want to allocate the memory for all results at the same time. You use them by iterating over them, either with a ‘for’ loop or by passing them to any function or construct that iterates. Most of the time generators are implemented as functions. However, they do not return a value, they yield it.

  • Python: generator A generator expression in Python can be created with a one-liner expression like:
g = (x**2 for x in range(10))
print g.next()

or can be created by defining a python function that yield a result, like:

def __gen(exp):
    for x in exp:
        yield x**2
g = __gen(iter(range(10)))
print g.next()
  • R: generator A generator expression in R can be created in multiple ways, one of them is using the coro library:
library(coro)
gen <- generator(
  function(X){
    for(x in X){
      yield(x**2)
    }
  }
)
# Get the elements one by one, by iteratively execute g()
g = gen(0:9)
g()
# Get all the elements at once
g = gen(0:9)
collect(g)

timeSeries

  • R: data.frame with two columns. The first one, named "time", defining the series' initial timestamp using POSIXct class and UTC timezone. The second column represents the value column and it can be of whatever class needed by the variable (character, float, integer, factor).

  • Python: pandas.DataFrame class with a DateTimeIndex and one column representing the values of the series. The series type can be whatever is needed by the variable (string, float, integer)

  • Note! In the case of non cumulative consumption or some weather feature time series, the time stamp of each element represents the initial time where the value applies. We assume that it does not change during that time step, until the next time series element.

matrix

  • R: data.frame
  • Python: pandas.DataFrame

object

  • R: whatever object that can be serialised
  • Python: whatever object that can be serialised

Modules of the AI toolbox

Data preparation

It contains the functions used for time stamps alignment, data validation, time gaps detection, outlier detection and missing data management

Access to the data preparation module documentation

Data transformation

It contains functions used for transforming cleaned datasets to valuable features for the usage in statistical and machine learning algorithms. This functions includes low pass filtering techniques, fourier decomposition, obtaining calendar features, multiple seasonalities profiling, weather features transformations, among others.

Access to the data transformation module documentation

Modelling

This module contains the functions used to train, optimise, predict and assess the statistical or machine learning models applied to energy consumption, thermal comfort and user behaviour in buildings.

Access to the data transformation module documentation

Reinforcement Learning

This is a very brief description of the functions implemented in this module.

Access to the data transformation module documentation

About

Language-agnostic documentation of the AI toolbox of the BIGG project

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published