@author smartin@groupcls.com: Project manager & Product owner
@author jaisus666@hotmail.fr: Software architect, Developer
How to read this file? Use a markdown reader: plugins chrome exists (Once installed in Chrome, open URL chrome://extensions/, and check "Markdown Preview"/Authorise access to file URL.), or for firefox (anchor tags do not work) and also plugin for notepadd++.
Be careful: Markdown format has issue while rendering underscore "_" character which can lead to bad variable name or path.
These scripts are dedicated to automatization of schema's replication. In a context where you have a master schema in a PostgreSQL database, and you want to ensure that all other schemas are the exact replication of this master, you need to have each object structured and to control at each single change that everything is consistant. Then, these scripts overwatch the database to automatically:
- Create a new child schema using the current structure of the master schema
- Harmonize names for each object
- Deploy each change made on the master schema
- Cancel each change made on a child schema if there is no corresponding change in the master schema
A technical schema, named 'common', is used to store common functions. Once the scripts are installed, changes will be automatically detected and the related action automatically launched.
A scrit is dedicated to deploy a brand new environment : init_all.sh This script needs to be in a folder that contains the following subfolders:
- DWH_WASTE_TU : contains unit tests
- waste_dwh_script : contains multi-schema management scripts
You will be prompted to detail the server where the database will be created and related authentication details.
The script will:
- Create / modify the given database with all necessary components
- Launch automated tests to ensure everything will run correctly
At this stage, it is strongly recommanded to never launch this script on an existing database unless you intend to delete it (wich will also remove all data without saving them nor prompting any advice) and create it again.
Two main schemas are created:
- common, a technical schema where all common features are stored
- master, the master schema is the single one where changes are supposed to be performed.
Dedicated roles are also automatically created:
- A Developper role with the following access:
- Create, update, delete tables in master schema
- Read data in master schema
- Create and delete indexes in master schema
- Create primary and foreign keys in master schema
- Create, update, delete sequences in master schema
- It is not possible for a developper to make changes on a child schema, nor update data in a child schema
- A Dashboard role able to read data in the master schema and all children schemas
- A Administrator role, which is NOT postgres, to allow to modify the whole system for maintenance purposes. This role is the only one with the ability to create and delete child schemas.
- reader_all: read-only access to the master schema and all children
- writer_all: create, read, update, delete access to the master schema and all children
Each time a new child schema is created, it comes with predefined roles (password = login):
- reader: read-only access to the related schema
- writer: access to create, read, update and delete data in the related schema
The scripts create procedures and triggers to manage child schemas' creation (and deletion) and ensure each child has always the same structure than the master schema.
The scripts are developped and tested for PostgreSQL 11.x databases. In the common schema, a table named Customer is dedicated to store each child name. These scripts have been developed to manage customer data and this solution requires that one schema is created for each single client.
When a primary key is defined (in the master schema) and when it is based on a single integer column, then a sequence will be created in the master schema. This sequence will be the default value for this primary key. Thus, due to the fact that, in PostgreSQL, a sequence is non-transactional, the unicity of the key is guaranted even across child tables that will be created using this sequence for their primary keys. Therefore, IT IS STRONGLY RECOMMANDED to create primary keys on single integer columns.
Some EVENT TRIGGERS will be created on the following actions: CREATE TABLE, ALTER TABLE, DROP TABLE, CREATE INDEX. These triggers will call necessary stored procedures to ensure each action made on the master schema will be reproduced on each child schema.
Scripts are splitted in several categories:
- Procedures: commands launched manually for a single action
- Triggers: commands launched when specific events occur
- Scripts: commands launched by procedures or triggers to perform unitary or global actions
- Tools : commands created for supervision, administration and maintenance purposes
Each of them is described hereafter.
The next procedures will be launched individually to perform some specific actions.
To call this procedure, use the following SQL command:
call common.create_new_client('[nom_client]', '[password]');
This is supposed to be the only way to create a new schema.
It performs the following actions:
- If table common.dwh_dm_client does not exists, it is created
- Create a new ligne in common.client (after an included unicity check on the schema's name)
- Create a new child schema using the given name
- Create all tables and objects according to the current master structure
- Primary keys are named according this rule : 'pk_[nom_table]_[nom_client]'
- Foreign keys are named according this rule : 'fk_[nom_table_support_contrainte][nom_table_référencée][nom_client]_[numéro_itéré]'
- Reader role is named : 'reader_[nom_client]'; his default schema is the new created one and it comes with USAGE privilege on the schema and SELECT privilege on all tables within the schema
- Writer role is named : 'writer_[nom_client]'; his default schema is the new created one and it comes with the following privileges:
- USAGE privilege on the schema
- SELECT privilege on all tables within the schema
- USAGE privilege on primary key sequences if any
- INSERT and UPDATE privileges on all columns within the schema, except primary keys defined as single integer columns (technically speeking, primary ky with based on than 2 columns)
To call this procedure, use the following SQL command:
call common.drop_client('[nom_client]');
This is supposed to be the only way to delete a schema.
It performs the following actions:
- Check whether the schema actually exists
- Delete cascade the schema
- Delete related roles
- Delete related line in table common.client
Several EVENT TRIGGERS are created to perform some actions when changes occur in master schema. They mainly launch stored procedures. These triggers are listed hereafter.
This is performed using the standard CREATE TABLE SQL statement. It launches ddl_trigger_create_table_fct event trigger which performs the following actions:
- Create table with the same characteristics in each child schema (new tables actually INHERITS tables in master schema)
- Provide roles with privileges on the table according to roles definition
Caution: NEVER USE A CUSTOMER NAME OR A FUNCTIONAL KEYWORD IN TABLE NAMES
Modifying a table, always in master schema, is done with a standard SQL statement (ALTER TABLE); it will launch a trigger to cascade the modifiction. This trigger also launches several checks to ensure child schemas are still in line with master and, if necessary, perform corrective actions.
Here is the related checklist:
- Check names: all objects should have names corresponding to the defined structures. If this is not the case, it means that changes have been made locally, on one or several child schemas, which should not occur: this is corrected accordingly.
- Check primary keys: ensure all primary keys in the master are defined in each child. If some differences are found, they are corrected.
- Check foreign keys : same as primary keys
- Check sequences : idem
- Check roles : idem
Most of these controls are performed thanks to scripts which are described later on in this document.
Use the standard SQL statement : DROP TABLE [table_name] CASCADE
.
This action is irreversible
It launches ddl_trigger_drop_table_fct which will delete the table in each child schema.
There is one trigger for each action and each one propagates the action within children schemas as soon as the action occurs in the master schema:
- Creation and update are propagated thanks to build_if_has_to_idxs.sql script
- Deletion is propagated thanks to delete_indexes_on_cascade.sql script
These groups of scripts control the names of constraints and indexes and rename them if necessary.
3 scripts are dedicated to constraints renaming:
- constraint_naming_control_limiter.sql: counts the number of constraints to be renamed
- constraint_naming_control_first.sql: rename constraints
- constraint_naming_control.sql: rename constraints
2 scripts are dedicated to index renaming;
- index_naming_control_limiter.sql: counts the number of indexes to be renamed
- index_naming_control_first.sql: rename indexes
Each group works the same way and, for technical reason, led to recursive programs:
- A first loop references constraints / indexes to be renamed
- For each constraint / index to rename, build and launch related query
These scripts are autamtically launched when master schema objects are created or modified, to clean client schemas on a regular basis. They are launched before any other action is performed.
This sript is launched by ddl_trigger_create_table_fct event trigger to create, in each child shema, the table that has just been created in master schema.
This procedure creates primary keys in each table in children schemas if a primary key exists in the master schema that has not been deployed yet (it actually should never occur, but the procedure is defined to make the system more robust). Primary keys are named according to the following structure : 'pk_' || m.table_name || ' _ ' || c.schema_name || '
This procedure creates foreign keys in each table in children schemas if a foreign key exists in the master schema that has not been deployed yet.
The procedure build the following SQL command to create necessary foreign keys :
ALTER TABLE ||c.schema_name||.||m.table_name||
ADD CONSTRAINT fk_||m.table_name||_||m.ref_table_name||_||c.schema_name||
FOREIGN KEY (||m.ref_columns||) REFERENCES ||c.schema_name||.||m.ref_table_name||(||m.ref_columns||);
The first step is to delete all sequences named 'seq_pk_%' and that are not default values for any column within master schema anymore (this action is irreversible but is not supposed to have any impact). Then, for each table within master schema, create missing sequences on columns that meet the following conditions:
- Has a primary key and is the only column the key is defined on
- The data type is an integer (small int, int or big int)
- It has no sequence as default value yet
Then, for each table within master schema, rename sequences on primary keys when the current name is not: 'sq_pk_[nom_table]' Finally, provides writer roles with USAGE privileges on the sequences named 'seq_pk_%'.
This procedure is based on the standard PostgreSQL functions : pg_get_indexdef()
It creates all indexes that exist in master schema but not in children ones.
When a column is added or deleted in a table within master schema, the change is replicated within all children schemas. But the privileges are not automatically granted and this procedures will perform this action : UPDATE and INSERT privileges are granted to writers for each schema.
All columns are concerned, except those that are primary keys with a dedicated sequence.
This script is launched to initialize the database and will create the following tables in common schema:
- dwh_dm_client: list of children schemas
- purge_tool_conf: data purge configuration table (see 'Data purge' for details)
- vacuum_script_results: stores results of vacuum and analizes (see 'Vacuum and analyse' for details)
It also creates necessary sequences.
It happened that some foreign keys have not been correctly initialized and this script is built to reset them if necessary. All foreign keys will be droped, then the propagation script build_if_has_to_fks.sql is launched.
Launch the script with the following command: CALL common.data_purge();
This script will purge data according to a predefined configuration.
This configuration is defined in a table : common.purge_tool_conf and has the following columns:
- table_name: table(s) that should be purged
- column_name: column(s) that should be purged
- retentionInterval: as the name suggests, retention duiration
The script will first launch scripts to check data in the configuration table:
- check_table_presence.sql: checks if table exists
- check_column_presence.sql: checks if column exists in above table
- check_data_purge_retentionInterval.sql: checks the format of retentionInterval is a string corresponding to the structure : "[number]' '[day/month/year]"
Then, it will purge data in the tables and columns listed in the configuration table: for each child schema, it will remove data corresponding to a date earlier than today minus retentionIntervall.
It will raise an error if configuration table is empty.
Launch the script with the following command: SELECT * FROM common.sys_vacuum_diag('[schema_name]')
For a given schema, this script will provide with a summary of changes that need to be made; the information is displayed with the following structure:
- schema name
- table name
- whether an ANALYZE commande has been launched or not
- whether a VACUUM commande has been launched or not
- a comment that can be one of the followings (defined in French for the time being):
- VACUUM, ANALYZE, CREATE INDEX CONCURRENTLY, CREATE STATISTICS, ALTER TABLE. L''une de ces étapes est en cours.
- CREATE INDEX en cours
- CREATE COLLATION, CREATE TRIGGER, ALTER TABLE. L''une de ces étapes est en cours
- REFRESH MATERIALIZED VIEW CONCURRENTLY en cours
- DROP TABLE, TRUNCATE, REINDEX, CLUSTER, VACUUM FULL, and REFRESH MATERIALIZED VIEW (without CONCURRENTLY). L''une de ces étapes est en cours.
- Un VACUUM est déjà en cours sur cette table.
- Une requête occupe déjà cette table depuis : [query_start]
The script with launch VACUUM ANALYZE on a table with the following criteria:
- vacuum_running = false
- dead_percentage > 25
- nb_sec_from_last_vacuum IS NULL OR nb_sec_from_last_vacuum > 432 000 (5 jours)
- table_size > 100 M
The script with launch ANALYZE on a table with the following criteria:
- vacuum_running = still false
- nb_sec_from_last_analyze > 2 592 000 (30 days)
Prerequisite: execute the command CREATE EXTENSION pgstattuple;
Launch the script with the following command: SELECT * FROM common.sys_index_diag('[schema_name]')
For a given schema, the script with provide with the following information:
- schema name
- table name
- index name
- a status that can be 'HAS TO BE DELETED?' or 'REINDEXED'
The script will renew btree indexes in the given schema where fragmentation is higher than 30%.
The output is a list of indexes that could be deleted, which means meeting the following criteria:
- has never been used
- is not part of an expression
- is not unique
- is not used as a constraint
After index renewing, an ANALYZE command is launched to update statistics.
This script will be used to deploy features changes within the application and will be launched by the bash script: dwh_up_version_caller.bash
A dump of master structure is stored in common.versionning table, which has the following structure:
- a unique incremental ID
- ip address of user applying updates
- date of update
- SQL script used to generate schema (as string)
This logging system intends to follow updates on master schema. All changes are managed by the above described scripts and procedures and each one includes a call to this script to log related changes or exceptions if any.
The information is stored with the following details:
- ip sending the log
- date of execution
- launched query
- name of the procedure or script that launched the query
LIMITATION: it is not yet possible to store a transcription of the query that launch an event trigger (it would be necessary to create a C script). Thus, only queries launched within procedures and fonctions launched by event triggers are logged.
In each function, for each query, it stores the executed query. If the procedure fails, the error description is logged instead, including status 0.
To see logs, use the following query: SELECT * FROM common.deblog ORDER BY 2 DESC;
This control is used in serveral scripts so it has been automated.
To use these tools more efficiently, we strongly recommand you follow theses guidelines:
- Each table created in master schema should ave a primary key base on a single integer column. This will allow, between others, to use indexes with RANGE function (even if CLUSTERS based on dates are frequently more efficient for oldest data)
- If you need to have a foreign key based on another (other) column(s) than the primary key, you should define a unique index (not a unique constraint, that would not be deployed) on this (these) column(s).
- You could envisage to delete password column from common table. It could make sens, but take care that maintenance will be much more complex. On top of that, only an administrator is supposed to access to this table.
Automated unit tests are provided in unit_test schema and it's important to maintain them during new release development. To launch theses tests, use SQL procedure : global_test_script.sql
There are some limitations of the different tools, due to technical choices made in the original context; there are some development subjects for future releases.
- A unique constraint (by constraint or by index) cannot be inherited. Thus, if a constraint is defined in a table in master schema, it will not be automatically propagated. Of course, there are script to perform this, but the created index will be different for each table and it will not be possible, for instance, to ensure a cross-schema unicity.
- Inheritance for CHECK constraints are not taken into account yet
- Partitioning is not active yet and native paritioning is not possible with PostgreSQL 11 because it is not possible to create a partition for inherited tables
- An index MUST NOT have 'master' in its name (except of course to describe its schema)
You might want to have users with cross-schema roles, at least as reader, but it requires to have dedicated queries to access related data: each query will need to know the different schemas for each table to be requested. A way to do this is to envisage to create some kinds of groups: create a 'group_schema' that inherits from master, then child schemas that inherit from this group. But it will any require some additional development effort.
Purge mechanism could also be improved to have different configurations for each child schema, instead of one single configuration for every schema.