NEWS

# Releases

## Sorting Hat 0.7 - (2018-10-02)

**NOTICE: Database schema generated by SortingHat < 0.7.0 is still
compatible but older versions can have problems inserting UTF-8
characters of 4 bytes.

Python 2.7 is no longer supported.

Please check "Compatibility between versions" section from README.md file.
**

** New features and improvements: **

 * Python 2.7 not longer supported

   As Python 2.x will not be maintained after 2020, SortingHat is only
   compatible with Python >= 3.4.

 * Low level API

   This API is able to execute basic operations over the database, such
   as adding or removing identities or finding entities. All these operations
   work within a session. Nothing is stored in the database until the
   session is closed. Thus, these functions can be considered as "bricks",
   that combined can create high-level functions.

 * Storage of UTF-8 4-bytes characters

   The default charset of UTF-8 (utf8) in MySQL/MariaDB does not support,
   even when they are part of the standard, 4-bytes long characters.
   This means characters like emojis or certain chinese characters cannot
   be inserted. Usually, identities names or usernames have these types of
   characters.

   The charset that fully supports UTF-8 is `utf8mb4` using the collation
   `utf8mb4_unicode_520_ci`. This collation implements the suggested Unicode
   Collation Algorithm (v5.2).

   Using `utf8mb4` also implies that the maximum size of char (VARCHAR and
   so on) columns is 191. Indexes cannot be larger than that when using
   InnoDB engine.

   Starting on 0.7 series, SortingHat is using this charset.

 * Handle disconnection using pessimistic mode

   SQLAlchemy offers a pessimistic mode to handle database disconnection.
   Setting `pool_pre_ping` parameter on the database engine will check if
   the database connection is still active when a session of the connection
   pool is reused. This causes a small hit in the performance but it's worth
   it.

 * Use a optimistic approach when inserting data

   With this optimistic approach, no more queries to check whether an entity
   exists on the database are run prior to its insertion.


## Sorting Hat 0.6 - (2018-03-05)

**NOTICE: Database schema generated by SortingHat < 0.6.0 are no longer
compatible. Please check "Compatibility between versions" section from
README.md file**

** New features and improvements: **

 * Gender.

   Unique identities gender can be set in the profile using the command
   `profile` and data will be stored in the table of the same name. This table
   adds two new fields: `gender`, a free text field to set the gender
   value, and `gender_acc`, to set the accuracy of the gender - in a range
   of 1 to 100 - when it is set using automatic options.

   The new command `autogender` has also been added. It assigns a gender
   to each unique identity using the name of the profile and the information
   provided by `http://genderize.io`. Possible values are *male* or *female*.

 * Option for reusing a database.

   An existing database can be reused when `init` command is called. So far,
   when the database was already created, this command raised an exception.

 * Version option.

   Calling `sortinghat` with the option `-v | --version` prints the version
   of `sortinghat` running on the system.

 * Tests improvements.

   Some minor changes were done in the testing area. The main ones were to
   support MariaDB engine and to use a remote testing database.


## Sorting Hat 0.5 - (2017-12-21)

**NOTICE: Database schema generated by SortingHat < 0.5.0 are no longer
compatible. Please check "Compatibility between versions" section from
README.md file**

** New features and improvements: **

 * Last modification.

   Unique identities and identities log the last time they were modified
   by adding, deleting, moving, merging, updating the profile, adding
   or removing enrollments operations.

   The new `search_last_modified_identities` API function allows to search
   for the UUIDs of those identities modified on or after a given date.

 * No strict matching option.

   This option allows to avoid a rigorous validation of values while
   matching identities, for instance, with well formed email addresses
   or names with first name and last name. This option is available on
   `load` and `unify` commands.

 * Reset option while loading.

   Before loading any data, if `reset` option is set, all the relationships
   between identities and their enrollments will be removed from the
   database.

 * GrimoireLab support.

   GrimoireLab identities and organizations YAML files can be converted
   to Sorting Hat JSON format using the script `grimoirelab2sh`.

** Bugs fixed: **

 * Fix tables created with invalid collation. In some random situations
   Sorting Hat tables appear with an invalid collation. This is related
   to a wrong generation of the DDL table statement by SQLAlchemy, which
   may randomly prepend the collation information (`MYSQL_COLLATE`) to
   the charset one (`MYSQL_CHARSET`), causing the former to be ignored.
   Changing `MYSQL_CHARSET` to `MYSQL_DEFAULT_CHARSET` fixed the problem.

 * Remove trailing whitespaces in exported JSON files. This error is only
   found in Python 2.7 due to a bug in the standard library with
   `json.dump()` and `indent` parameter. (#103)

 * Update profile information when loading identities. So far, profile
   information was set only the first time a unique identity was loaded.
   With this change, it will be updated always, except when the given
   profile is empty


## Sorting Hat 0.4 - (2017-07-17)

** New features and improvements: **

 * Mailmap and StackAlytics support.

   Mailmap and StackAlytics files can be converted to Sorting Hat JSON
   format using the new scripts `mailmap2sh` and `stackalytics2sh`.

 * Unify by sources.

   Giving a list of sources, this option allows to `unify` command to
   merge only those unique identities which belong to any of the given
   sources.

** Bugs fixed: **

 * Encoding error generating UUIDs in Python 3. Some special characters
   cannot be encoded in Python3. This caused function `uuid()` to fail
   when converting those characters. 'surrogateescape' handler was
   added to fix that problem.

 * Force `utf8_unicode_ci` collation on MySQL tables to fix integrity errors.
   MySQL considers chars like `β` and `b` or `ı` and `i` the same, when
   some collation values are set (i.e `utf8_general_ci`). This can raise
   integrity errors when Sorting Hat tries to add similar identities with
   these pairs of characters.

   For instance, if the identity:

       ('scm', 'βart', 'bart@example.com', 'bart)

   is stored in the database, the insertion of:

       ('scm', 'bart', 'bart@example.com', 'bart)

   will raise an error, even when these identities have different UUIDs.
   Forcing MySQL to use `utf8_unicode_ci` fixes this error, allowing
   to insert both identities.


## Sorting Hat 0.3 - (2017-03-21)

**NOTICE: UUIDs generated by SortingHat < 0.3.0 are no longer compatible.
Please check "Compatibility between versions" section from README.md file**

** New features and improvements: **

 * New algorithm to genere UUIDs.

   UUIDs were generated using case and accent sensitive values with the seed
   `(source:email:name:username)`. This means that for any identity with the
   same values in lower or upper case (i.e: `jsmith@example.com` and `JSMITH@example.com`)
   or with the same values accent or unaccent (i.e: `John Smith` or `Jöhn Smith`)
   would have different UUIDs for any of these combinations.

   The new algorithm changes upper to lower case characters and converts accent
   characters to their canonical form before the UUIDs is generated.

   This change is caused by the behaviour of MySQL with case configurations
   and accent and unaccent characters. MySQL considers those characters the same,
   raising `IntegrityError` exceptions when similar tuple values are inserted
   into the database. Generating the same UUID for these cases will prevent the
   error.

   Take into account that previous UUIDs are no longer compatible with this
   version of SortingHat. You should regenerate the UUIDs following the steps
   described in section *Compatibility between versions* from `README.md` file.

** Bugs fixed: **

 * Any non-empty value in email field was used during the affiliation. This
   caused some errors for non valid email addresses like with 'email@' cases,
   which raised a `IndexError` exception. This bug has been fixed using valid
   email addresses only during the affiliation.

 * Invalid database names were allowed in `init` command.


## Sorting Hat 0.2 - (2017-02-01)

** New features and improvements: **

 * Auto complete profile information with `autoprofile` command.

   This command autocompletes the profiles information related to a set of unique
   identities. To update the profile, the command uses a list of sources ordered
   by priority. Only those unique identities which have one or more identities
   from any of these sources will be updated. The name of the profile will be
   filled using the best name possible, normally the longest one.

 * GiHub identities matching method.

   This new method tries to find equal identities using those identities from
   GitHub sources. The identities must come from a source starting with a `github`
   label and the usernames must be equal.

** Bugs fixed: **

 * The parser for Gitdm files only accepted email addresses as valid aliases.
   This has been modified to accept any type of aliases. Thus, the input file
   passed to `gidm2sh` script will be a list of valid aliases instead of email
   aliases.