Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

sap_ha_pacemaker_cluster: Allow to append manual cluster config to the generated config #922

Open
wants to merge 1 commit into
base: dev
Choose a base branch
from

Conversation

rob0d
Copy link
Contributor

@rob0d rob0d commented Jan 3, 2025

@marcelmamula @ja9fuchs
This is based on the discussion we had several months ago with @ja9fuchs.
This role generally respects pre-defined ha_cluster* variables and tries to merge the configuration it generates with the content of the pre-defined variables. It works really well, however, in some cases the order of resources is important.
The new variables allow to append a specific cluster configuration AFTER the configuration generated by this role.
I've added information to the README file which explains the behaviour. It's a non-breaking change as it just enhances the functionality without changing anything else.

@marcelmamula
Copy link
Contributor

@rob0d
I think you need to provide example of how current code affects output, because appending post fact results in advanced mode, because you 100% trust that whoever uses it knows everything: dict format of ha_cluster variable.


- _Type:_ `list`

This role generally respects pre-defined ha_cluster* variables and tries to merge the configuration it generates with the content of the pre-defined variables. <br>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rob0d These sentences are bit on the nose, because it contains too many negatives as it tells me: "Hey, this role barely works"

It has to be reworked if PR is to be accepted.

@ja9fuchs
Copy link
Contributor

ja9fuchs commented Jan 7, 2025

This specific role has certain limits and will not be able to cover all custom cases.
The use case in question is of such specific nature, that it should rather be covered by making further advanced use of the ha_cluster role native parameters. If the order of resources is critical, then the source definition can be pre-defined including the resources which the role would add - in the desired order. The SAP HA role checks for duplicate resource IDs (names) and in the cases of presumably unique resources it also skips if the resource agent is already present in the resource primitives parameter. The user defined primitive content should always take precedence.

As @marcelmamula already mentioned, just appending the variables in this role is not good enough due to the vars having to follow the correct syntax. It is not safe to do this without quite some effort of syntax validation and such logic, to make sure that what will be fed into ha_cluster is sane input.

Please share more details of your particular use case, so we could assess if it would justify putting effort into this enhancement.

Note: The variables section in the README is generated from meta/argument_specs.yml and vars content manually added to the README file will be lost later if not added there accordingly. I will follow up on highlighting this. You can't know if we don't tell. :)

@rob0d
Copy link
Contributor Author

rob0d commented Jan 31, 2025

Hi @marcelmamula @ja9fuchs,

Sorry, I probably should have explained this a bit more/better.
The key thing here is that ha_cluster doesn't allow modification of cluster settings. It will always recreate the cluster from scratch. So for any non-standard behaviours or advanced functionality we need to add those to ha_cluster* or sap_ha_cluster* variables if we want to automate cluster deployment using Ansible.

The two typical non-default functions is to add are: Ethmonitor to monitor if the network interface has gone down (see) or checking the network connectivity using the ping agent (see). This is obviously just our usage case, other people may need other functionality.

The Ethmonitor is a must if the cluster uses multiple NICs (e.g. a separate heartbeat interface as generally recommended).
Everything is fine when this config is put in sap_ha_cluster* variables and we call sap_ha_pacemaker_cluster role which then merges the predefined config with our SAP config. The problem is that for e.g. ethmonitor we need define location constraints which will move ASCS if the network has failed. (Note: This situation does not result in cluster detecting the network failure if the cluster has multiple interfaces! The only solution according to Redhat is to use Ethmonitor resource agent.)
Unfortunately the order of resources and primitives definitions matter, so if we defined Ethmonitor first (with the location constraints) everything is fine until ha_cluster tries to create the cluster. This fails as the location constraint can't be defined as the resource we are referring to doesn't exist. This is not something ha_cluster or sap_ha_pacemaker_cluster can safely and correctly detect without some serious coding.
Therefore I am suggesting that we allow the *_append variables for the user to define additional config on top of what sap_ha_pacemaker_cluster generates.
I think that whether we allow user to predefine some config in sap_ha_cluster* / ha_cluster* variables or in the _append variables, it is advanced usage. In both cases the users can put stuff in the variables that will break the execution or create a non-functional cluster (regardless of checks that are performed). I would suggest that if user is using the _append variables, it's their responsibility to get it ultimately right. ha_cluster will perform checks and it's not necessarily responsibility of sap_ha_pacemaker_cluster to make sure the settings in these variables are correct.

An example of how I'm using it is below. To test the current behaviour you can just remove the _append and it should run (and then fail during cluster creation) with the current code.

        sap_ha_cluster_resource_primitives_append:
          - id: ethmonitor-ens192
            agent: ocf:heartbeat:ethmonitor
            instance_attrs:
              - attrs:
                  - name: interface
                    value: ens192
                  - name: link_status_only
                    value: 'true'
 
        sap_ha_cluster_resource_clones_append:
          - resource_id: ethmonitor-ens192
 
        sap_ha_cluster_constraints_location_append:
          - resource:
              id: "grp_{{ sap_system_sid }}_ASCS{{ sap_ha_pacemaker_cluster_nwas_abap_ascs_instance_nr }}"
            rule: "ethmonitor-ens192 ne 1"
            id: "location-grp_{{ sap_system_sid }}_ASCS{{ sap_ha_pacemaker_cluster_nwas_abap_ascs_instance_nr }}"
            options:
              - name: score
                value: -INFINITY
          - resource:
              id: "grp_{{ wdp_system_sid }}_WDP{{ sap_ha_pacemaker_cluster_wdp_instance_nr }}"
            rule: "ethmonitor-ens192 ne 1"
            id: "location-grp_{{ wdp_system_sid }}_W{{ sap_ha_pacemaker_cluster_wdp_instance_nr }}"
            options:
              - name: score
                value: -INFINITY

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants