diff --git a/README.md b/README.md index d632bac..94f2e83 100644 --- a/README.md +++ b/README.md @@ -11,7 +11,7 @@ This current chapter presents the scope of the content of this page: This chapter provides guidance for architecture, topology and design decisions that produce well-performing and scalable Workflow solutions. ## [Workflow container tuning and configuration](container.md) This chapter explains how to vertically and horizontally scale workflow containers and how to modify the configuration of a workflow container. -## [Workflow container tuning for very large workloads](https://github.com/icp4a/workflow-performance/blob/23.0.1/container.md#tune-for-high-workloads-above-large-pattern-size---draft) +## [Workflow container tuning for very large workloads](https://github.ibm.com/dba/workflow-performance-documentation/blob/main/container.md#tune-for-high-workloads-above-large-pattern-size) This chapter explains how to scale and tune for workloads larger than "Large" ## [Zen and Common Services tuning and configuration](zen-cs.md) This chapter discusses scaling and tuning options of Zen and Common Services components used with Workflow. diff --git a/architecture.md b/architecture.md index 74f3daa..3a71551 100644 --- a/architecture.md +++ b/architecture.md @@ -82,11 +82,11 @@ To improve overall system performance, it can be important to purge data from th ### Workflow authoring Workflow Authoring holds snapshots of Process Applications (PAs) and Toolkits as they are developed. In addition to every named snapshot you create, every time a save operation is done, an unnamed snapshot is created. As a result, a significant number of unnamed snapshots can accumulate over time, which can impact the performance of many operations. To remove named and unnamed snapshots from the system use the command: - + ### Workflow runtime -Workflow runtime holds named snapshots of process applications that have been deployed to it. To remove named snapshots from the Process Server use the command: -If completed process and task instances are not removed, they accumulate over time, which typically impact overall system performance, especially for task list queries like saved searches. To remove completed instances from use the command: +Workflow runtime holds named snapshots of process applications that have been deployed to it. To remove named snapshots from the Process Server use the command: +If completed process and task instances are not removed, they accumulate over time, which typically impact overall system performance, especially for task list queries like saved searches. To remove completed instances from use the command: ## Set an appropriate Java heap size to deliver optimal throughput and response time. Memory usage data that is obtained through the JVM’s verbose garbage collection option (verbosegc) helps determine the optimal settings. Further information is available at @@ -99,7 +99,7 @@ In non-federated Business Automation Workflow environments, you can optimize sav Correct tuning and deployment choices for databases can greatly increase overall system throughput. For more details, see [Database configuration, tuning, and best practices](database.md) ## Tune the bpd-queue-capacity and max-thread-pool-size parameters to achieve optimal throughput and scaling -To optimize throughput and scaling, start with a bpd-queue-capacity of 10 per physical processor core (for example, 40 for a 4-processor core configuration), with a maximum value of 80: +To optimize throughput and scaling, start with a bpd-queue-capacity of 10 per physical processor core (for example, 40 for a 4-processor core configuration), with a maximum value of 80: ## Disable tracing and logging Tracing and logging are important when debugging, but the resources to do so severely affects performance. diff --git a/container.md b/container.md index 8debab6..d48574d 100644 --- a/container.md +++ b/container.md @@ -1,10 +1,10 @@ -This chapter describes how the conatiners in the workflow namespaces can be configured and tuned. +This chapter describes how the containers in the workflow namespaces can be configured and tuned. ### Set the deployment profile size #### CP4BA pods -You can select a deployment profile (sc_deployment_profile_size) and enable it during or after the installation. IBM Cloud Pak® for Business Automation provides small, medium, and large deployment profiles. For more information about deployment profiles refer to https://www.ibm.com/docs/en/cloud-paks/cp-biz-automation/23.0.1?topic=pcmppd-system-requirements. This affects the size and/or the amount of replicas of Workflow pods: +You can select a deployment profile (sc_deployment_profile_size) and enable it during or after the installation. IBM Cloud Pak® for Business Automation provides small, medium, and large deployment profiles. For more information about deployment profiles refer to https://www.ibm.com/docs/en/cloud-paks/cp-biz-automation/24.0.1?topic=pcmppd-system-requirements. This affects the cpu, memory resources and/or the amount of replicas of Workflow pods: Edit the ICP4ACluster object: `oc edit ICP4ACluster -o yaml` @@ -23,24 +23,27 @@ Locate the `spec.size` property and set it to `small`, `medium` or `large` For Zen edit the CommonServices object: `oc edit ZenService -o yaml` -Locate the `spec.scale_config` property and set it to `small`, `medium` or `large` +Locate the `spec.scaleConfig` property and set it to `small`, `medium` or `large` -### Adapt resources and replica sizes for containers -You can scale the number of replicas for Workflow pods manually by editing the ICP4ACluster object: +### Adapt number of replicas for containers +Besides adjusting the deployment profile size, you can override the number of replicas for Workflow pods manually by editing the ICP4ACluster object: `oc edit ICP4ACluster -o yaml` Locate the `baw_configuration.replicas` property and set it to the number of replicas you want to scale to. -In addition you can increase or decrease the cpu and memory ressource requests and limits. -In general, we do not recommend to set the CPU limit too low, since kubernetes cpu throttling can start early. +#### WfPS pods +Since there is no concept of deployment profile size (S/M/L) for WfPS, pod replicas need to be scaled manually. This can be achieved by setting a positive integer as value for `spec.node.replicas` in the WfPS custom resource (Type: WfPSRuntime). + +### Adapt resources for containers +You can increase or decrease the cpu and memory resource requests and limits. +In general, we do not recommend to set the CPU limit too low, since kubernetes CPU throttling can start early. Edit the ICP4ACluster object: `oc edit ICP4ACluster -o yaml` Locate the `baw_configuration.resources` property and set its limit and request properties according your needs. - ### Modify the JVM configuration The jvm heap size should not modified using JVM properties, instead use the memory request and limit settings to increase the jvm heap size, since the jvm is using the container aware setting. @@ -68,21 +71,47 @@ For WfPS, edit the WfPSRuntime resource and add the connection pool size value h `spec.database.client.maxConnectionPoolSize` ### Modify Workflow Caches -Several cache settings that might benefit from larger values. An overview about caches can be found at https://www.ibm.com/docs/en/bpm/8.5.7?topic=servers-cache-cache-related-settings. Refer also to the Cache monitoring section at https://www.ibm.com/docs/en/bpm/8.5.7?topic=servers-using-process-instrumentation-data-cache-tuning. +Several cache settings that might benefit from larger values. An overview about caches can be found at https://www.ibm.com/docs/en/baw/24.x?topic=data-cache-cache-related-settings. Refer also to the Cache monitoring section at https://www.ibm.com/docs/en/baw/24.x?topic=data-using-process-instrumentation-cache-tuning. Most of the caches can be modified by editing the custom_xml file: -https://www.ibm.com/docs/en/cloud-paks/cp-biz-automation/23.0.1?topic=customizing-business-automation-workflow-properties +https://www.ibm.com/docs/en/cloud-paks/cp-biz-automation/24.0.1?topic=customizing-business-automation-workflow-properties +### Reduce or disable User-Group-Syncs -### Tune for High Workloads above Large pattern size - **draft** +Set the user-group-membership-sync-cache-expiration to -1 to disable User Group Sync. This will reduce lock wait times on LSW_LOCK -For very large workloads exceeding "Large" tuning might be required. This tuning depends on indiviual workloads. For thousands of concurrent users we applied the following tuning steps for Workflow Process Service. This is a sample: -#### WFPS resource +``` + + + + -1 + + + +``` +Edit the cluster object `oc edit ICP4ACluster` + +Modify the bastudio_configuration.bastudio_custom_xml object: +``` +bastudio_configuration: + bastudio_custom_xml: | + + + + -1 + + + +``` + +### Tune for High Workloads above Large pattern size -Increase WFPS resources +For very large workloads exceeding "Large" tuning is required. This tuning depends on indiviual workloads. For 7000 concurrent users (20s thinktime) and a throughput of 30+ human processes per second / 120 human tasks per second we applied the following tuning steps for Workflow Process Service. This is a sample: - * spec.database.managed.managementState=Unmanaged (this allows to modify the Postgres cluster resources) +#### WFPS resource + + * spec.database.managed.managementState=Unmanaged  * spec.node.replicas=8  * spec.node.resources.limits.cpu=6  * spec.node.resources.limits.memory=4Gi @@ -90,22 +119,17 @@ Increase WFPS resources   #### Cluster resource (Postgres DB) -Increase database ressources -  * spec.postgresql.parameters.max_connections=1000  * spec.postgresql.parameters.max_prepared_transactions=1000  * spec.resources.limits.cpu=32  * spec.resources.limits.memory=32Gi #### Postgres DB filesystem -Make sure the postgres filesystem resides on fast disks. +Make sure the postgres filesystem resides on fast disks #### Zen usermanagement pods - -Increase Zen resources required for user session validation. -``` oc scale deployment usermgmt --replicas=12 -``` + #### Disabling Notifications (optional) @@ -187,3 +211,6 @@ Edit configmap wfpsruntime-sample-liberty-dynamic-config: ``` + + + diff --git a/database.md b/database.md index a532391..f0b1f8a 100644 --- a/database.md +++ b/database.md @@ -19,6 +19,13 @@ Databases are designed for high availability, transactional processing, and reco As a result, database log files might be heavily used. More important, the log-writes hold commit operations pending, meaning that the application is synchronously waiting for the write to complete. Therefore, the performance of write access to the database log files is critical to overall system performance. For this reason, we suggest that database log files be placed on a fast disk subsystem with write-back cache. #### Place database log files on a separate device from the table space containers: A basic strategy for all database storage configurations is to place the database logs on dedicated physical disks, ideally on a dedicated disk adapter. This placement reduces disk access contention between I/O to the table space containers and I/O to the database logs and preserves the mostly sequential access pattern of the log stream. Such separation also improves recoverability when log archival is employed. +#### Enlarge the transaction log +Each database writes log files called "transaction log" to recored the changes on the database. These log files are used when the database needs to be recovered. +Under high load, the transaction log might be too small. +For DB2, increase the size of each log files (LOGFILSIZ) and the number of log files (LOGPRIMARY / LOGSECOND) +* db2 update db cfg for [database] using LOGFILSIZ [new value] +* db2 update db cfg for [database] using LOGPRIMARY [new value] +* db2 update db cfg for [database] using LOGSECOND [new value] IMMEDIATE ### Tuning queries #### Monitor top SQL statements Use the database vendor’s tools to discover expensive SQL statements, for example the SYSIBMADMIN.TOP_DYN_SQL view of DB2, or the automated workload repository (AWR) report of an Oracle database. Even in a perfectly tuned database, you can find a most expensive SQL query, even if this query needs no tuning at all. @@ -52,9 +59,17 @@ In non-federated Business Automation Workflow environments, you can optimize sav ## Tuning PostgreSQL in the context of WfPS ### Setting PostgreSQL database to unmanaged mode -Before customizing the PostgreSQL configuration with parameters that are not exposed through the WfPS custom resource (WfPSRuntime), it is necessary to set the database to unmanaged. This can be done by changing the value of `spec.database.managed.managementState` to `Unmanaged`. +Before customizing the PostgreSQL configuration with parameters that are not exposed through the WfPS custom resource (type: WfPSRuntime), it is necessary to set the database to unmanaged. This can be done by changing the value of `spec.database.managed.managementState` to `Unmanaged`. ### Setting the maximum allowed connections When encountering exception `FATAL: remaining connection slots are reserved for non-replication superuser connections` this may be an indicator that the server-side connections are depleted and the `max_connections` parameter (default typically 100) needs to be increased. For this, the PostgreSQL custom resource (type: Cluster, name: \-postgre) needs to be modified after setting the database mode to unmanaged. Add or modify the paramater at location `items[*].spec.postgresql.parameters.max_connections` in the Cluster resource. +### Adjusting CPU and memory resource settings +You can adjust the resource settings for the PostgreSQL container(s) by modifying the following values in the unmanaged PostgreSQL custom resource (type: Cluster, name: \-postgre). +`spec.resources.limits.cpu` +`spec.resources.limits.memory` +`spec.resources.requests.cpu` +`spec.resources.requests.memory` +### Adjusting max_prepared_transactions +When encountering exceptions like `org.postgresql.xa.PGXAException: Error preparing transaction.` this may be an indicator that the maximum number of prepared transactions is reached. To increase the value, modify the PostgreSQL custom resource (type: Cluster, name: \-postgre) and increase the value at location `items[*].spec.postgresql.parameters.max_prepared_transactions` (default: 100). # Oracle specific database tuning and troubleshooting ## Improve IBM BPM performance with an Oracle database diff --git a/monitoring.md b/monitoring.md index 99afbde..2189fb4 100644 --- a/monitoring.md +++ b/monitoring.md @@ -1,6 +1,6 @@ For monitoring the performance of a containerized Workflow system there a several tools and approaches available. Several sources of information are highly valuable, even necessary, when diagnosing and resolving performance problems. This information is often referred to as must-gather information. It includes the following items: -* Hardware ressource utilization +* Hardware resource utilization * Client (the users workstation) processor utilization and memory use * CP4BA container processor utilization, memory use and network utilization * Database server processor, disk subsystem, memory use and network utilization