Skip to content

Commit d3e2877

Browse files
authored
Merge pull request #109 from flowx-ai/DOC-272-reporting-setup-guide-new-updates
Reporting setup guide updates
2 parents f6bb508 + 8b4392e commit d3e2877

File tree

1 file changed

+99
-54
lines changed

1 file changed

+99
-54
lines changed

docs/platform-deep-dive/plugins/plugins-setup-guide/reporting-setup.md

+99-54
Original file line numberDiff line numberDiff line change
@@ -1,57 +1,29 @@
11
# Reporting Setup Guide
22

3-
The reporting plugin, available as a Docker image, relies on specific dependencies:
3+
## Introduction
4+
5+
The Reporting Setup Guide assists in configuring the reporting plugin, relying on specific dependencies and configurations.
46

57
## Dependencies
68

7-
- **PostgreSQL** instance dedicated to reporting data.
9+
The reporting plugin, available as a Docker image, requires the following dependencies:
10+
11+
- **PostgreSQL**: Dedicated instance for reporting data storage.
812
- **Reporting-plugin Helm Chart**:
9-
- Utilizes a Spark Application to extract data from the FLOWX.AI Engine database and populate the FLOWX.AI Reporting plugin database.
13+
- Utilizes a Spark Application to extract data from the FLOWX.AI Engine database and populate the Reporting plugin database.
14+
- Utilizes Spark Operator (more info [**here**](https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/blob/master/docs/quick-start-guide.md)).
1015
- **Superset**:
1116
- Requires a dedicated PostgreSQL database for its operation.
12-
- Needs a [Redis](https://redis.io/) instance for efficient caching.
13-
- Utilizes an ingress to expose its user interface.
14-
15-
### Postgres Database Configuration
16-
17-
#### Basic Postgres Setup:
17+
- Utilizes Redis for efficient caching.
18+
- Exposes its user interface via an ingress.
1819

19-
```yaml
20-
postgresql:
21-
enabled: true
22-
postgresqlUsername: {{userName}}
23-
postgresqlPassword: ""
24-
postgresqlDatabase: "reporting"
25-
existingSecret: {{scretName}}
26-
persistence:
27-
enabled: true
28-
storageClass: standard-rwo
29-
size: 5Gi
30-
resources:
31-
limits:
32-
cpu: 1000m
33-
memory: 1024Mi
34-
requests:
35-
memory: 256Mi
36-
cpu: 100m
37-
metrics:
38-
enabled: true
39-
serviceMonitor:
40-
enabled: false
41-
prometheusRule:
42-
enabled: false
43-
primary:
44-
nodeSelector:
45-
preemptible: "false"
46-
47-
```
48-
### Reporting Plugin Helm Chart Configuration
20+
## Reporting Plugin Helm Chart Configuration
4921

50-
For your configuration you will need a SparkApplication which is a Kubernetes custom resource provided by the Spark Operator, which manages the execution and lifecycle of Apache Spark applications on Kubernetes clusters. It's a higher-level abstraction that encapsulates the specifications and configurations needed to run Spark jobs on Kubernetes.
22+
Configuring the reporting plugin involves several steps:
5123

52-
#### To install Spark Operator
24+
### Installation of Spark Operator
5325

54-
1. Install kube operator using Helm:
26+
1. Install the Spark Operator using Helm:
5527

5628
```bash
5729
helm install local-spark-release spark-operator/spark-operator \
@@ -65,7 +37,6 @@ helm install local-spark-release spark-operator/spark-operator \
6537
```bash
6638
kubectl apply -f spark-rbac.yaml
6739
```
68-
6940
3. Build the reporting image:
7041

7142
```bash
@@ -74,24 +45,24 @@ docker build ...
7445

7546
4. Update the `reporting-image` URL in the `spark-app.yml` file.
7647

77-
5. Configure the correct database ENV variables in the `spark-app.yml` file.
48+
5. Configure the correct database ENV variables in the `spark-app.yml` file (check them in the above examples with/without webhook).
7849

7950
6. Deploy the application:
8051

8152
```bash
8253
kubectl apply -f operator/spark-app.yaml
8354
```
8455

85-
#### Without webhook
56+
## Spark Operator Deployment Options
8657

87-
:::caution
88-
When opting for Spark Operator deployment without a webhook, leveraging envVars is recommended. This involves managing secrets, which can either be securely mounted or provided in cleartext within the configuration. This approach ensures flexibility in handling sensitive information while maintaining security measures throughout the deployment process.
89-
:::
58+
### Without webhook
59+
60+
For deployments without a webhook, manage secrets and environmental variables for security:
9061

9162
```yaml
9263
sparkApplication: #Defines the Spark application configuration.
9364
enabled: "true" #Indicates that the Spark application is enabled for deployment.
94-
scheduler: "@every 5m" #A cronJob that should run at every 5 minutes.
65+
schedule: "@every 5m" #A cronJob that should run at every 5 minutes.
9566
driver: # This section configures the driver component of the Spark application.
9667
envVars: #Environment variables for driver setup.
9768
ENGINE_DATABASE_USER: flowx
@@ -119,16 +90,15 @@ sparkApplication: #Defines the Spark application configuration.
11990
Note: Passwords are currently set as plain strings, which is not secure practice in a production environment.
12091
:::
12192
122-
#### With webhook
93+
### With webhook
94+
95+
When using the webhook, employ environmental variables with secrets for a balanced security approach:
12396
124-
:::caution
125-
When deploying the Spark Operator with a webhook, it's recommended to employ environmental variables (env) along with environmental variables sourced from Secrets. These Secrets could be securely mounted or provided within the configuration file, ensuring a balance between convenience and security in handling sensitive information during the deployment process.
126-
:::
12797
12898
```yaml
12999
sparkApplication:
130100
enabled: "true"
131-
scheduler: "@every 5m"
101+
schedule: "@every 5m"
132102
driver:
133103
env: #Environment variables for driver setup with secrets.
134104
ENGINE_DATABASE_USER: flowx
@@ -164,6 +134,81 @@ sparkApplication:
164134
In Kubernetes-based Spark deployments managed by the Spark Operator, you can define the sparkApplication configuration to customize the behavior, resources, and environment for both the driver and executor components of Spark jobs. The driver section allows fine-tuning of parameters specifically pertinent to the driver part of the Spark application.
165135
:::
166136
137+
Below are the configurable values within the chart values.yml file (with webhook):
138+
139+
```yml
140+
apiVersion: "sparkoperator.k8s.io/v1beta2"
141+
kind: ScheduledSparkApplication
142+
metadata:
143+
name: reporting-plugin-spark-app
144+
namespace: dev
145+
labels:
146+
app.kubernetes.io/component: reporting
147+
app.kubernetes.io/instance: reporting-plugin
148+
app.kubernetes.io/managed-by: Helm
149+
app.kubernetes.io/name: reporting-plugin
150+
app.kubernetes.io/release: 0.0.1-FLOWXRELEASE
151+
app.kubernetes.io/version: 0.0.1-FLOWXVERSION
152+
helm.sh/chart: reporting-plugin-0.1.1-PR-9-4-20231122153650-e
153+
spec:
154+
schedule: '@every 5m'
155+
concurrencyPolicy: Forbid
156+
template:
157+
type: Python
158+
pythonVersion: "3"
159+
mode: cluster
160+
image: eu.gcr.io/prj-cicd-d-flowxai-jx-6401/reporting-plugin:0.1.1-PR-9-4-20231122153650-eb6c
161+
imagePullPolicy: IfNotPresent
162+
mainApplicationFile: local:///opt/spark/work-dir/main.py
163+
sparkVersion: "3.1.1"
164+
restartPolicy:
165+
type: Never
166+
onFailureRetries: 0
167+
onFailureRetryInterval: 10
168+
onSubmissionFailureRetries: 5
169+
onSubmissionFailureRetryInterval: 20
170+
driver:
171+
cores: 1
172+
coreLimit: 1200m
173+
memory: 512m
174+
labels:
175+
version: 3.1.1
176+
serviceAccount: spark
177+
env:
178+
ENGINE_DATABASE_USER: flowx
179+
ENGINE_DATABASE_URL: postgresql:5432
180+
ENGINE_DATABASE_NAME: process_engine
181+
ENGINE_DATABASE_TYPE: postgres # To set the type of engine database, can be also changed to oracle
182+
REPORTING_DATABASE_USER: flowx
183+
REPORTING_DATABASE_URL: postgresql:5432
184+
REPORTING_DATABASE_NAME: reporting
185+
ENGINE_DATABASE_PASSWORD: "password"
186+
REPORTING_DATABASE_PASSWORD: "password"
187+
extraEnvVarsMultipleSecretsCustomKeys:
188+
- name: postgresql-generic
189+
secrets: #Secrets retrieved from a generic source.
190+
ENGINE_DATABASE_PASSWORD: postgresql-password
191+
REPORTING_DATABASE_PASSWORD: postgresql-password
192+
executor:
193+
cores: 1
194+
instances: 3
195+
memory: 512m
196+
labels:
197+
version: 3.1.1
198+
env: #Environment variables for executor setup with secrets.
199+
ENGINE_DATABASE_USER: flowx
200+
ENGINE_DATABASE_URL: postgresql:5432
201+
ENGINE_DATABASE_NAME: process_engine
202+
ENGINE_DATABASE_TYPE: postgres # To set the type of engine database, can be also changed to oracle
203+
REPORTING_DATABASE_USER: flowx
204+
REPORTING_DATABASE_URL: postgresql:5432
205+
REPORTING_DATABASE_NAME: reporting
206+
extraEnvVarsMultipleSecretsCustomKeys:
207+
- name: postgresql-generic
208+
secrets: #Secrets retrieved from a generic source.
209+
ENGINE_DATABASE_PASSWORD: postgresql-password
210+
REPORTING_DATABASE_PASSWORD: postgresql-password
211+
```
167212
168213
### Superset Configuration
169214

0 commit comments

Comments
 (0)