You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/platform-deep-dive/plugins/plugins-setup-guide/reporting-setup.md
+99-54
Original file line number
Diff line number
Diff line change
@@ -1,57 +1,29 @@
1
1
# Reporting Setup Guide
2
2
3
-
The reporting plugin, available as a Docker image, relies on specific dependencies:
3
+
## Introduction
4
+
5
+
The Reporting Setup Guide assists in configuring the reporting plugin, relying on specific dependencies and configurations.
4
6
5
7
## Dependencies
6
8
7
-
-**PostgreSQL** instance dedicated to reporting data.
9
+
The reporting plugin, available as a Docker image, requires the following dependencies:
10
+
11
+
-**PostgreSQL**: Dedicated instance for reporting data storage.
8
12
-**Reporting-plugin Helm Chart**:
9
-
- Utilizes a Spark Application to extract data from the FLOWX.AI Engine database and populate the FLOWX.AI Reporting plugin database.
13
+
- Utilizes a Spark Application to extract data from the FLOWX.AI Engine database and populate the Reporting plugin database.
14
+
- Utilizes Spark Operator (more info [**here**](https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/blob/master/docs/quick-start-guide.md)).
10
15
-**Superset**:
11
16
- Requires a dedicated PostgreSQL database for its operation.
12
-
- Needs a [Redis](https://redis.io/) instance for efficient caching.
13
-
- Utilizes an ingress to expose its user interface.
14
-
15
-
### Postgres Database Configuration
16
-
17
-
#### Basic Postgres Setup:
17
+
- Utilizes Redis for efficient caching.
18
+
- Exposes its user interface via an ingress.
18
19
19
-
```yaml
20
-
postgresql:
21
-
enabled: true
22
-
postgresqlUsername: {{userName}}
23
-
postgresqlPassword: ""
24
-
postgresqlDatabase: "reporting"
25
-
existingSecret: {{scretName}}
26
-
persistence:
27
-
enabled: true
28
-
storageClass: standard-rwo
29
-
size: 5Gi
30
-
resources:
31
-
limits:
32
-
cpu: 1000m
33
-
memory: 1024Mi
34
-
requests:
35
-
memory: 256Mi
36
-
cpu: 100m
37
-
metrics:
38
-
enabled: true
39
-
serviceMonitor:
40
-
enabled: false
41
-
prometheusRule:
42
-
enabled: false
43
-
primary:
44
-
nodeSelector:
45
-
preemptible: "false"
46
-
47
-
```
48
-
### Reporting Plugin Helm Chart Configuration
20
+
## Reporting Plugin Helm Chart Configuration
49
21
50
-
For your configuration you will need a SparkApplication which is a Kubernetes custom resource provided by the Spark Operator, which manages the execution and lifecycle of Apache Spark applications on Kubernetes clusters. It's a higher-level abstraction that encapsulates the specifications and configurations needed to run Spark jobs on Kubernetes.
22
+
Configuring the reporting plugin involves several steps:
4. Update the `reporting-image` URL in the `spark-app.yml` file.
76
47
77
-
5. Configure the correct database ENV variables in the `spark-app.yml` file.
48
+
5. Configure the correct database ENV variables in the `spark-app.yml` file (check them in the above examples with/without webhook).
78
49
79
50
6. Deploy the application:
80
51
81
52
```bash
82
53
kubectl apply -f operator/spark-app.yaml
83
54
```
84
55
85
-
#### Without webhook
56
+
##Spark Operator Deployment Options
86
57
87
-
:::caution
88
-
When opting for Spark Operator deployment without a webhook, leveraging envVars is recommended. This involves managing secrets, which can either be securely mounted or provided in cleartext within the configuration. This approach ensures flexibility in handling sensitive information while maintaining security measures throughout the deployment process.
89
-
:::
58
+
### Without webhook
59
+
60
+
For deployments without a webhook, manage secrets and environmental variables for security:
90
61
91
62
```yaml
92
63
sparkApplication: #Defines the Spark application configuration.
93
64
enabled: "true"#Indicates that the Spark application is enabled for deployment.
94
-
scheduler: "@every 5m"#A cronJob that should run at every 5 minutes.
65
+
schedule: "@every 5m"#A cronJob that should run at every 5 minutes.
95
66
driver: # This section configures the driver component of the Spark application.
96
67
envVars: #Environment variables for driver setup.
97
68
ENGINE_DATABASE_USER: flowx
@@ -119,16 +90,15 @@ sparkApplication: #Defines the Spark application configuration.
119
90
Note: Passwords are currently set as plain strings, which is not secure practice in a production environment.
120
91
:::
121
92
122
-
#### With webhook
93
+
### With webhook
94
+
95
+
When using the webhook, employ environmental variables with secrets for a balanced security approach:
123
96
124
-
:::caution
125
-
When deploying the Spark Operator with a webhook, it's recommended to employ environmental variables (env) along with environmental variables sourced from Secrets. These Secrets could be securely mounted or provided within the configuration file, ensuring a balance between convenience and security in handling sensitive information during the deployment process.
126
-
:::
127
97
128
98
```yaml
129
99
sparkApplication:
130
100
enabled: "true"
131
-
scheduler: "@every 5m"
101
+
schedule: "@every 5m"
132
102
driver:
133
103
env: #Environment variables for driver setup with secrets.
134
104
ENGINE_DATABASE_USER: flowx
@@ -164,6 +134,81 @@ sparkApplication:
164
134
In Kubernetes-based Spark deployments managed by the Spark Operator, you can define the sparkApplication configuration to customize the behavior, resources, and environment for both the driver and executor components of Spark jobs. The driver section allows fine-tuning of parameters specifically pertinent to the driver part of the Spark application.
165
135
:::
166
136
137
+
Below are the configurable values within the chart values.yml file (with webhook):
0 commit comments