Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Expose a single timeout setting in CRDs #1045

Merged
merged 1 commit into from
Oct 7, 2024

Conversation

pavolloffay
Copy link
Collaborator

@pavolloffay pavolloffay commented Oct 3, 2024

This PR adds a single timeout setting defaulting to 30s (aligning with upstream grafana).
The timeout is both for read and write because gateway/route/oauth-proxy unity it as well.

tempo:
  server:
    # for reading the body - ingest
    http_server_read_timeout: 
    # for writing the request - query
    http_server_write_timeout: 

@codecov-commenter
Copy link

codecov-commenter commented Oct 3, 2024

Codecov Report

Attention: Patch coverage is 97.67442% with 1 line in your changes missing coverage. Please review.

Project coverage is 69.20%. Comparing base (e2ae4c0) to head (e3f388f).

Files with missing lines Patch % Lines
internal/manifests/monolithic/build.go 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1045      +/-   ##
==========================================
+ Coverage   69.07%   69.20%   +0.12%     
==========================================
  Files         110      110              
  Lines        7017     7049      +32     
==========================================
+ Hits         4847     4878      +31     
- Misses       1880     1881       +1     
  Partials      290      290              
Flag Coverage Δ
unittests 69.20% <97.67%> (+0.12%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@pavolloffay pavolloffay marked this pull request as ready for review October 3, 2024 16:17
@IshwarKanse
Copy link
Contributor

IshwarKanse commented Oct 4, 2024

@pavolloffay The oauth proxy image needs to be updated.

Install Minio:

oc create -f << EOF -
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  labels:
    app.kubernetes.io/name: minio
  name: minio
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 2Gi
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: minio
spec:
  selector:
    matchLabels:
      app.kubernetes.io/name: minio
  strategy:
    type: Recreate
  template:
    metadata:
      labels:
        app.kubernetes.io/name: minio
    spec:
      containers:
        - command:
            - /bin/sh
            - -c
            - |
              mkdir -p /storage/tempo && \
              minio server /storage
          env:
            - name: MINIO_ACCESS_KEY
              value: tempo
            - name: MINIO_SECRET_KEY
              value: supersecret
          image: quay.io/minio/minio:latest
          name: minio
          ports:
            - containerPort: 9000
          volumeMounts:
            - mountPath: /storage
              name: storage
      volumes:
        - name: storage
          persistentVolumeClaim:
            claimName: minio
---
apiVersion: v1
kind: Service
metadata:
  name: minio
spec:
  ports:
    - port: 9000
      protocol: TCP
      targetPort: 9000
  selector:
    app.kubernetes.io/name: minio
  type: ClusterIP
EOF

Create TempoStack with timeout set.

oc apply -f << EOF -
apiVersion: v1
kind: Secret
metadata:
   name: minio-test
stringData:
  endpoint: http://minio:9000
  bucket: tempo
  access_key_id: tempo
  access_key_secret: supersecret
type: Opaque

---
apiVersion: tempo.grafana.com/v1alpha1
kind: TempoStack
metadata:
  name: simplest
spec:
  timeout: 3m
  storage:
    secret:
      name: minio-test
      type: s3
  storageSize: 200M
  template:
    queryFrontend:
      jaegerQuery:
        enabled: true
        ingress:
          type: route
          host: example.com
          annotations:
            example_annotation: example_value
EOF

Check the status of pods:

% oc get pods
NAME                                             READY   STATUS             RESTARTS      AGE
minio-8665d4876d-n4kn2                           1/1     Running            0             5m13s
tempo-simplest-compactor-56fd54cdd8-dlq9s        1/1     Running            0             3m8s
tempo-simplest-distributor-6cd7bdf7f6-f8ng7      1/1     Running            0             3m8s
tempo-simplest-ingester-0                        1/1     Running            0             3m8s
tempo-simplest-querier-85dd5f5d69-rrbx6          1/1     Running            0             3m8s
tempo-simplest-query-frontend-7f54d56556-plktp   3/4     CrashLoopBackOff   5 (10s ago)   3m8s

The oauth proxy image used is quay.io/openshift/origin-oauth-proxy:4.12 which doesn't have the --upstream-timeout flag.

% oc logs tempo-simplest-query-frontend-7f54d56556-plktp -c oauth-proxy
flag provided but not defined: -upstream-timeout
Usage of oauth2_proxy:
  -approval-prompt string
    	OAuth approval_prompt (default "force")
  -authenticated-emails-file string
    	authenticate against emails via file (one per line)
  -authentication-kubeconfig string

Timeout is also not set on the route:

% oc get route tempo-simplest-query-frontend -o yaml
apiVersion: route.openshift.io/v1
kind: Route
metadata:
  annotations:
    example_annotation: example_value
  creationTimestamp: "2024-10-04T06:17:18Z"
  labels:
    app.kubernetes.io/component: query-frontend
    app.kubernetes.io/instance: simplest
    app.kubernetes.io/managed-by: tempo-operator
    app.kubernetes.io/name: tempo
  name: tempo-simplest-query-frontend
  namespace: test
  ownerReferences:
  - apiVersion: tempo.grafana.com/v1alpha1
    blockOwnerDeletion: true
    controller: true
    kind: TempoStack
    name: simplest
    uid: 59bc3fcf-42f0-4d36-b1d8-aaaaec9dbaed
  resourceVersion: "105031"
  uid: b5e9ab0d-1c73-4d6a-b136-cd0b0b554617
spec:
  host: example.com
  port:
    targetPort: oauth-proxy
  tls:
    termination: reencrypt
  to:
    kind: Service
    name: tempo-simplest-query-frontend
    weight: 100
  wildcardPolicy: None
status:
  ingress:
  - conditions:
    - lastTransitionTime: "2024-10-04T06:17:18Z"
      status: "True"
      type: Admitted
    host: example.com
    routerCanonicalHostname: router-default.apps.ikanse-29.qe.gcp.devcluster.openshift.com
    routerName: default
    wildcardPolicy: None


@pavolloffay
Copy link
Collaborator Author

updated

@IshwarKanse
Copy link
Contributor

@pavolloffay With the latest change the haproxy annotation is added to the route but the custom annotations are not added now.

    queryFrontend:
      jaegerQuery:
        enabled: true
        ingress:
          type: route
          host: example.com
          annotations:
            example_annotation: example_value
apiVersion: route.openshift.io/v1
kind: Route
metadata:
  annotations:
    haproxy.router.openshift.io/timeout: 70s
  creationTimestamp: "2024-10-04T09:48:35Z"
  labels:
    app.kubernetes.io/component: query-frontend
    app.kubernetes.io/instance: simplest
    app.kubernetes.io/managed-by: tempo-operator
    app.kubernetes.io/name: tempo
  name: tempo-simplest-query-frontend
  namespace: test
  ownerReferences:
  - apiVersion: tempo.grafana.com/v1alpha1
    blockOwnerDeletion: true
    controller: true
    kind: TempoStack
    name: simplest
    uid: cf9fd1ed-3fb9-49ed-8fe0-12e6015cd22e
  resourceVersion: "129125"
  uid: c60dc031-3dcb-4063-991b-8b28919d28e2
spec:
  host: example.com
  port:
    targetPort: oauth-proxy
  tls:
    termination: reencrypt
  to:
    kind: Service
    name: tempo-simplest-query-frontend
    weight: 100
  wildcardPolicy: None
status:
  ingress:
  - conditions:
    - lastTransitionTime: "2024-10-04T09:48:35Z"
      status: "True"
      type: Admitted
    host: example.com
    routerCanonicalHostname: router-default.apps.ikanse-30.qe.gcp.devcluster.openshift.com
    routerName: default
    wildcardPolicy: None

component: tempostack, tempomonolithic

# A brief description of the change. Surround your text with quotes ("") if it needs to start with a backtick (`).
note: Add unified timeout configuration. It changes the default to 30s.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking out loud, should we have a single timeout or different timeouts for read and write path?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please read the PR description. It has some explanation on this

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I jumped directly to the code diff 😬
Ok, makes sense.

@@ -248,6 +248,7 @@ func deployment(params manifestutils.Params, rbacCfgHash string, tenantsCfgHash
fmt.Sprintf("--web.internal.listen=0.0.0.0:%d", manifestutils.GatewayPortInternalHTTPServer), // serves health checks
fmt.Sprintf("--traces.write.otlpgrpc.endpoint=%s:%d", naming.ServiceFqdn(tempo.Namespace, tempo.Name, manifestutils.DistributorComponentName), manifestutils.PortOtlpGrpcServer), // Tempo Distributor gRPC upstream
fmt.Sprintf("--traces.write.otlphttp.endpoint=%s://%s:%d", httpScheme(params.CtrlConfig.Gates.HTTPEncryption), naming.ServiceFqdn(tempo.Namespace, tempo.Name, manifestutils.DistributorComponentName), manifestutils.PortOtlpHttp), // Tempo Distributor HTTP upstream
fmt.Sprintf("--traces.write-timeout=%s", params.Tempo.Spec.Timeout.Duration.String()),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a setting for the read path? The gateway also proxies the Tempo API (i.e. TraceQL API)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

@pavolloffay pavolloffay mentioned this pull request Oct 4, 2024
6 tasks
@IshwarKanse
Copy link
Contributor

@pavolloffay For monolithic mode. The http_server_* config is not set.

oc extract cm/tempo-mono-route-config
tempo-query.yaml
tempo.yaml
ikanse@ikanse-mac tmp % cat *
address: 127.0.0.1:7777
backend: 127.0.0.1:3200
tenant_header_key: x-scope-orgid
services_query_duration: 72h0m0s
server:
  http_listen_port: 3200
internal_server:
  enable: true
  http_listen_address: 0.0.0.0
storage:
  trace:
    backend: local
    wal:
      path: /var/tempo/wal
    local:
      path: /var/tempo/blocks
distributor:
  receivers:
    otlp:
      protocols:
        grpc: {}
        http: {}
usage_report:
  reporting_enabled: false

@pavolloffay
Copy link
Collaborator Author

thx @IshwarKanse, fixed in the last commit

@IshwarKanse
Copy link
Contributor

@pavolloffay I verified the PR manually using TempoStack and TempoMonolithic instances with/without multitenancy enabled. We are good to merge it. We are currently blocked on the e2e tests due to https://issues.redhat.com/browse/TRACING-4703 after its fixed, I'll update the tests to add more assertions to check the timeout.

Signed-off-by: Pavol Loffay <p.loffay@gmail.com>
@pavolloffay pavolloffay merged commit c7eb01b into grafana:main Oct 7, 2024
11 checks passed
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants