Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

feat: store localnet logs between runs #4830

Merged
merged 3 commits into from
Jan 27, 2025

Conversation

mur-me
Copy link
Collaborator

@mur-me mur-me commented Jan 14, 2025

What was done:

  • change rm and clean commands to remove only subfolder for tmp_log folder- justification - tmp_log folder needed for the log aggregator, prevent removal on the go when log stack is running
  • fix promtail config to point it to the right place
  • add named volume to the docker-compose to store Loki indexed between localnet runs
  • echance debug-start-log - it checking that localnet is running, exits
    if not
  • add debug-delete-volume-log to make, it removes loki named volume,
    warns about sudo rm for the loki files

UPD by 24 Jan:

  • fixed loki config path
  • tweak loki limits config

Issue

Test

Main use case:

  1. You started localnet via make debug-ext or similiar
  2. You started log stack - debug-start-log
  3. Open grafana and search against this run, e.g. {filename=~"/var/log/log-20250116-195803/.*"}
  4. You stopped localnet run
  5. You started again with make clean debug-ext
  6. Check the fresh run with e.g. {filename=~"/var/log/log-20250116-205803/.*"}
  7. Recheck that old run is still here

Tested commands

  • make debug-start-log
$ make debug-start-log 
bash ./test/logs_aggregator/start_log_aggregator.sh
working in /home/user/.gvm/pkgsets/go1.22.5/global/src/github.com/harmony-one/harmony/test/logs_aggregator
/home/user/.gvm/pkgsets/go1.22.5/global/src/github.com/harmony-one/harmony/test/logs_aggregator
starting docker compose for log aggregation
[+] Running 4/4
 ✔ Volume "logs_aggregator_loki_data"  Created                                                                                                  0.0s 
 ✔ Container loki                      Started                                                                                                  1.4s 
 ✔ Container grafana                   Started                                                                                                  1.4s 
 ✔ Container promtail                  Started                                                                                                  1.3s 
Whole list of log folders
/home/user/.gvm/pkgsets/go1.22.5/global/src/github.com/harmony-one/harmony/tmp_log/log-20250116-195803
Opening Grafana
  • make debug-delete-volume-log
$ make debug-delete-volume-log 
docker volume rm logs_aggregator_loki_data
logs_aggregator_loki_data
[WARN] - it needs sudo to remove folder created with loki docker image user
sudo rm -rf test/logs_aggregator/loki
[sudo] password for user: 

Unit Test Coverage

Before:

<!-- copy/paste 'go test -cover' result in the directory you made change -->

After:

<!-- copy/paste 'go test -cover' result in the directory you made change -->

Test/Run Logs

Operational Checklist

  1. Does this PR introduce backward-incompatible changes to the on-disk data structure and/or the over-the-wire protocol?. (If no, skip to question 8.)

    YES|NO

  2. Describe the migration plan.. For each flag epoch, describe what changes take place at the flag epoch, the anticipated interactions between upgraded/non-upgraded nodes, and any special operational considerations for the migration.

  3. Describe how the plan was tested.

  4. How much minimum baking period after the last flag epoch should we allow on Pangaea before promotion onto mainnet?

  5. What are the planned flag epoch numbers and their ETAs on Pangaea?

  6. What are the planned flag epoch numbers and their ETAs on mainnet?

    Note that this must be enough to cover baking period on Pangaea.

  7. What should node operators know about this planned change?

  8. Does this PR introduce backward-incompatible changes NOT related to on-disk data structure and/or over-the-wire protocol? (If no, continue to question 11.)

    YES|NO

  9. Does the existing node.sh continue to work with this change?

  10. What should node operators know about this change?

  11. Does this PR introduce significant changes to the operational requirements of the node software, such as >20% increase in CPU, memory, and/or disk usage?

TODO

@mur-me mur-me force-pushed the feature/store_localnet_logs_between_runs branch from 1f5ee4d to 7ceeb50 Compare January 16, 2025 16:53
@mur-me mur-me requested a review from sophoah January 16, 2025 17:14
@sophoah
Copy link
Contributor

sophoah commented Jan 22, 2025

2025-01-22 16:10:22 level=warn ts=2025-01-22T09:10:22.112218886Z caller=client.go:419 component=client host=loki:3100 msg="error sending batch, will retry" status=429 tenant= error="server returned HTTP status 429 Too Many Requests (429): Ingestion rate limit exceeded for user fake (limit: 4194304 bytes/sec) while attempting to ingest '3375' lines totaling '1048446' bytes, reduce log volume or contact your Loki administrator to see if the limit can be increased"

@mur-me
Copy link
Collaborator Author

mur-me commented Jan 22, 2025

2025-01-22 16:10:22 level=warn ts=2025-01-22T09:10:22.112218886Z caller=client.go:419 component=client host=loki:3100 msg="error sending batch, will retry" status=429 tenant= error="server returned HTTP status 429 Too Many Requests (429): Ingestion rate limit exceeded for user fake (limit: 4194304 bytes/sec) while attempting to ingest '3375' lines totaling '1048446' bytes, reduce log volume or contact your Loki administrator to see if the limit can be increased"

Hey @sophoah, how did you face this?

@sophoah
Copy link
Contributor

sophoah commented Jan 23, 2025

just start the localnet, it doesn't happen all the time i tried

@mur-me mur-me force-pushed the feature/store_localnet_logs_between_runs branch from 7ceeb50 to 28773b9 Compare January 24, 2025 15:31
Details:
- change rm and clean commands to remove only subfolder for tmp_log folder- justification - tmp_log folder needed for the log aggregator, prevent removal on the go when log stack is running
- fix promtail config to point it to the right place
- add named volume to the docker-compose to store Loki indexed between localnet runs
- echance debug-start-log - it checking that localnet is running, exits
  if not
- add debug-delete-volume-log to  make, it removes loki named volume,
  warns about sudo rm for the loki files
@mur-me mur-me force-pushed the feature/store_localnet_logs_between_runs branch from 28773b9 to 2cda0ae Compare January 24, 2025 15:34
@mur-me
Copy link
Collaborator Author

mur-me commented Jan 24, 2025

just start the localnet, it doesn't happen all the time i tried

Fixed in 2cda0ae commit

Proof - 150 MB/700K lines of logs ingested in a meter of 1-2 minutes and no more 429 errors in the logs:
image

@sophoah sophoah merged commit f7bec84 into dev Jan 27, 2025
4 checks passed
@mur-me mur-me deleted the feature/store_localnet_logs_between_runs branch January 27, 2025 14:25
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants