In read mode, request sincedb disk flush on each read loop iteration. #196

guyboertje · 2018-06-06T13:09:59Z

The actual write occurs at the sincedb_write interval.

Note: the chunk size vs the events-per-second affect how often a request is made., e.g. a large chunk size (1MB), lines of say 128 bytes and 256 events/s means each iteration takes 32 seconds, and an eps of 32768 takes 250 ms.
With the default chunk size of 32768 bytes and an average line size of 128 bytes a pipeline running at 256 events/s will iterate every 1 second - but with a 32768 event/s the loop takes 7.8125 ms or 128 noop calls to request_disk_flush every second.

Fixes #195

The actual write occurs at the `sincedb_write interval`. Note: the chunk size vs the events-per-second affect how often a request is made. e.g. a large chunk size (1MB), lines of say 128 bytes and 256 events/s means each iteration takes 32 seconds, and an eps of 32768 takes 250 ms.

jsvd · 2018-06-08T09:45:27Z

lib/filewatch/read_mode/handlers/read_file.rb

@@ -28,6 +28,9 @@ def handle_specifically(watched_file)
            watched_file.file_close
            # unset_watched_file will set sincedb_value.position to be watched_file.bytes_read
            sincedb_collection.unset_watched_file(watched_file)
+            # we know we're done, we should not wait until the discovery of a
+            # new file (or LS shutdown) to write out the sincedb
+            sincedb_collection.immediate_write


since we're now guaranteed to write the sincedb at exit, should we still keep this change of forcing sincedb writing at EOF?

Yes I think so.
If not then consider the case where one has say 5 small files and a default sincedb write interval of 15 seconds:

Each file has 10 read loops, 10 request_disk_flush were made and none are granted as all files are processed in less than 15 seconds.

The 11th loop is an immediate EOF.

No further write requests are made, only LS exit will force the write. This is a vulnerable situation, say in a docker container (with persistent disks) that is killed (because we don't have the read all files and exit feature yet).

The user is forced to understand enough to set a very low sincedb write interval

The discover/stat loop continues to discover and stat though .

Before I had a nagging feeling about changing the request_disk_flush into setting a flag to true before checking the interval. This flag is set to false when the file is written. Then adding a call in the iterate_on_state method to disk_flush_if_needed (if the flag is set and the interval is exceeded).
I could not see the logic before but now after these changes and the SDH I feel this is a better change.

WDYT?

I understand the scenario, but IMO it is an edge case that could be worked around by terminating logstash gracefully before destroying the docker container.

Thinking the other way around, what about the downside to this? Calling a sincedb write on every EOF can be slow in scenarios of multiple small files, creating a large sincedb that needs to be rewritten on each EOF.

Do think the tradeoff is worth it?

Well my alternate suggestion (the nagging feeling) would not call immediate_write at EOF.

How about I implement it and you can critique again?

jsvd · 2018-06-11T11:54:34Z

There seems to be a failing spec:

  1) LogStash::Inputs::File testing with new, register, run and stop when #run is called multiple times should only actually open files when content changes are detected
     Failure/Error: subject { described_class.new(conf) }
     
     LogStash::ConfigurationError:
       Something is wrong with your configuration.
     # /home/travis/build/logstash/logstash-core/lib/logstash/config/mixin.rb:86:in `config_init'
     # /home/travis/build/logstash/logstash-core/lib/logstash/inputs/base.rb:60:in `initialize'
     # ./spec/inputs/file_tail_spec.rb:283:in `block in subject'
     # ./spec/inputs/file_tail_spec.rb:301:in `block in (root)'

guyboertje · 2018-06-11T17:06:38Z

I don't know why that test failed. I re-ran it and it is OK now

jsvd

LGTM! 🚢 it

guyboertje self-assigned this Jun 6, 2018

guyboertje requested a review from jsvd June 6, 2018 13:22

guyboertje added bug needs reviewing labels Jun 6, 2018

Guy Boertje added 3 commits June 6, 2018 17:26

changelog and version bump

7999216

make spec a tad more reliable.

977222f

add a reference to watch so we can react to quit in the read loop

c3b9dc7

jsvd reviewed Jun 8, 2018

View reviewed changes

remove immediate write and add a call to write in the discover/stat loop

2d5818e

jsvd approved these changes Jun 11, 2018

View reviewed changes

guyboertje merged commit 323e669 into logstash-plugins:master Jun 12, 2018

guyboertje deleted the fix/195 branch June 13, 2018 10:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

In read mode, request sincedb disk flush on each read loop iteration. #196

In read mode, request sincedb disk flush on each read loop iteration. #196

guyboertje commented Jun 6, 2018 •

edited

Loading

jsvd Jun 8, 2018

guyboertje Jun 8, 2018

jsvd Jun 8, 2018

guyboertje Jun 8, 2018

jsvd commented Jun 11, 2018

guyboertje commented Jun 11, 2018

jsvd left a comment

In read mode, request sincedb disk flush on each read loop iteration. #196

In read mode, request sincedb disk flush on each read loop iteration. #196

Conversation

guyboertje commented Jun 6, 2018 • edited Loading

jsvd Jun 8, 2018

Choose a reason for hiding this comment

guyboertje Jun 8, 2018

Choose a reason for hiding this comment

jsvd Jun 8, 2018

Choose a reason for hiding this comment

guyboertje Jun 8, 2018

Choose a reason for hiding this comment

jsvd commented Jun 11, 2018

guyboertje commented Jun 11, 2018

jsvd left a comment

Choose a reason for hiding this comment

guyboertje commented Jun 6, 2018 •

edited

Loading