Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Potential partial read of .m4s DASH segments. #386

Closed
ofnothing opened this issue Apr 27, 2018 · 4 comments
Closed

Potential partial read of .m4s DASH segments. #386

ofnothing opened this issue Apr 27, 2018 · 4 comments
Assignees
Labels
status: archived Archived and locked; will not be updated type: bug Something isn't working correctly
Milestone

Comments

@ofnothing
Copy link

ofnothing commented Apr 27, 2018

System info

Centos7
packager version v2.0.3-ef93a1d-release

Issue and steps to reproduce the problem

If the packager is used to produce a live DASH stream, there is a chance that a process reading the dash files could read a partial dash .m4s segment.

I used strace to examine how packager writes output to files. For my test case, packager wrote out the segment files in two calls to write(). This causes a small race condition where it is possible that a process reading the .m4s file could receive a partial file. See the strace log below for details.

As soon as the segment files are finished, I would like to move them to location that can be accessed from either a local web server or a service like Amazon S3. However, currently It is difficult to tell when the video segment files are actually complete.

Looking at the strace log, it appears that that steps have been taken to prevent a partial reads of the .mpd file by writing out the .mpd file to a temp file, then renaming it. If the same logic was applied to the .m4s files, I think that would solve the issue.

The init_segment seems to have the same issue as well.

Packager Command:
packager in=udp://127.0.0.1:3001?interface=127.0.0.1&reuse=1,stream=video,init_segment=/tmp/realtime_staging_video/16/video_init.mp4,segment_template=/tmp/realtime_staging_video/16/video_segment_$Time$.m4s --mpd_output /tmp/realtime_staging_video/16/manifest.mpd --time_shift_buffer_depth 600 --segment_duration 2

test video command:
gst-launch-0.10 videotestsrc ! queue ! x264enc ! mux. mpegtsmux name=mux ! udpsink port=3001 host=127.0.0.1

strace command:
sudo strace -f -p <pid> -e trace=!recvfrom,futex,madvise

strace output:
[pid 17491] open("/tmp/realtime_staging_video/16/video_segment_6392520000.m4s", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 4
[pid 17491] stat("/tmp/realtime_staging_video/16/video_segment_6392520000.m4s", {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
[pid 17496] <... restart_syscall resumed> ) = 0
[pid 17496] fstat(4, <unfinished ...>
[pid 17491] gettid( <unfinished ...>
[pid 17496] <... fstat resumed> {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
[pid 17491] <... gettid resumed> ) = 17491
[pid 17496] mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f7bfe18e000
[pid 17496] write(4, "\0\0\0$stypisom\0\0\0\0iso8mp41dashavc1"..., 372736) = 372736
[pid 17491] stat("/", {st_mode=S_IFDIR|0555, st_size=261, ...}) = 0
[pid 17491] stat("/tmp", {st_mode=S_IFDIR|S_ISVTX|0777, st_size=4096, ...}) = 0
[pid 17491] stat("/tmp", {st_mode=S_IFDIR|0755, st_size=36, ...}) = 0
[pid 17491] stat("/tmp/realtime_staging_video", {st_mode=S_IFDIR|0755, st_size=46, ...}) = 0
[pid 17491] stat("/tmp/realtime_staging_video/16", {st_mode=S_IFDIR|0755, st_size=32768, ...}) = 0
[pid 17491] open("/tmp/realtime_staging_video/16/packager-tempfile-4451-4453-2ec96b9d5e851b", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 5
[pid 17491] stat("/tmp/realtime_staging_video/16/packager-tempfile-4451-4453-2ec96b9d5e851b", {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
[pid 17493] fstat(5, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
[pid 17493] mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f7bfe18d000
[pid 17491] write(5, "<?xml version="1.0" encoding="UT"..., 1163) = 1163
[pid 17491] close(5) = 0

[pid 17491] munmap(0x7f7bfe18d000, 4096) = 0
[pid 17491] rename("/tmp/realtime_staging_video/16/packager-tempfile-4451-4453-2ec96b9d5e851b", "/tmp/realtime_staging_video/16/manifest.mpd") = 0
[pid 17491] write(4, "Gs;7\217\217\337C\222\230NM\211v\371+I\26\306\v\362\351\224r\241<k\270^\270p}"..., 2691) = 2691
[pid 17491] close(4) = 0

@kqyang
Copy link
Contributor

kqyang commented Apr 27, 2018

@ofnothing Thanks for bringing this up.

Yes, it may take multiple writes for the segment files to finish writing, but the partial files are not visible to the player, because the manifest is only updated after the segment is completed.

As soon as the segment files are finished, I would like to move them to location that can be accessed from either a local web server or a service like Amazon S3. However, currently It is difficult to tell when the video segment files are actually complete.

I suspect you are seeing problems because an updated manifest is moved/uploaded before the updated segment file is moved/uploaded.

There is a trick to make sure the manifest always references completed files by following the below procedure:
(1) Backup manifests
(2) Sync/move/upload all files
(3) Move/upload the backup manifests

Let us know if it works.

We will looking into supporting atomic writing for media segments but we probably won't implement it as it does not really help. The trick described above will still be needed to avoid manifests referencing a not-exist segment due to the race conditions in moving/uploading files.

@kqyang kqyang added status: waiting on response Waiting on a response from the reporter(s) of the issue and removed needs triage labels Apr 27, 2018
@ofnothing
Copy link
Author

Yes, it may take multiple writes for the segment files to finish writing, but the partial files are not visible to the player, because the manifest is only updated after the segment is completed.

That is what I initially thought, but I was concerned after looking at strace output that that was not actually the case. The strace output seems to indicate that segment file is written to after the manifest file is created. More experimentation with larger segments appears to show that it is written to exactly once after only a very small delay after the manifest file is created.

Thinking about the timing, I am guessing that this may only be a theoretical problem when following the procedure you outlined above.

I have reproduced the relevant lines from the strace output with line numbers below to better illustrate what I am seeing.

  • On line 1, the segment is opened with with fid=4.
  • On line 2, the segment is written to for the first time. For large segments, there would be multiple writes here.
  • On line 3 through 6, the manifest file is written.
  • On line 7, fid=4 is written to. This is the segment file again.
  • On line 8, fid=4 is closed. Again, this is the manifest file.
  1. [pid 17491] open("/tmp/realtime_staging_video/16/video_segment_6392520000.m4s", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 4
  2. [pid 17496] write(4, "\0\0\0$stypisom\0\0\0\0iso8mp41dashavc1"..., 372736) = 372736
  3. [pid 17491] open("/tmp/realtime_staging_video/16/packager-tempfile-4451-4453-2ec96b9d5e851b", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 5
  4. [pid 17491] write(5, "<?xml version="1.0" encoding="UT"..., 1163) = 1163
  5. [pid 17491] close(5) = 0
  6. [pid 17491] rename("/tmp/realtime_staging_video/16/packager-tempfile-4451-4453-2ec96b9d5e851b", "/tmp/realtime_staging_video/16/manifest.mpd") = 0
  7. [pid 17491] write(4, "Gs;7\217\217\337C\222\230NM\211v\371+I\26\306\v\362\351\224r\241<k\270^\270p}"..., 2691) = 2691
  8. [pid 17491] close(4) = 0

@shaka-bot shaka-bot added needs triage and removed status: waiting on response Waiting on a response from the reporter(s) of the issue labels Apr 27, 2018
@kqyang
Copy link
Contributor

kqyang commented Apr 27, 2018

I see. Thanks for the detail trace! I think it is caused by our file caching: https://github.com/google/shaka-packager/blob/master/packager/file/io_cache.h#L19. We should flush the cache to make sure it is written before the manifest is updated.

@kqyang kqyang added type: bug Something isn't working correctly and removed needs triage labels Apr 27, 2018
@kqyang kqyang added this to the v2.1 milestone Apr 27, 2018
@kqyang kqyang self-assigned this Apr 30, 2018
@kqyang
Copy link
Contributor

kqyang commented May 3, 2018

@ofnothing This should be fixed. Thanks again for reporting the problem.

@shaka-bot shaka-bot added the status: archived Archived and locked; will not be updated label Jul 2, 2018
@shaka-project shaka-project locked and limited conversation to collaborators Jul 2, 2018
# for free to subscribe to this conversation on GitHub. Already have an account? #.
Labels
status: archived Archived and locked; will not be updated type: bug Something isn't working correctly
Projects
None yet
Development

No branches or pull requests

3 participants