Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Discuss: Event Assets #2

Open
wants to merge 1 commit into
base: discussions/event-assets/base
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
101 changes: 101 additions & 0 deletions docs/event/assets.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,101 @@
---
sidebar_position: 4
---

# Assets

Assets are files associated with an event.
Each asset has some asset-metadata that is stored in the DB, while the file itself is stored in the file system and is referenced by the database.

Every asset has this metadata attached to it:
- `id: ID`: unique identifier among all assets of all events. Assigned by Opencast and unchangable.
- `flavor: NonBlankAsciiString` <sup>(1?)</sup>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be its own datatype. flavor has a specific syntax. I guess [0-9a-Z+-]/[0-9a-Z+-] is fine, but maybe some additional characters are used.

- `tags: string[]` <sup>(1?)</sup>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UTF-8 is probably not required.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But are there good reasons for non-UTF8 strings? Because for anything that's not binary data, I would really like to enforce UTF-8, see https://utf8everywhere.org/

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant that tags can be even further restricted to ASCII or so.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see.

- `properties: Map<Label, string>`: a `Label` to string map for custom properties. Values can be arbitrary strings.<sup>(7?)</sup>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For simplicity, IMO string is enough. Applications can always convert.

- `mimeType: NonBlankAsciiString`: a *lowercase* `NonBlankAsciiString` representing the MIME-type of the asset.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MIME should be its own type.

- `size: uint64`: size of the file in bytes. This is always the actual file size and cannot be changed manually.
- `checksum`: Checksum/hash of the file. Consists of `type` (e.g. `md5`, `sha256`) and the hex encoded value. In the API (and maybe in the database?) this should be serialized as `<type>:<value>`, e.g. `sha256:e3b0c44298f...`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to allow multiple checksums? So an array?

- `source: bool`: whether this was directly uploaded by a user. `false` if it was processed or generated by Opencast.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this?

- `updated: Timestamp`: timestamp of when anything about this asset was last changed. <sup>(2?)</sup>
- `internal: bool`: internal assets are not exposed like other assets in APIs. They are used to store source or intermediate artifacts, like the originally uploaded video. These cannot be queried or read by users with only `read` access. <sup>(3?)</sup>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How should this actually be implemented on the disk? My webserver doesn't know what files are internal and not and I don't want it to ask Opencast for it. I'm not convinced that we don't need a separate location for storing delivery files. In that case we don't need this property.


Additionally, in the API representation, assets have the following fields:
- `uri: string`: a URI to the asset, i.e. where it can be downloaded.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Differentiate between delivery and (internal) asset store. This can be scaled / architected differently.



## Tracks

A track is an attachment with time-component and "lives on a timeline", i.e. representing something that spans the video length.
A cut operation on the video needs to modify all tracks.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you want that as a requirement. Storing delivery files and recutting you don't want to cut old delivery tracks.


Each track has one or more streams.
For example, an `mp4` track might have a video and audio stream, while a `vtt` track only has a single text track.

- `tracks: Track[]`
- `isLive: bool` <sup>(5?)</sup>
- `isMaster: bool`: <sup>TODO: describe what exactly this means</sup>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This field is for HLS master playlist tracks. I would rename this to isManifest. MPEG DASH is calling this Media Presentation Description (MPD). But the general term is usually manifest nowadays. This is the main file the player uses to get all the information for playing the media.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isMainManifest is probably better. For HLS you also have variant manifests.

- `duration: Milliseconds`

Non-`internal` tracks have to have the following properties:
- The `duration` of the track has to match the `duration` of the event, i.e. all non-internal durations are the same.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about videos that have been cut? Wouldn't there be one source track in the full length of the recording and one track for the cut video? The duration of the event would be longer than the track available to end users.

There could also be the situation where there is a scheduled recording with e.g. startTme 14:00, endTime 16:00. The event finishes early and they manually stop the recording on the recording device at, say, 15:47. The recorded track would be shorter than the duration of the event.

- There is only one video stream<sup>?</sup>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can go the CMAF route and only allow elementary streams for delivery. On the other hand, I'm always leaning to letting adopters do what they want with the media.



### Streams

A stream has the following properties:
- `language: LangCode?`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

average bitrate?

- `type: "video" | "audio" | "text"`: type of the track. Depending on the type, there are additional properties:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Allow "data" streams?

- `"video"`
- `resolution: [uint32, uint32]`
- `framerate: "dynamic" | float32`<sup>?</sup>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

average framerate, lowest common framerate (ffprobe is reporting this, not sure how to actually call this).

- `codec: "H264" | "H265" | "VP8" | "VP9" | "AV1" | ...`<sup>?</sup>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bit depth, sar, field order, color (model, chroma sub-sampling, range, space, primaries, transfer)

- `"audio"`
- `codec: "AAC" | "MP3" | "Opus" | ...`<sup>?</sup>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bit depth, channels, channel layout, sampling rate.

- `"text"`:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why no codec? This can be a general stream field.

- `kind: string`: the kind of data this text track represents.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, this is called type. Probably as generic as kind. So if we rename, find something better? disposition as called by FFmpeg?

- `"subtitle"`: subtitles (does *not* contain speaker names, sounds, and the like)
- `"caption"`: closed captions (*does* contain speaker names, sounds, or the like)
- `"chapters"`: chapter markers with title per chapter
- `"in-video-text"`<sup>(4?)</sup>: representing text in the video (e.g. slide text)
Comment on lines +56 to +60
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FFmpeg has these dispositions (not only for subtitle tracks).

- `origin: "manual" | "generated" | null`: describes how this track was created, `manual` meaning it was human crafted, while `generated` means it was automatically generated.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, this is called generator-type. Why rename?

- `generator: string?`: non-empty name of the generating software (should only be set if `origin: "generated"`). Example: `"whisper"`.

## Attachments

Attachments are assets without "time-component", unlike tracks.
They have the following additional metadata:
- `language: LangCode?`

There are some built-in attachments that have special meanings and requirements, each identified by the flavor:

- `oc/thumbnail`: Thumbnail for the event, e.g. a preview image.
- `mimeType` must be `image/*`
- `properties` must include `w` and `h`, both holding numbers describing the width and height of the image.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like the shorthand form. Why not width and hight?

- `?/timeline-preview`: a sprite image, holding a raster of smaller images, all extracted from the video at regular intervals.
- `mimeType` must be `image/*`
- `properties` must include:
- `imageCountX`, `imageCountY`: how many smaller images are in each row/column
- `imageSizeX`, `imageSizeY`: size of each smaller image in pixels
- TODO: in the future, this might be better encoded as actual video file
- `?/segment-preview`<sup>(6?)</sup>: image that is a preview for a segment of the video.
- `mimeType` must be `image/*`
- `properties` must include `startTime: Milliseconds`, denoting when the segment starts.

TODO: generally make clear what properties are "user changable" and which are automatically set by OC, derived from files.

---

:::danger[Open questions]

- (1?) Is it OK to require ASCII-only for tags and flavors?
- (2?) Do we really need the `updated` field?
- (3?) What permissions should be required to read internal assets? Is `write` access to the event enough? And/or should a special role be required?
- (4?) Better name for this? `on-screen-text`? `text`? `video-text`?
- (5?) Is `isLive` per track really the correct model? Should this be attached to the event instead? Like, how would a Tobira or LMS decide whether to display an event as live or not?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, video systems backed by Opencast should use flavors / tags to select elements to play.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@KatrinIhler wrote:

definitely don't mark each track as live, but event as a whole

- (6?) For timeline and segment previews, it is a bit unclear how to deal with dual stream videos. Right now, Opencast only generates these previews for one video (presentation) by default, I think? Is it useful to have previews for both? Then apps/the player need to support that.
- If we want to potentially show both, then the current `presenter/*` can stay.
- If we just want to have one preview per video, then it should be `oc/*`, as otherwise external apps have to arbitrarily chose one.
Comment on lines +96 to +98
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again I tend to let people do what they want. IMO one preview image is enough, but maybe not for everyone.

- (7?) Do we want to allow values other than `string` in properties? Also compare `extraMetadata`

:::