As we progress in this digital life concept, everyone tries to create contents plus they almost shoot everything moreover they spend a whole lot of time in editing and making that content watchable.
This raw content requires a lot of cleaning and tuning to make the final output easy to understand and contains highlights to regions of interest, which then can be posted on media sites like Youtube, Instagram, Twitter, etc.
So we provide a solution to automate the task by using various methods to analyze audio and video aspects of the raw video and generate a better and summarized output content, expected by any user.
Automated summarization of digital Video Sequences is accomplished using a vector rank filter. The output of the rank vector is determined by the minimum rank to be given to the input sequence. And the selection of the max ranking subset which is continuous and satisfies the minimum ranking.
Each frame in a Video Segment can be ranked according to its feature significance. Using all these features to generate a ranking vector for each such feature.
Applying filter on the final summation of all the ranked feature vectors to extract subsequences on the vector.
- Every video has some moments. Nobody wants to see an idle image as a video. So proposing a motion feature ranking. The amount of motion determines the rank for the FRAME in the sequence.
- The rank is set to 0 if the motion is below a certain threshold
- Determining the sharpness of the video FRAME, to rank the subsequence.
- If the sharpness is below a certain threshold ranking is set to 0.
- Ranking the video sequence based on the audio activity i.e. talking, sound, music. etc.
- A certain threshold will determine whether to rank the sequence or not
- Audio will be denoised using Wavelet Transform
- Ranking the video sequence based on the text detected in the video
- If text is detected rank gets added or else 0 is added
- The east model of the OpenCV will be used to detect the text in the video.
- Start
- Accepts video from the user
- Reads the video [All processes below are parallel]
- Processes the video stream for [Visual] :
- Motion ranking, no motion will be ranked 0
- Blur detection, blur detected ranked 0
- Processes the video stream for [Textual] :
- Text detection, high rank for text detected
- Processes the audio stream for [Auditory] :
- Audio de-noising with DWT & FWT
- Audio activity ranking
- Calculate the sum of all ranks
- Select slices satisfying min rank
- Make trims to video using the ranks time stamps
- End
- Automatic video editing for any video
- Security footage extraction of importance parts
- Tutoring video, with text detection and motion it can extract good amount
- In general video editing
- Audio de-noising of vlogging videos
For all docs visit torpido
For dev logs visit logs
- Install ffmpeg
$ sudo apt install ffmpeg
- Install all the dependencies
$ pip install -r requirements.txt
- Compile the cython files
$ python setup.py build_ext --inplace
- Download EAST model and add it to the path
$ wget https://www.dropbox.com/s/r2ingd0l3zt8hxs/frozen_east_text_detection.tar.gz?dl=1
$ tar -xvf frozen_east_text_detection.tar.gz
// set environment variable
$ sudo gedit /etc/environment
// add new var
EAST_MODEL="path_to_frozen_east_text_detection.pb"
// test the var
$ echo $EAST_MODEL
- Run the run.py using some video file
$ python run.py /example/sample.mp4
// or with ui
$ python3 start_up.py