-
-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
support timestamp for numbers. #986
Conversation
Hello, How do I contribute my changes? Basically this aligment works well for numbers. However, I found in some cases, two subtitles are displayed at the same time. I managed to find a simple fix and it works well. To fix this, I added 0.03s to the beginning of sentence and decresed 0.02s from the end of the sentence. for sdx, (sstart, send) in enumerate(segment["sentence_spans"]):
curr_chars = char_segments_arr.loc[(char_segments_arr.index >= sstart) & (char_segments_arr.index <= send)]
char_segments_arr.loc[(char_segments_arr.index >= sstart) & (char_segments_arr.index <= send), "sentence-idx"] = sdx
sentence_text = text[sstart:send]
sentence_start = curr_chars["start"].min() + 0.03 # fix
end_chars = curr_chars[curr_chars["char"] != ' ']
sentence_end = end_chars["end"].max() - 0.02 # fix
sentence_words = []
for word_idx in curr_chars["word-idx"].unique():
word_chars = curr_chars.loc[curr_chars["word-idx"] == word_idx]
word_text = "".join(word_chars["char"].tolist()).strip()
if len(word_text) == 0:
continue
# dont use space character for alignment
word_chars = word_chars[word_chars["char"] != " "]
word_start = word_chars["start"].min()
word_end = word_chars["end"].max()
word_score = round(word_chars["score"].mean(), 3)
# -1 indicates unalignable
word_segment = {"word": word_text}
if not np.isnan(word_start):
word_segment["start"] = word_start + 0.03 if not sentence_words else word_start # fix
if not np.isnan(word_end):
word_segment["end"] = word_end - 0.02 if word_idx == len(curr_chars["word-idx"].unique()) - 1 else word_end # fix
if not np.isnan(word_score):
word_segment["score"] = word_score |
great work let me test this today thanks @bfs18 |
@bfs18 could you change the docstrings and comments to English please? |
Hi @Barabazs I've already made the changes. |
updated get_trellis and backtrack to support align numbers.
--BhThOY2Ug_2.mp3
The output is
Words with numerical elements are now accompanied by timestamps.