Jump to content

How to: Synchronize two audio tracks for muxing with audacity


Baal

Recommended Posts

The purpose of this tutorial is to answer the question "how do I find the delay?" when you want to mux english audio to a subtitled release, but don't know where to begin.
 
What you need:
 - Audacity (http://audacity.sourceforge.net) with FFmpeg import/export library installed.
 - MKVToolNix (Link)
 - Headphones or speakers.
 
To begin you need to open Audacity and load japanese audio first and then english audio.
 
Image 01:
p6Dx2fY.jpg
 
In this tutorial I'm syncing audio for the first episode of "A Certain Magical Index". With green color I marked japanese audio track (2 channel AAC file). English audio is marked with yellow, its 6 channel audio but I closed other 5 channels because it looks cleaner and I don't need them. Red rectangle on the bottom is where you write/read selection length.
 
To find the delay we need to look for any sounds that are not speech, because japanese and english speech won't overlap and it will look differently in the waveform display*. If you are not sure about your selected sound then just listen to it by selecting a portion of audio and hit spacebar.
 
Image 02:
LrUncEU.jpg
 
When you have found your background sound make a selection between its beginning on japanese and english track. Don't worry about precision.
In this case its 1035 ms. So I must delete 1035 ms from the beginning of the english track. There are multiple ways to do it, but I select the english audio, hit "Home" to go to 00:00:00, type 1035 ms as the length of the selection and delete it with "Del".
 
Image 03:
7rOtU3b.jpg
 
In this image you can see that I found different sound (it can be the same), zoomed in more and selected the difference one more time. It shows us that I deleted 8 ms too much, so I press CTRL+Z to undo the deletion and remove only 1027ms.
 
Image 04:
eyYHd2n.jpg
 
And this is the result, difference between two sounds less than 1ms.
The delay for this english audio track is "-1027ms". FIN.
 
Things that can go wrong:
There is a chance that the audio won't sync if you only delete a portion at the beginning. It might require syncing in a few places like: after the OP, before/after the commercials break, before the ED. In such case there is no other choice than syncing it in audacity (or other app) and exporting.
To make sure that your audio is synced you should simply check the ending in audacity.
 
*The most common visual representation of audio is its waveform display, which is a graph of amplitude (loudness) over time.

Edited by Baal
  • Like 5
Link to comment
Share on other sites

Doing things visually can lead to errors though. No amount of practice can compensate for one entire sense. It's always better to listen to both tracks playing together, removing/adding silence until you can tell no difference between which is playing.

You might want to include how to add the delay to mkvmerge, too. No doubt people will want their hands held for that part too.

Link to comment
Share on other sites

Doing things visually can lead to errors though. No amount of practice can compensate for one entire sense. It's always better to listen to both tracks playing together, removing/adding silence until you can tell no difference between which is playing.

You might want to include how to add the delay to mkvmerge, too. No doubt people will want their hands held for that part too.

Yea that's how I was doing it when I did my own blue exorcist movie mux. But great tutorial nonetheless. Does anyone know the exact time frame of viz and funi logo opening?

Link to comment
Share on other sites

Doing things visually can lead to errors though. No amount of practice can compensate for one entire sense. It's always better to listen to both tracks playing together, removing/adding silence until you can tell no difference between which is playing.

 

Since I don't want to argue about it I just removed that suggestion from the tutorial.

 

You might want to include how to add the delay to mkvmerge, too. No doubt people will want their hands held for that part too.

 

I think I will just create tutorial about remuxing single audio releases with english audio. It will include creating signs/songs tracks in Aegisub and chapters.

Link to comment
Share on other sites

  • 3 weeks later...

Alright, so Im trying to sync these 2 tracks - 

 

The original difference between them is 1059ms. Im tying up the OP as it has a definitive start/end point as well as then only having to have one timed subtitle track (kara) as you should. 

f75e8a5710.png


 

Set the difference in mkvmerge (Nouages), applying it to the English track...

d2458b4d39.png


 

 

And here is where it gets funny, open the new "timed" file, and the English track now overshoots the Japanese by 29ms? Why? Does mkvmerge and Audacity interpret times differently? 

d0f1ca80c7.png


 

 


Things I have tried:


• The closest I got was then subtracting the last difference (29ms) from the original difference (1059), giving me a new time of 1030. Which when checked in Audacity gave me an out of sync time of 3ms this time around. Closer at least.


 


• Using the Time Shift Tool to drag the english track and manually time it with the Jap. Both are in perfect sync when played back. Delete the Japanese track, export the Eng (sticking with options that match that tracks properties), open the file, import the new "timed" English track and its out of sync by 25ms -_-


Link to comment
Share on other sites

Believe it or not, audio files also have frames. It's length depends on number of samples and sampling frequency... bla bla (technical stuff), anyway at 48kHz 1 frame it is 32ms and you can cut your audio files every 32ms. Now, in mkvmerge negative delays dosn't exist, yep. In order to apply negative delay mmg does this:


 


1059/32 = 33,09375


34*32 = 1088


1088-1059 = 29


 


1. Remove from the begginning 1088ms (34 "frames")


2. Delay that audio by 29ms to get desired 1059ms


 


Now, my question is for what reason are you demuxing it after sync'ing?


Link to comment
Share on other sites

Believe it or not, audio files also have frames. It's length depends on number of samples and sampling frequency... bla bla (technical stuff), anyway at 48kHz 1 frame it is 32ms and you can cut your audio files every 32ms.

That is interesting. Would You mind providing source of this information.

I am asking, because in software I use, 1 frame is ~13.3 ms or 1/75 s. Some sources suggest 1 frame is 1 sample from each channel, so it is getting confusing.

Link to comment
Share on other sites

 

Believe it or not, audio files also have frames. It's length depends on number of samples and sampling frequency... bla bla (technical stuff), anyway at 48kHz 1 frame it is 32ms and you can cut your audio files every 32ms.

That is interesting. Would You mind providing source of this information.

I am asking, because in software I use, 1 frame is ~13.3 ms or 1/75 s. Some sources suggest 1 frame is 1 sample from each channel, so it is getting confusing.

 

I don't remember where I read about it long ago but here:

http://www.labdv.com/learning/dv_basics/cd_audio_dd_explained-en.html?full-window

You can read that:

 

 

Dolby Laboratories' AC-3 format is the standard generally used on DVD's and can contain up to 6 discrete channels of sound. This format uses audio compression (like MP3) to reduce the data stream size. An AC-3 data stream is composed of a series of fixed-size, independent frames, each representing 1536 PCM samples (each decoded frame produces 1536 samples of uncompressed PCM audio). Therefore, depending on the sampling rate of the encoded audio stream, each frame represents a different running length. A 48kHz AC-3 frame, for example, contains 1536/48000=0.032 seconds of sound.

 

Because I noticed that MK has problem with Index II audio that he got from DVD I knew it was AC3, so I'm sorry for my generalization in previous post.

 

For those that are interested and because AAC is also common in releases 1 frame in most AAC files @48kHz is 1024/48000=0,021333 - ( 21,3 [ms] )

http://wiki.multimedia.cx/?title=Understanding_AAC

https://developer.apple.com/library/mac/documentation/QuickTime/QTFF/QTFFAppenG/QTFFAppenG.html

 

EDIT: 'Fixed' equation.

Edited by Baal
  • Like 2
Link to comment
Share on other sites

Believe it or not, audio files also have frames. It's length depends on number of samples and sampling frequency... bla bla (technical stuff), Now, in mkvmerge negative delays dosn't exist, yep. In order to apply negative delay mmg does this:

 

Ah I see, that would explain why it always "snaps" to the same place when you delay the audio. You also said that negative delays dont exist, but then gave the procedure that mkvmerge uses when you input a negative value, so surely it would work by putting the negative value in from the beginning? (I may have misinterpreted your words thought)

 

I gave your instructions a go, but the English audio had shot forward by that 29ms again... even after delaying it in mmg. 

 

 

For those that are interested and because AAC is also common in releases 1 frame in most AAC files @48kHz is 1024/48000=0,021333 - ( 21,3 [ms] )

http://wiki.multimedia.cx/?title=Understanding_AAC

https://developer.apple.com/library/mac/documentation/QuickTime/QTFF/QTFFAppenG/QTFFAppenG.html

 

EDIT: 'Fixed' equation.

 

 

Alright clarification required please: are AC-3 and AAC files the same? Am I to use the 23,3ms instead of 32ms for the calculations? Or was this second equation just for clarification purposes

Link to comment
Share on other sites

But... The picture clearly shows all tracks as being PCM. Is that what Audacity displays all foreign tracks as?

 

Probably, AC3 and AAC shows as PCM, FLAC shows as 'float'.

 

You also said that negative delays dont exist, but then gave the procedure that mkvmerge uses when you input a negative value, so surely it would work by putting the negative value in from the beginning? (I may have misinterpreted your words thought)

 

What I meant is that whether you input positive or negative delay mkvmerge will always use positive delay in the resulting *.mkv

When you remux with positive value and then demux you get your original file, but if you use negative value then your file is forever lost because mkvmerge subtracted the full number of frames that your codec supports and delayed the difference to fullfill your request. So you always get positive delay in the end and its impossible to reconstruct the original file in the demuxing process.

 

 

are AC-3 and AAC files the same?

 

No.

 

Am I to use the 23,3ms instead of 32ms for the calculations?

 

No, you don't need any calculations except finding the delay in Audacity.

 

Or was this second equation just for clarification purposes

 

Yes.

 

MK, I still don't understand why are you demuxing the resulting audio after delaying it in mkvmerge??? Were you expecting to get the same file that you put there? Or can you hear that its out of sync by 29ms?

Edited by Baal
Link to comment
Share on other sites

 

What I meant is that whether you input positive or negative delay mkvmerge will always use positive delay in the resulting *.mkv

When you remux with positive value and then demux you get your original file, but if you use negative value then your file is forever lost because mkvmerge subtracted the full number of frames that your codec supports and delayed the difference to fullfill your request. So you always get positive delay in the end and its impossible to reconstruct the original file in the demuxing process.

 

 

Ah ok I see now, yeah that makes sense. Thanks.

 

 

 

No, you don't need any calculations except finding the delay in Audacity.

 

Ok, so the delay was 1059ms originally, I now also understand what you meant by your first calculations. So then Im still not to sure why if mkvmerge gives me my desired 1059ms delay, how come its still out of sync when I recheck it in audacity. As I now understood from what you said, it should be as easy as finding the delay in audacity, inputting it into mmg and re-muxing?

 

 

 

MK, I still don't understand why are you demuxing the resulting audio after delaying it in mkvmerge??? Were you expecting to get the same file that you put there? Or can you here that its out of sync by 29ms?

 

I am not demuxing anything? Just delaying the audio as per this screenshot: http://puu.sh/cWINs/d2458b4d39.png. So the Eng track is muxed into the Jap BD (no timing done - just as it came from the DVD), then I open this new Dual Audio version in Audacity, make a note of the time difference (1059ms), open the Dual Audio version in mkvmerge and "delay" the Eng track by this 1509ms and remux to a new file. Open this V2 in Audacity to check the timing and the Eng track ends up being 29ms too early. 

 

Unfortunately you can hear the delay (sounds like you are listening to it in a big cave when you feed the Eng audio to the right channel and Jap to the left.)

Link to comment
Share on other sites

Open this V2 in Audacity to check the timing and the Eng track ends up being 29ms too early.

 

 

You need to understand that audacity is audio editing application and it opens audio files as they are stored inside mkv file.

When any other video playback application opens that mkv it means it supports this format and it will correctly read 29ms delay.

 

In your mkv V2 after delaying eng track it is stored as ac3 file with 1088ms cut from the beginning, but also that mkv container stores information for all players that this track needs to start playing after 29ms. It means you get your 1059 delay but only in playback. Opening english dub track in audio editing app will show its missing 1088ms.

Link to comment
Share on other sites

Oh I see now. So if I wanted to have Audacity read the 2 tracks being "in sync", or the OP's starting at the same time, I would have to add 29ms of silence to the beginning of the Eng track, remux and make sure there is no delay set in mmg? 


 


What was putting me off was the first episode that I did, after syncing the tracks using the same method I mentioned, the 2 tied up perfectly in Audacity ~ timing must have been just right. 


 


Ok, so end result is do everything that I have been doing, but dont worry about the tracks not lining up in Audacity? 


 


Edit: what happens with something like Aegisub then? Does it read the audio track like a video player would? I.e. with the 29ms delay?


Edited by Moodkiller
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...
Please Sign In or Sign Up