|
|
Nandub Interleave Bug: Technical details
Due to a significant bug, described in the last paragraph below, muxing an MP3 stream with video is badly flawed in Nandub if you choose interleave by quot;timequot;. This is regardless of the actual time value entered. It occurs with any bitrate, any style (VBR or CBR), and with any version of Nandub (v0.23m, v0.21.2, v1.0 RC2 were tested). Interleaving by specifying frame count, on the other hand, works fine.
It should be noted that existence of these two different methods of specifying interleave is just a convenience – a 200ms interleave on a 25FPS AVI is, by design anyway, identical to a five frame interleave. Given the existence of this bug, one should never use the time-based specification at all.
The bug has the effect that the interleaved audio is physically nowhere near its respective video, which is the whole point of interleaving. The error is extreme enough that typically the audio and video become separated from each other within the first second or two of playback, and then continue to get further separated for the duration. In other words, once you’re 00:00:01 into the movie, the interleave is not only useless to the media player, but it’s likely a hindrance.
More specifically, a 30FPS/128Kbps file exhibits an interleave quot;rate skewquot; of 0.36, i.e. the audio chunks are laid out only 36% as fast as they should be based on the elapsed time of the adjacent video chunks. As one might guess, the video quot;runs outquot; first, too. Then the file becomes audio only. Using the same 30FPS/128Khz example with, say, 1000Kbps video, the video (and hence any interleaving) ends 92% of the way thru the file. The final 8% is one big non-interleaved quot;audio-onlyquot; segment containing the remaining 64% of the audio.
Examining the Nandub source code, I found that the bug is in the MainAddAudioFrame() function of the Dubber class (file dub.cpp). To calculate the next quot;audio pointquot; the author needed to divide by quot;bytes per framequot;, but instead used quot;nBlockAlignquot; which is quot;samples per framequot; - almost always 1152. He should have taken that samples/frame value and multiplied it by bytes/sec (e.g. 16000 for 128Kbps) and divided that by samples/sec (e.g. 44100) – that gives the value he was looking for, 418 bytes per frame in the example used above. All of those values, specific to the file currently loaded, are sitting right there in the same structure, making the program change trivial. In fact, I changed that one line in the Nandub just for testing, recompiled the code, and the corrected version then created perfectly interleaved files.
MP3-VBR has varying bytes per frame...and in case of 128 kBit with 44.1 kHz, even CBR does not have constant frame size: frames can be 417 or 418 bytes...
Hmm...according to this showthread.php?s=amp;threadid=36094 for ac3-audio any other value than multiple of 32 ms using timebased interleaving splits ac3-frames.
So using framebased as a workaround is not an option.
I suggest you submit your fix to the authors of VirtualDubMod, which have used the audio code from Nandub in their mod of VirtualDub!
The interleave must be audio-frame-based, nothing else. For MP3 at 48 kHz, any MP3 frame is 24ms, while for 44.1 kHz, frame are between 26065µs and 26112µs, if I'm correctly.
I thus strongly recommend never to use 44.1 kHz MP3-VBR in AVI files...I don't even know why this can work at all
chibi, the bug is in an MP3 related calculation, so there is no reason to believe it extends to AC3 - if there are any AC3 problems, they would be for a different reason and probably have a different fix.
alexnoe, it's 26.122 ms per frame at 44,100 and 24ms at 48,000 - why is one better than the other? Neither are likely to be a multiple of the video frame rate.
For MP3 with 44.1 kHz, it should not be constant. Frames can be 417 or 418 bytes for 128 kBit. Do both of them take 26122 µs indeed?
I recalced and get 26125µs for 418 bytes and 26062,5µs for 417 bytes. Assuming that 96% of frames are 418 bytes and 4% are 417 bytes, you get 26122.5 µs as *average*
But since the 96% vs 4% is just the one very distribution which assures that the result is exactly 128 kBit (which is *not* required!), only working with the average value would cause crap results if someone feeds a file which only consists of 417 byte frames...
Originally posted by stegre
chibi, the bug is in an MP3 related calculation, so there is no reason to believe it extends to AC3 - if there are any AC3 problems, they would be for a different reason and probably have a different fix.
I see, so it's only mp3-related...thanx for clearing that up...ac3-probs with nandub have been discussed a lot in the above mentioned thread already...
Originally posted by alexnoe
For MP3 with 44.1 kHz, it should not be constant. Frames can be 417 or 418 bytes for 128 kBit. Do both of them take 26122 µs indeed?
I recalced and get 26125µs for 418 bytes and 26062,5µs for 417 bytes. Assuming that 96% of frames are 418 bytes and 4% are 417 bytes, you get 26122.5 µs as *average*
But since the 96% vs 4% is just the one very distribution which assures that the result is exactly 128 kBit (which is *not* required!), only working with the average value would cause crap results if someone feeds a file which only consists of 417 byte frames...
Does all this mean that mp3-files from nandub are also crap (as was described in original post) and should better be remuxed (-gt;AviMuxGUI) as well? I already been asking this in the above mentioned thread, you said, nandub handles (vbr)mp3 fine, as far as you know...maybe this is an issue that has not been brought up yet?
If you guys figure it out, don't forget to submit your findings to the VDubMod guys.
Does all this mean that mp3-files from nandub are also crap
@Chibi Jasmin: I have absolutely no idea what leads you to this conclusion
The frame time is fixed, regardless of the bytes required to represent it. Nandub would be fine with a stream of quot;all 417'squot; because it doesn't count bytes, it syncs to frames and then adds together the frame times.
If the issue is a quot;smoothquot; interleave, there is a quot;jitterquot; with most of these interleaving schemes -- not because of 417 vs 418 business, but rather because a quot;1 frame interleavequot; looks something like this (using the original 44.1KHz/128kbs/30FPS example): V, A, V, AA, V, A, V, A, V, AA, V, A, etc. Which is to say the audio frames are sometimes single amp; sometimes doubled up (in this example) in order to get the right average ratio of audio to video framerate. That's assuming frames are not split across interleaves, which is the case for both CBR amp; VBR with Nandub and VBR with AVIMux (AVIMux does an excellent job, btw).
But I'm not really discussing any of that, the Nandub error I'm discussing is quot;cumulativequot;. Forget double frames - this bug would cause more like 176,000 audio frames in a row at the end of a two-hour movie.
Originally posted by alexnoe
@Chibi Jasmin: I have absolutely no idea what leads you to this conclusion
Well, if it's true what stegre says, I'd consider the resulting files crap ('The bug has the effect that the interleaved audio is physically nowhere near its respective video')
Ah...yes, if you set the interleave time-based, then I agree on the files being crap.
Didn't read properly
chibi - thanks for mentioning that; I just sent an email to the VDubMod people (after checking if their code has the same problem, which it apparently does)
Originally posted by alexnoe
Ah...yes, if you set the interleave time-based, then I agree on the files being crap.
Didn't read properly
Luckily, it's set for framebased by default
Fun story: MP3 is only good, when set framebased and AC3 is only good. when set timebased as multiple of 32 ms, maybe the whole audio thing in Nandub needs a rewrite...how about contributing your findings from programming AviMuxGUI to VDubMod as well?
Originally posted by stegre
chibi - thanks for mentioning that; I just sent an email to the VDubMod people (after checking if their code has the same problem, which it apparently does)
`
Okay, fine...thanx!
Originally posted by Chibi Jasmin
Luckily, it's set for framebased by default
It is lucky; in fact, I've checked a bunch of movies collected from quot;the fieldquot;, and fortunately the percentage of files I've found which exhibit the problem is relatively small, and I bet that's the reason. But they are out there - I've found a number of them. I can check easily because I've added a new display field to GSpot, the codec ID utility I wrote. I had already added an analysis of interleave period and preload, which is how I got involved with interleave and discovered this to begin with. So now (as soon as I release v2.1, another week or so), it's going to display an quot;interleave media rate ratioquot; - which is, specifically, quot;seconds of audio per interleave divided by seconds of video per interleavequot;. As you might expect, the value should always be 1.00. Files which exhibit this Nandub bug are pretty easy to recognize, they display very low values (e.g. 0.36). Here's a screenshot of utility as it stands now (it's still undergoing internal testing and cleanup).
gspot/beta/gspot-v2.1b3-a.png
(btw, it won't analyze AC3 interleave on this version, but that's next. This will handle CBR and VBR MP3, though)
Originally posted by stegre
(btw, it won't analyze AC3 interleave on this version, but that's next.
I've been looking for something like this along time now. I find Gspot to be essential in my tool bag these days, great work
Thanks, MaTTeR, I appreciate that. As far as the new version, I hope to have in in another week or so.
Originally posted by stegre
Examining the Nandub source code, I found that the bug is in the MainAddAudioFrame() function of the Dubber class (file dub.cpp). To calculate the next quot;audio pointquot; the author needed to divide by quot;bytes per framequot;, but instead used quot;nBlockAlignquot; which is quot;samples per framequot; - almost always 1152. He should have taken that samples/frame value and multiplied it by bytes/sec (e.g. 16000 for 128Kbps) and divided that by samples/sec (e.g. 44100) – that gives the value he was looking for, 418 bytes per frame in the example used above. All of those values, specific to the file currently loaded, are sitting right there in the same structure, making the program change trivial. In fact, I changed that one line in the Nandub just for testing, recompiled the code, and the corrected version then created perfectly interleaved files.
It is true there may be (well most likely there is ) a problem with per-time interleaving.
However what you describe here only concerns MP3 VBR audio.
Indeed the 'normal' meaning of nBlockAlign is the number of bytes per block (or frame as you called it). The 1152 value (which represents the number of samples in a frame for MP3) is invalid and represents the hack made in Nandub to identify the stream as MP3.
So what you demonstrate is true, this value shouldn't be used as-is, but only in this specific case (MP3 ala Nandub).
If you look at the code in MainAddAudioFrame, you will see that the 2 interleaving methods are not processed the same way. In the per-frame interleaving the program compute the quot;audio pointquot; accordingly to the kind of audio detected (i.e. MP3 ala Nandub, Ogg, and 'standard' - i.e. others - streams). But in the per-time interleaving the quot;audio pointquot; is always computed the same way, assuming that nBlockAlign represent effectively the number of bytes per frame.
Thus the quot;audio pointquot; should be computed according to the kind of stream too.
If you look at VirtualDub sources you will see that Nandub (and VDubMod) use the same formulae when computing the quot;audio pointquot; for 'normal' streams, i.e. use the nBlockAlign value.
Thanks for pointing the problem and helping us. We received your quick fix but couldn't integrate it for the next release (see Belgabor's post in the VirtualDub section) due to what I described above (the problem concerns only MP3 ala Nandub) and the following fact :
When computing the quot;audio pointquot; you compute first the number of bytes per frame (integer value) as (nBlockAlign * nAvgBytesPerSec / nSamplesPerSec). But for 'normal' streams (waved MP3 muxed in VirtualDub, ...) nBlockAlign equals generally 1, so the number of bytes per frame is below zero (=gt; rounded to zero) and leads to a 'divide by zero' error.
When I have some time I will try to fix the issue (i.e. make MP3 VBR interleaving works better in the per-time mode). I will also make some tests to see if this is a problem that concerns the muxing or / and remuxing processes.
ahh, yes, okay - I take it back about the program change being trivial then -- I only tried my quot;quick fixquot; very quickly on one or two MP3's - not on other file types - it was really just something I did for verification of the problem.
Regarding CBR vs. VBR, I was about to object and say that I saw the same effects with both (which I did), but now I see what you're getting at: Multiplexing a plain MP3 with Nandub will fail (whether it's VBR or CBR), but using the original VDub system of multiplexing a quot;wav-wrappedquot; CBR MP3 (using the quot;WAV audioquot; menu option) does work correctly (on either Nandub or VDub). In fact, I just now tried making one that way and then tested the result with my utility - indeed it displays quot;1.00quot; right on the nose.
Thanks for your response, I'll be looking forward to see if amp; how it finally gets resolved on a subsequent release of VDubMod. |
|