I was hoping I’d be able to wrap this series up with this post, but it’s clear to me now that I won’t be able to. There’s simply too much to cover and I’m certain that at least a few are still scratching your heads (like I was), wondering why we shouldn’t just rely on the pixel matrices to calculate Pixel Aspect Ratio (PAR) from analog sources. It is critically important to understand that I am approaching this topic from a forensic perspective, with the goal of standardizing the methodology used for forensic processing, interpretation, and presentation.
“If we only use the pixel matrices to correct aspect ratio for analog source video, will that affect accurate interpretation?” Probably not. In fact, in most cases it will not, if you are not doing any photogrammetry.
Here’s more good news, formats like Source Input Format (SIF) and Common Intermediate Format (CIF) have already addressed this for us; so using the pixel matrices from these formats, as shown with the SIF example in the last post, obtains the same results. “Are you serious? What the he** do I need to know this for then!?”
Non-Visual Information (720 samples, ~ 704 visual)
As you are probably aware, the 720 samples specified by ITU Rec. 601 for NTSC and PAL include non-visual information (space on each side, among other things). This was done for a few reasons, which I’m not going to get into here, as it is well documented, understood, and accepted. In fact, it’s so well accepted and understood that many of the later analog video standards were based off from 704 samples to represent the visual information, rather than 720 to represent the entire signal.
If we correct aspect ratio for sources that store all 720 samples by simply using the pixel matrices stored, we have essentially factored this non-visual data into our aspect ratio correction equation incorrectly. In other words, we’ve included non-visual data in our equation designed to define the shape of the visual data’s non-square pixels. Oops.
Remember, we’re trying to accurately define the shape of the non-square pixels that represent the visual information. Let’s look at a few examples, starting with the uncompressed ITU Rec. 601 examples from my last post:
NTSC 480i Uncompressed: 720/486 = 1.48 SAR. 1.48 (SAR) x .909 (PAR) = 1.345 (DAR)
PAL 576i Uncompressed: 720/576 = 1.25 SAR. 1.25 (SAR) x 1.092 (PAR) = 1.365 (DAR)
The resulting pixel matrices for the examples above are 654 x 486 (NTSC) and 786 x 576 (PAL). The pixels are now square.
Although we used the PAR derived from the analog source sampling rates, neither of the above examples equate to a 4:3 (~1.33) Display Aspect Ratio (DAR), because of the non-visual information. Of course, there are a number of ways to address this; we can A) crop out the non-visual data to obtain our 4:3 DAR or B) leave it as is and simply display the aspect ratio corrected video along with the non-visual data.
If we crop, it is critically important to maintain the center of the image. In the case of the NTSC example above, we crop down from 654 x 486 to 648 x 486. For PAL we’d crop 786 x 576 down to 768 x 576.
“Well, what do you recommend, crop or leave as is?” I’ve got to tell you, whoever this voice is, they are a pain in the a**. The answer is, it depends. There are a lot of other variables to consider, which I’ll get into in the next post, hopefully. In the meantime, one more example.
NTSC DVD-Video 480i: 720/480 = 1.5 SAR. 1.5 (SAR) x .909 (PAR) = 1.363 (DAR)
PAL DVD-Video 576i: 720/576 = 1.25 SAR. 1.25 (SAR) x 1.092 (PAR) = 1.365 (DAR)
The resulting pixel matrices for these examples are 654 x 480 (NTSC) and 786 x 576 (PAL). The pixels are now square. If we crop, we keep the image centered and crop to 640 x 480 (NTSC) and 768 x 576 (PAL).
“Hey chief, what gives? In the last post you said NTSC DVD-Video was 704 x 480 and PAL DVD-Video was 704 x 576.” You are absolutely right, I did. It was late, my eyes were strained, and I forgot to mention that the DVD-Video specification supports four different pixel matrices for both NTSC and PAL.
Yay! Isn’t this fun? :)
Alright, I think I’m done conversing with that voice in my head for now. In the next post I’ll finally go over line and sample doubling, talk about more variables, and see if we might be able to wrap this up. I’m thinking of doing a related FAQ. Thoughts?