the two streams have different tempos so they can't be sync'ed by simply matching their starting points. You can either periodically insert stretch markers into the Italian item to sync section after section or you could completely trim away the silence at the beginning of the Italian item and then slide its start to match the French item's beginning. Then, to compensate for the drift over time, you can time-compress the entire Italian item as a whole by pressing the ALT key while grabbing the item's final vertical edge (mouse changes to a hand) and pushing the edge to the left until you see the waveforms line up. Zoom in and repeat the last step until they line up with the desired accuracy. Go to the beginning and also fine tune it in the same way. It took me about one minute to line them up fairly well. It would be helpful to have the video displayed while doing this to verify if the result is actually lip-sync'ed.
.
|