Why don't you line up the starting points manually? For me this looks like an exporting issue rather then interpretation issue in Reaper.
To make it clear: your client sends you a video (with audio) from a concert and a separate mix (only audio) from that concert. You have to master the mix and send it back so they can put it together to a new video with your mix.
What source program are they using?
Could it be that the tiny amount of shifting has to do with a frame rounding of the source software? Imagine that video has to follow frames. Audio not. So if someone cuts a video in a DAW it cuts thru video as if its audio. But that's not the case. The program has to start the video with a frame even if the item starts at a different point.
To check this you should change the measuring from beats to frames in the source program and take a look if the item really starts with a frame. Hope you can follow me because my English lacks.
That's just a theory.
Greetings