The difference between regular stereo mixes and binaural mixes is basically the difference between lateralization and localization.

Lateralization happens every time you listen to a stereo track via headphones: the sources within that mix are all in your head, between your left ear and the right ear. The singers most of the time directly in the middle.

When you listen to stereo via loudspeakers, all of the sources are out of your head (as the loudspeakers are), we call that localization.

A binaural mix tries to recreate the sound at the entry of your ear which you would get by listening to any source outside your head. Such a mix can quite simply be created by putting microphones in your ears, and record any sound source (for example a stereo mix played back via loudspeakers in a room), and then listen to the recorded signals. But it can also be achieved by using HRTFs (head related transfer functions - without room) or BRIRs (binaural room impulse response - with room), they basically describe how a sound changes from it's source being in a certain direction until arriving at your ear. With that information you can create a virtual source being wherever you want

There are different approaches to do that, one of them: Ambisonics with binaural rendering.
Most important thing to remember:
Without externalization there is no localization but only lateralization. And externalization can only happen with a room, may it be natural (within the BRIRs) or synthetic (simulated).

