Clarification on JSFX atomic_setifequal() semantics

tack · 05-23-2020, 05:22 PM

This is probably only something Justin or Schwa can answer, but if anyone knows, I'd be grateful:

Does atomic_setifequal() have any behavioral differences by platform? Can we expect it to work equally reliably on Mac, Windows, and Linux?
Does atomic_setifequal() work on gmem buffers?
Does atomic_setifequal() work on _global.* variables?

Specifically, I am trying to understand if this code is invalid.

A Reaticulate user reported an issue that could be explained by a race condition in JSFX initialization if atomic_setifequal() does not support gmem buffers.

Ultimately I need to be able to lock a critical section across arbitrarily many instances of a JSFX.

Thanks!

tack · 05-24-2020, 08:12 AM

And to add, if atomic_setifequal() can't work across JSFX instances (gmem regions or _global.* variables), is anyone aware of some other pattern or recipe to implement some form of cross-instance locking?

tack · 05-25-2020, 05:20 PM

I'm fairly well convinced by now that atomic_setifequal() doesn't work with gmem region or global variables.

I've implemented a horrifying workaround by having a central Lua script elect one of the JSFX instances to be a lock manager. But this is really tragic. I'm quite interested to know if there's a proper way to do synchronization of critical sections across JSFX instances.

By way of example, consider the scenario where a JSFX instance wants to allocate a unique instance id by counter. (This is what Reaticulate needs to do, as each JSFX instance must autonomously and safely carve out its own region in the shared gmem buffer.)

Here's the JSFX, which tries (and fails) to use atomic_setifequal() to implement a spinlock used to generate a synchronized counter to act as an instance id:

Code:

// RaceMaker.jsfx
desc:RaceMaker
options:gmem=racemaker

slider1:-1<-1,1000,1>instance counter

@init
MAGIC = 0xdeadbeef;
instance_id = -1;
slider1 = instance_id;


@block
while (instance_id == -1 && gmem[0] == MAGIC) (
    // Spin until lock is acquired (or we reach max allowed iterations)
    (atomic_setifequal(_global.racemaker_lock, 0, 1) == 0) ? (
        // Lock was acquired.
        instance_id = gmem[1];
        gmem[1] += 1;
        // Release lock
        _global.racemaker_lock = 0;
        slider1 = instance_id;
    );
);

The idea is that the instances will startup and lie dormant (not attempt to acquire the id) until the first slot in the gmem buffer is updated with a special value to indicate readiness by a Lua script. Once the Lua script writes this magic value, all instances spring to life with the usual parallelism for FX processing.

Here is a Lua script now which writes the magic value to the gmem buffer, and then scans all the instances to look for collisions of instance ids.

Code:

function scan()
    local seen = {}
    local collisions = 0
    for tidx = 0, reaper.CountTracks(0) - 1 do
        local track = reaper.GetTrack(0, tidx)
        local fx = reaper.TrackFX_GetByName(track, "RaceMaker", false)
        if fx and fx ~= -1 and
          reaper.GetMediaTrackInfo_Value(track, "I_FXEN") == 1 and
          reaper.TrackFX_GetEnabled(track, fx) then
            local instance_id, _, _ = reaper.TrackFX_GetParam(track, fx, 0)
            if seen[tostring(instance_id)] then
                reaper.ShowConsoleMsg("COLLISION: track " .. tostring(tidx + 1) ..
                                    " with " .. seen[tostring(instance_id)] .. ": " ..
                                    tostring(instance_id) .. "\n")
                collisions = collisions + 1
            else
                seen[tostring(instance_id)] = tostring(tidx + 1)
            end
        end
    end

    if collisions == 0 then
        reaper.ShowConsoleMsg("OK: no counter collisions\n")
    end
end

reaper.gmem_attach("racemaker")
reaper.gmem_write(0, 0xdeadbeef)
-- Give instances a bit of time to allocate an id
reaper.defer(function()
    scan()
end)

I reproduce this by:

Creating 200 tracks with the RaceMaker JSFX
Executing the above Lua script

This outputs something like:

Code:

COLLISION: track 5 with 1: 163.0
COLLISION: track 16 with 13: 12.0
COLLISION: track 17 with 12: 18.0
COLLISION: track 22 with 12: 18.0
COLLISION: track 28 with 12: 18.0
COLLISION: track 42 with 36: 29.0
COLLISION: track 52 with 48: 38.0
[...]

Of course, as this is a race, the number and nature of the collisions varies. (Occasionally, as you'd expect, there are no collisions at all.)

So, with all that, unless I've done something silly with my spinlock implementation, I feel confident in saying atomic_setifequal() doesn't work with global variables.

Can this be implemented? Or is there another option to accomplish synchronization between JSFX instances?

Thanks!

IXix · 05-27-2020, 04:15 AM

Can't help regarding the atomics but I made a library ages ago to track instances of a JSFX. Here's the link if it's any use... https://stash.reaper.fm/v/25417/IX.Tracker.jsfx-inc.zip

tack · 05-27-2020, 07:29 AM

Quote:

Originally Posted by IXix

Can't help regarding the atomics but I made a library ages ago to track instances of a JSFX. Here's the link if it's any use... https://stash.reaper.fm/v/25417/IX.Tracker.jsfx-inc.zip

Thanks for sharing that.

It looks like your code has the same problem mine does. In the case of your tracker library, you're making the assumption that Tracker.Register() can't execute concurrently across multiple JSFX instances. In my case, I was assuming that wasn't safe, but was making a different assumption that atomic_setifequal() worked on gmem buffers, which appears also to be false.

Concurrent execution of Tracker.Register() breaks the promise of unique instance ids.

Now, in practice, this race seems to be really rare. Personally, I have never run into it: I'd never observed multiple Reaticulate JSFX instances getting the same id. It works much like your code does: if the instance id isn't yet assigned in @block, then it goes and allocates one. This has been fine for me and until recently I'd not received user reports of any problems.

But the user in question can, for reasons that still aren't clear to me, very reliably reproduce the race with a few reloads of a project.

So to prove the unsafeness of the approach you and I have taken for allocating instance ids, I've made concurrency of the id allocation code much more likely by requiring a flag be set somewhere in gmem before the JSFX will try to allocate its id.

And this works for your Tracker library too:

Code:

@block
(gmem[20000] == 0xdeadbeef) ? (
    refresh_state = _global.Demo.Tracker.Update(self);
    my_id = _global.Demo.Tracker.GetID(self);
    instance_count = _global.Demo.Tracker.GetCount();
);

Then, once the project is loaded, I call a Lua script to write 0xdeadbeef at that (arbitrary) gmem slot, inducing the race. With this, I can get your library to generate the same id for multiple JSFX instances.

So ultimately the raciness is still there. Like I mentioned earlier, I've got a solution baking to basically assign one of the JSFX instances to act as a lock manager for the critical section (which assignment is done by a single-instance Lua script). But I'd really like there to be a proper way to solve this concurrency problem in JSFX.

(One may wonder, if I have a global Lua script running, why it doesn't take the role of assigning unique instance ids. There are reasons we can talk about if you want. But it's still only a workaround to the basic problem of memory-safe concurrency in JSFX.)

IXix · 05-27-2020, 07:47 AM

Oh, sorry it wasn't more helpful. I didn't think instances could be updated simultaneously but perhaps that's something to do with anticipative processing, multiple cores and all that voodoo. IIRC the atomic functions were to help manage interaction between the gfx and audio threads of a single jsfx.

I'm sure Justin would probably consider adding global variants if there's a valid use case but he's probably super busy with much bigger fish right now, so don't hold your breath.

tack · 05-27-2020, 09:11 AM

Quote:

Originally Posted by IXix

IIRC the atomic functions were to help manage interaction between the gfx and audio threads of a single jsfx.

Yep, this is what I've now come to understand.

Quote:

Originally Posted by IXix

I'm sure Justin would probably consider adding global variants if there's a valid use case but he's probably super busy with much bigger fish right now, so don't hold your breath.

Indeed. I've come to learn that one can't quite predict what things catch Justin and schwa's attention. I think it somehow relates to quantum indeterminacy.

IXix · 05-27-2020, 11:07 AM

Quote:

Originally Posted by tack

one can't quite predict what things catch Justin and schwa's attention. I think it somehow relates to quantum indeterminacy.

Haha! I think their focus will definitely be elsewhere for the next couple of months but yes, you never know.

Justin · 05-30-2020, 10:25 AM

sorry I've been weighing over this for a bit figuring out what to do.

atomic_*() work on gmem, but only atomically within a particular JSFX instance (so not across multiple instances).

Internally they are implemented using a mutex, so while I'd like to make it global, I want to make sure it doesn't cause any performance issues down the road. Expect some changes for this in a build soon.

tack · 05-30-2020, 10:36 AM

Quote:

Originally Posted by Justin

sorry I've been weighing over this for a bit figuring out what to do.

Awesome! I'm really happy to know you're chewing on it.

Quote:

Originally Posted by Justin

Internally they are implemented using a mutex, so while I'd like to make it global, I want to make sure it doesn't cause any performance issues down the road. Expect some changes for this in a build soon.

To mitigate risk of unintended consequences (performance hits or edge case deadlocks), I'd be quite happy with an explicit function call to acquire/release a mutex too. (Of course, the JSFX failing to release the lock would result in a deadlock too, but at least that wouldn't be an edge case.

)

But if you're confident enough in making the atomic_() mutexes global would be safe, that'd definitely be nicest from a Just Works perspective.

nofish · 06-04-2020, 06:26 AM

tack, you may want to check latest pre-release.

tack · 06-04-2020, 06:37 AM

Quote:

Originally Posted by nofish

tack, you may want to check latest pre-release.

Awesome!

mschnell · 09-07-2022, 05:16 AM

Quote:

Originally Posted by Justin

atomic_*() work on gmem, but only atomically within a particular JSFX instance (so not across multiple instances).
...
Expect some changes for this in a build soon.

I just tried to build a queue with the ability to handle multiple senders, each of which will be a JSFX instance in different tracks, and a single consumer JSFX.

Hence the queue is in gmem.

Works fine when not using the atomic functions. But of course this is not safe. Hence I do need atomic to work across instances of JSFXes.

So I tried atomic_setifequal(), checking the result of that function and spinning if the result is not as expected (== previous value ).

Works with a single instance, but I do need the multiple instance stuff. Is this already working ?

Thanks,
-Michael

Code:

desc:midi to partner send

options:gmem=MIDI_TRANSFER

in_pin:none
out_pin:none

@init
  gmem[0] = 4; // write pointer for writers (needs atomic)
  gmem[1] = 4; // write pointer for readers
  gmem[2] = 4; // read pointer

@block
  while (midirecv(mpos, msg1, msg2, msg3) ) (         
    midisend(mpos, msg1, msg2, msg3);  // pass through
    ptr_if = 1;
    while (ptr_if)  (
      ptr = atomic_get(gmem[0]);
      ptr_old = ptr;
      ptr_4 =  ptr+4;
      ptr_4 >= 4+4*10000 ? ptr_4 = 4;
      ptr_cmp =  atomic_setifequal(gmem[0], ptr_old, ptr_4);
      ptr_cmp == ptr_old ? (  // qmwm[0] not changend by another sending instance
        gmem[ptr] = mpos; // we can do this, as the other only reads the prevoisly stored values
        ptr +=1;
        gmem[ptr] = msg1;
        ptr +=1;
        gmem[ptr] = msg2;
        ptr +=1;
        gmem[ptr] = msg3;
        ptr +=1;
        gmem[1] = ptr_4; // allow reading 
        ptr_if = 0;  // successfully queued
      );
    ); 
  );

mschnell · 09-07-2022, 11:33 AM

Quote:

Originally Posted by Justin

Internally they are implemented using a mutex,

AFAIU both X86 and ARM architectures do feature instructions that allow to implement e.g. atomic_setifequal() without the help of the OS. That would automatically be global and supposedly the fastest way to do it.

BTW.: what is atomic_get() and atomic_set() supposed to do ? AFAIU these actions don't need atomicness.

-Michael

mschnell · 09-07-2022, 01:09 PM

Two instances of these

Code:

desc:midi to partner send

options:gmem=MIDI_TRANSFER

in_pin:none
out_pin:none

@init
  gmem[0] = 4; // write pointer for writers (needs atomic)
  gmem[1] = 4; // write pointer for readers
  gmem[2] = 4; // read pointer

@block
  while (midirecv(mpos, msg1, msg2, msg3) ) (         
    midisend(mpos, msg1, msg2, msg3);  // pass through
    ptr_if = 1;
    while (ptr_if)  (
      ptr = atomic_get(gmem[0]);
      ptr_4 =  ptr+4;
      ptr_4 >= 4+4*10000 ? ptr_4 = 4;
      ptr ==  atomic_setifequal(gmem[0], ptr, ptr_4)  ? (  // qmwm[0] not changend by another sending instance
        gmem[ptr] = mpos; // we can do this, as the other only reads the prevoisly stored values
        ptr +=1;
        gmem[ptr] = msg1;
        ptr +=1;
        gmem[ptr] = msg2;
        ptr +=1;
        gmem[ptr] = msg3;
        ptr +=1;
        gmem[1] = ptr_4; // allow reading 
        ptr_if = 0;  // successfully queued
       ):(
        _1 += 1; 
      );
    ); 
  );

indeed _1 slowly counts up, if multiple instances are active with receiving many Midi messages.

Hence the atomicness across instances seems to work now.

-Michael

mschnell · 09-08-2022, 03:05 AM

BTW.: the list of library functions used for "atomic" by the C++ compiler (and hence might be considered useful) can be found here: -> https://en.cppreference.com/w/cpp/atomic
-Michael

mschnell · 09-30-2022, 10:10 AM

Why do we need atomic_set() and atomic_get(). Is it really possible that a thread is interrupted and a floating point value is saved only partly ?

-Michael

mschnell · 09-30-2022, 10:58 PM

...
Or is this related to some multicore cache issues - which seems unlikely as the code in GHitGub does not use "memory boundary" ASM instructions but a Mutex ?

mschnell · 10-01-2022, 05:58 AM

I posted a FR regarding this issues.

-Michael