I've babled about Userspace RCUs many times, and how we have the two fastest Userspace RCUs currently in existence: GraceVersion URCU and TwoPhase URCU.
Unfortunately, we submitted to a few conferences already and the reviewers don't get it.
I think this is partially because not a lot of people know what RCU is, or how to use it (it's a simple yet powerful multi-purpose synchronization  mechanism). 
Anyways, we got another reject, which is always a bit upsetting, but clearly we were submitting to the wrong conference because the majority of comments from the reviewers showed how little they knew about RCU.
The funny thing is that they told us to compare against Predicate RCU (see link below). Predicate RCU has three algorithms, with two of them requiring a time() function that returns results that are sequentially consistent across cores, something that only a few CPUs provide, and therefore are not really "generic algorithms".
Only the algorithm named "D-PRCU" (Algorithm 2 in the Predicate RCU paper) is truly generic. What't so funny about this?
Well, the Predicate RCU paper was published in 2015 and as it so happens, the "D-PRCU" algorithm was actually invented by us somewhere in 2012 and published in 2013 as part of the Left-Right paper (Algorithm 15 of our paper):
Predicate RCU paper http://www.cs.tau.ac.il/~mad/publications/ppopp2015-prcu.pdf
Original Left-Right paper https://sourceforge.net/projects/ccfreaks/files/papers/LeftRight/leftright-extended.pdf/download
Now, we didn't add the predicate part, that is a true innovation from the authors of Predicate RCU, but the predicate part is an extension of the RCU API... and there are many extensions to the RCU API (we have several which we never published). In the end, what matters is the actual algorithm, and the algorithm they've published is the same we published two years previously.
It's sad that we submitted the Left-Right paper to multiple conferences, including PPoPP, and it never got accepted (we stopped trying at some point and just posted it online on sourceforge and github) and that their paper got accepted. I know, I know, it's all about presentation, and not about the actual content, but hey, it saddens me that's all  :(
I would like to believe they actually came up with this algorithm and they didn't see our paper online and got "inspired by it". Whatever the case, good ideas tend to be "re-discovered" many times, and this approach of using a two-phase algorithm is certainly one of the good ones, so it's perfectly reasonable that it got re-discovered and probably will get re-discovered many times in the future.
That wasn't even the funniest part. The really funny thing is that our URCU TwoPhase is an improvement over the original idea such that there is no need to acquire a lock and the readers continue to be wait-free (see algorithm 2 of our "Grace Sharing URCU" paper), and one of the reviewers is asking us to compare it against the old version.
https://github.com/pramalhe/ConcurrencyFreaks/blob/master/papers/gracesharingurcu-2017.pdf
In detail, the old algorithm (shown in the Algorithm 15 of Left-Right paper and algorithm 2 of Predicate RCU paper) is something like this:
    // Grab the lock shared by all updaters
    writersMutex.lock();
    // Wait for the next RI to become empty
    while (!readIndicator[!updaterVersion].isEmpty()) pause();  
    // Toggle the updaterVersion
    updaterVersion = !updaterVersion;      
    // Wait for the previous RI to become empty    
    while (!readIndicator[!updaterVersion].isEmpty()) pause(); 
    // Release the lock so that another updater can go in
    writersMutex.unlock();                        
    
The new algorithm (shown in Algorithm 2 of the Grace Sharing URCU paper) is something like this:
    // Look ma', no locks!
    const int64_t currUV = updaterVersion;
    const int64_t nextUV = (currUV+1);
    // Wait for the next RI to become empty of the updaterVersion to be advanced by another updater
    while (!readIndicator[(int)(nextUV&1)].isEmpty()) {
      if (updaterVersion.load() > nextUV) return;
      if (updaterVersion.load() == nextUV) break;
    }
    // Advance updaterVersion
    if (updaterVersion.load() == currUV) {
      auto tmp = currUV;
      updaterVersion.compare_exchange_strong(tmp, nextUV);
    }
    // Wait for the previous RI to become empty of the updaterVersion to be advanced by another updater
    while (!readIndicator[(int)(currUV&1)].isEmpty()) {
      if (updaterVersion.load() > nextUV) return;
    }
The basic premise of the two-phase approach with two ReadIndicators is still there, the difference is that our new algorithm doesn't need locks because it can handle multiple updaters changing the updaterVersion variable (they use a CAS instead of a sequentially consistent store).
So, the sequence of events was like this: 
2006 - The first know publishing of a two-phase algorithm for RCU, using a lock on synchronize_rcu();
2011 - We (re)-discovered the two-phase approach with a lock (among other things), which nobody accepted for publication nor referenced the 2006 algorithm;
2013 - We give up on publishing in a conference and put the paper online on sourceforge and github;
2014 - We submit (again) the Left-Right paper to PPoPP 2014  and it gets rejected;
2015 - The Predicate RCU paper gets accepted at PPoPP 2015, among others, it uses the same algorithm two-phase algorithm in D-PRCU;
2017 - We make an even better algorithm that doesn't need a lock and allows updaters to scale efficiently but nobody accepts for publication either... on top of that, the reviewers complain that we don't cite the Predicate RCU which is the state of the art paper and which in fact is the same as our algorithm in Left-Right and the same as the 2006 entry on lwn.net.
If that's not funny then I don't know what is  :)
... ohhhh wait, I forgot about the cherry on top of the cake:
One of the reviewers said that in our GraceVersion and TwoPhase algorithms the updaters calling synchronize_rcu() don't block each other, and of course, without blocking each other they can have large performance benefits, but he asks, without blocking on synchronize_rcu(), how does the application synchronize concurrent updates to the data structure?
ahahhahahah that one is just beautiful, I'm crying from laughing so hard.
Ohhh well, you know the saying: Today's tragedies are tomorrows comedies!
Edit: It was pointed out to me (thanks Adam!) that there is an even older reference to this kind of two-phase approach. It was first described in 2006, so it predates our original idea with a lock though not the new algorithm TwoPhase without locks in synchronize_rcu(). You can check it out here https://lwn.net/Articles/202847/ 
 
> and there are many extensions to the RCU API (we have several which we never published).
ReplyDelete> It's sad that we submitted the Left-Right paper to multiple conferences, including PPoPP, and it never got accepted (we stopped trying at some point and just posted it online on sourceforge and github) and that their paper got accepted.
IMHO, you should start publishing papers on arXiv instead of github, then, hopefully, more people will pay attention to your papers.