Just wanted to say thank you! This extension was critical to a bunch of my research (and now my lab's research as well). Being able to control fine-grained elements of each plan while letting the PG planner "do the rest" has saved me personally probably 100s of hours of work.
My reading is that no deadlocks should be possible since there is only one lock (pages are "locked" optimistically, meaning that the tx is aborted if the page has changed). "Live locks" are possible, where two repeatedly-reissued transactions cause each other to abort forever.
Above, scottlamb came to the conclusion that live lock isn't possible after all, because one transaction will always have made progress. Intuitively, that makes sense, but intuition alone is always dangerous with concurrency. Which is it?
After reading the post a bit more carefully, I think livelock is impossible since transactions are only aborted if one or more read pages have been modified since they were read. That implies forward progress must be made another transaction in order for a transaction to be aborted.
However, unless they do some very careful accounting, this sounds equivalent to snapshot isolation which has anomalies not found with serializable isolation, though the chance of this happening is reduced by using page level locking.
Thanks for the first-hand information! I was also curious about this.
> This is not a course everyone enrolls in.
This is the ticket -- at both the University of Arizona and MIT, I've seen a group of folks graduate with a CS degree after taking OS, compilers, databases, and abstract algebra. Another another group of folks graduated after taking HCI, software engineering, design, and psychology courses. The two groups had some baseline skills (all knew the basic data structures and algorithms), but otherwise appeared quite distinct.
I don't know how to phrase this formally, but I think some statement like the following is true: within-university variance is higher than between-university variance.
(When I was younger, I had strong opinions on which one of these groups were "real" computer scientists. This was a very unfortunate way of thinking that prevented me from talking to folks who I later realized were some of the smartest around. I wish someone had corrected me sooner -- solving a problem with inputs/outputs well-defined enough to apply "rigorous" techniques doesn't make those problems inherently valuable or "harder" than others. God gave all the easy problems to the physicists.)
From this side of the atlantic that always seem quite strange to me, the only way to mix so disparate set of skills is to have two degrees, the software engineering degree of the first group + the HCI and psycology stuff in a second degree for social sciences.
My experience is entirely different -- writing HPC code for supercomputers at Los Alamos National Lab (on and off for 5 years) made me a true Rust believer.
One of the things I spent the most time on with Fortran / C++ codes was debugging wrong-result bugs. About 90% of the time, the wrong result came from some edge-case where an array was wrongly freed too early, an array was accessed out of scope, or a race condition caused an array member to be updated in a non-deterministic manner. Each of these bugs required hours of debugging and was a huge time sink. Once I started working with Rust, I never encountered any of these bugs. After about a year of fighting the borrow-checker, I feel my overall efficiency has greatly improved.
Now, when I go back and write or read C++ code, patterns that the Rust compiler would yell about jump out at me (multiple unprotected mutable references, cloning unique pointers), and I find these are generally a source of the bug I'm hunting. Like sibling comments point out, a lot (but not all) of the things Rust stops you from doing are just bad practice anyway.
Of course, for GPGPU stuff I have to write CUDA or OpenCL, but those are generally small, compact kernels that are easy to reason about end to end.
I'm not suggesting that you are doing this, but for me, I initially resisted Rust for a long time. Rust seemed extremely complex, and whenever I'd try to use it I would run into a wall. The loud Rust community talking about how great Rust was and how easy it was to use once you "got it" made me feel stupid. Instead of being humble, I became arrogant, and I'd say things like "Rust is too restrictive for the high performance applications I care about" or "I write code that Rust would find unsafe but is actually super well-tuned for this architecture." For me, these were mental excuses I made because I was unable to accept that I was having such a hard time with Rust, and I considered myself a "high performance computing software engineer!"
It took me way longer than most to "get" Rust -- over a year of repeatedly forcing myself to learn and stumble through compiler errors before things started to click. A year after that, and I'm still frequently surprised by certain aspects of the language ("really? I need a & in that match statement?" and "oh god, what does this lifetime and trait bound mean..." are two of the most common). But the parts of Rust that have clicked for me (the borrow checker and associated lifetime mechanics) make Rust very enjoyable to write.
Again, I'm not suggesting that you are falling into the same trap I did, I just wanted to post this to encourage anyone else in the "banging their head against the Rust compiler" stage to power through!
Thank you for sharing your experience! I'd be very curious to hear more about your experience in doing GPGPU work in Rust - it was my understanding that there was virtually no tooling, libraries or support for that kind of thing beyond the existence of the C FFI.
On the broader point, I suspect a large part of the reason we've had such contrasting experiences is just a radically different mindset behind the C++ codebases we've dealt with. Wrongly freed or out of scope arrays scream of exactly the kind of C++ code Rust was designed to address, and as far as I can tell it is indeed great at doing that. On the opposite extreme, when you have statically determined sizes and bounds, all allocations happen at startup and nothing ever gets freed, that entire class of issues simply doesn't arise in the first place. The reason why the overwhelming majority of the bugs I debug are either silly typos or plain logic errors isn't because I'm particularly good at this, it's just a different approach to programming that's easy to pull off in simulation code (or embedded systems, or game engines), but probably rather more difficult in other kinds of applications.
Anyway, I'm glad you're enjoying Rust and I hope it'll have more of a scientific / numerics / GPGPU ecosystem in the future. More viable languages can only be a good thing for us computational scientists.
This would be considered absolutely batshit at all three of the R1 research universities I've been around, which spans a significant range of prestige.
I think it goes to show how much variance there are in PhD programs. I frequently advise undergrads to find the right lab (i.e., a lab that doesn't have such a competitive environment) instead of picking a school based on some other criteria like prestige, but this is far easier in hindsight. I got super lucky -- a small lab with a good advisor.
Maybe (in addition to a strong union) we need a "Yelp for labs" where advisors can be penalized (or recognized) for their behavior. If it were publicly available and student testimonial could be somehow verified and anonymous (potentially impossible), I bet administrators would put at least some pressure on problematic PIs...
Definitely agree with assessment of high variance in programs and even individual groups.
I think ‘yelp for academic groups’ is an interesting idea, but I really doubt the administration would step in unless it got really bad. But giving prospective students better information could disrupt the flow of good students to toxic labs, which might actually create some incentive for change.
I’m not sure I would actually post to such a service though, even though I have a pretty good relationship with my Ph.D. advisor. Too many bridges to potentially burn in such a small community.
For any prospective students who might read this, in the meantime you might try briefly emailing a few current and former group members asking for their perspective.
There are definitely some differences between the kind of UDFs that Spark supports and the kind that Froid handles. For one, Spark UDFs cannot invoke a Spark SQL query in their definition AFAIK, whereas TSQL functions can.
But still, some techniques might be applicable. Definitely worth digging further!