More

RMarcus · on March 16, 2024

Just wanted to say thank you! This extension was critical to a bunch of my research (and now my lab's research as well). Being able to control fine-grained elements of each plan while letting the PG planner "do the rest" has saved me personally probably 100s of hours of work.

ioltas · on March 17, 2024

Thanks, glad to hear some good feedback.

RMarcus · on July 27, 2023

My reading is that no deadlocks should be possible since there is only one lock (pages are "locked" optimistically, meaning that the tx is aborted if the page has changed). "Live locks" are possible, where two repeatedly-reissued transactions cause each other to abort forever.

anyfoo · on July 27, 2023

Above, scottlamb came to the conclusion that live lock isn't possible after all, because one transaction will always have made progress. Intuitively, that makes sense, but intuition alone is always dangerous with concurrency. Which is it?

slaymaker1907 · on July 28, 2023

After reading the post a bit more carefully, I think livelock is impossible since transactions are only aborted if one or more read pages have been modified since they were read. That implies forward progress must be made another transaction in order for a transaction to be aborted.

However, unless they do some very careful accounting, this sounds equivalent to snapshot isolation which has anomalies not found with serializable isolation, though the chance of this happening is reduced by using page level locking.

jgraettinger1 · on July 28, 2023

Right, unless they’re marking non-leaf pages as “read” — which would imply marking the root — it would have to be snapshot, right?

Otherwise you can’t properly abort the txn because a “select x where y” that was previously empty, no longer is.

RMarcus · on July 28, 2023

Ah, I missed that this requires WAL mode -- indeed, if WAL is used, one transaction should always be able to make progress.

RMarcus · on Nov 27, 2022

Thanks for the first-hand information! I was also curious about this.

> This is not a course everyone enrolls in.

This is the ticket -- at both the University of Arizona and MIT, I've seen a group of folks graduate with a CS degree after taking OS, compilers, databases, and abstract algebra. Another another group of folks graduated after taking HCI, software engineering, design, and psychology courses. The two groups had some baseline skills (all knew the basic data structures and algorithms), but otherwise appeared quite distinct.

I don't know how to phrase this formally, but I think some statement like the following is true: within-university variance is higher than between-university variance.

(When I was younger, I had strong opinions on which one of these groups were "real" computer scientists. This was a very unfortunate way of thinking that prevented me from talking to folks who I later realized were some of the smartest around. I wish someone had corrected me sooner -- solving a problem with inputs/outputs well-defined enough to apply "rigorous" techniques doesn't make those problems inherently valuable or "harder" than others. God gave all the easy problems to the physicists.)

pjmlp · on Nov 27, 2022

From this side of the atlantic that always seem quite strange to me, the only way to mix so disparate set of skills is to have two degrees, the software engineering degree of the first group + the HCI and psycology stuff in a second degree for social sciences.

RMarcus · on Jan 26, 2021

We recently published a comparison of learned indexes (including RadixSpline and the PGM index posted yesterday)in VLDB: https://vldb.org/pvldb/vol14/p1-marcus.pdf

jarym · on Jan 26, 2021

Well written and easy to follow. Thank you for publishing this!

RMarcus · on Jan 25, 2021

We produced a detailed comparison of such "fitting" and "learning" techniques, available here: https://vldb.org/pvldb/vol14/p1-marcus.pdf

(Thomas Neumann, one of authors of the blog post, is a co-author of the linked paper)

RMarcus · on Jan 25, 2021

Check out work by Jialin Ding and Vikram Nathan, they both work on multi-dimensional learned index structures.

https://arxiv.org/pdf/2006.13282.pdf

RMarcus · on April 11, 2020

https://rmarcus.info

I mostly post interactive or semi-narrative explanations of technical topics I find interesting. 1-3 posts per year.

RMarcus · on Nov 7, 2019

My experience is entirely different -- writing HPC code for supercomputers at Los Alamos National Lab (on and off for 5 years) made me a true Rust believer.

One of the things I spent the most time on with Fortran / C++ codes was debugging wrong-result bugs. About 90% of the time, the wrong result came from some edge-case where an array was wrongly freed too early, an array was accessed out of scope, or a race condition caused an array member to be updated in a non-deterministic manner. Each of these bugs required hours of debugging and was a huge time sink. Once I started working with Rust, I never encountered any of these bugs. After about a year of fighting the borrow-checker, I feel my overall efficiency has greatly improved.

Now, when I go back and write or read C++ code, patterns that the Rust compiler would yell about jump out at me (multiple unprotected mutable references, cloning unique pointers), and I find these are generally a source of the bug I'm hunting. Like sibling comments point out, a lot (but not all) of the things Rust stops you from doing are just bad practice anyway.

Of course, for GPGPU stuff I have to write CUDA or OpenCL, but those are generally small, compact kernels that are easy to reason about end to end.

I'm not suggesting that you are doing this, but for me, I initially resisted Rust for a long time. Rust seemed extremely complex, and whenever I'd try to use it I would run into a wall. The loud Rust community talking about how great Rust was and how easy it was to use once you "got it" made me feel stupid. Instead of being humble, I became arrogant, and I'd say things like "Rust is too restrictive for the high performance applications I care about" or "I write code that Rust would find unsafe but is actually super well-tuned for this architecture." For me, these were mental excuses I made because I was unable to accept that I was having such a hard time with Rust, and I considered myself a "high performance computing software engineer!"

It took me way longer than most to "get" Rust -- over a year of repeatedly forcing myself to learn and stumble through compiler errors before things started to click. A year after that, and I'm still frequently surprised by certain aspects of the language ("really? I need a & in that match statement?" and "oh god, what does this lifetime and trait bound mean..." are two of the most common). But the parts of Rust that have clicked for me (the borrow checker and associated lifetime mechanics) make Rust very enjoyable to write.

Again, I'm not suggesting that you are falling into the same trap I did, I just wanted to post this to encourage anyone else in the "banging their head against the Rust compiler" stage to power through!

keldaris · on Nov 7, 2019

Thank you for sharing your experience! I'd be very curious to hear more about your experience in doing GPGPU work in Rust - it was my understanding that there was virtually no tooling, libraries or support for that kind of thing beyond the existence of the C FFI.

On the broader point, I suspect a large part of the reason we've had such contrasting experiences is just a radically different mindset behind the C++ codebases we've dealt with. Wrongly freed or out of scope arrays scream of exactly the kind of C++ code Rust was designed to address, and as far as I can tell it is indeed great at doing that. On the opposite extreme, when you have statically determined sizes and bounds, all allocations happen at startup and nothing ever gets freed, that entire class of issues simply doesn't arise in the first place. The reason why the overwhelming majority of the bugs I debug are either silly typos or plain logic errors isn't because I'm particularly good at this, it's just a different approach to programming that's easy to pull off in simulation code (or embedded systems, or game engines), but probably rather more difficult in other kinds of applications.

Anyway, I'm glad you're enjoying Rust and I hope it'll have more of a scientific / numerics / GPGPU ecosystem in the future. More viable languages can only be a good thing for us computational scientists.

RMarcus · on March 7, 2019

This would be considered absolutely batshit at all three of the R1 research universities I've been around, which spans a significant range of prestige.

I think it goes to show how much variance there are in PhD programs. I frequently advise undergrads to find the right lab (i.e., a lab that doesn't have such a competitive environment) instead of picking a school based on some other criteria like prestige, but this is far easier in hindsight. I got super lucky -- a small lab with a good advisor.

Maybe (in addition to a strong union) we need a "Yelp for labs" where advisors can be penalized (or recognized) for their behavior. If it were publicly available and student testimonial could be somehow verified and anonymous (potentially impossible), I bet administrators would put at least some pressure on problematic PIs...

rsfern · on March 7, 2019

Definitely agree with assessment of high variance in programs and even individual groups.

I think ‘yelp for academic groups’ is an interesting idea, but I really doubt the administration would step in unless it got really bad. But giving prospective students better information could disrupt the flow of good students to toxic labs, which might actually create some incentive for change.

I’m not sure I would actually post to such a service though, even though I have a pretty good relationship with my Ph.D. advisor. Too many bridges to potentially burn in such a small community.

For any prospective students who might read this, in the meantime you might try briefly emailing a few current and former group members asking for their perspective.

trombonechamp · on March 7, 2019

> we need a "Yelp for labs"

It exists: https://www.gradpi.com

There was an article in Science about it last year: https://www.sciencemag.org/careers/2018/02/crowdsourcing-goe...

RMarcus · on Dec 24, 2018

I've only read part of it, but it seems great so far! I always appreciate the clarity and practicality y'all at the JGL take.

I'm amazed that the implementation was under 1500 LOC! Was that the research prototype or the shipped preview?

Congratulations on the VLDB paper! Hopefully I'll come say "hi" in LA :)

karthiksr · on Dec 24, 2018

Thank you.

The shipped preview has only a bit more than 1500LOC.

The VLDB paper was presented at Rio in Aug this year already, but I'll try to come over to LA anyways :)

maslam · on Dec 24, 2018

Karthik, I'm no Spark expert but almost all advice I read is to avoid UDFs if at all possible. Examples below:

- https://medium.com/teads-engineering/spark-performance-tunin... - https://www.inovex.de/blog/efficient-udafs-with-pyspark/

karthiksr · on Dec 24, 2018

Thank you for those pointers.

There are definitely some differences between the kind of UDFs that Spark supports and the kind that Froid handles. For one, Spark UDFs cannot invoke a Spark SQL query in their definition AFAIK, whereas TSQL functions can. But still, some techniques might be applicable. Definitely worth digging further!

RMarcus · on Dec 24, 2018

Doh! Guess I should've checked. I didn't make it to Rio last year... Figured I was gonna miss a bunch of good stuff.