I had known that thread-local variables can be access pretty fast, via dedicated segment register, but I was not clear how can one make this work for dynamically loaded PIC code, like most .so files.
Turns out you can't. You only get fast access via dedicated register if you are using variable declared in the main program. The .so files have to call special function which does multiple memory lookups to get the actual location, probably severely reducing performance.
(and this is another case when seemingly simple operation -- getting variable value -- gets internally translated to dozens of operations and a function call)
That's one particular implementation for one language runtime one common OS. It's neither a requirement of the C language nor of the ELF executable file format.
I have done plenty with threads but never used thread-local storage except when forced to by some other library using it. To me it seems like a bolted-on monstrosity that provides thread safety to thread-naive code after the fact. Am I missing something? Is this a good solution to some problem I haven't encountered? Are there situations where the performance of TLS is better than some other solution?
I've used it to store the state of a global RNG. Thread local state is faster than locking a shared RNG and the observable behaviour is the same.
For high quality code you often pass around the RNG instance explicitly (improves testability), but often I just want a random number without bothering with explicit state management.
I read the section in that book about storing each thread's count in a TLS variable, and I think the root of the confusion is related to how C vs. Rust (the language I use and learned how to write multithreaded code in) deal with threads.
if I wanted to store a per-thread count in rust, it would make zero sense to use TLS for that, it would just be on the stack in the context of the thread's lexical scope:
let mut threads = Vec::new();
let global_count = Arc::new(AtomicUsize::new(0));
for _ in 0..n_threads {
threads.push(std::thread::spawn({
let global_count = global_count.clone();
move || {
let mut thread_count = 0;
for _ in some_iteration {
thread_count += 1;
}
global_count.fetch_add(thread_count, Ordering::Release);
}
});
}
for handle in threads { let _ = handle.join().unwrap(); }
println!("global count = {}", global_count.load(Ordering::Acquire));
that is a form of "thread-local storage", I guess, but does not involve any of the TLS primitives.
I had known that thread-local variables can be access pretty fast, via dedicated segment register, but I was not clear how can one make this work for dynamically loaded PIC code, like most .so files.
Turns out you can't. You only get fast access via dedicated register if you are using variable declared in the main program. The .so files have to call special function which does multiple memory lookups to get the actual location, probably severely reducing performance.
(and this is another case when seemingly simple operation -- getting variable value -- gets internally translated to dozens of operations and a function call)