Hacker News new | past | comments | ask | show | jobs | submit | michalsustr's comments login

I think it’s because of https://en.wikipedia.org/wiki/Address_space_layout_randomiza...

My interpretation is it’s basically a hack because C programmers can’t make their libs reliable/memory safe, and it throws some logs under attacker’s feet.

There was this software called MagicErmine or statifier that could convert the two.


I don't see what ASLR has to do with the inability to convert a shared to static library though. In fact a shared library must be position independent so if anything it would make the job easier. For going from static (non-PIC) to shared, W^X is probably why modern dynamic linkers don't support patching text segment.

Tools like Statifier to convert an executable + its dylibs into a statically linked executable are sort of an in-between hack that does the equivalent of prelinking. But that's not converting a shared library to a static library, because the result isn't an object file that can be linked.

I guess I never stated the punchline explicitly, but boricj's tool seems to prove that there is in fact no such theoretical limitation in converting from shared -> static library. It's a bit hacky of course, but you can indeed reconstruct relocation tables with decompiler-like analysis to go back from executable -> object file for any function, and this also implies you can go from shared library -> static library.


> I guess I never stated the punchline explicitly, but boricj's tool seems to prove that there is in fact no such theoretical limitation in converting from shared -> static library. It's a bit hacky of course, but you can indeed reconstruct relocation tables with decompiler-like analysis to go back from executable -> object file for any function, and this also implies you can go from shared library -> static library.

As far as the traditional linker goes, sure. This whole shtick relies on the fact that a platform has ABI conventions, so with a naive toolchain any translation unit (and function) compiled in isolation must be interoperable as-is, because the linker is completely oblivious to whatever's going on within an object file section (relocations excluded). Even then, platform-specific considerations might get in the way.

For artifacts that went through inter-procedural and especially link-time optimization, this breaks down. The former will take advantage of the fact that anything goes as long as the visible interface of an object file section is ABI compliant, the latter effectively turns an entire program into one big translation unit. Among other things, functions may exhibit non-standard calling conventions, which wreaks havoc as soon as you start trying to meld delinked bits with freshly built ones, because the optimizer is unlikely to make the same choices in its custom calling conventions.

I have a user who is decompiling a link-time optimized artifact built over 15 years ago. In order to graft newly compiled code, they binary patched the original compiler to basically add support for __usercall, in order to coerce the toolchain into making the same optimization decisions. If you're up against an artifact built with a non-traditional toolchain with optimizations on, delinking alone most likely won't cut it and additional magic will be required.


Thank you for chiming in! LTO wouldn't really matter for "delinking" exported symbols of shared libraries though, would it? The exported functions must necessarily follow platform ABI convention, and so long as those are copied over, things would seem to work fine.

I guess the one catch with shared libraries is that if shared library A itself depends on another dylib B, then calls within A to functions of B would go via PLT indirection. So then creating an object file out of this would involve not just moving code and creating relocation entry but also possibly patching the assembly itself to remove the call indirection.


If you're delinking the entire LTO-optimized shared library as one big blob, then sure. LTO effectively turns a program/library into one huge translation unit, but if you're not cutting across it then the platform ABI mandates the observable boundary.

AE (algebraic effect) are very interesting! Great article, thank you.

Reading through, I have some concerns about usability in larger projects, mainly because of "jumping around".

> Algebraic effects can also make designing cleaner APIs easier.

This is debatable. It adds a layer of indirection (which I concede is present in many real non-AE codebases).

My main concern is: When I put a breakpoint in code, how do I figure out where the object I work with was created? With explicit passing, I can go up and down the stack trace, and can find it. But with AE composition, it can be hard to find the instantiation source -- you have to jump around, leading to yo-yo problem [1].

I don't have personal experience with AE, but with python generators, which the article says they are the same (resp. AE can be used to implement generators). Working through large complex generator expressions was very tedious and error-prone in my experience.

> And we can use this to help clean up code that uses one or more context objects.

The functions involved still need to write `can Use Strings` in their signature. From practical point of view, I fail to see the difference between explicitly passing strings and adding the `can Use Strings` signature -- when you want add passing extra context to existing functions, you still need to go to all of them and add the appropriate plumbing.

---

As I understand it, AE on low level is implemented as a longjmp instruction with register handling (so you can resume). Given this, it is likely inevitable that in a code base where you have lots of AE, composing in various ways, you can get to a severe yo-yo problem, and getting really lost in what is the code doing. This is probably not so severe on a single-person project, but in larger teams where you don't have the codebase in your head, this can be huge efficiency problem.

Btw. if someone understands how AE deal with memory allocations for resuming, I'd be very interested in a good link for reading, thank you!

[1]: https://en.wikipedia.org/wiki/Yo-yo_problem


> As I understand it, AE on low level is implemented as a longjmp instruction with register handling (so you can resume).

Not quite. setjmp/lonjmp as they exist in C at least can jump up the call stack but not back down. I mention this at the end of the article but each language implements algebraic effects differently, and efficiency has improved in recent years. Languages can also optimize the effect differently based on how the handler is defined:

- Handlers which are tail-resumptive can implement the effect as a normal closure call.

- Handlers which don't call resume can be implemented as an exception or just return an error value at every step until the function exits the handler.

- Handlers which perform work after resume is called (e.g. `| my_effect x -> foo (); resume (); bar ()` can be implemented with e.g. segmented call stacks.

- Handlers where resume is called multiple times need an equivalent of a delimited continuation.

Another way to implement these generally is to transform the effects into monads. For any set of effects you can translate it into a monad transformer where each effect is its own monad, or the free monad can be used as well. The cost in this approach is often from boxing closures passed to the bind function.

Koka has its own approach where it translates effects to capability passing then bubbles them up to the handler (returns an error value until it gets to the handler).

With just a few restrictions you can even specialize effects & handlers out of the program completely. This is what Effekt does.

There really are a lot of options here. I have some links at the end of the article in the foot notes on papers for Koka and Effekt that implement the approaches above if you're interested.


> From practical point of view, I fail to see the difference between explicitly passing strings and adding the `can Use Strings` signature -- when you want add passing extra context to existing functions, you still need to go to all of them and add the appropriate plumbing.

Couldn't the plumbing be reduced significantly through type inference? I.e. the function that invokes an effect will need the signature, the callers higher up the chain won't.

Also, even if you had to adapt all the signatures, at least you wouldn't have to adjust the call sites themselves to pass in the context every single time.


Right, compared to explicitly passing the parameter, with effects:

- You wouldn't have to edit the body of the function to thread through the parameter. - The `can Use Strings` part can be inferred (and in Ante's case, it is my goal to have the compiler write in inferred types for you so that top-level definitions can be inferred if desired but still annotated when committed for code review). - Most notably, the `can Use Strings` can be included in a type alias. You could have an alias `MyEffects = can Use Strings, Throw FooError`, etc for the effects commonly used in your program. If your state type is used pervasively throughout, this could be a good option. When you have such an alias it also means you'd just be editing the alias rather than every function individually.

Generally though, while I think the passing around of state through effects can be useful it isn't the most convincing use of effects. I mention it more for "here's another benefit they can have" rather than "here's this amazing reason you should definitely use them for"


> Generally though, while I think the passing around of state through effects can be useful it isn't the most convincing use of effects.

I actually find it quite convincing, even if it might not be the main use case. Nowadays in imperative languages there are countless popular frameworks that resort to using & manipulating global state behind the scenes because drilling context through the call stack by hand is painful. Unfortunately, this makes understanding code much more difficult and testing it even more so.


This also called opponent modeling. Used in e.g. poker, so you can maximise your earnings.

https://scholar.google.com/scholar?hl=en&as_sdt=0%2C5&q=oppo...


Did you publish these by any chance?


Sorry, the code is closed source.


I understand the point , however I think explicit types are still superior, due to abundance of data in the training phase. It seems to me to be too computationally hard to incorporate a REPL-like interactive interface in the gpu training loop. Since it’s processing large amounts of data you want to keep it simple, without back-and-forth with CPUs that would kill performance. And if you can’t do it at training time, it’s hard to expect for the LLM to do well at inference time.

Well, if you could run clojure purely on gpu/inside the neural net, that might be interesting!


Why would it be more expensive to include a REPL-like experienced compared to running the whole of the Rust compiler, in the GPU training loop?

Not that I argued that you should that (I don't think either makes much sense, point was at inference time, not for training), but if you apply that to one side of the argument (for Clojure a REPL), don't you think you should also apply that to the other side (for Rust, a compiler) for a fair comparison?


I agree. I am under the impression that unlike Rust, there aren’t explicit types required in Clojure. (I don’t know clojure)

So there are examples online, with rust code and types and compiler errors, and how to fix them. But for clojure, the type information is missing and you need to get it from repl.


> So there are examples online, with rust code and types and compiler errors, and how to fix them. But for clojure, the type information is missing and you need to get it from repl.

Right, my point is that instead of the LLM relying on static types and text, with Clojure the LLM could actually inspect the live application. So instead of trying to "understand" that variable A contains 123, it'll do "<execute>(println A)</execute>" and whatever, and then see the results for themselves.

Haven't thought deeply about it, but my intuition tells me the more (accurate and fresh) relevant data you can give the LLM for solving problems, the better. So having the actual live data available is better than trying to figure out what the data would be based on static types and manually following the flow.


If you want to build LLM specific to clojure, it could be probably engineered, to add the types as traces for training via synthetic dataset, and provide them from repl at inference time. Sounds like awfully large amount of work for non mainstream language.


Hi, I made a new crate since I wasnt happy with the existing ones.

I published first version: https://github.com/michalsustr/rust-automata

Happy to get feedback.


Hi, I just made a library which does that!

I published first version: https://github.com/michalsustr/rust-automata

Happy to get feedback.


Looks really neat, might use that in my work!


Well this is timely :) I’m in the middle of writing a library, based on rust-fsm crate, that adds nice support for Mealy automata, with extensions like

- transition handlers

- guards

- clocks

- composition of automata into a system.

The idea is to allow write tooling that will export the automata into UPPAAL and allow for model checking. This way you don’t need to make too much additional effort to ensure your model and implementation match/are up to date, you can run the checker during CI tests to ensure you don’t have code that deadlocks/ some states are always reachable etc.

I plan to post a link here to HN once finished.


Please do!


I published first version: https://github.com/michalsustr/rust-automata

Happy to get feedback.


I’m not familiar with Haskell concurrency. The combination of green threads and large memory allocations due to immutable data structures sounds like it would be hard to implement a web server handling 10k+ concurrent requests on commodity hardware?

Btw. too bad author talks about microsecond guarantees usage but does not provide a link, that would be interesting reading.


> sounds like it would be hard to implement a web server handling 10k+ concurrent requests on commodity hardware?

In practice, it is not. The canonical Haskell compiler, GHC, is excellent at transforming operations on immutable data, as Haskell programs are written, into efficient mutations, at the runtime level. Also, since web development is quite popular in the Haskell community, lots of people have spent many hours optimizing this precise use-case.

In my experience, the real downside is that compilation times are a bit long -- the compiler is doing a LOT of work after all.


> The canonical Haskell compiler, GHC, is excellent at transforming operations on immutable data, as Haskell programs are written, into efficient mutations, at the runtime level.

Yes, at the level of native machine code and memory cells, there's not that much of a difference between immutability + garbage collection, and higher level source code that mutates. Thanks to GC you are going to overwrite the same memory locations over and over again, too.


Programmers for some reason really don't understand that generational garbage collection provides locality. I am really surprised how often I see C/C++/Rust types not understand this.


I think that only applies to a moving GC. A conservative GC (like the Boehm GC for C) doesn't move any items around, and thus doesn't do anything for locality.

Of course, even a moving GC has limits, itwon't turn a hashtable into something that has local accesses.


> Warp is a high-performance HTTP server library written in Haskell, a purely functional programming language. Both Yesod, a web application framework, and mighty, an HTTP server, are implemented over Warp. According to our throughput benchmark, mighty provides performance on a par with nginx.

Source: https://aosabook.org/en/posa/warp.html


The interaction of laziness and purity means that the memory costs are not always what you think. Purity means that it's a lot safer to share structure between old and new versions of a data structure where an imperative language would have to do defensive copying, and laziness means that you can incrementally amortise the cost of expensive rebalancing operations (Okasaki is the standard reference for this).


> [...] large memory allocations due to immutable data structures sounds [...]

Why would there be large memory allocations because of immutable data structures? Btw, you can also use immutable data structure in eg Rust fairly easily. And Haskell also supports mutation and mutable data structures.

However, Haskell can use a lot of memory, but that's more to do with pervasive 'boxing' by default, and perhaps laziness.


No reason. OC probably thinks that immutable data structures are always copied when being operated on.


Yes indeed, unless you use ropes or other specialised structures


Lists aren’t copied on prepend.

Tries (like scala’s Vector) or trie maps (the core map types of Scala, Clojure and probably Haskell?) aren’t copied on updates.

In fact, whether a data structure is an immutable or persistent data structure or merely an unmodifiable data structure (like Kotlin uses) is based on whether it requires full copies on most updates or not. In FP languages, immutable data structures aren’t “specialized” at all.


> whether a data structure is an immutable or persistent data structure or merely an unmodifiable data structure...

This hurt my brain. It seems that in some places (e.g. Java land) unmodifiable refers to something that you can't modify but could just be a wrapper around a structure that can be modified. In that case they use immutable to mean something that is nowhere modifiable.

I may be misrepresenting this idea, but I think the terminology is so poor that it deserves to be misunderstood.


Think about mutability in Java land this way:

  // Using mutability.
  // `increment` is void, and makes 2 bigger for everyone.
  increment(2); 

  // Typical Java "safety".
  // It's still void, but now it throws a RuntimeException
  // because the developers are saving you from making everyone's 2 bigger.
  increment(2);

  // Immutable
  // Returns 3
  increment(2);


Doesn't it depend on the data structure? Eg prepending to a list is actually cheaper with immutable data structures: you keep the original list and add a new head pointing to its head. Now you have two lists available in your program, but only one stored in memory. Yay!


Well luckily, Haskell is full of said "specialized structures."

containers and unordered-containers handle most of your needs and they only copy their trees' spines (O log n) on update.


Haskell also support eg arrays with O(1) in-place updates just fine.


Not really. You might want to look into “ Purely functional data structures” by Chris Okazaki.


It doesn't actually have "large memory allocations" due to immutable data structures. This is a meme that isn't true. Immutable data structures, especially at small scale, do not have huge performance penalties. You don't copy the entire structure over and over...you copy the O(log n) spine.

Haskell's GC is also fast when you are mostly generating garbage, which is inherently true for web server handlers.


Deforestation helps with that

A composition of catamorphic and anamorphic functions can eliminate a lot of the in-between allocations (a hylomorphism)

Basically it looks like you’re building a ton of intermediate structure then consuming it - meaning much of the in-between stuff can be eliminated.

Interesting optimizations and a little mind blowing when you see it.


You obviously haven’t ran anything on the BEAM (Erlang’s VM).


Correct. Erlang also uses green threads?


Yes. And immutable data structures.

When data is immutable, it can be freely shared. Changes to the data essentially uses copy-on-write. And it only writes the delta change, since you don't need a deep copy due to immutability. Add that the garbage collectors of Haskell and Erlang are designed to work with a high allocation rate and have 0 cost for dead data, and this is much faster than what people think.

The way you implement a webserver in either Haskell or Erlang is rather trivial. Whenever there's an incoming request, you make a thread to handle it. So you don't have 1 webserver serving 10k requests. You have 10k webservers serving 1 request each. And since they are started from the same core data, they'll share that due to immutability. See also old-style Apache or PHP and fork().


Web servers handling lots of small requests are actually pretty easy to garbage collect to: you just delete all the data at the end of the request.

Either you have a specialised GC that works like this, or probably a good general generational GC can pick up on this pattern on its own.


Or you do as Erlang's BEAM VM: each thread has it's own memory area which is GC'ed individually. This means upon request termination, you just terminate the thread and the memory is reclaimed with no need for a GC.


In the abstract, this is very similar to spawning a unix process for every request, never free-ing any memory, and letting the memory allocation die with the process.


nah bro, warp is quite performant. think there were some consultancies that wrote haskal web app for their clients.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: