Hacker Newsnew | past | comments | ask | show | jobs | submit | Everdred2dx's commentslogin

I had a ton of fun using Kaitai to write an unpacking script for a video game's proprietary pack file format. Super cool project.

I did NOT have fun trying to use Kaitai to pack the files back together. Not sure if this has improved at all but a year or so ago you had to build dependencies yourself and the process was so cumbersome it ended up being easier to just write imperative code to do it myself.


It hasn't improved that much, you need to know the final size and fill all attributes, there are no defaults, at least in Python.


tiktok influencers seem to love them because they can record strangers without them knowing. so there’s that…


Well limiting to specifically OP's example (M4 Mac), Asahi doesn't support it yet. :(


They mentioned “rebuilding” in EC2. Probably just moved that over to Azure and no longer are bothering with serverless :)


Yep. I've ran into this using Bugsnag for reporting on unhandled exceptions in Python-based lambda functions. The exception handler would get called, but because the library is async by default the HTTP request wouldn't make it out before the runtime was torn down.

I sympathize with OP because debugging this was painful, but I'm sorry to say this is sort of just a "you're holding it wrong" situation.


It's not AI but Ghidra has a cool feature called BSim which does something similar. Each function get's a "feature vector" which now that I think about it has some clear parallels to embeddings.


Wow that is cool, I bet with that feature and a huge database of known "feature vectors" from open-source libraries so you can focus on the actual business logic of the binary instead of trying to reverse external library functions


BSim is a hash machine, right? (BSim uses feature vectors, and locality-sensitive hashing.)

Embeddings could be derived from reconstituted hash.


If you were using a user agent spoofing extension couldn't this be used to guess your "real" UA?


It looks like it's an SHA hash, so working backwards would probably be prohibitively irritating.


That's not how it works. The combination of valid inputs is a small set. You just try each one until you get the hash.


It's not all that small, although probably small enough to make a rainbow table or something.

You would have to maintain the code to generate character-perfect strings (or maybe just keep a very large library of the current most popular ones) and also make sure you have the up to date API key salt values (which they probably going to start rotating regularly), which–as I said before–wouldn't be impossible, just prohibitively irritating to maintain for comparatively little benefit.

And besides, it won't be too long before people just start spoofing the hash too, probably shorter than getting the generator up and running


Not to discount your overall point, but you're comparing a 25% increase with a 75% decrease here. Not quite the same thing


Is the benefit of using a language server as opposed to just giving access to the codebase simply a reduction in the amount of tokens used? Or are there other benefits?


Beyond saving tokens, this greatly improved the quality and speed of answers: the language server (most notably used to find the declaration/definition of an identifier) gives the LLM

1. a shorter path to relevant information by querying for specific variables or functions rather than longer investigation of source code. LLMs are typically trained/instructed to keep their answers within a range of tokens, so keeping shorter conversations when possible extends the search space the LLM will be "willing" to explore before outputting a final answer.

2. a good starting point in some cases by immediately inspecting suspicious variables or function calls. In my experience this happens a lot in our Python implementation, where the first function calls are typically `info` calls to gather background on the variables and functions in frame.


Yes. It lets the LLM immediately obtain precise information rather than having to reason across the entire source code of the code base (which ChatDBG also enables). For example (from the paper, Section 4.6):

  The second command, `definition`, prints the location and source
  code for the definition corresponding to the first occurrence of a symbol on a
  given line of code. For example, `definition polymorph.c:118` target prints the
  location and source for the declaration of target corresponding to its use on
  that line. The definition implementation
  leverages the `clangd` language server, which supports source code queries via
  JSON-RPC and Microsoft’s Language Server Protocol.


It’s early days but check out Tracecat. I’ve been playing with it a bit and love it!



n8n has even more restrictive license.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: