More

Everdred2dx · 2025-10-23T23:34:50 1761262490

I had a ton of fun using Kaitai to write an unpacking script for a video game's proprietary pack file format. Super cool project.

I did NOT have fun trying to use Kaitai to pack the files back together. Not sure if this has improved at all but a year or so ago you had to build dependencies yourself and the process was so cumbersome it ended up being easier to just write imperative code to do it myself.

kodachi · 2025-10-24T03:10:27 1761275427

It hasn't improved that much, you need to know the final size and fill all attributes, there are no defaults, at least in Python.

Everdred2dx · 2025-10-01T22:06:45 1759356405

tiktok influencers seem to love them because they can record strangers without them knowing. so there’s that…

Everdred2dx · 2025-09-16T00:45:53 1757983553

Well limiting to specifically OP's example (M4 Mac), Asahi doesn't support it yet. :(

Everdred2dx · 2025-07-15T03:33:10 1752550390

They mentioned “rebuilding” in EC2. Probably just moved that over to Azure and no longer are bothering with serverless :)

Everdred2dx · 2025-07-15T02:48:23 1752547703

Yep. I've ran into this using Bugsnag for reporting on unhandled exceptions in Python-based lambda functions. The exception handler would get called, but because the library is async by default the HTTP request wouldn't make it out before the runtime was torn down.

I sympathize with OP because debugging this was painful, but I'm sorry to say this is sort of just a "you're holding it wrong" situation.

Everdred2dx · 2025-07-13T04:36:43 1752381403

It's not AI but Ghidra has a cool feature called BSim which does something similar. Each function get's a "feature vector" which now that I think about it has some clear parallels to embeddings.

mixel · 2025-07-13T08:04:20 1752393860

Wow that is cool, I bet with that feature and a huge database of known "feature vectors" from open-source libraries so you can focus on the actual business logic of the binary instead of trying to reverse external library functions

MomsAVoxell · 2025-07-14T20:20:22 1752524422

BSim is a hash machine, right? (BSim uses feature vectors, and locality-sensitive hashing.)

Embeddings could be derived from reconstituted hash.

Everdred2dx · 2025-07-13T04:20:47 1752380447

If you were using a user agent spoofing extension couldn't this be used to guess your "real" UA?

jedimastert · 2025-07-13T15:20:08 1752420008

It looks like it's an SHA hash, so working backwards would probably be prohibitively irritating.

dataflow · 2025-07-13T15:25:21 1752420321

That's not how it works. The combination of valid inputs is a small set. You just try each one until you get the hash.

jedimastert · 2025-07-13T16:54:33 1752425673

It's not all that small, although probably small enough to make a rainbow table or something.

You would have to maintain the code to generate character-perfect strings (or maybe just keep a very large library of the current most popular ones) and also make sure you have the up to date API key salt values (which they probably going to start rotating regularly), which–as I said before–wouldn't be impossible, just prohibitively irritating to maintain for comparatively little benefit.

And besides, it won't be too long before people just start spoofing the hash too, probably shorter than getting the generator up and running

Everdred2dx · 2025-07-04T15:02:10 1751641330

Not to discount your overall point, but you're comparing a 25% increase with a 75% decrease here. Not quite the same thing

Everdred2dx · 2025-05-05T14:40:21 1746456021

Is the benefit of using a language server as opposed to just giving access to the codebase simply a reduction in the amount of tokens used? Or are there other benefits?

nicovank · 2025-05-05T16:42:54 1746463374

Beyond saving tokens, this greatly improved the quality and speed of answers: the language server (most notably used to find the declaration/definition of an identifier) gives the LLM

1. a shorter path to relevant information by querying for specific variables or functions rather than longer investigation of source code. LLMs are typically trained/instructed to keep their answers within a range of tokens, so keeping shorter conversations when possible extends the search space the LLM will be "willing" to explore before outputting a final answer.

2. a good starting point in some cases by immediately inspecting suspicious variables or function calls. In my experience this happens a lot in our Python implementation, where the first function calls are typically `info` calls to gather background on the variables and functions in frame.

emeryberger · 2025-05-05T16:17:00 1746461820

Yes. It lets the LLM immediately obtain precise information rather than having to reason across the entire source code of the code base (which ChatDBG also enables). For example (from the paper, Section 4.6):

  The second command, `definition`, prints the location and source
  code for the definition corresponding to the first occurrence of a symbol on a
  given line of code. For example, `definition polymorph.c:118` target prints the
  location and source for the declaration of target corresponding to its use on
  that line. The definition implementation
  leverages the `clangd` language server, which supports source code queries via
  JSON-RPC and Microsoft’s Language Server Protocol.

Everdred2dx · 2025-05-03T16:36:07 1746290167

It’s early days but check out Tracecat. I’ve been playing with it a bit and love it!

mdaniel · 2025-05-03T17:17:17 1746292637

Be aware of its AGPL license: https://github.com/TracecatHQ/tracecat/blob/0.34.1/LICENSE

mogili · 2025-05-03T17:36:26 1746293786

n8n has even more restrictive license.