More

whitten · 2025-11-07T17:56:47 1762538207

What is the Oxford flowers dataset ? Where is it available ?

rockinghigh · 2025-11-07T19:04:06 1762542246

https://www.robots.ox.ac.uk/~vgg/data/flowers/

whitten · 2025-11-05T13:57:01 1762351021

Does the SMILE (or Simplified Molecular Input Line Entry System) code have an EBNF definition ? https://en.wikipedia.org/wiki/Simplified_Molecular_Input_Lin... Claims there is a context free grammar.

dalke · 2025-11-05T20:48:30 1762375710

That's "SMILES".

Yes. Here is the yacc grammar for the SMILES parser in the RDKit. https://github.com/rdkit/rdkit/blob/master/Code/GraphMol/Smi...

There's also one from OpenSMILES at http://opensmiles.org/opensmiles.html#_grammar . It has a shift/reduce error (as I recall) that I was not competent enough to fix.

I prefer to parser almost completely in the lexer, with a small amount of lexer state to handle balanced parens, bracket atoms, and matching ring closures. See https://hg.sr.ht/~dalke/opensmiles-ragel and more specifically https://hg.sr.ht/~dalke/opensmiles-ragel/browse/opensmiles.r... .

dalke · 2025-11-06T06:14:54 1762409694

Oh, I should have pointed out my Python lexer-driven parser at https://hg.sr.ht/~dalke/smiview/browse/smiview.py

The lexer: https://hg.sr.ht/~dalke/smiview/browse/smiview.py?rev=tip#L3...

The lexer state transitions: https://hg.sr.ht/~dalke/smiview/browse/smiview.py?rev=tip#L3...

dekhn · 2025-11-05T21:52:43 1762379563

I wrote a very simple SMILES parser using pyparsing https://github.com/dakoner/smilesparser/tree/master I wouldn't say it's intended for production work, but it has been useful in situations where I didn't want to pull in rdkit.

dalke · 2025-11-06T06:09:38 1762409378

I see you include the dot disconnect "." as part of the Bond definition.

You also define Chain as:

  Chain <<= pp.Group(pp.Optional(Bond) + pp.Or([Atom, RingClosure]))

I believe this means your grammar allows the invalid SMILES C=.N

whitten · 2025-10-24T02:25:50 1761272750

To quote from the page: id: flags type: u1

This seems to say flags is a sort of unsigned integer.

Is there a way to break the flags into big endiaN bits where the first two bits are either 01 or 10 but not 00 or 11 with 01 meaning DATA and 01 meaning POINTER with the next five bits as a counter of segments and the next bit is 1 if the default is BLACK and 1 if the default is WHITE ?

CGamesPlay · 2025-10-24T03:25:09 1761276309

Appears so: https://doc.kaitai.io/user_guide.html#_bit_sized_integers

whitten · 2025-10-22T01:08:47 1761095327

So if the vibes are wild, I’m not a hippie but an AI ? Cool. Is that an upgrade or &endash; or not ?

whitten · 2025-09-18T18:26:59 1758220019

What tool do you use to make such a video ?

mike_hearn · 2025-09-22T08:05:14 1758528314

On macOS it's built in. The screenshot tool can do it. For more professional work I use ScreenFlow.

whitten · 2025-09-13T17:03:10 1757782990

I think the fact that the line protocol for DEC VT terminals is as the ANSI X3.64 standard is why the issue hasn’t been addressed or modernized

See https://en.m.wikipedia.org/wiki/ANSI_escape_code

whitten · 2025-08-19T20:38:12 1755635892

I know branch prediction is essential if you have instruction pipelining in actual CPU hardware.

It is an interesting thought experiment re instruction pipelining in a virtual machine or interpreter design. What would you change in a design to allow it ? Would an asynchronous architecture be necessary ? How would you merge control flow together efficiently to take advantage of it ?

addaon · 2025-08-19T21:21:42 1755638502

> I know branch prediction is essential if you have instruction pipelining in actual CPU hardware.

With sufficiently slow memory, relative to the pipeline speed. A microcontroller executing out of TCM doesn’t gain anything from prediction, since instruction fetches can keep up with the pipeline.

immibis · 2025-08-20T13:59:20 1755698360

The head of the pipeline is at least several clock cycles ahead of the tail, by definition. At the time the branch instruction reaches the part of the CPU where it decides whether to branch or not, the next several instructions have already been fetched, decoded and partially executed, and that's thrown away on a mispredicted branch.

There may not be a large delay when executing from TCM with a short pipeline, but it's still there. It can be so small that it doesn't justify the expense of a branch predictor. Many microcontrollers are optimized for power consumption, which means simplicity. I expect microcontroller-class chips to largely run in-order with short pipelines and low-ish clock speeds, although there are exceptions. Older generations of microcontrollers (PIC/AVR) weren't even pipelined at all.

addaon · 2025-08-20T14:13:24 1755699204

> but it's still there

Unless you evaluate branches in the second stage of the pipeline and forward them. Or add a delay slot and forward them from the third stage. In the typical case you’re of course correct, but there are many approaches out there.

o11c · 2025-08-19T22:12:48 1755641568

You have to ensure that virtual instructions map to distinct hardware instructions.

Computed-goto-after-each-instruction is well known, and copying fragments of machine code is obvious.

Less known is "make an entire copy of your interpreter for each state" - though I'm only aware of this as a depessimization for stack machines.

https://dl.acm.org/doi/pdf/10.1145/223428.207165

But the main problem, which none of these solve, is that most VM languages are designed to be impossible (or very difficult) to optimize, due to aggressive use of dynamic typing. Nothing will save you from dynamic types.

cogman10 · 2025-08-19T21:25:54 1755638754

With the way architectures have gone, I think you'd end up recreating VLIW. The thing holding back VLIW was compilers were too dumb and computers too slow to really take advantage of it. You ended up with a lot of "NOP"s as a result in the output. VLIW is essentially how modern GPUs operate.

The main benefit of VLIW is that it simplifies the processor design by moving the complicated tasks/circuitry into the compiler. Theoretically, the compiler has more information about the intent of the program which allows it to better optimize things.

It would also be somewhat of a security boon. VLIW moves the branch prediction (and rewinding) into the processor. With exploits like spectre, pulling that out would make it easier to integrate compiler hints on security sensitive code "hey, don't spec ex here".

_chris_ · 2025-08-19T21:35:58 1755639358

> The thing holding back VLIW was compilers were too dumb

That’s not really the problem.

The real issue is that VLIW requires branches to be strongly biased, statically, so a compiler can exploit them.

But in fact branches are very dynamic but trivially predicted by branch predictors, so branch predictors win.

Not to mention that even vliw cores use branch predictors, because the branch resolution latency is too long to wait for the branch outcome to be known.

whitten · 2025-08-19T20:31:17 1755635477

I think Common Logic ( https://en.m.wikipedia.org/wiki/Common_Logic - ISO/IEC 24707:2007) would be a good addition to any effort trying to add a semantic layer to any database.

This is a good write up that doesn’t require DuckDB as it isn’t specific to a particular database.

whitten · 2025-08-17T23:21:34 1755472894

I really enjoyed seeing the tools that provide an MS-DOS ecosystem.

I didn’t know there was an open source version of the Watcom compilers and a 16-bit library to support them.

3036e4 · 2025-08-19T12:29:04 1755606544

There is still some controversy around the OpenWatcom license, preventing it from being included in Debian and possibly other places.

https://github.com/open-watcom/open-watcom-v2/discussions/27...

whitten · 2025-07-31T02:55:23 1753930523

What are the issues re the SIL OPEN FONT LICENSE ?

Description: https://opensource.org/license/ofl-1-1 Also https://openfontlicense.org/

All licenses: https://opensource.org/licenses