Hacker Newsnew | past | comments | ask | show | jobs | submit | dekhn's commentslogin

It's not completely correct, though- "race" as we currently classify it has a strong correlate to genetic background and self-identified race is often used as a proxy for genetic background.

Even if he was "into eugenics", there is strong evidence that your genetic makeup contributes significantly to your longevity.

No, Mullis wrote the Nature paper on time reversal due to the LSD trip (https://www.nature.com/articles/218663b0)

From the Wikipedia page

> During a symposium held for centenarian Albert Hofmann, Hofmann said Mullis had told him that LSD had "helped him develop the polymerase chain reaction that helps amplify specific DNA sequences".


both of them were jerks.

It's also incomplete and incorrect. It was Gosling's photo, he did the work for Franklin. And she had already shared the results in a department seminar before Wilkins showed it to W&C. And she was credited for this in the W&C paper in Nature.

She was credited, see the original W&C paper: https://www.nature.com/articles/171737a0 at the end is an acknowledgement. She also has a related article in the same issue of Nature.

I wouldn't be so sure that Franklin would have figured out that DNA was an antiparallel double helix. She knew it was a helix from the fibre diffraction pattern, but I don't think just anybody would have had the insight W&C did about it being a double helix and antiparallel, which immediately suggests a possible copying mechanism for the genetic material. However, we can't know for sure.

Edit, in re-reading https://www.nature.com/articles/d41586-023-01313-5 I see that she did suspect the DNA structure contained multiple chains. So my statement about about the double helix aspect was incomplete/incorrect.


The best I've read is "The Eighth Day Of Creation" (which is amazing book beyond the part that covers the elucidation of the structure of DNA). He references multiple internal data sources that establish the process by which Gosling's photo made it to Watson and Crick. Of all the accounts I've read, it seems to be the most factual. I think it's also worth reading Watson's account ("The Double Helix") and the book that originally brought the most attention to the treatment of Franklin ("Rosalind Franklin: The Dark Lady of DNA")

I believe this article has some updated results: https://www.nytimes.com/2023/04/25/science/rosalind-franklin... and it appears there was an earlier book before Dark Lady, referenced here: https://www.nytimes.com/1975/09/21/archives/rosalind-frankli...


I wrote a very simple SMILES parser using pyparsing https://github.com/dakoner/smilesparser/tree/master I wouldn't say it's intended for production work, but it has been useful in situations where I didn't want to pull in rdkit.

I see you include the dot disconnect "." as part of the Bond definition.

You also define Chain as:

  Chain <<= pp.Group(pp.Optional(Bond) + pp.Or([Atom, RingClosure]))
I believe this means your grammar allows the invalid SMILES C=.N

I expect the same thing will happen here as happened when Google X said they were looking into space elevators. They'll conclude it won't be feasible because of already-known engineering challenges for which there are no technical solutions that are economically viable. The big barrier for space-based data centers is heat exhaust, which does not work well in space and typically requires enormous radiators. They allude to this in the paper but just gloss over it.

Has anyone ever deployed a heat pump in orbit to increase the radiator temperature? T^4 is a pretty steep curve. If your usual rejection temperature was 77 C / 350 K, and you could bump that up to 227 C / 500K, your radiator goes down to ~1/4 of the required area. But then you need more solar panel area to power the heat pump. And 1/4 of the area is only ~50% smaller in linear dimensions. Maybe it doesn't pencil out. Maybe instead of powering the heat pump with PV panel electricity, it is mechanically driven by a heat engine driven by concentrated sunlight. 1000 K warm end, and 500K cold end. But then you need to dissipate the waste heat from the engine as well. Fun to think about anyway.

Heat pumps are used a lot in space already. They do not address data center scale heat removal.

What is a good rejection temperature to use for medium earth orbit? Assuming you would orient your radiator normal to the incoming light from the sun, and normal to the earth-shine/earth-radiation. And in the shade from your PV panels. Is 50 Kelvin a reasonable temperature? Then you could radiate 1 GW of thermal energy at 350 K using a 2.5e6 m^2 surface (assuming 0.9 for emissivity). That is a square about 1.55 km on a side. Maybe reversible computing is step one in this plan.

Ah, for a radiator, you can use both sides; one side exposed to the solar system north, and the other side facing south. So the 1.55 km on a side gets reduce by a factor of sqrt(2), to 1.1 km on a side.

Interestingly enough, to get 1 GWe, using 30% efficient PV panels and assuming 1360 W/m^2 insolation, you'd also need ~2.5e6 m^2 of PV panels.

It's still being argued if you really need SELFIES, or if SMILES autoencoders can be trained to only generate valid molecules, or if generating invalid molecules is useful (I'm in camp SELFIES, but I also want better ways to represent and learn on graphical chemical structures, ratehr than serialized strings).

can you guys explain what makes SELFIES robust? I'd only heard of SMILES until this thread, but I have been out of this space for 10 years.

Let me start with an example- some time ago I worked on a VAE that encoded and decoded SMILES strings. The idea is that you should be able to encode a SMILES into an embedding space, do all the normal things you would do in that space, and then convert the resulting embedding vector back to a valid molecule.

The VAE is trained with a very large number of valid SMILES strings, typically tokenized at the character level (so "C" is a token, and "Br" is "B" then "r"). I and others have observed that VAEs trained like this produce large number of embedding vectors that do not decode to valid SMILES strings- they have syntax errors, or perform chemical alchemy (personally, I saw the training set had Br (bromine) and Ca (Calcium), and the output molecules sometimes were Ba (barium) even though that's not in the original dataset at all.

There are other reasons why the tokenizer produces bad results- only about 1-10% of vectors decode to valid molecules. Invalid SMILES are mostly useless- they don't correspond to actual structures.

To respond to this, the SELFIES format makes a few changes so that it is effectively impossible to produce invalid SELFIES stringes when decoding a VAE. Among other things, tokenization matches the actual elements and so the model will only ever output valid elements.

I believe this is the SMILES paper that my own experiments were based on: https://arxiv.org/pdf/1610.02415 (see https://github.com/maxhodak/keras-molecules for an open source attempt at implementation)

And this is the paper introducing SELFIES: https://arxiv.org/abs/1905.13741 (open source packages for working with SELFIES, and some example training scripts https://github.com/aspuru-guzik-group/selfies see "Validity of Latent Space in VAE SMILES vs. SELFIES for more detail on the robustness).

BTW, as a side note: even though we put a bunch of effort into duplicating the original SMILES VAE, it was extremely slow to train and not very useful. Now you can just ask Gemini to write a full SELFIES VAE and train it in less than a day on a conventional GPU (thanks pytorch transformers!) to get a decent basic set of embeddings useful for exploring chemical space.


Thanks, that's very interesting! Naive question, but why couldn't you force a specific tokenization scheme on SMILES? Specifically, just one token per element? I understand SELFIES does more, but your example of Ba/Br made me wonder.

I asked the authors of the original SMILES paper and they didn't have a good answer. I wrote a parser for SMILES so I could tokenize that way but never followed up, and eventually SELFIES was announced.

Thanks!

Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: