Hacker News new | past | comments | ask | show | jobs | submit login
On being the maintainer and sole developer of SPITBOL (2012) (daveshields.me)
334 points by Zuider on Aug 22, 2015 | hide | past | favorite | 95 comments



This is from 2012.

He's wrong of course about being the only user.

I love this software, along with k/q. I admire the work Mr. Shields has put into this project. I especially like the use of musl and provision of static binaries.

I do not use Perl, Java, Python, Javascript, Go, Rust, Closure, etc., etc. Whatever the majority of people are recommending, that is generally not what I use. It just does not appeal to me.

I guess I am stubborn and stupid: I like assembly languages, SPITBOL, k/q, and stuff written by djb. Keep it terse. Ignore the critics.

Yet this is now on the front page of HN. Maybe because it is the weekend? I really doubt that the software I like with ever become popular. But who knows? Maybe 10 years from now I will look at this post and marvel at how things turned out.

There is no "structured programming" with spitbol. No curly braces. Gotos get the job done. Personally, I do not mind gotos. It feels closer to the reality of how a computer operates.

Would be nice if spitbol was ported to BSD in additon to Linux and OSX. As with k/q I settle for Linux emulation under BSD.


Fascinating, I'm curious, what do you do with such languages and tools? I'm not trolling or trying to start a flame war, I'm genuinely curious as to the best use of these tools. Also - are you doing it as a hobbit, or commercially (or both)? How did you discover this tech?


> are you doing it as a hobbit

Thanks for that typo, I laughed immediately after drinking some tea and now it's all over my monitor :(


Bilbo Bugends


Clearly as a 'hobbit' lol


> It feels closer to the reality of how a computer operates.

In an alternate reality, high-level languages would be wired directly into our "hardware", via microcode or FPGA's or what have you. Software systems would be designed first, then the circuitry. In this alternate reality, Intel did not monopolize decades doubling down on clock speed so that we wouldn't have time to notice the von Neumann bottleneck. Apologies to Alan Kay. [0]

We should look at the "bloat" needed to implement higher-level languages as a downside of the architecture, not of the languages. The model of computing that we've inherited is just one model, and while it may be conceptually close to an abstract Turing machine, it's very far from most things that we actually do. We should not romanticize instruction sets; they are an implementation detail.

I'm with you in the spirit of minimalism. But that's the point: if hardware vendors were not so monomaniacally focused on their way of doing things, we might not need so many adapter layers, and the pain that goes with them.

[0] https://www.youtube.com/watch?v=ubaX1Smg6pY&t=8m9s


Don't we have cases of this alternate reality in our own reality? Quoting from the Wikipedia article on the Alpha processor:

> Another study was started to see if a new RISC architecture could be defined that could directly support the VMS operating system. The new design used most of the basic PRISM concepts, but was re-tuned to allow VMS and VMS programs to run at reasonable speed with no conversion at all.

That sounds like designing the software system first, then the circuitry.

Further, I remember reading an article about how the Alpha was also tuned to make C (or was it C++?) code faster, using a large, existing code base.

It's not on-the-fly optimization, via microcode or FPGA, but it is a 'or what have you', no?

There are also a large number of Java processors, listed at https://en.wikipedia.org/wiki/Java_processor . https://en.wikipedia.org/wiki/Java_Optimized_Processor is one which works on an FPGA.

In general, and I know little about hardware design, isn't your proposed method worse than software/hardware codesign, which has been around for decades? That is, a feature of a high-level language might be very expensive to implement in hardware, while a slightly different language, with equal expressive power, be much easier. Using your method, there's no way for that feedback to influence the high-level design.


Don't forget LISP machines! I think they might be the perfect example of what he was referring to.

https://en.wikipedia.org/wiki/Lisp_machine


Indeed!


I just wanted to thank you (belatedly) for a thoughtful reply. The truth is, I don't know anything about hardware and have just been on an Alan Kay binge. But Alan Kay is a researcher and doesn't seem to care as much about commodity hardware, which I do. So I don't mean to propose that an entire high-level language (even Lisp) be baked into the hardware. But I do think that we could use some higher-level primitives -- the kind that tend to get implemented by nearly all languages. Or even something like "worlds" [0], which as David Nolen notes [1, 2] is closely related to persistent data structures.

Basically (again, knowing nothing about this), I assume that there's a better balance to be struck between the things that hardware vendors have already mastered (viz, pipelines and caches) and the things that compilers and runtimes work strenuously to simulate on those platforms (garbage collection, abstractions of any kind, etc).

My naive take is that this whole "pivot" from clock speed to more cores is just a way of buying time. This quad-core laptop rarely uses more than one core. It's very noticeable when a program is actually parallelized (because I track the CPU usage obsessively). So there's obviously a huge gap between the concurrency primitives afforded by the hardware and those used by the software. Still, I think that they will meet in the middle, and it'll be something less "incremental" than multicore, which is just more-of-the-same.

[0] http://www.vpri.org/pdf/rn2008001_worlds.pdf

[1] https://www.recurse.com/blog/55-paper-of-the-week-worlds-con...

[2] https://twitter.com/swannodette/status/421347385915498496


Exactly. If the world had standardised on something like the Reduceron [1] instead, what we currently consider "low-level" languages would probably look rather alien.

[1] https://www.cs.york.ac.uk/fp/reduceron/


> There is no "structured programming" with spitbol. No curly braces. Gotos get the job done. Personally, I do not mind gotos. It feels closer to the reality of how a computer operates.

That's because it is closer, as I'm sure you know, since you stated your fondness for assembly languages. I even like them for specific, limited tasks (advanced loop control). That said, I think preferring them over more "modern" constructs such as if/while/for is sort of like disparaging all those new gas powered carriages, because you can get around just fine with your horse to power your carriage, thankyouverymuch. There are very good reasons to approach most uses of goto with skepticism.


I don't think that's a good metaphor. There are no actual horses inside any gasoline (or electric) engine. But there are gotos behind many (though not all) of these modern constructs...


There's actually a lot more implied by the metaphor than just goto and constructs built upon it. It's about the art and science of programming, and advancements in the field. I specifically didn't say car or automobile because I wanted to evoke the feeling that the "new" thing being shunned is actually itself far behind the current state of the art. For loops and if blocks aren't very new and shiny either. You know what is (for some relative value of "new" that includes coming back into prominence or finally gaining some traction)? Static code analysis. Typing concepts beyond what C offered. IDEs and tooling infrastructures to assist development. Languages that support formal proofs.

Goto is essential, it's the glue that holds the instruction set together. That said, we must not fetishize it, just as we must not fetishize items of the past that are largely superseded by what they helped create. To do so slows us down, and we fail to achieve what we otherwise could. We must not forget them either, they have their places, and to do so would also slow us down.


> But there are gotos behind many (though not all) of these modern constructs...

I'd argue that e.g. an x86 LOOP instruction is far more equivalent to a do/while loop than a goto. Most of the jump instructions I see in my disassembly aren't unconditional like goto is - if anything, car engines are closer to horses in what they accomplish than, say, jnz is to goto! Even jmp certainly doesn't used named labels, as any goto worth it's salt will use - instead you'll see absolute or relative address offsets.

>> Personally, I do not mind gotos. It feels closer to the reality of how a computer operates.

There's a time and place to get close to the hardware, but I've never felt that goto got me meaningfully closer. Of course, my first and primary exposure to GOTO was in BASIC - where it was interpreted.

You want to get close to the hardware? Play with intrinsics. Open up the disassembly and poke at it with a profiler. Find out if your algorithm manages to execute in mostly L1 cache, or if it's spilling all the way out into RAM fetches. Figure out where you need to prefetch, where you're stalling, where your cycles are being spent. Diagnose and fix some false sharing. Write your own (dis)assembler - did you know there's typically no actual nop instruction? You simply emit e.g. xchg eax, eax, which happens to do nothing of note, and label it "nop" for clarity.

IMO, you'll have more time to do these things when embracing the advantages that structured programming can provide. Of course, I may be speaking to the choir, at least on that last point.


NOP is most certainly a NOP on modern x86 CPUs. Yes, the encoding matches what would be XCHG EAX,EAX (or AX,AX or RAX,RAX) but it hasn't been that for quite some time as it could create a pipeline stall waiting for [RE]AX to be ready for the following instruction.

As for JNE not being a GOTO, it most certainly is. It just so happens to only happen under certain circumstances (along with the other conditional jumps, and yes, that's how they are described). Compare:

    IF X <= 45 GOTO THERE
with

    CMP EAX,45
    JLE THERE
Not much of a difference if you ask me. Also, the LOOP instruction is more of a FOR loop than a DO/WHILE, as the ECX register is decremented as part of the instruction.

And let me assure you, when writing assembly, you almost always use labels. A disassembly will show you the absolute/relative address because that's all it has to go by.


Gotos in spitbol can be conditional or unconditional.


I have a question: In his analogy, what represents the hardware and what represents the software? Wouldn't the change from horses to combustion engine be a change in hardware? And software might be represented by something like the reins or a gas pedal?


On both sides it's technology and advancement of the status quo. More explicitly, it's programming and personal transportation.

Oh, and my rant wasn't aimed at you, per-se, but the statement about goto which I expanded in isolation to a fictional point of view. That point of view may or may not have any relation to how you feel about programming and goto, I have no idea.


Gotos get the job done; so do NAND gates and flip-flops.


Gotos were shot down more than 40 years ago and they have never really made a comeback since. They are still used for error handling in the kernel, I've seen.


And error handling in many C applications, since in the absence of exceptions they're very useful for doing cleanup.


Exceptions is just a fancy word for gotos.


Just like functions are just a fancy word for gotos?


Functions are a fancy word for gosubs!


Java employs a limited Goto in the form of labelled continue statements. C# includes an explicit Goto statement for breaking out of loops.

Goto is definitely still out there.


The first programming language I used was WATFIV, a Waterloo FORTRAN implementation. It seemed straightforward, and if that’s all there was to programming, I probably would have dropped it in 1974 to play more D&D. FORTRAN felt like the same arithmetic I already knew how to use. It was obviously way more powerful to use a program to do calculations, but it was all stuff I could have done by hand or with a calculator.

But one day, as I waited for a keypunch to make some changes to some program or other I was writing, my eye fell upon a copy of the Green Book left behind by some other programmer. I started reading it, and my little mind was completely blown.

SNOBOL was something else, it forced me to think about programs in a completely different way. It wasn’t about specifying steps to be taken one by one, it was about designing a way to pattern match.

And more than in the obvious, RegExp way, but you could do things as the pattern was matching, and thus you were writing a kind of program where the control flow was determined my backtracking and success or failure of matches.

To this day, my programming is highly influenced by one feature of SNOBOL, I guess it “imprinted” on me: Patterns are first-class values (like regular expressions in other languages), but they are also composable, there is an arithmetic of patterns. To this day I favour designs where things can be composed and decomposed at will rather than used as opaque, monolithic entities.

I’m not saying SNOBOL was better than the FP and OOP and multi-paradigm languages that dominate today, but the experience of learning a new way to think was intoxicating, and once a year or so I re-read the Green Book and think about thinking differently.

If you are interested, I highly recommend you read the whole book. If you see “pattern matching” and think “Regular Expressions,” you will miss the forest for the trees. I’m not sure that anybody needs to know SNOBOL (or its descendants), but I think that it’s a valuable exercise to learn it once.

"A language that doesn't affect the way you think about programming, is not worth knowing.”

--Alan Perlis


The Green Book you mention is available for download on the Google Code page for SPITBOL:

https://code.google.com/p/spitbol/downloads/list


I love SNOBOL4/SPITBOL. It was a completely bizarre and original language that was quite powerful.

In the late 70s, I took two compiler courses with RBK Dewar, one of the creators of SPITBOL. Those courses were wonderful. He mentioned SPITBOL occasionally, and I remember one story in particular. The implementation was done in assembler, (if I'm remembering correctly), and it took 20 runs to get a working system, (I guess that means a basic suite of tests running successfully). That style of working is completely alien today, and arguably less effective.

Dewar also spent some time talking about his work on the SETL language (for processing sets). Flow analysis for global optimization could be expressed extremely concisely, and was of course applied to SETL itself.


That's the language the first Ada compiler was written in. Anyone knowing about the rough history of Ada compiler already has a reason to look at the language that got the job done. ;)


The SETL language is still available. There is an enhanced version written in Java called setlX

http://randoom.org/Software/SetlX


I took some undergrad courses with Prof Dewar in the 90s and all he talked about was Ada and Assembly (using an assembler he wrote). Never heard of SPITBOL. His class was great.


Forgot to mention it in my original comment, but in the grad-level compiler course, we actually wrote a compiler in SPITBOL. I don't remember if the language choice was optional or mandatory, but I did enjoy it.


I sort of understand it, as he was a pretty major figure in the Ada world. But no mention of SPITBOL at all? :(


Since this is all about obscure languages, it maybe worth pointing out here that the original INTERCAL compiler was written in SPITBOL.

On one or two occasions I asked Don Woods to clarify some feature of the language that was incompletely described in the original INTERCAL document, and he dug out the original SPITBOL code in order to answer my question.


I was project manager over a FORTRAN project in the early 1980's. I remember very clearly how our team's productivity went up as well as satisfaction when the developers starting using the Ratfor (Rational Fortran) preprocessor. https://en.wikipedia.org/wiki/Ratfor

At the time I did not realize that Brian Kernighan created Ratfor. Ratfor changed my thinking about program structure and coding style more than any other single event in my professional life.


If you're searching for examples, search instead for SNOBOL.

From Wikipedia: > SPITBOL (Speedy Implementation of SNOBOL) is a compiled implementation of the SNOBOL4 language.

Also from wikipedia SNOBOL page shows some examples: https://en.wikipedia.org/wiki/SNOBOL

And C2 talks about it as well: http://c2.com/cgi/wiki?SnobolLanguage


> Can you name a widely-used contemporary programming language that still uses the 60’s software technology of reference counts to manage storage?

As opposed to the 50's software technology of garbage collection?

What an off-putting remark to include.


Don Syme, creator of the F# language, wrote to an OCaml mailing list about how hard it was to get generics into the CLR. How MSCorp viewed it as "academic" and "experimental". Despite that what he was doing was basically implementing ideas from the 70s into a modern system.

It's strange how anyone would look back and think the language ideas from back then are all useless. Or that there's significantly new stuff going on now in mainstream implementations.


   Don Syme, creator of the F# language, wrote to an OCaml mailing list
   about how hard it was to get generics into the CLR. How MSCorp
   viewed it as "academic" and "experimental". 
Link?


http://caml.inria.fr/pub/ml-archives/caml-list/2006/11/d921c...

Don Syme's blog has some more details on how deeply MSR got involved to ship generics. MS Corp simply wasn't capable or interested. This gets revised when talking to folks today - claims that oh yeah, Delphi++ was always gonna do generics right. But in truth, it seems they weren't going to make it without MSR's (F# team overlaps with the generics team) help.

http://blogs.msdn.com/b/dsyme/archive/2011/03/15/net-c-gener...

It's incredibly frustrating to see, even as an outsider. Having the patience, tact, and political savvy to pull this off? It's pretty impressive. But did MS learn? Nope. F# is still a second-class citizen, with only token support. F# still isn't marketed as "generally better than C#", but still aimed at "niche" users. It's sad.

MS language tech is pretty much stagnated for 8 years now. Feels like IE all over again. They trounced their major competitor (Java), so now they can kick back and add minor stuff here and there because there's no pressure.


Interesting, thanks. I didn't understand how Don Syme needed to convince Microsoft since he is part of Microsoft.

From the blog:

> Generics for .NET and C# in their current form almost didn't happen: it was a very close call, and the feature almost didn't make the cut for Whidbey (Visual Studio 2005). Features such as running CLR code on the database were given higher priority.

> ...

> Ultimately, an erasure model of generics would have been adopted, as for Java, since the CLR team would never have pursued a in-the-VM generics design without external help.

I have no particular reason to doubt his perspective - and I certainly don't have any inside knowledge - but a feature not making the cut for version N doesn't always imply that a shitty version would have been implemented in version N+1.

> The world is as it is today because we act when we have the chance to act, and because we invest in foundations that last the test of time.

Could this be a case of the victors getting to write the canonical history?

   MS language tech is pretty much stagnated for 8 years now. Feels
   like IE all over again. 
Nah. Sure, C# 6.0 is largely smaller tweaks here and there, but by all measures the team has been busy with Rosyln, which is huge.

Out of curiosity, are there language features that you think should be added to C#? For C# to be like IE, the world would need to have moved on and C# would have to be behind in this new world. It doesn't seem like that's the case. And compared to Java, C# moves at a pace that's absolutely breakneck. What has Java gotten in the pat 8 years? Crappy not-closures?

   They trounced their major competitor (Java)
Microsoft may have trounced Java-on-Windows, but overall Java is more popular than C# by a significant margin.


Edit: This is probably too negative of a comment. It's just years of frustration with MS coming out, that's all.

I've not got much inside knowledge. I was an MVP 2003-2005 on C# then CLR and Security. I wasn't very knowledgeable back then, but I don't recall any push for FP style, at all. I think the fact that C# originally didn't have lambdas, then added them with an 8-character keyword says enough.

Implementing better generics later? I doubt it. There's been no CLR changes since v2, as far as the type system or IL goes. So that's 10 years, no additions, just added tweaks here and there. Hell, even now, .NET Native relies on source-level transformations, instead of being implemented at the IL-level.

They've been hyping Rosyln. Great. One famous problem with C#'s compiler is that its design made it very hard to add type inference consistently to the language[1]. They rewrote the compiler, did they fix that? Nope. Even worse: My watch says it's 2015, but VS/C# still doesn't ship a REPL. Come on. (Yeah maybe "C# Interactive" will show up some day now that 2015 has RTM'd. But not today.)

The core of my complaint is that they have a mindset of implementing things as hard-coded scenarios, versus general purpose features. Async. Dynamic. Duck typing. Operators. Even the lambda syntax being ambiguous between expressions and code. Why? Cause they choose an end-user scenario, e.g. LINQ, then implemented just what they need to get that scenario done. That lacks elegance. It adds conceptual overhead.

Java has more popularity because MS decided to shun non-Windows. Essentially no one prefers Java-the-language over C#, but Windows-only is a nonstarter in many cases. My IE comment is saying that, like IE, MS has removed resources and the drive to seriously improve its language tech, as there are no real competitors in their space, language-wise.

1: http://blogs.msdn.com/b/ericlippert/archive/2009/01/26/why-n...

P.S. I still think C# is a good language in relative terms and they have brilliant people doing great work on it. And the polish of the tooling - wow, yeah it's amazing. I'm just disappointed that MS doesn't seem to be interested in really upping-the-ante and being a leader here. F# is basically a "best-of-breed" language that'd put them solidly ahead, yet they neglect it.


   Edit: This is probably too negative of a comment. It's just years
   of frustration with MS coming out, that's all.
Par for the course, I think. :)


> I didn't understand how Don Syme needed to convince Microsoft since he is part of Microsoft.

You haven't had to convince other engineers in your company to adopt a new practice or management to build a new structure?


Of course I have - including when I was at Microsoft.

The difference is that I don't refer to my employer in the third person. I thought the phrasing was strange. It confused me why he needed to convince Microsoft ("MSCorp") since he is a part of Microsoft.

Reading the blog post linked in the comment I was replying to brought the needed clarity: he was talking about convincing Microsoft-not-Microsoft-Research ("MSCorp") while he was at Microsoft Research.


Yes, age here is irrelevant.

At the very least, Objective-C, Swift and PHP still use reference counting.


And C++11 just got reference counting as an official library option.

Reference counting is better than mark-and-sweep GC for several use cases:

* Real time code where you don't ever want a GC to steal cycles. I know that a lot of research has been done to decrease the amount of time stolen, but it's always non-zero.

* Immediate, deterministic clean-up of resources as soon as they are no longer referenced: If you have a limited number of file handles, for instance, you want them closed ASAP, and not when some GC decides it's time.

* No performance penalty for having weak references. I use this in asset management: A map of asset handles to weak references to currently loaded assets. If an asset is no longer used, it's deallocated immediately. Having weak references in a GC system can increase processing complexity.


> * Real time code where you don't ever want a GC to steal cycles. I know that a lot of research has been done to decrease the amount of time stolen, but it's always non-zero.

Real time code shouldn't allocate/deallocate memory, much less from a GC'able pool. With that constraint, it's possible to have real time code that coexists with a GC, such as RTSJ's NoHeapRealtimeThread or Eventrons, with an effective cost of zero cycles taken by GC from the realtime code.


Not talking "hard real time" here, but game code where you want to prevent frame glitches.

In C++ you can also replace the allocator to pull items from a pool, so that the "allocation" is "grab the first item from the pool linked list" and "deallocation" is "link this item back into the pool." The first case costs two pointer reads and one write, the second case costs two pointer writes and one read.

This lets you use objects as if they're dynamically allocated while still keeping very costs in allocation/deallocation.


Immediate cleanup is not necessarily an advantage. For example, it makes releasing a reference inside a critical section very dangerous, because you have no idea how much, or even what, work is going to be done. (Don't ask me how I know.)


I feel your pain.

Critical sections are really designed to wrap only a few lines of code. Basically nothing nontrivial should be done within a critical section, IMO.

If you're dealing with multithreading, the only safe thing to do with references is to put them in a list of "things to release later." And then do that from the main thread.

GC does make this easier, sure. But creating a "release list" is not hard. Making a GC not stall the program at awkward times is actually a lot harder.


Reference counting steal cycles all over the place. The counting itself steals cycles, and the effect of dropping a reference is unpredictable: whichever module happens to drop the last reference to an object which is the last gateway to a large graph of objects will suffer a storm of recursive reference count drops and deallocations.

If you have a limited number of file handles, you may want them closed ASAP, and not when some reference-counting mechanism or GC decides. Reference counting is not ASAP. Typically, you have some smart pointers which will drop the reference in relation to some lexical scope. That could be too late: the file handle object could be in scope over some long running function, and so the file is held open. The fix is to call the close method on the object, and then let refcounting reap the object later. (Woe to you if you decide to manually take over reference counting and start hand-coding acquire and release calls. Been there, debugged that.)

I implemented weak hash tables in a garbage collector; it's neither complicated nor difficult. Under GC, we use weak referencing for essential uses that requires the semantics, not as a crutch for breaking circular references.


The effect of dropping a reference is sometimes predictable. For example, Color cannot root a large object graph, so dropping a reference to Color will deallocate at most one object. At least it does not require nonsense like Android's allocation-in-onDraw warning.

I worked on the now-deprecated GC for Cocoa frameworks, and we made heavy use of weak references for out-of-line storage. This put us at risk for cycles: if A references B through a weak-to-strong global map, and B in turn references A, we have an uncollectible cycle even under GC. This represented a large class of bugs we encountered under GC.

So both GC and RC have their classes of cycles and unpredictable behavior. I've come to believe that these techniques are more related than we'd like to admit, and the real difference is in their second order effects. For example, GC enables lockless hash tables, which require hazard pointers or other awkward techniques under RC. On the other hand, RC enables cheap copy-on-write, which is how Swift's collections can be value types.


>The counting itself steals cycles

An atomic increment/decrement takes so little time as to make this irrelevant. If you're in such a tight loop that you care about a single increment when calling a function (to pass a parameter in), you should have inlined that function and preallocated the memory you're dealing with.

I'm talking about general use of smart pointers, which means that there's a function call involved with the smart pointer value copy, and throwing an increment in is trivial by comparison.

>whichever module happens to drop the last reference to an object which is the last gateway to a large graph of objects

When writing games, I don't think I ever had a "large graph of objects" get dropped at some random time. Typically when you drop a "large graph" it's because you're clearing an entire game level, for instance. Glitches aren't as important when the user is just watching a progress bar.

And you can still apply "ownership semantics" on graphs like that, so that the world graph logically "owns" the objects, and when the world graph releases the object, it does so by placing it on the "to be cleared" list instead of just nulling the reference.

Then in the rare case where something is holding a reference to the object, it won't just crash when it tries to do something with it. In this rare case a release could trigger a surprise extra deallocation chain, as you've suggested.

If that's ever determined to be an issue (via profiling!) you can ensure other objects hold weak references to each other (which is safer anyway), in which case only the main graph is ever in danger of releasing objects -- and it can queue up the releases and time-box how many it does per frame.

Honestly having objects reference each other isn't typically the best answer anyway; having object listeners and broadcast channels and similar is much better, in which case you define the semantics of a "listener" to always use a weak reference, and every time you broadcast on that channel you cull any dead listeners.

Aside from all of that, if you're using object pools, you'd need to deallocate thousands, maybe tens of thousands, of objects in order for it to take enough time to glitch a frame. Meaning that in typical game usage you pretty much never see that. A huge streaming world might hit those thresholds, but a huge streaming world has a whole lot of interesting challenges to be overcome -- and would likely thrash a GC-based system pretty badly.


Reference counting in C++ can be done really well even in the absence of language support.

For example, with the Qt library you can pass objects around by value, yet behind the scenes everything is reference counted with automatic copy-on-write. It's the best of all worlds. You get easy, value-based coding (no pointers), speed (because deep down everything is a reference), and deterministic destruction (no GC). http://doc.qt.io/qt-5/implicit-sharing.html

I'm curious if any languages have adopted a Qt-style approach natively.


I know for sure the K (of the J / K / APL) family does, and I suspect they all do.


Yes, Swift works this way. Arrays, dictionaries, and sets are value types and cannot be implicitly shared, which eliminates a large class of bugs. It's kind of amazing, isn't it!


> For example, with the Qt library you can pass objects around by value, yet behind the scenes everything is reference counted with automatic copy-on-write.

PHP does exactly this, but for arrays and strings only (objects are implicitly passed by reference). So you can pass arrays by value with no performance penalty, as they are actually passed by reference. A COW mechanism ensures you end up with a local copy only if you write to an argument; such mechanism is disabled when passing arguments byref.


This is incorrect due to the existence of reference cycles.

You can easily create a cycle of objects not reachable by any of your roots in your object graph. The ref counts won't ever reach 0 so to collect it you still need a gc pass periodically, imposing stop times. For the same reason you must never rely on ref counting to clean up file objects etc.


Sorry, but that's nonsense. Cycles don't just magically appear, you write them.

So writing refcounting code simply means being aware of this when designing the more complicated data structures in your code to use weak backreferences.

File objects are not secretly stashed in complicated graphs to prevent their destruction and you very much can rely on their behavior. GC passes to clean up cycles is something you got confused about: that's what GC does (because unrooted cycles are very much an issue there too!), not refcounting where you always have to break the cycles yourself, manually, preferably when designing the data structures.


> Cycles don't just magically appear, you write them.

That's like saying memory leaks doesn't magically appear, you write them. In real code, ref cycles are everywhere and it is not trivial to know beforehand what code will generate cycles. And don't give the spiel about how that's only something that affects bad programmers.


All modern Objective-C codebases and a high percentage of large C++ codebases use refcounting without a cycle collector. Some of them have memory leaks, and there are some quite ugly cases like [1], but most of them have it under control. So it's not like relying on pure refcounting is impractical in the real world; it's just a bit more difficult to deal with than full GC.

[1] http://albertodebortoli.github.io/blog/2013/08/03/objective-...


As does Python.


As does Rust, though it prefers unique ownership.


To be clear, Rust offers a reference-counted type in the standard library, it's not part of the language itself.


A time-sharing multi-user system with demand-paged virtual memory is also 60's technology.


A few years back, Dr. Dobb’s posted a pretty great series of articles on some of the techniques RBK Dewar employed when building the original SPITBOL compiler: http://www.drdobbs.com/cpp/some-programs-are-poorly-designed...

(I was lucky enough to study compilers under Prof. Dewar when I was a grad student at NYU - I still have my notes on SPITBOL’s architecture, somewhere…)


Excellent article, very clever optimizations given the tradeoffs. From the article:

> We were talking about students' tendency to let the compiler substitute for thinking

This is actually why I use OCaml. Not going to comment on whether useful error messages are a detriment to pedagogy, but offloading thinking to the compiler is a lifesaver.


Definitely. It doesn't prevent, say, off-by-one errors (and in general, errors when you manipulate several items of the same type), but if you structure your program correctly, you can be very confident about your code.

Though OCaml's errors are not always the most explicit or nicely written.


I first encountered SNOBOL when I was 15, running an implementation on an old IBM 360 at university. It was the 3rd language I mastered (after Fortran and an early T/S version of Basic...both on the same 360) and the first I completely fell in love with.

It awoken me to just how different and amazing a programming language could be and bent my mind around something very different than what I'd been doing with Fortran. It was an entirely new way to think about designing solutions.

Years later when I was introduced to Prolog, everything thing felt very much at home...Prolog's backtracking algorithm being very much like SNOBOL's pattern matching system.

Of all the languages I've worked with over the years, SNOBOL and FORTH are in class by themselves for how they informed my thinking about problem solving...lessons I carried with me in work done in many other languages.

It's a shame that both languages have passed into history...they each had subtle things to teach a developer just learning their craft...


I think parser combinator libraries are pretty similar. In Haskell, people often use Parsec instead of regular expressions.


> SPITBOL is unique in the power it provides to manipulate strings and text. I’ve yet to see anything else come close.

Can anyone elaborate on this?


From Wikipedia entry on SNOBOL/SPITBOL

'SNOBOL4 stands apart from most programming languages by having patterns as a first-class data type (i.e. a data type whose values can be manipulated in all ways permitted to any other data type in the programming language) and by providing operators for pattern concatenation and alternation. Strings generated during execution can be treated as programs and executed.'

By contrast, other languages such favor the use of regular expressions.

I am not so sure about the claim to uniquness. Ralph Griswald went on to develop the Icon language which included this feature. The Unicon language, a superset of Icon also has this distinction.


I'm having a hard time understanding the parenthetical. If you grok it, and could take a moment to explain it in your own words -- or give an example of its use -- I'd very much appreciate it. Thanks!


If you have a pattern that matches FIRSTNAME and another that matches LASTNAME, roughly:

    FULLNAME = FIRSTNAME LASTNAME
Whitespace is a catenation operator, and catenating two patterns gives you a new pattern that matches a string matching the first and the second pattern.

Maybe you want to handle old-school people like raganwald:

    OLDSCHOOLNAME = LASTNAME ‘, ‘ FIRSTNAME
Or both:

    ANYNAME = FULLNAME | OLDSCHOOLNAME
The vertical bar is an alternate operator.

Regular expressions work the same way, but in most languages, the regular expression language is really a DSL embedded in the syntax for a regular expression literal. Whereas in SNOBOL, all those operators are ordinary language operators and you can use them anywhere.

So you can manipulate patterns programatically.


How does that differ from a parser combinator library beyond that it's baked into the language? It took a few decades for other languages to catch up, but patterns being first-class objects that can be combined in various ways isn't that unusual. For example, in Lua with LPeg those examples would be:

    full_name = first_name * last_name
    old_school_name = last_name * P', ' * first_name
    any_name = full_name + old_school_name

    (* is sequence, + is or, P is a function that converts a string to a pattern)


While I don't remember enough Snobol to say, in Icon every expression may participate in the pattern-matching search process. For example, `a < b` doesn't produce a boolean value, it either fails (cutting off the current branch of the search) or succeeds (producing b's value as its value, so you can write `a < b < c` with the conventional meaning without any special handling of that form of comparison).

That's the kind of way that patterns are more deeply baked into these languages.


It is similar, but in Snobol, it is a core language feature.


Oh, neat.

Perl 6 appears to have assimilated that idea: http://doc.perl6.org/language/regexes#Subrules


I don't think that's quite the same, as Perl 6 isolates the grammar DSL to grammar blocks, but it would be trivial to make something in Perl 6 that is equivalent, if there isn't already, by defining operators between tokens and grammars in Perl 6 to accomplish the same thing.

IIUC grammars are also first class in Perl 6, but it isolates their DSL to grammar blocks. I'm not sure of the specifics of each to note whether one is capable of easily doing something the other can't or has a hard time with, but it looks to boil down to SPITBAL's implementation being slightly easier to access as there's no grammar block required, and Perl 6's being slightly more clear and self documenting, due to that same requirement.

Note: I've yet to use either, so someone with more experience, possibly you, might be able to correct my misunderstandings.


I think it's saying that "patterns" are themselves values that have operations you can perform on them. For example, adding to patterns, or subtracting from them, or concatenating them, etc.

I don't know SNOBOL though, so I'm having a hard time picturing what the actual implications of that are, or exactly how it works. But I'm intrigued enough now that I want to go read this "Green Book" and see what it's all about.


This makes me think of a generic assembler I wrote at one point (in C++ I think, I should put it on github). The idea was that you could define the instruction set right in the assembly source. It included a backtracking BNF parser to support it with these pseudo ops:

   .rule name value pattern  ; Define a syntax rule

   .insn pattern    ; Define an instruction
       ...            ; Macro expanded on pattern match
   .end

   "pattern" contains literal characters, whitespace and
   references to other rules with "<rule-name>" or <expr>
   for a math expression.

   "value" is a comma separated list of expressions which
   can contain "argN" to reference the Nth value from the
   pattern (as returned by embedded rules).

    For example, this is how you could construct the
    instructions "lda <expr>", "lda #<expr>", "ldb <expr>",
    and "ldb #<expr>":

   .rule dual 0x01 lda
   .rule dual 0x02 ldb
   .rule mode 0xf8,arg1 <expr>
   .rule mode 0xfa,arg1 #<expr>

   .insn <dual> <mode>
      .byte arg1|arg2  ; Emit op-code
      .word arg3       ; Emit argument
   .end
SNOBOL4 itself is not an assembler, but I think you could make one like this from it.


This is a very interesting idea, and I have been finding inspiration in it. New machine architectures by way of #include files! Or cat! I've been thinking that maybe the right compile-time programming model is something like term-rewriting (more or less like C++ templates or Prolog or Aardappel) rather than textual substitution. I wrote some more thoughts on the matter at https://lobste.rs/s/kfpsou/what_is_everyone_working_on_this_... but I still haven't gotten very far on it.


Just found this on wikipedia: SnoPy, snobol pattern matching for python.

http://snopy.sourceforge.net/user-guide.html


SPITBOL is an enhanced version of the SNOBOL language.

From Wikipedia:

SNOBOL (StriNg Oriented and symBOlic Language) is a series of computer programming languages developed between 1962 and 1967 at AT&T Bell Laboratories by David J. Farber, Ralph E. Griswold and Ivan P. Polonsky, culminating in SNOBOL4.

https://en.wikipedia.org/wiki/SNOBOL

TkS*LIDE is a Tcl/Tk based IDE for SPITBOL (along with SNOBOL4). Binaries for SNOBOL4 and SPITBOL are included with the IDE, along with a tutorial and sample programs.

http://rms.republika.pl/slide.html

SNOBOL is also distinguished by being described in Guy Steele & Richard Gabriel: 50 in 50 speech as one of the three languages worth knowing:

https://vimeo.com/25958308


> I have several reasons to push on:

> SPITBOL is unique in the power it provides to manipulate strings and text. I’ve yet to see anything else come close.

I would be interested to know more about what features SPITBOL offers for string processing. I'm going to take a look at the "Green Book" [1] Dave mentions, but if anyone else has relevant focused resources on that topic I'd love to give them a look.

[1] https://code.google.com/p/spitbol/downloads/detail?name=Gree...


I was thinking about why I liked SNOBOL4/SPITBOL so much. The bizarre syntax and control structures appealed to me for some reason. If you haven't used the language, each statement can succeed or fail. At the end of the statement, you can optionally specify a goto target for success and for failure. There are no higher level control structures that I remember (except for functions), so indenting is not something you (OK, I) normally did. This resulted in very clean looking source code, for some reason. I just remember that impression very strongly. But that doesn't really explain the appeal completely.

The string processing is fantastic -- extremely powerful. I think regexes have a lot of the same power, but I always found SNOBOL4 more readable after the fact, when I had to go back and read and fix my own code. But that's not it either.

I think the main reason I liked SNOBOL4 so much was that it was the first dynamic language I used. Values have types, variables do not. That was a big revelation. I don't think I actually exploited it very often, but it was a cool new idea. And the absence of type declarations also contributed to the sense that the code looked clean. Automatic memory management was also very nice. I had spent a lot of time dealing with memory management in C. I must have in Pascal also. And I really don't remember what Algol-W did for memory management -- a free or delete statement maybe? And of course in FORTRAN, COBOL, and BASIC, there was no dynamically allocated memory at all, so you had to guess high, keep track, etc. Not having to worry about tracking memory was a nice change.


It's unfortunate the article doesn't have any examples to show off SPITBOL's expressiveness, or have any benchmarks to show how fast it is compared to solutions in other languages. Reading the comments here, it seems like it could have some decent benefits.

Would be interested to know what built-in types are currently available. I wonder also whether this language would also be a good fit for test case writing.


On the subject of SNOBOL descendants still in use, there is also Snowball, designed and used exclusively for writing stemming algorithms:

http://snowball.tartarus.org/


I used SNOBOL4 in the 80s. It was beautiful for parsing.


https://github.com/spitbol/x64/blob/master/demos/sentenc.sbl

In Python, you would probably use regex for the pattern matching. In SPITBOL, you can accomplish the task at the language level. I doubt the pattern matching is as capable as regex but that's a useful feature to have (edit: based on braythwayt's comment, it sounds like the pattern matching is more capable than regex). It might be better suited to NLP tasks. According to the developer, "SPITBOL is unique in the power it provides to manipulate strings and text. I’ve yet to see anything else come close."

See Zuider's comment for more information.

I hope all-caps-keywords is optional.


Wikipedia says that the parsing can handle CFGs, which means it is strictly more powerful than regex.

"SNOBOL4 patterns subsume BNF grammars, which are equivalent to context-free grammars and more powerful than regular expressions." - https://en.wikipedia.org/wiki/SNOBOL


A closer alternative in Python is using pyparsing, which provides similar syntax. Since Python's operators can be re-implemented by the object where they're being applied, you can get pretty native-looking code.

https://pyparsing.wikispaces.com/file/view/simpleSQL.py


> If you know of anyone else who is maintaining an open-source implemenation of a programming language that has only one user, please let me know [...]

The author of HolyC and TempleOS, Terry Davis, might be in a similar position:

http://www.templeos.org/Wb/Doc/HolyC.html

Though I imagine with the exposure it's gotten over the years, Terry might not be the sole user (probably sole regular user).




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: