Why is it that languages like this don't scale? It's not the first time I see a powerful language that got forgotten. Other examples include SmallTalk and Common Lisp (tiny community).
It is because some languages are "too powerful"? What does that say about our industry? That we're still not that advanced of a specie to be able to handle the full power of such languages?
I say that because it seems languages that are "dumbed down" seem to absolutely dominate our world (Python, Ruby, JS, etc.)
One simpler explanation: in forth you are forced to keep the stack, and modifications to the stack, in your short term memory, albeit only really three numbers in most cases. Whereas with C et al you simply look down the page at the variables, far less taxing on your short term memory.
Well-written and designed high-level forth words often transcend that and tend to be, quite literally, readable however, in a way that is incredibly rare to see in C et al. Of course the argument is that other programmers shouldn't be expected to see the problem in the way the original problem solver did.
This is probably why you see things like locals get used a lot as modern Forth programs grow. It doesn't have to be brutal early days Chuck Moore genius programs, but I guess you start getting away from the original ethos.
I think even with locals you're still mentally dealing with a few items on the stack in each word usually. But, yes, locals do help you from passing around items from word to word: you see the usage of the local far easier than you see the location of the stack elements.
Forth was an excellent way to write a powerful and expressive programming language that could self-host with a bare minimum of assembly language "bare metal" programming.
The fridge-sized computer that Forth was originally developed on had double-digit kilobytes of memory (maybe 8192kwords, with 16-bit words) and clocked instructions through at a whopping 300kHz or so. The microcontroller that drives the Caps Lock LED on your keyboard is a hundred times faster with a hundred times the memory.
These days we do not need to squeeze editor, compiler, and target binary into such a tiny machine. If you're developing for a microcontroller you just use C on your "big" computer, which is unimaginably more powerful.
In the olden days of the 1990s I used a development system for embedded stuff that was written in and targetted Forth on a Z80 with a whopping 64kB of RAM and 5.25" floppies, but that was at least ten years old and five years out of date at the time.
You're probably reading my words on a slice of glass the size of half a sandwich that contains more computing power than existed in the whole world when Forth was first written.
It's a shame because writing something like Forth from the ground up (and I mean, assembly code to load the registers to start the ACIA to begin transmitting text to the terminal) perhaps in an emulated early 80s home computer is a great way to get a sense of what the chip behind it all is doing, and I feel that makes you a better programmer in "real" languages like Go or Python or C.
Find an existing implementation that runs on some computer you already have, or have an emulator for.
Then find a computer you're really into, and port fig-Forth to it, just for fun. Don't copy the source across, type it in with your own changes as you go.
Edit: Don't forget to have fun. That's the most important thing. You're doing this because you *can*, and just to see what will happen.
I was lucky, early in my career, to work at a place which used a lot of Perl and to read Damian Conway’s book, Object Oriented Perl. It was an amazing, mind-expanding book for me. It was filled with examples of different approaches to object-oriented programming, more than I ever dreamt existed, and it showed how to implement them all in Perl.
So much power! And right in line with Perl’s mantra, “there’s more than one way to do it.”
Unfortunately, our codebase contained more than one way of doing it. Different parts of the code used different, incompatible object systems. It was a lot of extra work to learn them all and make them work with each other.
It was a relief to later move to a language which only supported a single flavor of object-oriented programming.
What I heard is with Forth, basically no 2 environments are alike, but highly customized, meaning every forth programmer creates his own language in the end for his custom needs.
So collaborating is a bit hard like this. The only serious forth programmer that I know, lives alone in the woods doing his things.
So from a aesthetic point of view, I really like the language, but for getting things done, especially in a collaborative way?
But who knows, maybe someone will write the right tools for that to change?
This is not a real issue, because the same thing can be said about C. No two C projects are the same, each has its own set of libraries, macros, types, etc.
I think the main problem is that Forth systems don't have a standard way of creating interfaces like C and other languages have. So the diversity of environments becomes a big issue because it's difficult to combine libraries from different sources.
Have you tried collaborating with Forth? There's a lot of documented history of people doing so in industry when it was actually used, and more recently I've usually found Forth codebases approachable and easy to follow.
Personally I think this is the pay-off for writing the code in the first place because Forth is very difficult to write in a clear way, if you actually manage to do it you've probably made it very clear to follow because otherwise it's hard to finish your project and make it work at all.
I don't think "power" is really that helpful a metric in determining how useful a programming language is. If you think of programming from the standpoint of trying to specify the program you want out of all of the possibly programs you could write, one of the most helpful things a programming language can do is eliminate programs that you don't want by making them impossible to write. From that standpoint, constraints are a feature, not a drawback.
And at the extremes, too much power makes a tool less useful. I don’t drive an F1 car to work, I don’t plant tulips with an excavator, I don’t use a sledgehammer when hanging a picture. Those tools are all too powerful for the job.
once you specify "the job", the best tool is "the solution" to that job only. anything else is excess complexity
however if "the job" is unspecified, power is inverse to the length of "the solution"
so is constraint of power bad?
--
a fascinating question
just like music can be created by both additive and subtractive synthesis; every line of code creates both a feature and a constraint on the final program
in which case power can be thought of as the ability to constrain...
it implies expressivity is the ability to constrain
it implies drawing on a page, or more broadly, every choice we make, is in equal parts a creative and destructive act
so maybe life, or human flourishing is choosing the restrictions that increase freedom of choice? it's so meta it's almost oxymoronic; concretely: we imprison people to maximize freedom; or, we punish children with the aim of setting them free from punishment
this is the same as the walk from law into grace found in Christian ethics
maybe the ultimate programming language then, provides the maximal step down that path, and this is also the most useful definition of "power"
i.e. place on people those restrictions that increase their ability to choose
I worked at a place that had a big Forth codebase that was doing something mission critical. It was really neat and cool once you finally got it, and probably hundreds or maybe thousands of people had touched it, worked on it and learned it, but the ramp was pretty brutal for your average developer and thus someone decided it would be better to build the same thing over with a shitty almost-C-but-not-quite interpreted language. It certainly made it easier for more people to understand and build, even if the solution was less elegant.
Honestly, when I write forth now, which is usually for embedded targets, I've got a customized version of zforth that I've grafted some stuff like locals into. If it's a small program, it's better to not be afraid of things like globals, and just spend at least twice as much time factoring, writing comments and thinking than writing. It's important to read other people's Forth code and try to understand, as there's a zen and style that looks very different than how you'd write something like Java. It's freeing and enlightening once it clicks, but you have to fight a ton of the way you think about "normal" code.
As far as the codebase, I probably shouldn't say too much (may it's been long enough now, but Idk), but all I'll say is that was a important part of things at a certain disk drive manufacturer.
Powerful languages invites people to do needlessly complex things. Needlessly complex things are harder to understand. Harder to understand is worse.
Code that matters is usually read and extended many more times than it is written, over time by different people, so being straightforward beats most other things in practice
It kinda happened with markup languages. HTML, SVG, and some other domain specific markup languages are all XML, which is a subset of SGML.
The thing there is those DSLs have their own specs.
Coding is a social activity. Reading code is hard. When there are multiple ways of doing things, it's extra hard. People want to have relatively standardized ways of doing things so they can share code and reason about it easier.
If there's a lisp or racket or a forth that's defined as a DSL, it might take off if it's standardized and it's the best solution for the domain.
HTML uses a ton of SGML features not part of XML (sometimes erroneously though to be non-standard ‘tag soup’, not to mention self-closing tags). You need either a specialized parser or an SGML processor + DTD.
Sadly our industry carries mostly about brick layers and usually tries to go into technologies that make it easier to deal with employees like replaceable servants at low wage prices.
The large scale salaries SV style isn't something that you will find all over the globe, in many countries the pay is similar across all office workers, regardless if they are working with Git, or Office.
That argument implies that you would actually see these languages in communities with large SV style salaries which isn’t the case.
It turns out that “brick layer” languages are also easier to understand not just for the next person taking over but yourself after a few months. That’s valuable even to yourself unless you value your time at 0.
Why? The less the VCs have to spend with employees the better.
See the famous quote about Go's target audience, or 2000's Java being a blue colour job language.
Not only do languages like Lisp, Forth, Smalltalk require a people to actually get them, a bit like the meme with burritos in Haskell, they suffered from bad decisions from companies pushing them.
Lisp suffered with Xerox PARC, Symbolics and TI losing against UNIX workstations, followed by the first AI Winter, which also took Japan's 5th project with Prolog alongside with it.
Smalltalk was getting alright outside Xerox PARC, with big name backers like IBM, where it had a major role on OS/2, similar to .NET on Windows, until Java came out, and IBM decided to pivot all their Smalltalk efforts into Java, Eclipse has roots on Visual Age for Smalltalk.
Your entire post makes the claim that it’s because the vast majority of programmers get paid the same as other roles and that’s why there’s the language selection pressure there is.
High salary jobs would be the exception yet they also make pragmatic choices about languages. It’s a two sided market problem - employers want popular languages to be used so they have a talent pool to hire from and don’t end up having a hard time finding talent (which then also implies something about the salary of course but it’s a secondary effect). Employees look to learn languages that are popular and are easy to find employment in.
Not sure if you’ve spent any time with them but VCs and investors more broadly generally could give two fucks about the language a business is built in. There are exceptions but generally they just want to see the business opportunity and that you’re the team to go do it.
There’s a reason it’s difficult to find employment with Haskell or Lisp or other niche languages and it’s because they’re niche languages that “you have to get” - not easy to learn and generally not as easy to work with as “popular” languages that see significantly more man hours dedicated to building out tooling and libraries. There’s also secondary things like runtime performance which is quite poor for Haskell or Lisp if you’re a beginner and even people familiar with the language can struggle to right equivalent programs that don’t use significantly more memory or CPU. And finally the languages can just be inherently more difficult and alien (Haskell) which attracts a niche and guarantees it remains a niche language that attracts a particular kind of person.
I'm not entirely sure this is different from other languages but I believe a common complaint about lisp is every solution ends up writing a DSL for that solution, making it hard to understand for anyone else. So it's a super power if you're a small team and especially if you're a team of 1. But if you're a large team it doesn't scale.
I think it's a simple abstraction situation and the move for programming environments that include everything.
Geordi Laforge doesn't code much on the Enterprise. He simply asks the computer to build him a model of the anomaly so he can test out ideas. In a way, modern languages like Python (even before LLMs) let you get a lot closer to that reality. Sure you had to know some language basics, but this was pretty minimal and you'd use those basic building blocks to glue together libraries to make an application. Python has a good library for practically anything I do and since this is standard, it's expected that a task doesn't take too long. I can't tell my boss I'll need 3 years to code my own solution that uses my own libraries for numpy and scipy. You're expected to glue libraries together. This is why MIT moved SICP from scheme to Python. It's a different world.
With Forth, every program is a work of art that encapsulates an entire solution to a problem from scratch. It's creator chuck moore takes this to such a level that he also fabs his own chips to work with his forth software optimally. These languages had libraries, but they weren't easy to share and didn't have any kind of repository before Perl's CPAN. Perl really took off for awhile, but Python won out by having a simpler language with builtin OO (Perl's approach was a really hacky builtin OO or you download a library...).
To be honest though, I spent a decade trying many languages (dozens including common lisp, Prolog, APL, C, Ada, Smalltalk, Perl, C#, C++, Tcl, Lua, Rust...etc) looking for the best and although I never became experts in those languages, I kept coming to the conclusion that for my particular set of needs, Python was the best I could find. I wasted a lot of time reading common lisp books and just found it much easier to get the same thing done in Python. Your mileage will vary if you're doing something like building a game engine. A lot of people are just doing process automation and stuff like that and languages like Python are just better than common lisp due to the environment and tooling benefits. Also, although Python isn't as conceptually beautiful as lisp, I found it much easier to learn. The syntax just really clicked for me and some people do prefer it.
It is too risky for companies to rely on a language that have a small pool of programmers. The bigger the company, the bigger the language must be. AI multiplies this availability, not productivity.
Flipside: it looks like the most productive programmers are those who work alone and not in a large pool. The core point of the article is that team development is slower and less efficient.
Which means management must make a choice: getting good code relatively fast from a small pool of high-value individuals that it must therefore cherish and treat well...
Or get poor-quality code, slowly, but from a large and redundant group of less skilled developers, who are cheaper and easier to replace.
It is a truth universally acknowledged that from the three characteristics of "good, fast, and cheap", you can pick which two you want.
In this case, maybe the choice is as simple as "good and fast" or "cheap."
If the structure of the business or the market requires management to pick "cheap" (with concomitant but unspoken "bad and slow") then the structure, I submit, is bad.
I don't think there's a unifying reason why programming languages languish in obscurity; it's certainly not because they're "too powerful." What does "powerful" even mean? I used to care more about comparing programming languages, but I mostly don't these days. Actually used/useful languages mostly just got lucky: C was how you wrote code for Unix; Python was Perl but less funny-looking; Ruby was Rails; JavaScript is your only choice in a web browser; Lisp had its heyday in the age of symbolic AI.
Forth and (R4RS) Scheme are simple to implement, so they're fun toys. Some other languages like Haskell have interesting ideas but don't excel at solving any particular problems. Both toy and general-purpose programming languages are plentiful.
Alike to big fortunes, no one wants to hear the truth about lot of them existing due to simple luck. There is a significant amount of post-hoc rationalization to explain the success by some almost magic virtues. Or even explain the success by lack of such virtues - "worse is better" and so on.
It just tells you the top N words by frequency in its input (default N=100) with words of the same frequency ordered alphabetically and all words converted to lowercase. Knuth's version was about 7 pages of Pascal, maybe 3 pages without comments. It took akkartik 50 lines of idiomatic, simple Lua. I tried doing it in Perl; it was 6 lines, or 13 without relying on any of the questionable Perl shorthands. Idiomatic and readable Perl would be somewhere in between.
#!/usr/bin/perl -w
use strict;
my $n = @ARGV > 1 ? pop @ARGV : 100;
my %freq;
while (my $line = <>) {
for my $w ($line =~ /(\w+)/g) {
$freq{(lc $w)}++;
}
}
for my $w (sort { $freq{$b} <=> $freq{$a} || $a cmp $b } keys %freq) {
print "$w\t$freq{$w}\n";
last unless --$n;
}
I think Python, Ruby, or JS would be about the same.
Then I tried writing a Common Lisp version. Opening a file, iterating over lines, hashing words and getting 0 as default, and sorting are all reasonably easy in CL, but splitting a line into words is a whole project on its own. And getting a command-line argument requires implementation-specific facilities that aren't standardized by CL! At least string-downcase exists. It was a lark, so I didn't finish.
(In Forth you'd almost have to write something equivalent to Knuth's Pascal, because it doesn't come with even hash tables and case conversion.)
My experience with Smalltalk is more limited but similar. You can do anything you want in it, it's super flexible, the tooling is great, but almost everything requires you to just write quite a bit more code than you would in Perl, Python, Ruby, JS, etc. And that means you have more bugs, so it takes you longer. And it doesn't really want to talk to the rest of the world—you can forget about calling a Squeak method from the Unix command line.
Smalltalk and CL have native code compilers available, which ought to be a performance advantage over things like Perl. Often enough, though, it's not. Part of the problem is that their compilers don't produce highly performant code, but they certainly ought to beat a dumb bytecode interpreter, right? Well, maybe not if the program's hot loop is inside a regular expression match or Numpy array operation.
And a decent native code compiler (GCC, HotSpot, LuaJIT, the Golang compilers, even ocamlopt) will beat any CL or Smalltalk compiler I have tried by a large margin. This is a shame because a lot of the extra hassle in Smalltalk and CL seems to be aimed at efficiency.
(Scheme might actually deliver the hoped-for efficiency in the form of Chez, but not Chicken. But Chicken can build executables and easily call C. Still, you'd need more code to solve this problem in Scheme than in Lua, much less Ruby.)
—·—
One of the key design principles of the WWW was the "principle of least power", which says that you should do each job with the least expressive language that you can. So the URL is a very stupid language, just some literal character strings glued together with delimiters. HTML is slightly less stupid, but you still can't program in it; you can only mark up documents. HTTP messages are similarly unexpressive. As much as possible of the Web is built out of these very limited languages, with only small parts being written in programming languages, where these limited DSLs can't do the job.
Lisp, Smalltalk, and Forth people tend to think this is a bad thing, because it makes some things—important things—unnecessarily hard to write. Alan Kay has frequently deplored the WWW being built this way. He would have made it out of mobile code, not dead text files with markup.
But the limited expressivity of these formats makes them easier to read and to edit.
I have two speech synthesis programs, eSpeak and Festival. Festival is written in Scheme, a wonderful, liberating, highly expressive language. eSpeak is in C++, which is a terrible language, so as much as possible of its functionality is in dumb data files that list pronunciations for particular letter sequences or entire words and whatnot. Festival does all of this configuration in Scheme files, and consequently I have no idea where to start. Fixing problems in eSpeak is easy, as long as they aren't in the C++ core; fixing problems in Festival is, so far, beyond my abilities.
(I'm not an expert in Scheme, but I don't think that's the problem—I mean, my Scheme is good enough that I wrote a compiler in it that implements enough of Scheme to compile itself.)
—·—
SQL is, or until recently was, non-Turing-complete, but expressive enough that 6 lines of SQL can often replace a page or three of straightforward procedural code—much like Perl in the example above, but more readable rather than less.
Similarly, HTML (or JSX) is often many times smaller than the code to produce the same layout with, say, GTK. And when it goes wrong, you can inspect the CSS rules applying to your DOM elements in a way that relies on them being sort of dumb, passive data. It makes them much more tractable in practice than Turing-complete layout systems like LaTeX and Qt3.
—·—
Perl and Forth both have some readability problems, but I think their main difficulty is that they are too error-prone. Forth, aside from being as typeless as conventional assembly, is one of the few languages where you can accidentally pass a parameter to the wrong call.
This sort of rhymes with what I was saying in 02001 in https://paulgraham.com/redund.html, that often we intentionally include redundancy in our expressions of programs to make them less error-prone, or to make the errors easily detectable.
The article in CACM that presents Knuth's solution [1] also includes some criticism of Knuth's approach, and provides an alternate that uses a shell pipeline:
With great respect to Doug McIlroy (in the CACM article), the shell pipeline has a serious problem that Knuth's Pascal program doesn't have. (I'm assuming Knuth's program is written in standard Pascal.) You could have compiled and run Knuth's program on an IBM PC XT running MS-DOS; indeed on any computer having a standard Pascal compiler. Not so the shell pipeline, where you must be running under an operating system with pipes and 4 additional programs: tr, sort, uniq, and sed.
McIlroy also discusses how a program "built for the ages" should have "a large factor of safety". McIlroy was worried about how Knuth's program would scale up to larger bodies of text. Also, Bentley's/McIlroy's critique was published in 1986, which I think was well before there was a major look into Unix tools and their susceptibility to buffer overruns, etc. In 1986, could people have determined the limits of tr, sort, uniq, sed, and pipes--both individually and collectively--when handling large bodies of text? With a lot of effort, yes, but if there was a problem, Knuth at least only had one program to look at. With the shell pipeline, one would have to examine the 4 programs plus the shell's implementation of pipes.
(I'm not defending Pascal and Knuth, Bentley, and McIlroy are always worth reading on any topic -- thanks for posting the link!)
Bringing this back to Forth, Bernd Paysan, who needs no introduction to the people in the Forth community, wrote "A Web-Server in Forth", https://bernd-paysan.de/httpd-en.html . It only took him a few hours, but in fairness to us mortals, it's an HTTP request processor that reads a single HTTP request from stdin, processes it, and writes it output to stdout. In other words, it's not really a full web server because it depends on an operating system with an inetd daemon for all the networking. As with McIlroy's shell pipeline, there is a lot of heavy lifting done by operating system tools. (Paysan's article is highly recommended for people learning Forth, like me when I read it back in the 2000s.)
> splitting a line into words is a whole project on its own
Is it[1]? My version below accumulates alphabetical characters until it encounters a non-alphabetical one, then increments the count for the accumulated word and resets the accumulator.
It does look a lot like what I was thinking would be necessary. About 9 of the 19 lines are concerned with splitting the input into words. Also, I think you have omitted the secondary key sort (alphabetical ascending), although that's only about one more line of code, something like
#'(lambda (a b)
(or (< (car a) (car b))
(and (= (car a) (car b))
(string> (cadr a) (cadr b)))))
Because the lines of code are longer, it's about 3× as much code as the verbose Perl version.
In SBCL on my phone it's consistently slower than Perl on my test file (the King James Bible), but only slightly: 2.11 seconds to Perl's 2.05–2.07. It's pretty surprising that they are so close.
Were I trying to optimise this, I would test to see if a hash table of alphabetical characters is better, or just checking (or (and (char>= c #\A) (char<= c #\Z)) (and (char>= c #\a) (char<= c #\z))). The accumulator would probably be better as an adjustable array with a fill pointer allocated once, filled with VECTOR-PUSH-EXTEND and reset each time. It might be better to use DO, initializing C and declaring its type.
Also worth giving it a shot with (optimize (speed 3) (safety 0)) just to see if it makes a difference.
Yes, definitely more verbose. Perl is good at this sort of task!
> You can do anything you want in it, it's super flexible, the tooling is great, but almost everything requires you to just write quite a bit more code than you would in Perl, Python, Ruby, JS, etc.
Given that Smalltalk precedes JS by many years: if it is true, then it was not always true.
Given that Smalltalk was early to the GUI WIMP party: if it is true, then it was not always true for GUI WIMP use.
I've concluded that Forth isn't as powerful as Lisp because it can't do lists or heaps. STOIC addresses these and other limitations. Unfortunately it's got the least search friendly language name ever.
One thing I note is that all of the languages you name are very far from the machine. Also Forth is not close to the modern machine. Note that it only has two integer types and the larger one can be aligned either way you make sure it is not.
> One thing I note is that all of the languages you name are very far from the machine
Common lisp is one step away from assembly - you disassemble any function and it is, in fact, a valid strategy of one wants to check the compiler optimizations.
I googled a bit on how common lisp is compiled. Apparently it is possible to add some sort of type hints and ensure that parameters/variables have a certain type. If one uses that for most code, it would potentially be enough to qualify as being close to the machine.
To me it means that one attempts to use the machine well. I.e., avoid introducing overheads that have nothing to do with the problem one is trying to solve. As an example of something that is very far from the machine imagine wanting to add some integers together. One can do this in untyped lambda calculus by employing Church Numberals. If one looks at the memory representation now your numerals are a linked list of a size equal, or proportional, to the number. However, the machine actually has machine language instructions to add numbers in a much more efficient way. For this discussion maybe the most relevant example is that using dynamic typing for algorithms that don't need it is distant from the machine because every value now has a runtime type label that is actually not needed because if your program could actually be statically typed, one would know in advance what the type labels are so they are redundant.
They scale extremely effectively to large problems solved by a team size of one, maybe two.
The story goes that changing the language to fit how you're thinking about the problem is obstructive the rest of the people thinking about the same problem.
I'm pretty sure this story is nonsense. Popular though.
frankly it's a miracle any of them scaled at all, such popularity mostly comes down to an arbitrary choice made decades ago by a lucky vendor instead of some grand overarching design
It is because some languages are "too powerful"? What does that say about our industry? That we're still not that advanced of a specie to be able to handle the full power of such languages?
I say that because it seems languages that are "dumbed down" seem to absolutely dominate our world (Python, Ruby, JS, etc.)