1) The money situation has changed over the years, and they've had times where t...

1) The money situation has changed over the years, and they've had times where things have boomed or busted - it's been a while since I left but I think they're still in a "boom" phase. There are a lottt more projects with different companies or organizations than the ones listed on the wiki, but they tend to be pretty secretive and I won't name names.

The categories of projects that I was familiar with were basically proof of concept work for companies or government R&D contracts. There are lots of big companies that will throw a few million at a long-shot AI project just to see if it pays off, even if they don't always have a very clear idea of what they ultimately want or a concrete plan to build a product around it. Sometimes these would pay off, sometimes they wouldn't but we'd get by on the initial investment for proof of concept work. Similarly, organizations like DARPA will fund multiple speculative projects around a similar goal (e.g. education - that's where "Mathcraft" came from IIRC) to evaluate the most promising direction.

There have been a few big hits in the company's history, most of which I can't talk about. The hits have basically been in very circumscribed knowledge domains where there's a lot of data, a lot of opportunity for simple common sense inferences (e.g. if Alice worked for the ABC team of company A at the same time Bob worked for the XYZ team of company B and companies A and B were collaborating on a project involving the ABC and XYZ teams at that same time, then Alice and Bob have probably met) and you have reason to follow all those connections looking for patterns, but it's just too much data for a human to make a map of. Cyc can answer questions about probable business or knowledge relationships between individuals in large sets of people in a few seconds, which would be weeks of human research and certain institutions pay a high premium for that kind of thing.

2) Oh god. Get ready. Here's a 10k foot overview of a crazy thing. All this is apparent if you use OpenCyc so I feel pretty safe talking about it. Cyc is divided into the inference engine and the knowledge base. Both are expressed in different custom LISPy dialects. The knowledge base language is like a layer on top of the inference engine language.

The inference engine language has LISPy syntax but is crucially very un-LISPy in certain ways (way more procedural, no lambdas, reading it makes me want to die). To build the inference engine, you run a process that translates the inference code into Java and compiles that. Read that closely - it doesn't compile to JVM bytecode, it transpiles to Java source files, which are then compiled. This process was created before languages other than Java targeting the JVM were really a thing. There was a push to transition to Clojure or something for the next version of Cyc, but I don't know how far it got off the ground because of 30 years of technical debt.

The knowledge base itself is basically a set of images running on servers that periodically serialize their state in a way that can be restarted - individual ontologists can boot up their own images, make changes and transmit those to the central images. This model predates things like version control and things can get hairy when different images get too out of sync. Again, there was an effort to build a kind of git-equivalent to ease those pains, which I think was mostly finished but not widely adopted.

There are project-specific knowledge base branches that get deployed in their own images to customers, and specific knowledge base subsets used for different things.