Hacker Newsnew | past | comments | ask | show | jobs | submit | mikehollinger's commentslogin

(needs a tag to be 2017)


Yeah, here I was thinking there's something new coming out of OpenAI that's not another LLM/diffusion model.


They (AI Corp. Execs) seem to think LLMs will be central to AGI. They are the experts I guess, but I have my doubts.


My cynical side says "exec" and "expert" are mutually exclusive.


From a different robot (Boston Dynamics' new Atlas) - the system moves at a "reasonable" speed. But watch at 1m20s in this video[1]. You can see it bump and then move VERY quickly -- with speed that would certainly damage something, or hurt someone.

[1] https://www.youtube.com/watch?v=F_7IPm7f1vI


Especially if holding a knife or something sharp.


This doesn’t capture work that’s happened in the last year or so.

For example some former colleagues timeseries foundation model (Granite TS) which was doing pretty well when we were experimenting with it. [1]

An aha moment for me was realizing that the way you can think of anomaly models working is that they’re effectively forecasting the next N steps, and then noticing when the actual measured values are “different enough” from the expected. This is simple to draw on a whiteboard for one signal but when it’s multi variate, pretty neat that it works.

[1] https://huggingface.co/ibm-granite/granite-timeseries-ttm-r1


My similar recognition was when I read about isolation forests for outlier detection[0]. When predictions are different from the average, something is off.

[0] https://scikit-learn.org/stable/modules/generated/sklearn.en...


what were you thinking then before your aha moment? :D


> what were you thinking then before your aha moment? :D

My naive view was that there was some sort of “normalization” or “pattern matching” that was happening. Like - you can look at a trend line that generally has some shape, and notice when something changes or there’s a discontinuity. That’s a very simplistic view - but - I assumed that stuff was trying to do regressions and notice when something was out of a statistical norm like k-means analysis. Which works, sort of, but is difficult to generalize.


> Like - you can look at a trend line that generally has some shape, and notice when something changes or there’s a discontinuity.

what you describe here is effectively forecasting a model of what is expected to happen and then you notice a deviation from it.


to me its always amazing how people look at whats evidently obvious to me and say its profound.


especially if they are self-assessed "distinguished engineers and master inventors"


Care to share the contexts in which someone needs a zero-shot model for time series? I have just never come across one in which you don't have some historical data to fit a model and go from there.


In this case I don't think zero-shot means no context. I think it's more used in relation to fine-tuning the model parameters over your data.

> TTM-1 currently supports 2 modes:

> Zeroshot forecasting: Directly apply the pre-trained model on your target data to get an initial forecast (with no training).

> Finetuned forecasting: Finetune the pre-trained model with a subset of your target data to further improve the forecast


> About "people still thinking LLMs are quite useless", I still believe that the problem is that most people are exposed to ChatGPT 4o that at this point for my use case (programming / design partner) is basically a useless toy....

and

> a key thing with LLMs is that their ability to help, as a tool, changes vastly based on your communication ability.

I still hold that the innovations we've seen as an industry with text transfer to the data from other domains. And there's an odd misbehavior with people that I've now seen play out twice -- back in 2017 with vision models (please don't shove a picture of a spectrogram into an object detector), and today. People are trying to coerce text models to do stuff with data series, or (again!) pictures of charts, rather than paying attention to timeseries foundation models which directly can work on the data.[1]

Further, the tricks we're seeing with encoder / decoder pipelines should work for other domains. And we're not yet recognizing that as an industry. For example, whisper or the emerging video models are getting there, but think about multi-spectral satellite data, fraud detection (a type graph problem).

There's lots of value to unlock from coding models. They're just text models. So what if you were to shove an abstract syntax tree in as the data representation, or the intermediate code from LLVM or a JVM or whatever runtime and interact with that?

[1] https://huggingface.co/ibm-granite/granite-timeseries-ttm-r1 - shout-out to some former colleagues!


Andrej Karpathy: https://twitter.com/karpathy/status/1835024197506187617

> It's a bit sad and confusing that LLMs ("Large Language Models") have little to do with language; It's just historical. They are highly general purpose technology for statistical modeling of token streams. A better name would be Autoregressive Transformers or something.

> They don't care if the tokens happen to represent little text chunks. It could just as well be little image patches, audio chunks, action choices, molecules, or whatever. If you can reduce your problem to that of modeling token streams (for any arbitrary vocabulary of some set of discrete tokens), you can "throw an LLM at it".


But I need enormous amounts of learning data and enormous amount of computing to learn new models, right? So it's kind of useless advice for most people who can't just parse github repositories and teach their new model using AST tokens. They have to use existing opensourced models or API and those happened to use text.


The environmental arguments are hilarious to me as a diehard crypto guy. The ultimate answer to “waste” of electricity arguments is that energy is a free market and people pay the price if it’s useful for them. As long as the activity isn’t illegal then training LLMs or mining bitcoins, it doesn’t matter. I pay for the electricity I use.


Do you think that it we should make it illegal to mine coins if the majority of people think the environmental cost is too high?


If a law is passed then that’s the law


One argument against that line of thinking is that energy production has negative externalities. If you use a lot of electricity, its price goes up, which incentivizes more electricity production, which generates more negative externalities. It will also raise the costs for other consumers of electricity.

Now that alone is not yet an argument against crypto currencies, and one person's frivolous squandering of resources is another person's essential service. But you can't simply point to the free market to absolve yourself of any responsibility for your consumption.


Unintentionally, the energy demands of cryptocurrencies, and data centers in general, have finally motivated utilities (and their regulators) to finally start building out the massive new grid capacity needed for our glorious renewable energy future.

Acknowledging that facilitating scams (eg pig butchering) are cryptocurrency's primary (sole?) use case, I'm willing to look the other way if we end up with the grid we need to address climate crisis.


To pretend romance / affinity scams and crime were created by crypto is absurd. It’s fair to argue crypto made crime more efficient, but it also made the responsible parties quicker to patch holes.

The primary use case of crypto is to protect wealth from a greedy, corrupt, money-printing state. Everything else is a sideshow


> primary use case of crypto is to protect wealth

Merely trading governments for corporations.

> Everything else is a sideshow

Agreed. Crypto is endlessly amusing.


What corporation made bitcoin?


Apologies, I assumed you knew what cryptocurrency is and how it works. My bad.

I'm really not well suited to explain this stuff. Here's an article for a general (layperson) audience to help you on your journey. https://www.cbsnews.com/news/cryptocurrency-bitcoin-virtual-...

Happy hunting!


I have been intimately involved with cryptocurrency since 2010


I greatly despise video games. Why is that not a waste of energy? If you are entertained by something, even if it serves no human purpose other than entertainment, is that not a valid use of electricity?


https://c4model.com/ is very useful for this. :-)

I've told it before, but when we were doing some clean sheet work a while ago I decided to use the C4 model and drew out the obligatory "Context" diagram with "user" "phone" "laptop" "app" sort of stuff.

I found them silly and (honestly) I still find that if I see one "in the wild" with no further elaboration I become suspect.

However two hours later, because of that silly context diagram, I realized that we had both an online and a semi-disconnected mobile app that could be offline for hours, and that certain things -had- to use a queue and expect an arbitrary amount of time for a task to run, and it completely changed how we thought about the core of how we implemented something pretty important.

Sold. :-)


Most of what I do these days is silly drawings in excalidraw. As a result I seem to understand more of our systems than anyone else. I'll even export the SVGs and commit them to our repos


"A picture is worth a thousand words" is just gabble until you draw one worth a million of them :)


Relationships and sequences.

But if you want to talk about REAL complex systems talk to a microprocessor logic owner or architect trying to shoot a bug.

A while ago we found a bug that could crash a system (fixed in a new RIT of the chip) if we did X then Y in state … we didn’t know.

Listening to the various leads for the sub-units on a phone call trying to reason about what was happening I found myself visualizing this increasingly complicated steam powered machine, with parts sprawling, tiny gears whirring, and bits zipping about whenever X happened.

It was humbling.


> compression mostly makes imperfections go away

The ultimate compression is to reduce the video clip to a latent space vector representation to be rendered on device. :)

Just give us a few more revs of Moore’s law for that to be reasonable.

edit: found a patent… https://patents.google.com/patent/US11388416B2/en


Eh. A better analogy - the output would decide that there needs to be conduit between floors for chilled water, hot water, sewage, dutifully make several 4” pipes, and then from floor to floor forget which is which.


"No, 'c_water' means 'clean_water', it has nothing to do with the temperature, so that's why you got burnt; also 'gray water' has nothing to do with a positional encoding scheme, and 'garbage collection' is just a service that goes around and picks up your discarded post-it notes - you didn't take that rotting fruit out of the bowl, so how could we be expected to know you were done with it?"


I am fascinated by how complex JIRA is. We evaluated it in 2008. It seemed fine enough.

Looking at it 16 years later, and… what is this nonsense? It’s so customizable that it’s loaded with footguns.


I have a theory: Back in 1996 Bugzilla worked very well. It had been designed, and honed, by a bunch of senior developers who also wrote the bug management system. So lots of dog food eaten. iirc it was written in Perl.

Then, someone I believe decided to make a "Bugzilla in Java", because they didn't like Perl (reasonable).

But whoever that was didn't have the deep knowledge of how the thing was supposed to be used. Lacking that insight, they created a "Swiss Army Chainsaw", implementing simultaneously everything, and nothing.

Next, some MBAs got hold of the thing, and made everything 10X worse.

Meanwhile, Bugzilla is still the same and still the best software project management tool, if you know how it's intended to be used.


In fact, the name “Jira” is a reference to Bugzilla. Atlassian says:

https://confluence.atlassian.com/plugins/servlet/mobile?cont...

> We originally used Bugzilla for bug tracking and the developers in the office started calling it by the Japanese name for Godzilla, Gojira (the original black-and-white Japanese Godzilla films are also office favourites). As we developed our own bug tracker, and then it became an issue tracker, the name stuck, but the Go got dropped - hence JIRA.


I had some thoughts on Jira: https://honeypot.net/2021/10/01/jira-is-a.html

TL;DR it's so completely customizable that it's more like a DIY project management toolkit. Pivotal and Linear have/had a more opinionated approach: "here's how you manage projects. Good luck and have fun!" Jira almost seems to push otherwise rational people to build the most baroque processes imaginable.


> Jira almost seems to push otherwise rational people to build the most baroque processes imaginable.

PM's gotta justify their jobs somehow.


I love a good PM. Trust me, you don't want to be responsible for all the reporting and status updates and all that they have to deal with daily.

It's just that I've never worked with someone I considered a good PM who loved Jira. The great ones wouldn't care if we did all the planning on papyrus because they were more concerned with getting things done than documenting them in excruciating detail.


it's the super customizable ones that end up adopted across large Enterprises. Flexible workflows I guess. eg Salesforce, Jira


Semi-related story with some insider baseball:

There are quite a few memorable words you can spell using 32 or 64 bits—like BA5EBA11. This is the story of me -not- choosing one of those.

These bit-pattern words are handy because they’re easy to recognize, especially in a random memory dump.

On my first “real” assignment, I was writing real-time embedded C code for a 16-bit processor that communicated with a host microprocessor on a server. We needed to run periodic assurance tests across a bus to ensure reliable communication with the host since we weren't constantly using the bus.*

We were given an unused register address on the host processor and told to write whatever we wanted to it. The idea was to periodically write a value, read it back, and if we encountered any write errors, incorrect reads, or failures, we’d declare a comm error and degrade the system in a controlled manner.

Instead of using zeros or something like 0xDEADBEEF, I decided to write 0x4D494B45 - "MIKE" in ASCII. It was unique, unlikely to be tampered with, it worked, and no one argued with me. The code shipped, the product shipped, and all was well. We even detected legitimate hardware errors, which I thought was pretty cool.

Fast forward two generations of systems, and long after I’d moved on from that team, the code had been ported around but that assurance test remained unchanged. Everything was fine until they brought up a new generation of systems, flipped on the firmware for that device, and 10 seconds later, my assurance test clobbered an important register. The entire system promptly checkstopped and crashed. It took the team days to figure out what was wrong, and I had to explain myself when they found "MIKE" staring back at them from the memory dump.

That was a fun project. ;-)

* Note: It would've been bad if our device went out to lunch because we were responsible for energy management of the server. If the power budget was exceeded and we couldn't downclock and downvolt the processor, something might have crashed or been damaged.


That was a great “Mike says no” moment.


What a funny way to think about it. Gordo again gets all the blame.


vs. 80's commercial tag line about 'Mikey likes it"


Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: