I am on an enterprise environment. I had non stop issues with OneDrive, such as OneDrive process always pegged at 100% CPU, files not synching, files going cloud-only and inaccessible without Internet, and issues in git repos. I go out of my way to avoid OneDrive at any cost, including the cost of using a separate backup system for my files.
> we don't send images to the model at query time. We describe each image once, at indexing time, with a cheap vision model, store the descriptions as text, and retrieve them alongside ordinary text chunks
This is what I've been doing in my Obsidian infodump for a while. If I know that an image is important, I generate a text description (Mermaid if possible, English if not) and paste it after the image in a block. This lets agents see the image if they don't really see it. Though my process is manual, the improvements in outcomes for agents that rely on text search/retrieval is very real and is worth it.
For a RAG project for a client with a lot of PDFs and Powerpoints with images, I used ColPali a year ago. I see the provider ColiVara is still online but it seems to have fizzled out.
Retrieving based on text and then giving the generation model the image instead is much smarter than retrieving based on image. Image-based retrieval is slow and expensive.
Same with giving the model an image vs a structured representation of it.
Leaps and bounds better!
I don't think I benchmarked it.
But the experience was that it was able to find small details in PDFs, in technical diagrams, and this was really not captured well at all with OCR.
In general, OCR I think should be used more as an add-on to retrieve data, not given to the generation model itself. Similar to retrieving based off a text description and then giving the generation model the image.
Most diagrams I come across are basically boxes and arrows which are representable with mermaid flow charts without losing information. The layout of the mermaid will usually look differently, but that is not typically what matters. ChatGPT is quite good in creating mermaid flow charts from random box and arrow diagram images.
Eh, maybe for the more luxurious properties? But plenty of landlords are operating on tight margins and they’re not legally allowed to raise rent by more than some measure of inflation reported by the state each year.
But you’re right, the tax would have to be much more punitive to crossover into the red.
If it does make it more challenging to justify the business of being a landlord, I’m all for it though. Steps towards the end goal of more New Yorkers who want to owning their primary residence.
That is not the case, sorry. Pre-2015 Wikipedia was as honest and unbiased as we can get. Now the political, historical, philosophical segments of English Wikipedia is very biased and I cannot recommend or support it.
> furiously hammering on my laptop “WHAT THE FUCK DID YOU DO???”. The recipient of these tirades is, you might have guessed, a coding agent. It’s completely pointless, I know.
I believe it's worth than pointless. IMO adding such things to the context "configures" the AI to reproduce the statistics of conversations where people swore, shouted, and were unprofessional (despite the alignment runing and all that), where quality content is rarer to find. So this is bound to decrease the quality of the LLM output.
Agreed. These accounts of people having genuine emotional responses to LLM chats, even going as far as to spend tokens berating them, are very curious. I would be surprised to learn that SOTA models respond optimally to anything other than dispassionate problem-solving, or that scolding per se serves any productive purpose.
Of course we all swear at our computers every now and then, but for me it's always been in good fun. It's just a sarcastic joke that adds some levity and self-amusement to an otherwise arduous debugging process, not generally actual insinuation of malfunction (or malice) on the part of the hardware/OS/toolchain. I'd assumed that "half the job is cursing at the machine until it obeys you" was a big in-joke amongst the profession, but the LLM era seems to be exposing a divide in how tongue-in-cheek that statement really is.
We may be in the last Golden age of AI, where experienced professionals still exist who can code manually, and AI already exists who can code automatically, and when the former use the latter skillfully, wonders happen. This magical intersection may not exist iin the future, or become very rare.
I think as long as it continues to be tangibly better these people will still exist and the intersection will continue to be valuable enough to survive.
> as long as it continues to be tangibly better these people will still exist
Sure. But how long will that last? LLMs are getting better at programming much faster than I am.
Imagine a plot with time on the X axis and LLM skill on the Y axis. The line goes up and to the right. On the left is GPT3, or GPT3.5 with the very first glimmers of programming ability just a few short years ago. In the middle is Opus 4.7 now.
Where's the intersection point, where AI skill is higher than that of humans? Less than 10 years. I'd guess less than 5 years.
I think the problem is is that coding is not wholly a 'writing code' problem. It's a translation from idea to outcome. Often I think the bad code generated by an LLM is less to do with it's 'ability' and more to do with an instruction that hasn't adequately accounted for the possibility of what code satisfies the criteria. I'm not sure how a newer model can improve on this per se - sure there will be imrpovement on outright mistakes but for me at least, that's been and gone with more or less with any model released in te last 6 months.
I was coding something with claude the other day. It got the program working by all externally observable metrics, but when I went into the code it was full of DRY violations. It made a bunch of interrelated - but separate - traits for some concepts which simply didn't fit together.
I asked it to look at the code and come up with better factorings, but it failed. I ended up manually reworking several thousand lines of code myself, via my IDE. It took days.
I'd like a claude-of-the-future to be able to come up with beautiful ways to factor the code itself. Amongst the correct solutions, pick one which is conceptually simple. Write the code in a way that it makes subsequent changes easier to write. If I were doing RL with claude, I'd consider directing it toward solutions which allow subsequent changes to be implemented with as little effort as possible.
I think a better way to think about it is - what are the invariants to our current architecture? Why can't you tell Claude to build you a 1B$ business, make no mistakes?
I have no doubt they will be better programmers than almost every human that has ever existed. But the role of a SWE will expand to fill the gaps that the LLM paradigm hasn't filled:
- Accountability
- Long term architectural vision, goal setting
- Everchanging business context
- Mercurial executives, people problems, relationships etc...
Token efficiency is going to be the next big thing.
Tokenmaxxing an army of juniors will destroy your business through slop induced tech debt and API costs. A senior that uses AI but is token efficient will be like rocket fuel.
And you act like there hasn't been a loss once we moved away from the master craftsman style of building to the professionalized architect style of building. We cannot make a gothic cathedral amymore. also CAD, homogenized the built environment, significantly. And we have been losing a lot of traditional, artisanal craftsmen art forms over the past century. artisanal craft mounds,
Did they? Genuine question, because I do wonder if people in some industries in the past were ever anxious about these specific things (especially skill attrition).
> I do wonder if people in some industries in the past were ever anxious about these specific things (especially skill attrition).
I've spoken with some people (now in their 60s & 70s) that worried about skill atrophy in their line of work.
First they worried about atrophy. Then they watched skill dry up. Now they know it's not available to buy anywhere. In the better cases the skills still exist, but entirely overseas.
These are people I could recognize as sharp engineers, even if I don't know their domains at all. I had to take them at their word about the value in what was lost. The problem is that it's easy to assume that business (or at least society) would prevent degradation of valuable knowledge over time.
My experience with ChatGPT as a search engine - it is totally paranoid about checking and re-checking its answers by referencing them in multiple places (I usually read its thinking output). I have not seen an outright hallucination for at least a year. (It is of course a different situation with Google's "AI summary" which is wrong half of the time.)
Ironically I quit using ChatGPT a while back. I decided to run it through it's paces and asked it some rather detailed questions about a range of topics that I have significant domain knowledge on. Without exception the responses I got back where glibly superficial to the point the responses were almost totally devoid of meaningful information. The AI summary on Google search results is so bad it represents an assault on reason.
reply