You might want to update the README where it says run "./conda.sh" - it should say there are hard-coded paths in this script that need to be changed (the first line is CONDA_BASE="/home/rawalk/anaconda3").
I wonder if there is something here that requires conda and not a simple requirements.txt or something like that. Every time I try conda is seems to mess up my entire environment (usually I just use pyenv w/ virtualenv). But trying with conda now, keeping my fingers crossed...
EDIT: yep, as usual, conda failed me. (fresh install of miniconda). "./conda.sh" finished with 0 exit code and said "Installation done!". Yet, now I have no new conda environment (I think I saw some warnings and errors deep in the logging output).
I see now how this has various requirements.txt for the different sub-projects - looks like I'll try to create a pyenv-virtualenv and do things manually to try to get an example working...
always curious what the license allows with these Meta research drops, seems all over the place… can this be used commercially? (specifically inference) it’s creative commons and some parts apache?
The Creative Commons seems to be Non-Commercial [0], meaning it’s very interesting and quite inspiring, but ultimately useless outside of research and side projects.
> Why would you spend any of your finite attention here?
Because it is research. And it's an open research, unlike OpenAI or (for the most part) modern Google DeepMind or xAI.
It's completely fair game for non-researchers to ignore that, but even non-researchers benefit from a higher pace of developing understanding how to do this kind of magic.
Thank you people at Meta Research for this release!
I’ve seen papers that combined pre-trained vision and language models, trained them together on image/text pairs, and then used the new model for things like text extraction. Could your model be plugged into such a design?
I’ve always wanted to scan whole books by just feeding
Pictures of their pages into an AI. Prefer preferably with minimal labeling requirements. I also see this as a way to generate more training data for language models from old cheap books. Do you think your model could help with that?
reposting my comment in hopes you'll see it in your profile:
Um, this looks really, really good.
Yo @yoknapthawa, can this be finetuned on an M3 chip? How much RAM is needed? What are the current low hanging fruit-type tasks you think the community could go at? What's latency like? I didn't see anything on the page / in the paper / github about speeds.
I'm also curious about the classes you use for the segmentation task -- do you have a list of them somewhere?
Finally, your generalization results are all on photorealistic images, did you do any looking at paintings / animation / other? I'm curious how broadly the generalization goes.
The shadiness about Facebook's proprietary dataset of 300 million photos is concerning and should draw more attention. At the very least it is scientifically unacceptable - we should not high-five Big Tech researchers for intentionally unreproducible research. And if Meta is harvesting user photos for AI research and commercialization, they should tell their users about it directly (I am sure there is something buried in the TOS). Does the dataset include only public photos, or are Instagram DMs fair game? Does it include CSAM? Who cares!
Serious question: who are the people in the illustrations they used in the paper?[1] Are they Facebook/Instagram users? Did the authors ask permission to use their photos for an arXiv publication? Including their kids? Meta researchers really should be answering questions like this before they are asked - but these authors didn't even include an impact statement!
At some point in the near future, Facebook will use your accumulated posting & commenting history to sell you an A.I. form of yourself that can chat with people AND keep you on teh interwebz well past your death.
At one point, well before Facebook was 'Facebook' in modern parlance, I posted a photo of a potato on my timeline knowing that somewhere in their object graph, I == potato. I'm certain my potatodentity isn't in this dataset, but one can hope that joke eventually lands.
Yo @yoknapthawa, can this be finetuned on an M3 chip? How much RAM is needed? What are the current low hanging fruit-type tasks you think the community could go at? What's latency like? I didn't see anything on the page / in the paper / github about speeds.
I'm also curious about the classes you use for the segmentation task -- do you have a list of them somewhere?
Finally, your generalization results are all on photorealistic images, did you do any looking at paintings / animation / other? I'm curious how broadly the generalization goes.
Disclaimer: Co-author here.