Foundation for Human Vision Models

yoknapathawa · 2024-08-24T16:04:28 1724515468

Vision transformer trained on 300M human images with state of the art results on a bunch of human tasks (keypoints, segmentation, depth, normals).

Disclaimer: Co-author here.

ks2048 · 2024-08-24T21:08:13 1724533693

You might want to update the README where it says run "./conda.sh" - it should say there are hard-coded paths in this script that need to be changed (the first line is CONDA_BASE="/home/rawalk/anaconda3").

I wonder if there is something here that requires conda and not a simple requirements.txt or something like that. Every time I try conda is seems to mess up my entire environment (usually I just use pyenv w/ virtualenv). But trying with conda now, keeping my fingers crossed...

EDIT: yep, as usual, conda failed me. (fresh install of miniconda). "./conda.sh" finished with 0 exit code and said "Installation done!". Yet, now I have no new conda environment (I think I saw some warnings and errors deep in the logging output).

I see now how this has various requirements.txt for the different sub-projects - looks like I'll try to create a pyenv-virtualenv and do things manually to try to get an example working...

nomel · 2024-08-28T19:18:19 1724872699

> usually I just use pyenv w/ virtualenv

In case you're not aware, because I was only recently: https://github.com/pyenv/pyenv-virtualenv

gimlids · 2024-08-24T19:50:27 1724529027

always curious what the license allows with these Meta research drops, seems all over the place… can this be used commercially? (specifically inference) it’s creative commons and some parts apache?

ElFitz · 2024-08-24T20:55:16 1724532916

The Creative Commons seems to be Non-Commercial [0], meaning it’s very interesting and quite inspiring, but ultimately useless outside of research and side projects.

The Apache parts seem to be dependencies.

[0]: https://github.com/facebookresearch/sapiens/blob/main/LICENS...

doctorpangloss · 2024-08-24T21:55:25 1724536525

> but ultimately useless outside of research and side projects.

“Everything is useless unless it personally, financially benefits me.”

echelon · 2024-08-25T03:11:39 1724555499

Yes. It is. Just a giant flexing its muscles. Look but don't touch.

Why would you spend any of your finite attention here? It's a signal to researchers and would-be upstarts that Meta already lurks here.

krasin · 2024-08-25T04:14:12 1724559252

> Why would you spend any of your finite attention here?

Because it is research. And it's an open research, unlike OpenAI or (for the most part) modern Google DeepMind or xAI.

It's completely fair game for non-researchers to ignore that, but even non-researchers benefit from a higher pace of developing understanding how to do this kind of magic.

Thank you people at Meta Research for this release!

tomrod · 2024-08-27T22:01:14 1724796074

Research is fantastic. Why release the models publicly if they can't be openly used?

ElFitz · 2024-08-25T06:07:25 1724566045

It also means no self-sustaining project can be built with it.

Which to me is something important to know when thinking about what can be built and the amount of effort it represents.

For better or worse, that ultimately shapes what can reasonably be done with it.

nickpsecurity · 2024-08-24T20:13:17 1724530397

I’ve seen papers that combined pre-trained vision and language models, trained them together on image/text pairs, and then used the new model for things like text extraction. Could your model be plugged into such a design?

I’ve always wanted to scan whole books by just feeding Pictures of their pages into an AI. Prefer preferably with minimal labeling requirements. I also see this as a way to generate more training data for language models from old cheap books. Do you think your model could help with that?

vessenes · 2024-08-27T11:13:09 1724757189

reposting my comment in hopes you'll see it in your profile:

Um, this looks really, really good.

Yo @yoknapthawa, can this be finetuned on an M3 chip? How much RAM is needed? What are the current low hanging fruit-type tasks you think the community could go at? What's latency like? I didn't see anything on the page / in the paper / github about speeds.

I'm also curious about the classes you use for the segmentation task -- do you have a list of them somewhere?

Finally, your generalization results are all on photorealistic images, did you do any looking at paintings / animation / other? I'm curious how broadly the generalization goes.

As always, thank you for opening the weights.

aithrowaway1987 · 2024-08-24T21:59:46 1724536786

The shadiness about Facebook's proprietary dataset of 300 million photos is concerning and should draw more attention. At the very least it is scientifically unacceptable - we should not high-five Big Tech researchers for intentionally unreproducible research. And if Meta is harvesting user photos for AI research and commercialization, they should tell their users about it directly (I am sure there is something buried in the TOS). Does the dataset include only public photos, or are Instagram DMs fair game? Does it include CSAM? Who cares!

Serious question: who are the people in the illustrations they used in the paper?[1] Are they Facebook/Instagram users? Did the authors ask permission to use their photos for an arXiv publication? Including their kids? Meta researchers really should be answering questions like this before they are asked - but these authors didn't even include an impact statement!

https://arxiv.org/abs/2408.12569

euroderf · 2024-08-25T12:56:37 1724590597

At some point in the near future, Facebook will use your accumulated posting & commenting history to sell you an A.I. form of yourself that can chat with people AND keep you on teh interwebz well past your death.

HeatrayEnjoyer · 2024-08-25T04:22:04 1724559724

Even more strange, Facebook a few years ago discontinued face tagging and deleted their database.

kridsdale1 · 2024-08-28T15:57:45 1724860665

That’s only a mapping of face-hash to username.

notjoemama · 2024-08-25T00:38:39 1724546319

At one point, well before Facebook was 'Facebook' in modern parlance, I posted a photo of a potato on my timeline knowing that somewhere in their object graph, I == potato. I'm certain my potatodentity isn't in this dataset, but one can hope that joke eventually lands.

alganet · 2024-08-25T15:19:56 1724599196

Sounds like something turned you into a thing. It happens so often.

vessenes · 2024-08-24T20:13:20 1724530400

Um, this looks really, really good.

Yo @yoknapthawa, can this be finetuned on an M3 chip? How much RAM is needed? What are the current low hanging fruit-type tasks you think the community could go at? What's latency like? I didn't see anything on the page / in the paper / github about speeds.

I'm also curious about the classes you use for the segmentation task -- do you have a list of them somewhere?

Finally, your generalization results are all on photorealistic images, did you do any looking at paintings / animation / other? I'm curious how broadly the generalization goes.

As always, thank you for opening the weights.

malshe · 2024-08-25T16:29:39 1724603379

I second the request for more information like the list of classes with code example.

Dig1t · 2024-08-28T21:28:09 1724880489

Anyone know how feasible it would be to use this for doing mocap for a game?