Unsupervised learning implies that there are no human-annotated labels whatsoever (in this context, meaning that the model had no paired translations at all).
Zero-shot learning (usually) means that the model can generalize learning from seen labels to unseen labels.
That being said, conceptually I guess there could be an "unsupervised zero-shot learning" model - say, a language model that learns word embeddings from English wikipedia, and trying to use those embeddings to generate French sentences. My guess is that it simply doesn't work.
To expand on this (complete correct) response, unsupervised training is often part of the training process for a zero-shot prediction task.
For example it's pretty common to use unsupervised learning to build embeddings for each target language, align the embeddings somehow (noting that you don't have labels, so you are using the multi-dimensional shapes within the embeddings to try to match them) and then finally test against labelled data (the zero-shot thing).
Zero shot, cross modal transfer is something humans do really well. You can read a description of a Platypus and then label it correctly even if you have never seen one before.
A seminal paper in this was Richard Socher's Zero-Shot Learning Through Cross-Modal Transfer[1]. It's the paper that earmarked him as a star, and look at the co-authors (Chris Manning and Andrew Ng).
Okay, so unsupervised learning would be if you had never seen any Earth animals before, and were presented with 99 photos of fish and one giraffe and noticed that the latter was the oddball, whereas zero-shot would be like if you were told that a giraffe was yellow and brown with four legs and a long neck and then said, "That must be a giraffe!" the first time you ever saw a photo of one.
Unsupervised learning implies that there are no human-annotated labels whatsoever (in this context, meaning that the model had no paired translations at all).
Zero-shot learning (usually) means that the model can generalize learning from seen labels to unseen labels.
That being said, conceptually I guess there could be an "unsupervised zero-shot learning" model - say, a language model that learns word embeddings from English wikipedia, and trying to use those embeddings to generate French sentences. My guess is that it simply doesn't work.