Perhaps you can learn such a function, but it may be hard to learn a suitable embedding space directly, so it makes sense to lean on the more general capabilities of an LLM model (perhaps fine-tuned and distilled for more efficiency).
In principle, there is no reason why an LLM should be able to do better than a more focused model, and a lot of reasons why it will be worse. You’re wasting a ton of parameters memorizing the capital of France and what the powerhouse of a cell is.
If data is the issue you can probably even generate vulnerabilities to create a synthetic dataset.