Not an expert, but I have a bit of formal training on Bayesian stuff which handl...

Not an expert, but I have a bit of formal training on Bayesian stuff which handles similar problems.

Usually Gibbs is used when there's no directly straight-forward gradient (or when you are interested in reproducing the distribution itself, rather than a point estimate), but you do have some marginal/conditional likelihoods which are simple to sample from.

Since each visible node depends on each hidden node and each hidden node effects all visible nodes, the gradient ends up being very messy, so its much simpler to use Gibbs sampling to adjust based on marginal likelihoods.