The significant thing is not that we're arbitrarily applying the log function to probabilities. It is that there's a relationship between expected code length and the entropy of a source. It is surprising, to me at least, that the length of a thing is related to its probability. From that relationship come a number of fascinating connections to many other fields. Check out Cover and Thomas' "Elements of Information Theory" for a very approachable introduction to all these connections.