Take y=3x+7. 3 and 7 are parameters, which would be learned by using gradient de...

Take y=3x+7. 3 and 7 are parameters, which would be learned by using gradient descent to find the values that make the x’s in the training data produce the corresponding y’s. LLMs are essentially functions like this, only they have billions of parameters and are nonlinear.