If you've already tried, you probably already came across this explanation, so there are good chances it won't help, but let me just try: I think a monad as a computation, meaning the act of computing something. A box with a button, that will do something (and possibly return a value) when you press the button. Before the button is pressed, you can't know what value is going to be generated, because none has yet.
The "return" monad operation does a very simple thing: given a value, constructs a box that is going to generate that value once you press the button. Not very useful in itself, but the point is that that value can be used for further computation. The "bind" operation precisely takes another monad and a function, so now we have two monads: the first one is provided by you and the second one is the result of the "bind" operation. Think the first monad as being "embedded" in the second one. When you press the button on the outer monad, first the button on the inner monad is pressed and a value is generated; then the value is mapped through the function; and then the result of the function is the value generated by the outer monad. In practice you extended the inner monad, doing some more computation (the function) after it had executed.
In the case of the Haskell IO monad, you can't really press the button inside your program: there is no way to extract values that were put in a IO monad. Still, on object of the IO monad encodes what you want the program to do. Basically the Haskell interpreter/compiler gives you the opportunity the press just once the button of a single IO monad, precisely of the one that is given by the main symbol (which has type "IO ()"). By composing monads in the right way, you can arrange your program to do what you want.
In other words, a monad like IO, where you can't press the button yourself, allows you to express precisely what a program with side-effects should be: something for which you cannot know the result (you cannot press the button and see the generated value in the program) unless you really execute it and commit to the side effects (which is what happen when main returns something and the single allowed button press is executed on that thing).
The "return" monad operation does a very simple thing: given a value, constructs a box that is going to generate that value once you press the button. Not very useful in itself, but the point is that that value can be used for further computation. The "bind" operation precisely takes another monad and a function, so now we have two monads: the first one is provided by you and the second one is the result of the "bind" operation. Think the first monad as being "embedded" in the second one. When you press the button on the outer monad, first the button on the inner monad is pressed and a value is generated; then the value is mapped through the function; and then the result of the function is the value generated by the outer monad. In practice you extended the inner monad, doing some more computation (the function) after it had executed.
In the case of the Haskell IO monad, you can't really press the button inside your program: there is no way to extract values that were put in a IO monad. Still, on object of the IO monad encodes what you want the program to do. Basically the Haskell interpreter/compiler gives you the opportunity the press just once the button of a single IO monad, precisely of the one that is given by the main symbol (which has type "IO ()"). By composing monads in the right way, you can arrange your program to do what you want.
In other words, a monad like IO, where you can't press the button yourself, allows you to express precisely what a program with side-effects should be: something for which you cannot know the result (you cannot press the button and see the generated value in the program) unless you really execute it and commit to the side effects (which is what happen when main returns something and the single allowed button press is executed on that thing).