I don't think character indices are enough, what if your selection begins at the middle of a table cell and ends on an image that is the only child of a cell in a completely different table (no text involved at all, except some text in the cells in between)? If you want to, e.g., delete those how do you find which nodes are to be deleted and updated (e.g. for merging the two tables if there are cells after the one that contains the image)?
Images and table cells are just nodes within the tree holding the text, assuming all styling etc is represented in a plain text syntax similar to Markdown, of course. Looking up the nodes from char indices is quick if each node stores how many chars it contains.
Other approaches would probably require the selection to be a tree of its own, I can't really say whether that's simpler overall or not.
The syntax shouldn't matter (you may not even being using a plain text syntax - or any syntax - anyway), you could treat an image or whatever as a single "special" character. Or just assign a linearly increasing ID (increasing in the order the text, images, etc flows) to each node.
Though that is basically another way to represent what i wrote above with having a pair of node pointers and a subrange (well, an index actually, the other end of the subrange is implicit if the node pointers are different). This is basically what the old HTML editing control Microsoft had back in the 90s used and that worked with the DOM tree (also what i used in a test editor i wrote some time ago). And yeah it isn't simple.