The main difference is that a Petri net is basically an hypegraph, where you have directed edges connecting multiple vertexes both in the source and in the target.
Graphs give you finite state machines in the obvious way: You mark the vertex you are in and walk the arrows.
Hypergraphs give you Petri nets: You mark each vertex as many times as you want and walk the arrows to move marks around. This tells us two things:
1. Petri nets are a calculus of resources. The marking is not telling anymore "what state you are in". A state is an allocation of resources to each vertex in the net.
2. Petri nets are concurrent: you don't have to move stuff around by walking one edge at a time: Two different hyperedges in two different places of the hypergraph can "act at the same time", since the "what state you are in" thing makes no sense anymore.
Anyway, this paper is pretty complicated and for sure there are waaaaay easier places to start. Such as this one:
https://arxiv.org/abs/1906.07629
When evaluating the dynamics of a net, do all tokens move each discrete step or are there other choices that can be made?
Are any forms where the edges are weighted, or does each edge necessarily have the same weight?
Related to the previous question, if you have a finite number of tokens at a vertex with multiple outgoing edges, how do you choose which edges they follow? I suppose that for any given allocation there may be multiple succeeding allocations.
Finally, the structure seems very similar to neural nets. Are they actually similar, or very different?
There are countless different flavors of Petri nets. The edges can be weighted, meaning that a transition can get more than one token from a given input place to fire, and can put more than one token in a output place when it fires.
About the choosing which edges they follow: You don't. In standard Petri nets firing is concurrent: If tokens can be used by more than one transition at the same time, they will non-deterministically go one way or another. You can actually refine this situation by extending your formalism, e.g. to timed nets.
I am not an expert of neural nets, but I'd guess they are more similar to signal flow graphs. These are related to Petri nets tho, but in a very deep and complicated way that I have no chance of explaining here right now. Check out the work of Sobocinski, Piedeleu and Zanasi about additive relations if you are interested in this!
I made it more than halfway through the article, but without this simple intuition it was really hard to grasp the ideas.