The data didn't leak from Dow Jones, and the article doesn't cover how Dow Jones stores the data internally. Some customer who had the data leaked it from their own open system.
Data from various arbitrary public sources would be difficult to put into a rational schema. Querying that schema would also be more difficult that a full tact text ES query