I am working for data engineering consultancy company and I noticed that we often need large datasets for various reasons, but it's hard to find those. So we started building open source tool which allows us to quickly generate large volumes of data. The idea is to create DSL for describing the data and allowed values and the generator should generate the data set based on that.
I could see this becoming a popular website (assuming there isn't something out there already). Including a bunch of standard (countries, cities, provinces, etc etc) datasets and whatnot, links to other sources.....like a one-stop directory for online datasets. Again, if there isn't already something (I don't think there is!?)
https://github.com/smartcat-labs/ranger/blob/master/README.m...
Any help and ideas are more than welcome.