Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Choropleth maps with ggplot and R (socviz.co)
94 points by nateb2022 on Feb 1, 2023 | hide | past | favorite | 31 comments


Of note is that this is a chapter of the book 'Data Visualization: A practical introduction', by Kieran Healy: https://socviz.co/index.html


Mapping in R seems a bit quaint at this time. If I need to do mixed visualizations including geo, I am reaching for Observable. If I only need the choropleth, Felt. If I need more spatial analysis, ArcGIS Online (it’s free!). If I need full-custom spatial or mixed statistics, geopandas.


is ArcGIS free? i could not find a link for it


I don't know if it is still a thing, but you can get ArcGIS, access to all of it, for $100/year under the "ArcGIS for Personal Use" program.


That's a pretty good deal too but if you just roll into arcgis.com they have a free, zero-friction signup. You login with one of their identity providers (github, google, apple, etc) and get started. They give you one "Creator"-level user. I've been using it for years without a hint of being charged.


Thats great! I was going to try but seems using on Mac is not straightforward :(


You can sign up for the personal use subscription and stick to Online. You can’t use Pro on Mac, but the web map viewer has grown into a highly featured editor and does a lot more than viewing (data editing, styling, analysis, charts, filtering, etc).


Bookmarked. Good post, Choropleth's in R are hit or miss and mergeing data to the mapping dataframe remains annoyingly complicated (alluded too in post).


I agree.

Data being placed on a map often has a hierarchical element (country, state/province, county/district, town/city, etc) and many of the data standards allow nesting such as GeoJSON.

While R can handle hierarchical data, it does much better with rectangular data sets.


Nowadays, there's lots of tooling around for nested data, like roomba [1] or the list-column workflow enabled by purrr and dplyr [2].

[1] https://github.com/cstawitz/roomba [2] https://r4ds.had.co.nz/many-models.html#nested-data


I make extensive use of geopandas and plotly for doing mapping and it’s very difficult for me to imagine it could get all that much easier. Dump my geo data frame in and specify which columns to use and boom you’ve got a fully interactive map with Open Street Maps data underneath. It even has a variety of themes to choose from!


I've had a lot of trouble with plotly maps and too many layers and moved to folium, which isn't easier, but results are more stable. (it is easier for simple plots, when using `gdf.explore()` and you can keep adding layers with `m = gdf.explore(); gdf2.explore(m=m)`

With that being said, tidyverse is much easier than pandas for EDA (though there is something to be said about the less stable API when it comes to production)

Plotly also exists for R and works good too.


Couldn’t you e.g. use KeplerGL or something like Folium to plot data on a basemap?


Sure, maybe. I wouldn't bother though, because it's basically a single line in Plotly to do this. (Though I freely admit there are quite a few parameters for that one line.) See: https://plotly.com/python/mapbox-county-choropleth/#using-ge...


GGplot2 is good, but leaflet is better at mapping these days. If you want a dynamic map, Leaflet for R is where to look. https://rstudio.github.io/leaflet/


GGplot is significantly more flexible. Leaflet is good for quick results that do not need to strictly adhere to cartography standards. Furthermore, it handles custom projections pretty poorly.


I've been meaning to use leaflet more for iterative portions of data exploration and designing a map, the stuff that a GUI and fast refresh times can speed up by an order of magnitude


a python alternative for ggplot (except mapping) is plotnine https://plotnine.readthedocs.io/en/stable/index.html


My preference is Altair: https://altair-viz.github.io/


why?


I scrolled through the article looking for examples/instructions how to plot maps for places other than the US, but sadly that doesn't seem to be covered.


Plenty of options, e.g. maps package makes it trivial to pull down data for another region and you have no shortage of projections that work well for other extents and latitudes, a number you can implement with ggplot2's coord_map(). My main qualm would be geom_raster(), the fast algorithm implemented for gridded data doesn't work with projections too far removed from Cartesian


I recently had to figure out how to do that (with zero R experience) and to be honest it is very well catered for.

See e.g. https://r-spatial.org/r/2018/10/25/ggplot2-sf.html


What are the best options for maps driven by data from SQL? So far the best I found is carto.com


If this is for an application you're building for browsers, use Leaflet (easy, but not as flexible) or OpenLayers (more flexible, more complicated). Tehre should be database libraries, though I've typically interacted with m database through an API I developed instead of direct from the browser. Cesium exists as well, but that is a resource hog.

IF you're playing around on a local machine, R and Python have sql interfaces that can let you load the data for whatever local processing you want.


You can connect to SQL databases from R, or almost any visualisation or GIS software. Not sure what you mean by 'best'?


I should have been more clear and said "using SQL only without R or python". This is for internal BI use, not for product development.


There's a basic map display in ssms if I recall correctly. But otherwise you can generate a map with something like Tableau. Less of a learning curve than R.


PostGIS.


10 thoughts on data visualization best practices and tools:

1) For interactive visualizations of data on 3D globes, I use a mix of C++, Python (for data cleaning), and Unreal Engine (with a plugin called Cesium). An example of this is at https://youtu.be/9i-tQ8Sr80o.

2) If I am trying to put together a 3D globe that has less quality but that can be accessed by the web, I use Mapbox GL JS, D3.js, and React. An example of this is at https://www.whiteowleducation.com/blog/2022/10/14/real-estat....

3) I have seen others use Three.js for developing 3D data visualizations on the web. An example of this in a data science context is at https://blog.fastforwardlabs.com/2019/04/29/visualizing-acti....

4) If you are trying to do 3D population density maps in R, there are a lot in the community that say you should use https://www.rayshader.com/ with R.

5) If you are really trying to push the limits of data visualization, follow https://twitter.com/Arti_AR_video . He is doing data vis in AR. Robert Scoble had a good tweet the other day (https://twitter.com/Scobleizer/status/1620498790653501440?) showing Arti with 3D bar charts sitting on a table.

6) If you are doing data vis for urban planning, odds are they are already using ArcGIS, and odds are you will be using something like that.

7) If you are trying to do data vis that relates to architecture, I would actually suggest starting with Twinmotion (which is part of the Unreal Engine ecosystem).

8) If you are trying to do data vis for simulations, it may be worth looking at https://www.nvidia.com/en-us/omniverse/ .

9) If you are wanting to show some high end maps fast, use Geolayers 3. There is a YouTube channel called "Boone Loves Video" (https://www.youtube.com/channel/UCXyGw2OkrAzLhq1r7hyDZkA). Boone explains Geolayers often in his videos.

10) I personally believe that if you are trying to get to next-gen data visualization my best guess is that you would use a mix of Blender, Nuke, Houdini, or After Effects. I personally have only used Blender and After Effects so far.

Also, if you have any data visualization needs, I am currently on the job market. https://www.linkedin.com/in/ralphbrooks has details about me.


bookmarked as well -- thanks!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: