Speeding up spatial mapping

4 minute read

Published:

Short post about visualizing spatial data Speeding up spatial mapping

Intro

Spatial maps are a powerful tool for any data scientists. Maps provide the ability to express relationships in your data that other techniques cannot. From visualizing points, lines, and polygons across space and time maps often serve as the best medium for expressing spatial relationships. GIS programs used to pose a barrier of entry to those wanting to unlock mapping as a data visualization tool but with the current suite of packages and libraries in R and Python, that is no longer the case.

Although generating maps is achievable with a few lines of code, I often run into the issue of time. The issue comes from the sheer size and amount of computing power it takes to handle some spatial data.

The good news is there are ways to speed up spatial mapping. One function that I find useful, which is the topic of this blog post is the st_simplify function in the R library simple features library(sf).

st_simplify()

I’ll explain this function through an example.

First, let’s grab some high-resolution spatial data. These data are from the National Hydrologic Database, and the spatial polygons represent HUC8s. The specifics of HUC8s are not important just know they represent watersheds.

library(nhdplusTools)
## USGS Support Package: https://owi.usgs.gov/R/packages.html#support
library(sf)
## Linking to GEOS 3.10.2, GDAL 3.4.2, PROJ 8.2.1; sf_use_s2() is TRUE
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## âś” dplyr     1.1.2     âś” readr     2.1.4
## âś” forcats   1.0.0     âś” stringr   1.5.0
## âś” ggplot2   3.4.2     âś” tibble    3.2.1
## âś” lubridate 1.9.2     âś” tidyr     1.3.0
## âś” purrr     1.0.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## âś– dplyr::filter() masks stats::filter()
## âś– dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
df <- data.frame(cbind(lat = 44.954701, long = -93.091417)) %>%
  st_as_sf(coords = c("long", "lat"), crs = 4269)

hucs <- get_huc8(AOI = df$geometry, buffer = 1000000)
## Spherical geometry (s2) switched off
## Found invalid geometry, attempting to fix.
## although coordinates are longitude/latitude, st_intersects assumes that they are planar
## Spherical geometry (s2) switched on

Now lets plot this data and see how long it takes.

start <- Sys.time()
ggplot(data = hucs$geometry) +
  geom_sf() +
  theme_void()

end <- Sys.time()
time <- end-start
print(time)
## Time difference of 2.276003 mins

That can take quite a while, which might not be too much of an issue if you’re plotting one time. However, if you’re working on the aesthetics of a map you may need to plot it dozens and dozens of times to get it right, which makes the time to print quite frustrating.

My solution to this is to simplify the polygons before plotting. Simplifying polygons can be accomplished with a single line of code.

hucs.simplified <- st_simplify(hucs, dTolerance = 1000)

As stated earlier st_simplify() comes from the library simple features. In the given line of code, I specified the dTolerance parameter, which specifies the tolerance for spatial resolution in unit meters. It should also be noted that this function includes other parameters which can do some powerful stuff!

Finally, let’s see how this speeds up plotting.

start <- Sys.time()
ggplot(data = hucs.simplified$geometry) +
  geom_sf() +
  theme_void()

end <- Sys.time()
time2 <- end-start
time2
## Time difference of 0.2787669 secs
time2-time
## Time difference of -136.2814 secs

Looking at the difference in the two times we see a big difference!

Conclusion

Simplifying polygons can definitely speed up plotting. Of course, because we decreased the spatial resolution the new plot is not as detailed. However, at this size, I usually don’t need such a high level of detail. If you need detail in your map this might not be the function for you, or at least not for your final product. You could still use st_simplfy to speed up the iterative process of finalizing plot aesthetics and then go back to the original spatial data for the final product.

Thanks for reading and hope you find this helpful!