How I manage spatial data for my wildfire detection dashboard

A few weeks ago, I shared the story behind the creation of the Fire Alert dashboard. In case you are not familiar with it, you can go and read about it now.

As I explained in my previous post, I work with satellite data from NASA, which is kindly provided through an open API. Two satellites, Aqua and Terra, are provisioned with heat-detecting cameras that are able to discern significant temperature differences at the ground level, hence isolating hotspots that most likely are wildfires. The typical report from the satellites looks like this:

Field	Value
Latitude	36.79887
Longitude	12.01467
Time	18/01/2023 12:21

This essential, raw data can't be directly translated on a map as-is. There are a few issues with it:

the API endpoint does not retain any memory about the previous calls, and will inevitably return the same observations multiple times. I want to recognize these instances as known hotspots without triggering new alerts;
the satellites passing over the Earth at different times will detect patches of hot ground in the whereabouts of known hotspots. Most likely, all the observations are about a single event that is still ongoing. I want to compare new data to previous observations;
a hotspot could abruptly disappear from the satellite survey: maybe the fire stopped, but it is possible that dense smoke or clouds are concealing the real situation on the ground. Therefore, I want hotspots to have some stickiness and stay on the dashboard even if they are briefly not being reported.

Demo hotspots on the ***Fire Alert*** map

Analyzing these requirements, I realized that I needed an appropriate definition for a hotspot event that goes further than a single observation at a point in time. To this end, I personally designed a coarse heuristic: a hotspot event is determined by any number of observations happening within a 2km radius and within a 24 hours time window. Why the 2km figure? Well, under optimal conditions, the granularity of the source data is 1km, so I gave myself some slack and doubled that number to limit false positives. However, the Satellite error can be vastly greater (up to 5km) if a specific portion of the image is at the edges of the scan performed. It is possible to account for this deviation, but I chose to keep it simple for now.

High-level workflow

Keeping in mind the aforementioned premises, I am now able to properly explain how data ingestion in Fire Alert works. First of all, I call the satellite API hourly to get updated data over a specified area. I will get a report, with a certain number of observations. As mentioned above, I want to make sure that each record I am processing is truly unique: I combine the latitude, longitude, and time of the observation to derive a unique key and filter the occasional duplicate.

At this point, I always take note of the new hotspot location. This is important for traceability, but also to gather historical data over time and be able to perform further analysis in the future. I believe that in autumn, after the wildfire season, I will have more significant observations to build more knowledge about this phenomenon.

The last step of the workflow is to apply my heuristic. I look for all recent fires that were reported in a radius of 2km from the new observation: if I find anything, it means that most likely there still is an ongoing fire in that location and I have to update the existing hotspot. Otherwise, I will ring a bell and report the new hotspot event, both on the Fire Alert dashboard and in the dedicated Telegram channel.

Working with spatial data

Due to the nature of my application, I needed to correctly process, persist and confront spatial data. Considering that I work with rather small distances, the distance between two pairs of coordinates can be roughly computed as if the Earth was flat (the Equirectangular approximation). However, it would be rather cumbersome and inefficient to search for all points within a certain radius, as I do with my heuristic, and makes it difficult to extend the dashboard with new features such as searching for events in specific geographical areas like some national wildlife park or, let's say, over the country of Greece.

In order words, I needed a Geographic Information System, or GIS. Fortunately, it is the case that Postgres supports a GIS module, unsurprisingly called PostGIS. The plugin enables new, interesting data types like GEOMETRY (an entity living on a Cartesian plane) or GEOGRAPHY (an entity best represented on the Earth's surface). For example, a hotspot event can be easily persisted as a record in the following table:

Using additional methods like ST_Distance and ST_GeogFromText (to parse a string representation of a GEOGRAPHY entity) it becomes trivial to select all hotspot events in the whereabouts of new observations:

Setting up PostGIS requires specific settings. To enable it on my RDS instance, I followed these instructions.

Conclusion

In this post, I provided an under-the-hood overview of my work around my Fire Alert dashboard. I believe it gives the necessary clarity and context for the most curious users and on top of it transparency about how it works.

If you are an expert in treating geospatial data and have any opinion or - even better - precious advice to share, please feel free to do in the comment or via email. Your suggestions will be highly appreciated!

How I manage spatial data for my wildfire detection dashboard

High-level workflow

Working with spatial data

Conclusion

References