V.2 Accuracy of data

Summary

V - HOW TO VALIDATE RESULTS?

 


2- ACCURACY OF DATA

Besides the geographical position, the content of the data must also be correct. At least, it must be as accurate as possible because 100% correct data exists only in an ideal world. Users of the data should at least have an idea of the type of errors it contains and their order of magnitude.

In the case of thematic maps derived from image classifications, we want to know whether the land-use classes we see on the map (e.g. forest, roads, built-up areas...) do indeed correspond to reality, and we want to have an idea of which classes might be confused with each other.

In the case of quantitative remote sensing data (surface variables such as, e.g., primary production or surface temperature), we want to have an idea of the order of magnitude of the error and of the standard deviation.  

To validate remote sensing data, we use reference data, sometimes called ground truth. These can be obtained from different sources (interpretation of aerial photographs, thematic maps, ground measurements,...) and they can occur in different forms (digital maps, sensor measurements, graphs,...). Such reference data not only provide additional information when analysing remote sensing data, they therefore also allow us to check that they are correctly interpreted.     .

Example 1 :  Determining errors in thematic maps obtained by image classification

To quantify errors in thematic maps derived from satellite images, we can use a so-called confusion matrix. For a number of control points (examples indicated on the image to the right), we determine the class to which the corresponding image pixels actually belong. We obtain this so-called ground truth by visual interpretation and/or on-site verification with a GPS (GNSS device). By placing each checkpoint in a matrix with the actual classes in the columns and the assigned classes in the rows, we can calculate a number of error statistics. For example, we obtain the percentage of correctly classified pixels by dividing  the sum of the values on the diagonal (blue ellipse) by the total number of control points.  In the fictitious example on the right, this is 328 : 499 or about 66%.

Simple land cover map of part of Dublin (Phoenix Park area, bottom left) derived from a Sentinel 2 image captured on 13 August 2022 (top right). The classification algorithm used is Random Forest (a so-called machine learning algorithm). There are obvious errors present in the map. For instance, some buildings belonging to an industrial estate are incorrectly assigned to the "bare soil" class. Conversely, some fallow agricultural plots have been classified as "built-up area". The confusion matrix (fictitious example below right) allows to study the confusion between the different classes and to calculate some error measures. This both at the level of the whole map (e.g. total number of correctly or incorrectly classified pixels) and at the level of individual classes (e.g. "how many of the pixels grass on the map are really grass?" or "how many of the actual pixels "grass" in our ground truth were correctly assigned?".

Example 2 : Determining errors in quantitative remote sensing data


Comparison of VIIRS (left) and MODIS (right) remote sensing products representing land surface temperature (LST) versus the land surface temperature actually measured by weather stations in Gobabeb, Namibia. Due to a wrong estimation of the surface emissivity values used in the algorithms, both VIIRS and MODIS products underestimate the LST of the Namibian desert by more than 4 degrees Kelvin on average. The figure illustrates the need for ground reference data: two different remote sensing LST products can be very similar to each other because a similar algorithm was used, but they can differ significantly from the corresponding ground reference measurements. Source: Guillevic, P.C. et al. (2014). Validation of Land Surface Temperature products derived from the Visible Infrared Imaging Radiometer Suite (VIIRS) using ground-based and heritage satellite measurements, Remote Sensing of Environment, 154, p. 19-37, ISSN 0034-4257