V - HOW TO VALIDATE RESULTS?
2- ACCURACY OF DATA
Besides the geographical position, the content of the data must also be correct. At least, it must be as accurate as possible because 100% correct data exists only in an ideal world. Users of the data should at least have an idea of the type of errors it contains and their order of magnitude.
In the case of thematic maps derived from image classifications, we want to know whether the land-use classes we see on the map (e.g. forest, roads, built-up areas...) do indeed correspond to reality, and we want to have an idea of which classes might be confused with each other.
In the case of quantitative remote sensing data (surface variables such as, e.g., primary production or surface temperature), we want to have an idea of the order of magnitude of the error and of the standard deviation.
To validate remote sensing data, we use reference data, sometimes called ground truth. These can be obtained from different sources (interpretation of aerial photographs, thematic maps, ground measurements,...) and they can occur in different forms (digital maps, sensor measurements, graphs,...). Such reference data not only provide additional information when analysing remote sensing data, they therefore also allow us to check that they are correctly interpreted. .
Example 1 : Determining errors in thematic maps obtained by image classification
To quantify errors in thematic maps derived from satellite images, we can use a so-called confusion matrix. For a number of control points (examples indicated on the image to the right), we determine the class to which the corresponding image pixels actually belong. We obtain this so-called ground truth by visual interpretation and/or on-site verification with a GPS (GNSS device). By placing each checkpoint in a matrix with the actual classes in the columns and the assigned classes in the rows, we can calculate a number of error statistics. For example, we obtain the percentage of correctly classified pixels by dividing the sum of the values on the diagonal (blue ellipse) by the total number of control points. In the fictitious example on the right, this is 328 : 499 or about 66%.



