Supervised learning – Image Processing for Remote Sensing

Introduction

Supervised learning is a type of machine learning where the model is trained on a labelled dataset
The model learns to map input data to the correct output label
Supervised learning is used extensively in remote sensing to enable quantitative analysis of satellite imagery
We will use historic satellite imagery to train a model to classify land cover types in a given area

Task and Data

The task we will perform is to quantitatively analyze forest cover change over time in the area surrounding the village Ces in Ticino, Switzerland
We are provided with historic areal imagery from 1961 from swisstopo [source]
The resolution of the imagery is 1 m/pixel and it contains only one single band (!!!)

library(terra)
library(purrr)
library(sf)
library(dplyr)
library(tmap)

ces1961 <- rast("data-large/ces/1961.tif")

ces1961 <- ces1961/255 # normalize the values to [0,1]

names(ces1961) <- "luminosity"

plot(ces1961, col = grey.colors(255))

Approach

To solve this task

We first need to create some labelled data
Split our labelled data into training and testing data
Train a model on the training data
Evaluate the model on the testing data

Data labelling

In preparation, I used QGIS to create labelled points for the following classes:

Forest
Buildings
Agriculture
Shadows

library(sf)

labelled_data <- read_sf("data-large/ces/labelled-data.gpkg")

# We will turn the class into a factor, 
# which is useful for classification
labelled_data$class <- factor(labelled_data$class)

tmap_mode("view")

tm_shape(ces1961) + 
  tm_raster(palette = "-Greys",legend.show = FALSE) +
  tm_shape(labelled_data) +
  tm_dots(col = "class") +
  tm_layout(legend.outside = TRUE)

Splitting the data

We need to split our data into training and testing data
We will randomly select 70% of the data for training and the remaining 30% for testing

Important

Note that this is not the recommended approach, instead we should use a spatially stratified split (see Brenning 2023; and Schratz et al. 2021). To keep things simple for this exercise however, we will use a random split.

labelled_data <- labelled_data |> 
  group_by(class) |>            # for each class...
  mutate(                       #
    i = sample(seq_len(n())),   # ...assign a random number from 1 to n...
    train = i <= n()*0.7        # ...and assign the first 70% to "training"
    )

data_train <- labelled_data |>  # filter the training data and store in a
  filter(train) |>              # a new data frame 
  select(-c(i,train))           # remove obsolete columns


data_test <- labelled_data |>  # same as above, but the filter is inverted
  filter(!train) |>            # for the testing data
  select(-c(i,train))

tmap_mode("plot")

tm_shape(labelled_data) + tm_dots(col = "train", palette = c("blue","red"))

Feature Extraction

We need to extract the values of the raster data at the labelled points
Since we only have one band, our result is a data.frame with one column

# ID = FALSE means that we do not want to add an ID column
train_features <- terra::extract(ces1961, data_train, ID = FALSE)

head(train_features)

  luminosity
1  0.1843137
2  0.3568627
3  0.1960784
4  0.4392157
5  0.6313725
6  0.2823529

data_train2 <- cbind(data_train, train_features) |> 
  st_drop_geometry()

Training the model

We will use the rpart package to train a classification tree
The classification tree is also known as a decision tree
A decision tree has a flowchart-like structure (see Figure 6.1)
Classification trees does not always produce the best results, but they are simple and interpretable

library(rpart)
cart_model <- rpart(class~., data = data_train2, method = "class")

As mentioned above, the decision tree is interpretable. We can visualize it using the rpart.plot package:
Each pixel is classified into one of the four classes based on its value (luminosity) according to the decision tree in Figure 6.1

library(rpart.plot)
rpart.plot(cart_model, type = 3)

Figure 6.1: The resulting decision tree. Each internal node represents a test on an attribute, each branch represents the outcome of the test and each leaf node represents a class label

Predicting the probabilities per class for each pixel

We will use the trained model to predict the probabilities of each class for each pixel in the raster data
We can use the predict function to do this
The result is a raster with one layer per class, giving the probability of each class for each pixel

ces1961_predict <- predict(ces1961, cart_model)

tm_shape(ces1961_predict) + tm_raster(style = "cont", midpoint = .5, palette = "-Spectral",title = "probability") +
  tm_layout()

Highest probability class

We can combine the four layers into a single layer by assigning the class with the highest probability to each pixel
For this, we can use the which.max function:

ces1961_predict2 <- which.max(ces1961_predict)

# the next three lines assign class labels to the raster (instead of numbers)
classes <- levels(data_train2$class)
classes_df <- data.frame(ID = seq_along(classes), class = classes)
levels(ces1961_predict2) <- classes_df

# visualize the result
tmap_mode("view")

tm_shape(ces1961_predict2) + tm_raster(palette = c("gold","gray","palegreen4","black")) +
  tm_layout(legend.outside = TRUE)

Model Evaluation I

To evaluate the model, we will use the testing data

test_features <- terra::extract(ces1961_predict2, data_test, ID = FALSE)

confusion_matrix <- cbind(data_test, test_features) |> 
  st_drop_geometry() |> 
  transmute(predicted = class.1, actual = class) |> 
  table()

Table 6.1: The confusion matrix shows the predicted values (rows) against the actual values (columns). The diagonal cells (in red) shows the correct predictions. The results are better than expected, the model seems to haev the most problems with the class “agriculture”.

	Actual
	Agriculture	Buildings	Forest	Shadows
Agriculture	37	4	17	0
Buildings	1	5	1	0
Forest	3	0	14	1
Shadows	0	0	7	19

Model Evaluation I

A very simple measure to asses the model performance is the overall accuracy
This is the ratio of the number of features correctly classified (diagonals in Table 6.1) to the total number of features (sum of all column sums in Table 6.1):

\[\frac{\text{number of features correctly classified}}{\text{total number of features}}\]

if we insert our values, we get:

\[\frac{37+5+14+19}{41+9+39+20}= 0.69\]

Feature Engineering

Currently, our raster data only contains one band. In other words, our feature space is only one-dimensional
It’s actually quite a challenge to determine the land cover of a pixel if we regard just a single pixel. We humans rely on context (sorrounding pixels) to determine the land cover. You can test this by zooming in very closely on the raster image and try to distinguish the land cover types (see Figure 6.2)
We will have a look at possibilities to include context information in Feature Engineering

# 

xmin <- 2706200
xmax <- xmin+100
ymin <- 1143600
ymax <- ymin+100
e <- ext(xmin, xmax, ymin, ymax)

crop(ces1961, e) |> 
  plot( col = grey.colors(255))

Figure 6.2: Zooming in on the raster image, it is difficult to determine the land cover types of individual pixels

Tasks

Download the datasets labelled-data.gpkg and 1961.tif from moodle
Follow the steps described above to train a model to classify the land cover types in the area surrounding the village Ces in Ticino
Evaluate the model using the testing data