Supervised learning

Introduction

  • Supervised learning is a type of machine learning where the model is trained on a labelled dataset
  • The model learns to map input data to the correct output label
  • Supervised learning is used extensively in remote sensing to enable quantitative analysis of satellite imagery
  • We will use historic satellite imagery to train a model to classify land cover types in a given area

Task and Data

  • The task we will perform is to quantitatively analyze forest cover change over time in the area surrounding the village Ces in Ticino, Switzerland
  • We are provided with historic areal imagery from 1961 from swisstopo [source]
  • The resolution of the imagery is 1 m/pixel and it contains only one single band (!!!)
library(terra)
library(purrr)
library(sf)
library(dplyr)
library(tmap)

ces1961 <- rast("data-large/ces/1961.tif")

ces1961 <- ces1961/255 # normalize the values to [0,1]

names(ces1961) <- "luminosity"
plot(ces1961, col = grey.colors(255))

Approach

To solve this task

  1. We first need to create some labelled data
  2. Split our labelled data into training and testing data
  3. Train a model on the training data
  4. Evaluate the model on the testing data

Data labelling

In preparation, I used QGIS to create labelled points for the following classes:

  • Forest
  • Buildings
  • Agriculture
  • Shadows
library(sf)

labelled_data <- read_sf("data-large/ces/labelled-data.gpkg")

# We will turn the class into a factor, 
# which is useful for classification
labelled_data$class <- factor(labelled_data$class)

tmap_mode("view")
tm_shape(ces1961) + 
  tm_raster(palette = "-Greys",legend.show = FALSE) +
  tm_shape(labelled_data) +
  tm_dots(col = "class") +
  tm_layout(legend.outside = TRUE)

Splitting the data

  • We need to split our data into training and testing data
  • We will randomly select 70% of the data for training and the remaining 30% for testing
Important

Note that this is not the recommended approach, instead we should use a spatially stratified split (see Brenning 2023; and Schratz et al. 2021). To keep things simple for this exercise however, we will use a random split.

labelled_data <- labelled_data |> 
  group_by(class) |>            # for each class...
  mutate(                       #
    i = sample(seq_len(n())),   # ...assign a random number from 1 to n...
    train = i <= n()*0.7        # ...and assign the first 70% to "training"
    )

data_train <- labelled_data |>  # filter the training data and store in a
  filter(train) |>              # a new data frame 
  select(-c(i,train))           # remove obsolete columns


data_test <- labelled_data |>  # same as above, but the filter is inverted
  filter(!train) |>            # for the testing data
  select(-c(i,train))

tmap_mode("plot")

tm_shape(labelled_data) + tm_dots(col = "train", palette = c("blue","red"))

Feature Extraction

  • We need to extract the values of the raster data at the labelled points
  • Since we only have one band, our result is a data.frame with one column
# ID = FALSE means that we do not want to add an ID column
train_features <- terra::extract(ces1961, data_train, ID = FALSE)

head(train_features)
  luminosity
1  0.3568627
2  0.1960784
3  0.4392157
4  0.6313725
5  0.2823529
6  0.5294118
data_train2 <- cbind(data_train, train_features) |> 
  st_drop_geometry()

Training the model

  • We will use the rpart package to train a classification tree
  • The classification tree is also known as a decision tree
  • A decision tree has a flowchart-like structure (see Figure 6.1)
  • Classification trees does not always produce the best results, but they are simple and interpretable
library(rpart)
cart_model <- rpart(class~., data = data_train2, method = "class")
  • As mentioned above, the decision tree is interpretable. We can visualize it using the rpart.plot package:
  • Each pixel is classified into one of the four classes based on its value (luminosity) according to the decision tree in Figure 6.1
library(rpart.plot)
rpart.plot(cart_model, type = 3)
Figure 6.1: The resulting decision tree. Each internal node represents a test on an attribute, each branch represents the outcome of the test and each leaf node represents a class label

Predicting the probabilities per class for each pixel

  • We will use the trained model to predict the probabilities of each class for each pixel in the raster data
  • We can use the predict function to do this
  • The result is a raster with one layer per class, giving the probability of each class for each pixel
ces1961_predict <- predict(ces1961, cart_model)

tm_shape(ces1961_predict) + tm_raster(style = "cont", midpoint = .5, palette = "-Spectral",title = "probability") +
  tm_layout()

Highest probability class

  • We can combine the four layers into a single layer by assigning the class with the highest probability to each pixel
  • For this, we can use the which.max function:
ces1961_predict2 <- which.max(ces1961_predict)

# the next three lines assign class labels to the raster (instead of numbers)
classes <- levels(data_train2$class)
classes_df <- data.frame(ID = seq_along(classes), class = classes)
levels(ces1961_predict2) <- classes_df

# visualize the result
tmap_mode("view")
tm_shape(ces1961_predict2) + tm_raster(palette = c("gold","gray","palegreen4","black")) +
  tm_layout(legend.outside = TRUE)

Model Evaluation I

  • To evaluate the model, we will use the testing data
test_features <- terra::extract(ces1961_predict2, data_test, ID = FALSE)

confusion_matrix <- cbind(data_test, test_features) |> 
  st_drop_geometry() |> 
  transmute(predicted = class.1, actual = class) |> 
  table()
Table 6.1: The confusion matrix shows the predicted values (rows) against the actual values (columns). The diagonal cells (in red) shows the correct predictions. The results are better than expected, the model seems to haev the most problems with the class “agriculture”.
Actual
Agriculture Buildings Forest Shadows
Agriculture 34 4 13 0
Buildings 3 5 0 0
Forest 4 0 16 2
Shadows 0 0 10 18

Model Evaluation I

  • A very simple measure to asses the model performance is the overall accuracy
  • This is the ratio of the number of features correctly classified (diagonals in Table 6.1) to the total number of features (sum of all column sums in Table 6.1):

\[\frac{\text{number of features correctly classified}}{\text{total number of features}}\]

  • if we insert our values, we get:

\[\frac{34+5+16+18)}{41+9+39+20}= 0.67\]

Feature Engineering

  • Currently, our raster data only contains one band. In other words, our feature space is only one-dimensional
  • It’s actually quite a challenge to determine the land cover of a pixel if we regard just a single pixel. We humans rely on context (sorrounding pixels) to determine the land cover. You can test this by zooming in very closely on the raster image and try to distinguish the land cover types (see Figure 6.2)
  • We will have a look at possibilities to include context information in Feature Engineering
xmin <- 2706200
xmax <- xmin+100
ymin <- 1143600
ymax <- ymin+100
e <- ext(xmin, xmax, ymin, ymax)

crop(ces1961, e) |> 
  plot( col = grey.colors(255))
Figure 6.2: Zooming in on the raster image, it is difficult to determine the land cover types of individual pixels

Tasks

  1. Download the datasets labelled-points.gpkg and 1961.tif from moodle
  2. Follow the steps described above to train a model to classify the land cover types in the area surrounding the village Ces in Ticino
  3. Evaluate the model using the testing data