<- 5
n <- matrix(rep(1,n^2), ncol = n)
focal3by3
<- focal(ces1961, focal3by3, fun = sd, fillNA = TRUE)
r_foc3
<- r_foc3
r_foc3
# plot(r_foc3)
Feature Engineering
What is Feature Engineering?
- Feature Engineering is the process of using domain knowledge to extract features from raw data.
- This is especially useful, when our raw data is not sufficient to build a model
- In our previous example, we only had luminosity to predict the class of the raster cells
- As discussed in the chapter Feature Engineering, we humans ourselves rely on context to determine the land cover types
- This context is provided by the values of the sorrounding pixels
- We can provide this context by applying focal filters to the raster data
Focal filters
- Focal Filters, as we have seen in the chapter Focal, aggregate the values over a (moving) neighborhood of pixels.
- We can determine the size and shape of this neighborhood by specifying a matrix


Using focal filters as features
- To use the focal filters as features, the values of the focal filters need to be normalized to [0,1]
- A simple way to do this is to use the min-max normalization:
\[x' = \frac{x - min(x)}{\max(x) - min(x)}\]
- To implement this in R, we need to use
global(x, min)
or (slightly faster)minmax(x)
.
<- function(x){
minmax_normalization <- minmax(x)[,1]
minmax_vals <- minmax_vals[1]
minval <- minmax_vals[2]
maxval
-minval)/(maxval-minval)
(x
}
<- minmax_normalization(r_foc3)
r_foc3
<- c(ces1961, r_foc3)
ces
names(ces) <- c("luminosity", "focal3by3")
Feature extraction
- Just as we did in our first approach (see Feature Extraction), we need to extract the features from the raster data at the labelled points
- Note that the resulting data frame now has two columns, rather than just a single column
<- terra::extract(ces, data_train, ID = FALSE)
train_features_b
head(train_features_b)
luminosity focal3by3
1 0.3568627 0.22739583
2 0.1960784 0.23494115
3 0.4392157 0.17400471
4 0.6313725 0.16638463
5 0.2823529 0.33381873
6 0.5294118 0.09907246
<- cbind(data_train, train_features_b) |>
data_train2_b st_drop_geometry()
Train the model
- Just as in our first approach (see Training the model), we need to train the model
- This time, we have more features to train the model
<- rpart(class~., data = data_train2_b, method = "class")
cart_modelb
library(rpart.plot)
rpart.plot(cart_modelb, type = 3)
Predict the classes
See Predicting the probabilities per class for each pixel and Highest probability class.
# Probability per class
<- predict(ces, cart_modelb)
ces1961_predictb
# Class with highest probability
<- which.max(ces1961_predictb) ces1961_predict2b
Evaluate the model
See Model Evaluation I and Model Evaluation I
<- terra::extract(ces1961_predict2b, data_test, ID = FALSE)
test_featuresb
<- cbind(data_test, test_featuresb) |>
confusion_matrixb st_drop_geometry() |>
transmute(predicted = class.1, actual = class) |>
table()
Agriculture | Buildings | Forest | Shadows | |
---|---|---|---|---|
Agriculture | 33 | 3 | 6 | 1 |
Buildings | 2 | 5 | 0 | 0 |
Forest | 6 | 1 | 32 | 0 |
Shadows | 0 | 0 | 1 | 19 |
- In our first approach, we achieved an accuracy of 0.67 (see Model Evaluation I)
- With our additional features, the overall accuracy is 0.82
- We can further improve our model by adding more features in this way
Tasks
- First do the tasks described here: Tasks
- Use the
focal
function to create new features as described above - Evaluate your new model