Finite Carbon

Building a Platform for Digital Energy

Imagine you’re working on a machine learning model to identify areas affected by deforestation. The dataset contains satellite images, and your task is to classify image patches into two categories: “Deforested” (Class A) and “Non-Deforested” (Class B). However, deforested patches are significantly rarer than non-deforested ones.

Data Description:

  • Class A (Deforested; at most 5% of the dataset): Represents the rare instances of deforested patches (e.g., cleared land, logging areas).
  • Class B (Non-Deforested): Dominates the dataset and includes natural forest cover, agricultural land, and other non-deforested regions.

Classifier Performance Metrics:

  • Accuracy: While overall accuracy is commonly used, it can be misleading due to class imbalance. In this case, achieving high accuracy might not reflect the model’s true performance.
  • Precision: Precision measures the proportion of correctly predicted deforested instances (Class A) out of all predicted positive instances. Minimizing false positives is crucial to avoid misclassifying non-deforested areas.
  • Recall (Sensitivity): Recall calculates the proportion of true deforested instances (Class A) correctly identified out of all actual deforested instances. High recall ensures we don’t miss deforested areas.
  • Specificity: Specificity represents the proportion of true non-deforested instances (Class B) correctly identified out of all actual non-deforested instances. Avoiding false alarms for natural forest cover is essential.
  • F1 Score: The F1 Score balances precision and recall, considering both false positives and false negatives. It’s particularly useful for imbalanced data.

Challenge:

  • How would you design a binary classifier that optimizes both precision (minimizing false positives) and recall (minimizing false negatives) for detecting deforested areas?
  • Consider techniques like oversampling deforested patches, using weighted loss functions, or leveraging synthetic data generation.
  • Which evaluation metric(s) would you prioritize when evaluating your model’s performance on deforestation detection?

Remember, in rare-case binary classification, thoughtful model selection, feature engineering, and robust evaluation are critical to achieving reliable results, especially when dealing with imbalanced data.

Jordan Golinkoff
Jordan Golinkoff
Senior Director, Research and Development at Finite Carbon
Bahareh Yekkehkhany
Bahareh Yekkehkhany
Applied Remote Sensing Scientist, Finite Carbon
Yasaman Shahhosseini
Yasaman Shahhosseini
Graduate Student
Arman Jahangiri
Arman Jahangiri
Graduate Student
Patrik Coulibaly
Patrik Coulibaly
PhD candidate