Object and Instance Detection with Pre-trained Models

This module is primarily designed to facilitate the use of neural networks for computer vision tasks on image data. This particular tutorial focuses on object and instance detection.

Object and instance detection involves identifying and localizing objects within an image. Typically, object detection refers to predicting both the class and the bounding box of each object. In contrast, instance segmentation goes a step further by also determining the precise boundaries (masks) of each object, allowing for pixel-level differentiation between individual instances.

Both instance and object detection are handled by the abstract class Detector. An instance of this class provides a detect method, which accepts an image as input and returns a list of polygons representing the detected objects. Each polygon is defined as an ordered list of (x, y) coordinates, where the order reflects the connectivity of the polygon’s edges.

Each subclass of the Detector class provides its own implementation of the detect method, tailored to the specific framework and architecture used by the underlying neural network. This allows for flexibility in supporting various detection models while maintaining a consistent interface.

Currently, implemented Detector subclasses include YOLOv8 and Mask2Former these are based on the YOLOv8 (object detection) and Mask2Former (instance segmentation) architectures respectively.

An architecture can be trained to detect any class of object or instance by adjusting its weights. These model weights are learned by optimizing a loss function on training data - specifically, images paired with corresponding labels. The resulting weights are typically saved to a file, which can later be loaded for inference. This tutorial assumes that the model weights have already been calculated and are available for the architecture being used. If you have do not have model weights, you may wish to follow the tutorial on model training for guidance on how to obtain them.

The examples below demonstrate how to use the YOLOv8 and Mask2Former classes for object detection and instance segmentation, respectively.

# YOLOv8
from gwel.networks.YOLOv8 import YOLOv8
model_weights_path = 'path/to/model/weights'
detector = YOLOv8(weights = model_path)
dataset.detect(detector)

# Mask2Former
from gwel.networks.Mask2Former import Mask2Former
model_weights_path = 'path/to/model/weights'
detector = Mask2Former(weights = model_path)
dataset.detect(detector)

The detections are stored in the object_detections attribute of a ImageDataset instance as a dictionary, where each key is an image name and the corresponding value is a list of polygons representing the detected objects.

detections = dataset.object_detections

To visualize the detections, create a Viewer instance and set its mode attribute to 'instance'. You may also want to adjust the contour_thickness attribute.

from gwel.viewer import Viewer
viewer = Viewer(dataset,max_pixels=1500)
viewer.mode="instance"
viewer.contour_thickness = 4
viewer.open()
#to navigate to the next or previous images use the 'n' and 'p' keys respectively.
#press the 'q' key to quit. 
#pressing the 'f' key will flag images, see earlier tutorials for a recap on flagging.

By default, the ImageDataset.detect method automatically caches the object detections by storing them in COCO json format at '.gwel/coco_detections.json' inside the images directory. When the detect method is called a second time, it will automatically read this file without executing the model, unless the use_saved optional argument is set to False. Additionally, if you do not wish to cache or overwrite an existing coco_detections.json , set the write optional argument to False when calling the detect method.

Tiling for Small Object Detection

When detecting smaller objects, models generally require higher-resolution images to achieve accurate results. However, processing such images directly can exceed memory limitations.

To address this, a technique known as tiling or patching is used. This involves splitting a high-resolution image into smaller tiles and performing detection independently on each tile. This method is efficient as long as the tile size is significantly larger than the objects being detected.

This technique is implemented in all Detector subclasses and can be activated by passing a tuple to the optional patch_size parameter when creating the Detector instance , specifying the desired patch dimensions:

detector = YOLOv8(model_weights = model_path, patch_size=(512,512)) # YOLOv8 as an example
dataset.detect(detector)

The Cropping of Objects to Create New Image Collections

Once objects have been detected these can then be cropped and the cropped images can be saved to a new directory using the crop method of an ImageDataset instance.

object_directory = "path/to/where/cropped/object/images/should/be/saved"
object_name = "object_class_name"
dataset.crop(output_directory = object_directory, object_name)