How to Use Detr for Smart Bone Fracture Detection

Leave a Comment / Object Detection

Last Updated on 16/11/2025 by Eran Feit

Getting to know Detr for smarter object detection

DETR

Detr (DEtection TRansformer) is a modern approach to object detection that replaces many of the hand-crafted tricks in classic detectors with a clean, transformer-based design. Instead of relying on anchors, custom assignment rules, and complex post-processing, Detr treats detection as a direct set prediction problem: given an image, it predicts a fixed set of bounding boxes and classes in one shot. A convolutional backbone extracts features, a transformer encoder–decoder reasons globally over the scene, and the model outputs final detections without needing non-maximum suppression.

What makes Detr especially powerful is its ability to capture relationships between objects across the entire image. The transformer encoder looks at all spatial positions at once, learning how bones, joints, and surrounding structures relate to each other instead of focusing only on local patches. The decoder then uses a set of learned “object queries” to attend to relevant regions and output boxes and labels, including the “no object” class for empty queries. This global reasoning is particularly valuable when working with noisy or complex images such as X-rays or MRI scans.

When you apply Detr to medical imaging, such as bone fracture detection, this architecture becomes a strong tool for locating subtle abnormalities. On X-ray images, fractures may appear as thin lines, small gaps, or slight changes in texture. By combining a deep feature extractor with transformer attention, Detr can learn to highlight these patterns and localize suspicious regions as bounding boxes, turning raw images into structured detections that radiologists or downstream systems can review. This matches broader trends in the literature, where deep learning pipelines are increasingly used to support fracture detection and classification in clinical workflows.

In this tutorial, the goal is to use Detr for smart bone fracture detection on a COCO-formatted dataset of X-ray images. You’ll see how to load and preprocess the data, wrap it with an image processor, train a Detr model using PyTorch Lightning, and finally evaluate it on new images. By the end, Detr becomes more than just a research idea: it turns into a practical, end-to-end object detection system that marks potential fractures directly on the image, ready to be integrated into real-world tools or educational demos.

If you are just getting started with object detection in general, you might also enjoy my SSD MobileNet v3 object detection tutorial for beginners, which walks through a lighter, real-time detector step by step.

DETR

Walking through the Detr bone fracture detection tutorial

This tutorial walks you step by step through a complete Detr workflow, from creating the environment to saving and testing your trained model on new X-ray images. The code starts by setting up a dedicated Conda environment with PyTorch, CUDA support, and the key libraries you need: transformers for Detr, pytorch-lightning for cleaner training loops, torchvision and pycocotools for COCO handling, and supervision plus OpenCV for drawing and visualizing detections. The goal is that you can copy the commands, recreate the same setup on your own machine, and immediately start experimenting with Detr on medical images.

Once the environment is ready, the code focuses on turning a COCO-formatted bone fracture dataset into something Detr can understand. A custom CocoDetection class wraps torchvision.datasets.CocoDetection and integrates DetrImageProcessor from Hugging Face. Each time you pull an item from the dataset, the image and its annotations are automatically encoded into pixel_values and detection targets that match Detr’s expected format. Three separate dataset objects are created for train, validation, and test splits, giving you a clean structure for training and evaluation.

The next part of the tutorial is all about preparing data for the model and wiring everything into PyTorch Lightning. A custom collate_fn handles batching: it pads images to the same size, builds the pixel_mask, and groups labels correctly so the model can process them in one go. Data loaders are created for both training and validation, and the category information is converted into an id2label mapping so that predictions can later be turned into human-readable bone fracture classes. At this stage, you already have a full data pipeline feeding Detr-ready tensors to the model.

The heart of the code is the Detr LightningModule, which wraps DetrForObjectDetection from the transformers library. In this class you load a pretrained facebook/detr-resnet-50 checkpoint, define the forward pass, implement a shared common_step that computes the loss and loss dictionary, and then use it in both training_step and validation_step while logging metrics. The optimizer is set up with different learning rates for the backbone and the rest of the model, and the Lightning Trainer handles epochs, gradient accumulation, clipping, and GPU usage. After training, the underlying Detr model is saved to disk so it can be reloaded later without repeating the whole training process.

Finally, the tutorial shows how to bring everything together for inference. The saved Detr model is reloaded, moved to the GPU, and used to predict on a chosen X-ray from the test folder. The image_processor prepares the image, the model outputs raw predictions, and a post-processing step converts them into bounding boxes, labels, and scores. With the help of supervision.BoxAnnotator, the code draws boxes and labels on top of the original X-ray, saving and displaying a clear visualization of the detected bone fracture. In one continuous flow, the tutorial demonstrates how to use Detr for smart bone fracture detection—from environment setup to a final image with red boxes around suspicious fracture regions.

Link for the video tutorial : https://youtu.be/cDzoPHpqCm8

Code for the tutorial here : https://eranfeit.lemonsqueezy.com/buy/5ffe563b-e9b4-4f3a-9500-77eae2fd148a

or here : https://ko-fi.com/s/f48f15ccd7

Link for the dataset : https://universe.roboflow.com/roboflow-100/bone-fracture-7fylg

Link for Medium users : https://medium.com/@feitgemel/how-to-use-detr-for-smart-bone-fracture-detection-cbfd8709496b

You can follow my blog here : https://eranfeit.net/blog/

Want to get started with Computer Vision or take your skills to the next level ?

If you’re just beginning, I recommend this step-by-step course designed to introduce you to the foundations of Computer Vision – Complete Computer Vision Bootcamp With PyTorch & TensorFlow

If you’re already experienced and looking for more advanced techniques, check out this deep-dive course – Modern Computer Vision GPT, PyTorch, Keras, OpenCV4

Getting to know Detr for smart bone fracture detection

Detr (DEtection TRansformer) is a modern object detection architecture that treats detection as a direct set prediction problem instead of a pipeline of separate stages.
It combines a convolutional backbone with an encoder–decoder transformer and predicts a fixed set of bounding boxes and labels in one shot, without anchor boxes or non-maximum suppression.

Because the transformer can model long-range relationships, Detr looks at the entire image when deciding where objects are and how they relate to each other.
For medical images like X-rays, this global context helps the model reason about bones, joints, and surrounding tissue instead of focusing only on local patches.

Deep learning has already shown strong performance in detecting skeletal fractures from radiographic images, and transformers are a natural next step in this trend.
By pairing Detr with a COCO-formatted bone fracture dataset, you can train a model that highlights suspicious regions directly on the X-ray, turning raw pixels into actionable detections.

In this tutorial we focus on How to Use Detr for Smart Bone Fracture Detection using PyTorch, Hugging Face Transformers, PyTorch Lightning, and the Supervision library.
You’ll see how to prepare the dataset, build a clean training loop, save your model, and finally test it on new X-ray images with clear bounding boxes around fractures.

Walking through the Detr bone fracture detection tutorial

This tutorial walks you step by step through a complete Detr workflow, from creating the environment to saving and testing your trained model on new X-ray images.
The code starts by setting up a dedicated Conda environment with PyTorch, CUDA support, and the key libraries you need: transformers for Detr, pytorch-lightning for the training loop, torchvision and pycocotools for COCO data, and supervision plus OpenCV for drawing and visualizing detections.
The goal is to keep everything copy-paste friendly so you can quickly adapt it to your own fracture datasets or other medical imaging projects.

Once the environment is ready, the code focuses on turning a COCO-formatted bone fracture dataset into something Detr can understand.
A custom CocoDetection class wraps torchvision.datasets.CocoDetection and integrates DetrImageProcessor from Hugging Face.
Each time you pull an item from the dataset, the image and its annotations are automatically encoded into tensors with pixel_values and labels that match Detr’s expected format.
Three separate dataset objects are created for train, validation, and test splits, giving you a clean structure for training and evaluation.

The next piece is preparing data for the model and wiring everything into PyTorch Lightning.
A custom collate_fn handles batching: it pads images to the same size, builds the pixel_mask, and groups labels correctly so the model can process them in one go.
Data loaders are created for both training and validation, and the category information is converted into an id2label mapping so that predictions can later be turned into human-readable bone fracture classes.
At this stage, you already have a full data pipeline feeding Detr-ready tensors to the model.

The heart of the code is the Detr LightningModule, which wraps DetrForObjectDetection from the transformers library.
In this class you load a pretrained facebook/detr-resnet-50 checkpoint, define the forward pass, implement a shared common_step that computes the loss and loss dictionary, and then reuse it in both training_step and validation_step while logging metrics.
The optimizer is set up with different learning rates for the backbone and the rest of the model, and the Lightning Trainer handles epochs, gradient accumulation, clipping, and GPU usage.
After training, the underlying Detr model is saved to disk so it can be reloaded later without repeating the whole training process.

Finally, the tutorial shows how to bring everything together for inference.
The saved Detr model is reloaded, moved to the GPU, and used to predict on a chosen X-ray from the test folder.
The image_processor prepares the image, the model outputs raw predictions, and a post-processing step converts them into bounding boxes, labels, and scores.
With the help of supervision.BoxAnnotator, the code draws boxes and labels on top of the original X-ray, saving and displaying a clear visualization of the detected bone fracture.

Setting up your Detr environment

Before touching any Python code, it’s worth isolating this project in its own Conda environment.
That keeps dependencies clean and makes it easy to reproduce the setup later or share it with others.

conda create -n detr python=3.9.11 conda activate detr  nvcc --version  # -> find your CUDA version  conda install pytorch==2.2.2 torchvision==0.17.2 torchaudio==2.2.2 pytorch-cuda=11.8 -c pytorch -c nvidia  pip install supervision==0.3.0 pip install transformers pip install pytorch-lightning pip install timm pip install cython pip install pycocotools pip install scipy

This environment now includes all the core pieces we’ll use for Detr bone fracture detection.
Once it’s ready, you can open your favorite editor or Jupyter notebook and start running the following parts.

Loading and visualizing the bone fracture dataset with Detr

This part builds the custom COCO dataset class, creates the train–val–test splits, and visualizes one annotated X-ray using Supervision and OpenCV.
The chosen keyword Detr appears naturally throughout, because everything here is about preparing clean input for the Detr model.

### Import the core PyTorch library. import torch  ### Import the Supervision library for easy detection visualization. import supervision as sv  ### Import the Transformers library, which provides Detr and its utilities. import transformers  ### Import PyTorch Lightning to support our training workflow later. import pytorch_lightning  ### Import the os module to handle file and folder paths. import os  ### Import torchvision, which includes the CocoDetection dataset helper. import torchvision  ### Import the random module to select a random image for visualization. import random  ### Import OpenCV for image loading and display. import cv2  ### Import NumPy for basic array operations if needed. import numpy as np  ### Import the DetrImageProcessor to prepare images and annotations for Detr. from transformers import DetrImageProcessor  ### Set the base folder where the COCO-formatted bone fracture dataset is stored. dataset = "C:/Data-sets/bone fracture.v2-release.coco"  ### Define the annotation file name used by the COCO dataset. ANNOTATION_FILE_NAME = "_annotations.coco.json"  ### Define the train images directory. TRAIN_DIRECTORY = os.path.join(dataset, "train")  ### Define the validation images directory. VAL_DIRECTORY = os.path.join(dataset, "valid")  ### Define the test images directory. TEST_DIRECTORY = os.path.join(dataset, "test")  ### Create a custom CocoDetection class that integrates the Detr image processor. class CocoDetection(torchvision.datasets.CocoDetection):     ### Initialize the dataset with an image directory and image processor.     def __init__(self, image_directory_path: str, image_processor, train: bool = True):         ### Build the full path to the COCO annotations file.         annotation_file_path = os.path.join(image_directory_path, ANNOTATION_FILE_NAME)         ### Call the parent CocoDetection constructor.         super(CocoDetection, self).__init__(image_directory_path, annotation_file_path)         ### Store the image processor for later use.         self.image_processor = image_processor      ### Retrieve a single item from the dataset.     def __getitem__(self, idx):         ### Use the parent class to load the raw image and annotations.         images, annotations = super(CocoDetection, self).__getitem__(idx)         ### Grab the COCO image ID for the current index.         image_id = self.ids[idx]         ### Wrap the annotations in the structure expected by the processor.         annotations = {"image_id": image_id, "annotations": annotations}         ### Use the image processor to create model-ready tensors.         encoding = self.image_processor(images=images, annotations=annotations, return_tensors="pt")         ### Extract the pixel values and remove the extra batch dimension.         pixel_values = encoding["pixel_values"].squeeze()         ### Extract the detection targets from the encoding.         target = encoding["labels"][0]         ### Return the processed image tensor and its target.         return pixel_values, target  ### Load the pretrained Detr image processor from Hugging Face. image_processor = DetrImageProcessor.from_pretrained("facebook/detr-resnet-50")  ### Create the training dataset using the custom CocoDetection class. TRAIN_DATASET = CocoDetection(image_directory_path=TRAIN_DIRECTORY, image_processor=image_processor, train=True)  ### Create the validation dataset in the same way. VAL_DATASET = CocoDetection(image_directory_path=VAL_DIRECTORY, image_processor=image_processor, train=False)  ### Create the test dataset for final evaluation. TEST_DATASET = CocoDetection(image_directory_path=TEST_DIRECTORY, image_processor=image_processor, train=False)  ### Print how many training images are available. print("Number of train images :", len(TRAIN_DATASET))  ### Print how many validation images are available. print("Number of validation  images :", len(VAL_DATASET))  ### Print how many test images are available. print("Number of test images :", len(TEST_DATASET))  ### Get all COCO image IDs from the training set. image_ids = TRAIN_DATASET.coco.getImgIds()  ### Choose a random image ID to visualize. image_id = random.choice(image_ids)  ### Print the chosen image ID. print("Image #{}".format(image_id))  ### Load the metadata for the chosen image from COCO. image_info = TRAIN_DATASET.coco.loadImgs(image_id)[0]  ### Load all annotations for this image. annotations = TRAIN_DATASET.coco.imgToAnns[image_id]  ### Build the full path to the image file. image_path = os.path.join(TRAIN_DATASET.root, image_info["file_name"])  ### Read the image from disk using OpenCV. image = cv2.imread(image_path)  ### Convert the COCO annotations into a Supervision Detections object. detections = sv.Detections.from_coco_annotations(coco_annotation=annotations)  ### Get the category dictionary from the COCO object. categories = TRAIN_DATASET.coco.cats  ### Print all categories to understand the dataset classes. print("categories:", categories)  ### Create a mapping from category ID to class name. id2label = {}  ### Loop over all categories and fill the id2label mapping. for k, v in categories.items():     ### Store the class name for the current category ID.     id2label[k] = v["name"]  ### Create an empty list to hold the labels for the current detections. labels = []  ### Loop through each detection returned by Supervision. for _, _, class_id, _ in detections:     ### Append the human-readable label for each detected class.     labels.append(f"{id2label[class_id]}")  ### Print a separator line for clarity in the console. print("==================================================================")  ### Print the final id2label mapping. print("id2label", id2label)  ### Print all labels for the current image. print("labels", labels)  ### Create a BoxAnnotator to draw bounding boxes and labels. box_annotator = sv.BoxAnnotator()  ### Draw the bounding boxes and labels on top of the image. frame = box_annotator.annotate(scene=image, detections=detections, labels=labels)  ### Show the annotated image in a window. cv2.imshow("Image", frame)  ### Wait for a key press before closing the window. cv2.waitKey(0)  ### Close all OpenCV windows. cv2.destroyAllWindows()

This snippet confirms that your COCO dataset is wired correctly and that Detr will receive clean pixel_values and labels.
You also get a quick visual sanity check by looking at the labeled X-ray before training.

For another example of training on a custom COCO-style dataset, check out Detectron2 custom dataset training made easy. It follows a similar idea but uses the Detectron2 framework instead of Detr.

Building the Detr data loaders for bone fracture detection

Now we need a collate function that can batch variable-sized images and a pair of data loaders that PyTorch Lightning can use during training and validation.
The idea is to let the Detr image processor handle padding and masks so the transformer can focus on learning meaningful bone fracture patterns.

### Import the DataLoader class to create iterable data batches. from torch.utils.data import DataLoader  ### Set float32 matmul precision to medium for better performance on recent GPUs. torch.set_float32_matmul_precision("medium")  ### Define a custom collate function to prepare a batch for Detr. def collate_fn(batch):     ### Initialize a list to store the pixel values of each image in the batch.     pixel_values_list = []     ### Loop over each item in the incoming batch.     for item in batch:         ### Append the image tensor (pixel values) to the list.         pixel_values_list.append(item[0])     ### Use the image processor to pad all images to the same size and build a pixel mask.     encoding = image_processor.pad(pixel_values_list, return_tensors="pt")     ### Initialize a list to hold the labels for each image.     labels = []     ### Loop again over the batch items to collect their labels.     for item in batch:         ### Append the second element of each item, which is the label dict.         labels.append(item[1])     ### Return a dictionary matching the Detr expected input format.     return {         "pixel_values": encoding["pixel_values"],         "pixel_mask": encoding["pixel_mask"],         "labels": labels,     }  ### Read the category dictionary from the training COCO object. categories = TRAIN_DATASET.coco.cats  ### Print the category dictionary for reference. print("Categories:") print(categories)  ### Create the id2label dictionary that maps numeric IDs to class names. id2label = {}  ### Loop through all categories to fill the mapping. for k, v in categories.items():     ### Assign the class name to the dictionary for each ID.     id2label[k] = v["name"]  ### Print the mapping to verify the number of classes. print("id2label :") print(id2label) print(len(id2label))  ### Print a separator to make console output easier to read. print("=====================================================")  ### Create the training data loader that shuffles samples and uses our collate function. TRAIN_DATALOADER = DataLoader(     dataset=TRAIN_DATASET,     collate_fn=collate_fn,     batch_size=4,     shuffle=True, )  ### Create the validation data loader without shuffling. VAL_DATALOADER = DataLoader(     dataset=VAL_DATASET,     collate_fn=collate_fn,     batch_size=4, )

With this in place, Detr will always receive batches of images with consistent sizes, masks, and label dictionaries.
That keeps the training loop clean and avoids shape-related bugs that are common when batching variable-resolution medical images.

Creating the Detr LightningModule and training loop

Here we wrap the Hugging Face DetrForObjectDetection model in a PyTorch LightningModule.
This class encapsulates the forward pass, shared training and validation logic, and optimizer configuration, making it easy to train Detr on the bone fracture dataset.

### Import PyTorch Lightning with its common alias. import pytorch_lightning as pl  ### Import the DetrForObjectDetection model class from Transformers. from transformers import DetrForObjectDetection  ### Import torch again to access the optimizer. import torch  ### Set the model checkpoint name for the pretrained Detr backbone. CHECKPOINT = "facebook/detr-resnet-50"  ### Define the main LightningModule that will train Detr on bone fractures. class Detr(pl.LightningModule):     ### Initialize the module with learning rates and weight decay.     def __init__(self, lr, lr_backbone, weight_decay):         ### Call the parent LightningModule constructor.         super().__init__()         ### Load the pretrained Detr model and adapt it to our number of labels.         self.model = DetrForObjectDetection.from_pretrained(             pretrained_model_name_or_path=CHECKPOINT,             num_labels=len(id2label),             ignore_mismatched_sizes=True,         )         ### Store the main learning rate.         self.lr = lr         ### Store the backbone learning rate.         self.lr_backbone = lr_backbone         ### Store the weight decay value.         self.weight_decay = weight_decay      ### Define the forward pass used for inference.     def forward(self, pixel_values, pixel_mask):         ### Call the underlying Detr model with pixel values and mask.         return self.model(pixel_values=pixel_values, pixel_mask=pixel_mask)      ### Define a shared step used by both training and validation.     def common_step(self, batch, batch_idx):         ### Extract pixel values from the batch dictionary.         pixel_values = batch["pixel_values"]         ### Extract pixel mask from the batch dictionary.         pixel_mask = batch["pixel_mask"]         ### Move each label tensor to the current device.         labels = [{k: v.to(self.device) for k, v in t.items()} for t in batch["labels"]]         ### Run the model forward pass with labels to compute loss and loss_dict.         outputs = self.model(pixel_values=pixel_values, pixel_mask=pixel_mask, labels=labels)         ### Extract the scalar loss value.         loss = outputs.loss         ### Extract the dictionary of individual loss components.         loss_dict = outputs.loss_dict         ### Return both loss and loss_dict to the caller.         return loss, loss_dict      ### Define the training step executed for each batch.     def training_step(self, batch, batch_idx):         ### Call the shared common_step to get loss and loss_dict.         loss, loss_dict = self.common_step(batch, batch_idx)         ### Log the overall training loss.         self.log("training_loss", loss)         ### Loop through each component in the loss dictionary.         for k, v in loss_dict.items():             ### Log each component individually with a prefix.             self.log("Train_" + k, v.item())         ### Return the main loss value for backpropagation.         return loss      ### Define the validation step executed for each validation batch.     def validation_step(self, batch, batch_idx):         ### Call the shared common_step to get loss and loss_dict.         loss, loss_dict = self.common_step(batch, batch_idx)         ### Log the overall validation loss.         self.log("validation_loss", loss)         ### Loop through each validation loss component.         for k, v in loss_dict.items():             ### Log each component with a validation prefix.             self.log("Validation_" + k, v.item())         ### Return the loss so Lightning can aggregate it each epoch.         return loss      ### Configure the optimizer and learning rates for Detr.     def configure_optimizers(self):         ### Group parameters that do not belong to the backbone.         param_dicts = [             {                 "params": [p for n, p in self.named_parameters() if "backbone" not in n and p.requires_grad],             },             {                 ### Group parameters that belong to the backbone and apply a smaller learning rate.                 "params": [p for n, p in self.named_parameters() if "backbone" in n and p.requires_grad],                 "lr": self.lr_backbone,             },         ]         ### Create an AdamW optimizer with our parameter groups.         return torch.optim.AdamW(param_dicts, lr=self.lr, weight_decay=self.weight_decay)      ### Return the training data loader for Lightning.     def train_dataloader(self):         ### Use the global training data loader defined earlier.         return TRAIN_DATALOADER      ### Return the validation data loader for Lightning.     def val_dataloader(self):         ### Use the global validation data loader defined earlier.         return VAL_DATALOADER  ### Create the Detr Lightning model with suitable hyperparameters. model = Detr(lr=1e-4, lr_backbone=1e-5, weight_decay=1e-4)  ### Import the Lightning Trainer class to manage the training loop. from pytorch_lightning import Trainer  ### Set the folder where Lightning logs will be stored. log_dir = "C:/temp/my_DETR_log"  ### Define the maximum number of training epochs. MAX_EPOCHS = 200  ### Create the Trainer object configured to use a single GPU and gradient accumulation. trainer = Trainer(     devices=1,     accelerator="gpu",     max_epochs=MAX_EPOCHS,     gradient_clip_val=0.1,     accumulate_grad_batches=8,     log_every_n_steps=1,     default_root_dir=log_dir, )  ### Start the Detr training process. trainer.fit(model)  ### Define a folder path where the trained Detr model will be saved. MODEL_PATH = "C:/temp/DETR-My-Model-1"  ### Save the underlying Hugging Face Detr model in the specified directory. model.model.save_pretrained(MODEL_PATH)

After this part finishes, you have a trained Detr bone fracture detector saved on disk, plus Lightning logs you can inspect in TensorBoard if you want deeper insights into the loss curves.

Testing Detr on new bone fracture X-ray images

The final part reloads your saved Detr model, runs inference on a test X-ray, and draws bounding boxes with class names and confidence scores.
This is where you see Detr turn all the previous work into a visual, practical bone fracture detection result.

### Import PyTorch for handling tensors and device placement. import torch  ### Import Supervision to annotate predictions on the image. import supervision as sv  ### Import Transformers for the DetrForObjectDetection class and processor. import transformers  ### Import PyTorch Lightning in case we want to reuse utilities. import pytorch_lightning  ### Import os to work with file paths. import os  ### Import torchvision so that TRAIN_DATASET remains compatible if reused. import torchvision  ### Import the DetrImageProcessor to preprocess input images. from transformers import DetrImageProcessor, DetrForObjectDetection  ### Import OpenCV to load and display images. import cv2  ### Recreate the image processor using the same pretrained checkpoint. image_processor = DetrImageProcessor.from_pretrained("facebook/detr-resnet-50")  ### Set the path where the trained Detr model was saved. MODEL_PATH = "C:/temp/DETR-My-Model-1"  ### Choose the device for inference, typically a CUDA GPU if available. device = "cuda" if torch.cuda.is_available() else "cpu"  ### Load the fine-tuned Detr model from the saved directory. model = DetrForObjectDetection.from_pretrained(MODEL_PATH)  ### Move the model to the chosen device. model.to(device)  ### Print the model architecture for a quick sanity check. print(model)  ### Create a BoxAnnotator instance to visualize predictions. box_annotator = sv.BoxAnnotator()  ### Set the path of a test image from the dataset. image_path = "C:/Data-sets/bone fracture.v2-release.coco/test/117_jpg.rf.119dccd2483b04d8d3a8c33a1393d362.jpg"  ### Read the test image using OpenCV. image = cv2.imread(image_path)  ### Define the confidence threshold for keeping predictions. CONFIDENCE_THRESHOLD = 0.35  ### Disable gradient computation during inference to speed things up. with torch.no_grad():     ### Preprocess the image with the Detr image processor and move tensors to the device.     inputs = image_processor(images=image, return_tensors="pt").to(device)     ### Run the model forward pass to obtain raw predictions.     outputs = model(**inputs)     ### Build a tensor describing the original image size.     target_sizes = torch.tensor([image.shape[:2]]).to(device)     ### Post-process raw outputs into final detections with boxes, scores, and labels.     results = image_processor.post_process_object_detection(         outputs=outputs,         threshold=CONFIDENCE_THRESHOLD,         target_sizes=target_sizes,     )[0]  ### Convert the Transformers detection results into a Supervision Detections object. detections = sv.Detections.from_transformers(transformers_results=results)  ### Build a label string for each detection using id2label and confidence score. labels = [f"{id2label[class_id]} {confidence:.2f}" for _, confidence, class_id, _ in detections]  ### Draw the detection boxes and labels on a copy of the original image. image_with_detection = box_annotator.annotate(scene=image.copy(), detections=detections, labels=labels)  ### Save the annotated image to disk. cv2.imwrite("predict.png", image_with_detection)  ### Display the image with detections in a window. cv2.imshow("image with detections", image_with_detection)  ### Optionally also display the original image for comparison. cv2.imshow("img", image)  ### Wait for a key press to close the windows. cv2.waitKey(0)  ### Destroy all OpenCV windows. cv2.destroyAllWindows()

Once this runs successfully, you’ll get an output image with bounding boxes around the predicted bone fractures, labeled with confidence scores

Once you are comfortable with Detr for bone fracture detection, you can combine detection with segmentation using SAM and YOLOv8 in my Segment Anything tutorial: generate YOLOv8 masks fast. It is a great next step if you need pixel-accurate medical masks instead of just bounding boxes.

FAQ

Can I run this Detr bone fracture tutorial on CPU only?

You can run the code on CPU for testing, but training will be slow and not ideal for larger datasets. A GPU is recommended for practical training times.

Which Detr checkpoint is used in this tutorial?

This tutorial uses the facebook/detr-resnet-50 checkpoint from Hugging Face as the starting point for fine-tuning on bone fracture data.

Do I need COCO annotations to train Detr on fractures?

Yes, the code expects COCO-style JSON files with bounding boxes and class IDs for train, validation, and test splits.

How can I add more fracture classes to this Detr model?

Update the COCO annotations with the new classes, regenerate id2label, and set num_labels to the new class count when loading Detr.

What batch size works well for Detr bone fracture training?

In this tutorial we use a batch size of 4 with gradient accumulation, which fits comfortably on most mid-range GPUs.

How long does Detr training usually take?

Training time depends on dataset size and GPU power, but expect at least a few hours for 200 epochs on a typical fracture dataset.

Can I resume Detr training from a previous checkpoint?

Yes, you can point the Lightning Trainer to a saved checkpoint or reload the Hugging Face weights and continue training from there.

How do I change the image size for training?

DetrImageProcessor handles resizing internally, but you can override the size parameter when you create the processor if needed.

Is this Detr tutorial suitable for beginners in medical imaging?

Yes, the steps are explained in a beginner-friendly way while still showing all the important configuration details for serious projects.

Can I adapt this code for non-medical object detection tasks?

Absolutely, you only need to swap the bone fracture dataset for your own COCO dataset and adjust class labels accordingly.

Conclusion

Detr brings a clean, modern view of object detection to medical imaging, and bone fracture detection is a perfect example of where it shines.
Instead of juggling anchors, custom assignment rules, and non-maximum suppression, you work with a single transformer-based model that directly predicts a set of bounding boxes and labels.
That simplicity makes it easier to focus on what really matters: curating a high-quality fracture dataset, monitoring the training process, and validating performance on real X-ray images.

In this tutorial you built a complete Detr pipeline around a COCO-formatted bone fracture dataset.
You started by creating a dedicated environment, then implemented a CocoDetection wrapper that talks nicely to DetrImageProcessor.
You designed a robust collate_fn for variable-sized images, created data loaders, and wrapped DetrForObjectDetection in a PyTorch LightningModule that cleanly defines training, validation, and optimization.
From there you trained the model, saved it to disk, and ran inference on new X-rays with visual overlays that highlight suspected fracture regions.

The nice thing about this setup is how reusable it is.
You can swap in a different medical dataset, change the paths, and be up and running with a new Detr project in minutes.
You can also experiment with other backbones, tweak learning rates, or integrate evaluation metrics and monitoring tools without rewriting the core training loop.
As you build more tutorials and projects, this Detr workflow becomes another solid building block in your computer vision toolbox, right next to YOLO, SSD, Detectron2, and your segmentation pipelines.

More tutorials you’ll like

Complete YOLOv8 classification tutorial for beginners – learn how to classify images with YOLOv8.
All object detection posts – browse more real-world projects with YOLO, SSD, Detectron2, and now Detr.

Connect :

☕ Buy me a coffee — https://ko-fi.com/eranfeit

🖥️ Email : feitgemel@gmail.com

🌐 https://eranfeit.net

🤝 Fiverr : https://www.fiverr.com/s/mB3Pbb

Enjoy,

Eran

Leave a Comment Cancel Reply