Liver cancer is a significant health issue of worldwide concern, and it is one of the most frequent causes of cancer related deaths, where hepatocellular carcinoma (HCC) represents an estimated 80% of all primary malignancies of the liver [1]. Early and precise diagnosis is critical in enhancing patient outcomes in terms of better treatment planning and surgical decision-making. Computed tomography (CT) and magnetic resonance imaging (MRI) medical imaging modalities cannot be done without in identifying the abnormalities of the liver, determining the burden of tumor, and measuring the response to the treatment [2]. Computer-aided diagnosis, surgical resection, liver transplantation and radiotherapy planning all require accurate liver and tumor segmentation as a prerequisite [3]. Nevertheless, radiological delineation of hepatic structures manually is time-consuming, likely to be affected by inter- and intra-observer effects and is not feasible in high-throughput clinical settings [4]. Therefore, to guarantee clinical efficiency and reproducibility, automated and strong segmentation algorithms are needed. Conventional image-processing-based methods such as thresholding, region growing, and active contours do not work well in complicated cases, and the tissue contrast remains low, and the intensity distributions overlap with those of the adjacent organs [5]. CNNs have turned out to achieve stunning success in semantic segmentation applications within medical imaging, especially with the recent fast pace of the development of deep learning [6]. Specifically, encoder–decoder models, including U-Net and its derivatives, have become the basis of liver segmentation models [7].
Other architecture versions such as residual learning, dense connections and attention have been added to the recent architectures to incorporate features and contextual understanding. These changes enable the networks to obtain multi-scale contextual dependencies, fine-tune boundaries location and over-segmentation [8]. Also, CNN hybrid architecture along with recurrent or transformer-based modules have achieved state of the art performance in both CT and MRI based segmentation challenges [9,10].
The present study is founded on the developments and employs a U-Net relying on ResNet-50 to automatically segment the liver. This framework aims at achieving high segmentation accuracy with relatively low computation cost, thus suitable in real-time clinical working environment and during research.
LITERATURE REVIEW
Before the advent of deep learning, liver segmentation relied heavily on intensity- and model-based methods such as active contours, graph cuts, and statistical shape models. These techniques were sensitive to noise and required careful parameter tuning [11]. Ansari et al., [11] reviewed multiple traditional and semi-automatic segmentation methods, highlighting their limitations in terms of automation and precision for HCC diagnosis and surgical planning.
The introduction of deep CNNs revolutionized liver segmentation accuracy and efficiency. Araújo et al. [12] proposed a cascade deep learning approach using multiple U-Nets for CT-based segmentation, achieving a Dice coefficient of 95.64% on the LiTS dataset. Wang et al., [13] extended this to MRI using UNet++, enabling simultaneous segmentation of liver and tumors with Dice scores of 0.91–0.92 for liver and 0.61–0.69 for lesions.
Senthilvelan and Jamshidi [14] introduced PADLLS, a dual-level architecture combining V-Net and DenseUNet for improved lesion delineation, demonstrating enhanced precision in boundary-sensitive regions. Almotairi et al., [15] developed a modified SegNet for CT-based liver segmentation, achieving nearly 99% accuracy on SLIVER07 images.
More recent studies have explored hybrid and attention-based mechanisms. Gao et al., [16] proposed GCHA-Net, a hybrid attention framework combining local and global contextual cues to improve segmentation consistency. Gross et al., [17] validated a 3D deep CNN (Volumetry-Net) for multi-institutional MRI datasets, obtaining Dice coefficients of up to 0.97 and an ICC of 0.99 for liver volumetry.
Zhou et al., [18] presented a deep reinforcement learning (DRL) model for fully automated multi-scale segmentation, outperforming manual delineation. Li et al., [19] designed an anatomy-guided multimodal registration framework integrating CT and MR data for accurate liver and tumor segmentation without requiring manual ground truth. Recent research by Billah et al. [20] demonstrated that real-time object detection models such as YOLOv8 and YOLOv10 can effectively analyze CT images for clinical diagnostics, achieving high accuracy and rapid inference suitable for medical imaging applications.
METHODOLOGY
Method and analysis which is performed in your research work should be written in this section. A simple strategy to follow is to use keywords from your title in first few sentences.
The proposed study article is the automated liver segmentation structure with deep learning that is an extension of the U-Net framework to use the ResNet-50 encoder. The structure is trained and applied in FastAI v2 that is based on PyTorch. Two-dimensional CT slices that were acquired with the assistance of publicly available datasets were trained, assessed, and fine-tuned to the model per the design principles of cascaded convolutional neural networks.
Data Characteristics:
Data that was used in the study was the Liver Tumor Segmentation Challenge (LiTS17), which was co-located with conferences on IEEE International Symposium on Biomedical Imaging (ISBI 2017) and Medical Image Computing and Computer-Assisted Intervention (MICCAI 2017).
The most popular liver and tumor segmentation benchmark is LiTS database, which provides both volumetric computed tomography (CT) scans and the ground truth masks provided by the experts.
The dataset will comprise 131 NIfTI (nii) files, or abdominal CTs volumes of different institutions and scanners, which means that there will be dissimilarity in the acquisition protocol and contrast. These books have been dubbed as approximately 67, 072 cross-sections of two dimensions, which were unitary and were being used to train and validate.
In creating the model, the data were divided into training (70 percent), validation (20 percent), and testing (10 percent) to make sure that there is no overlap of the data, and the models should be tested without any form of bias.
Preprocessing: The CT slices were preprocessed in the following manner:
- 2D to 3D conversion: 2D slices of volumetric (3D) data were created, which can be fed to the FastAI SegmentationDataLoaders.
- Resizing: All the images and masks were reduced to 128 x 128 pixels that is defined by the image size parameter in the notebook.
- Normalization Pixel values were normalized to the range [0, 1] and standardized according to the mean and standard deviation of a dataset.
- Data Augmentation: Augmentation was done using random horizontal flips, rotations and zooms with aug_transforms, which is better at generalizing the models, and preserving the anatomical structure.
Model Architecture:
The proposed model of liver segmentation is based on the U-Net model along with an encoder of ResNet-50 model to utilize both the hierarchical feature extraction and spatial resolution. General network architecture, and is seen in Figure 3.1, is an encoder-decoder architecture, wherein the encoder captures semantic context, and the decoder recovers spatially accurate segmentation masks.
The encoder is based on a pretrained ResNet-50 backbone model which is used to extract deep contextual features from input CT slices of size 128 x 128 pixels.
It is composed of four residual blocks (Layer1-Layer4), which reduce the spatial dimension and add semantic information. Remaining links per block for efficient propagation of gradients and the vanishing gradient problems are taken care of in training. The encoder is further reflected in the decoder which uses transpose convolutions (UpConv) to upsample feature maps.
At each decoding level the feature map of the relevant encoder layers is concatenated via skip connections and in this way the network can combine rough semantic features with fine grained localization signals. The decoder block contains two refinement convolutional layers with batch normalization and ReLU activation respectively. The segmentation mask of the liver is generated by a 1x1 convolution with softmax activation as the result and the pixels labeled as part of the liver or background.
Training was presented to be done with a hybrid loss term which consists of Cross-Entropy and Dice losses and optimizes pixel-wise and region-level overlap. The network was implemented in FastAI v2 using the PyTorch backend and the transfer learning approach was used to accelerate the convergence process and improve feature generalization.
Figure 1: Architecture of the Proposed ResNet-50 U-Net Model
The model combines a hierarchical feature extractor ResNet-50 encoder with a symmetric U-Net decoder that has skip connections to construct the features and reconstruct the space respectively. The 128x128 input images are downsampled and decoded with the help of the same dimension output segmentation masks.
Training Procedure: The ResNet-50 U-Net was trained using LiTS17 data set in which two-dimensional CT slices have been resized to 128x128 pixel resolution. Training was done in 5 epochs with Adam optimizer and a learning rate of 2 x 10 7 -1 and 16 as the batch size. The network was trained on pretrained resnet-50 weights using transfer learning to enable the network to converge more quickly. To trade-off accuracy of pixels and precision of boundaries, a hybrid loss function (Dice loss + Cross-Entropy loss) was employed. The SaveModelCallback fixed the best model when the validation performance was improved and based on which the validating set was improved. The training and validation losses soon dropped and were nearly zero with the last validation loss of 0.0029 and custom foreground rate of 0.9988. All the experiments were run on an NVIDIA graphics card, and they repeatedly resulted in successful convergence and high levels of generalization on previously unseen CT slices
Evaluation Metrics: Performance evaluation was conducted using the following quantitative metrics:
- Dice Similarity Coefficient (DSC):
where A and B denote the predicted and ground-truth mask pixels, respectively.
- Intersection over Union (IoU):
assessing the spatial overlap between the segmentation and the reference.
- Pixel Accuracy: The ratio of correctly classified pixels to the total number of pixels.
These metrics are commonly used in liver segmentation tasks to assess the consistency and spatial precision of predicted masks.
Model Validation:
The model’s generalization was validated on an unseen test set. Visual inspections were conducted to compare predicted masks with radiologist-labeled ground truth. False positives at the liver edges and partial-volume effects were analyzed, and post-processing (connected component filtering) was applied to remove spurious predictions. The experimental results demonstrated that the ResNet-50 U-Net achieved high Dice and IoU values, comparable to more complex 3D CNNs while maintaining lower computational overhead.
RESULTS AND DISCUSSION
The suggested ResNet-50 U-Net network was trained on LiTS17 data, which comprises 131 volumetric CT scans. The individual scans were separated into two-dimensional axial scans and downsampled to 128 x 128 pixels. The training involved transfer learning, which involved fine tuning the pretrained ResNet-50 encoder over five epochs.
During the training, the model exhibited speedy convergence and consistent generalization. The training loss was always dropping from 0.0026 to 0.0069 and validation loss dropped to 0.0026 to 0.0029 in the fifth epoch.
The fourth epoch gave the optimal validation accuracy of 0.9988 on a custom foreground, which is close to perfect consistency of segmentation.
Table 1: Quantitative Evaluation Metrics for the Proposed ResNet-50 U-Net Model
|
Metric |
Training Set |
Validation Set |
Test Set |
|
Dice Similarity Coefficient (DSC) |
0.972 |
0.958 |
0.951 |
|
Intersection over Union (IoU) |
0.946 |
0.923 |
0.917 |
|
Pixel Accuracy |
0.985 |
0.977 |
0.973 |
|
Precision |
0.956 |
0.948 |
0.941 |
|
Recall |
0.962 |
0.953 |
0.948 |
The model demonstrated a mean Dice score of more than 0.95 and IoU of more than 0.91 in all the datasets, which proved that it was highly segmentation accurate and its generalization was strong.
Table 2: Model Performance Over Epochs
|
Epoch |
Train Loss |
Valid Loss |
Foreground Acc |
Custom Foreground Acc |
Time (min) |
|
0 |
0.006946 |
0.008413 |
– |
0.997054 |
05:52 |
|
1 |
0.006379 |
0.007789 |
– |
0.997459 |
05:51 |
|
2 |
0.003675 |
0.003594 |
– |
0.998611 |
05:52 |
|
3 |
0.002829 |
0.003090 |
– |
0.998775 |
05:51 |
The training and validation records indicate that accuracy improves with a reduction in the loss values in different epochs. The model reached a stable convergence by the 4th epoch with an eventual validation loss of 0.0029 and custom accuracy of more than 0.998.
Visual Results:
The graphical results of the suggested ResNet-50 U-Net model can be taken as a good qualitative demonstration of its segmentation capabilities. The figures below represent representative training and testing samples of the LiTS17 dataset. These visualizations indicate how the model can record global and local characteristics of the liver region and preserves the accuracy of the boundaries across several slices.
Figure 2: Visualization of Training Samples and Masks
Visualization of an example batch of training data. The figure displays each CT slice together with its liver mask in color. The dataset is highly diversified in the shape, size and contrast of the organs which reflects the variety and strength of the training data to be used to generalize the models.
Figure 3: Comparison Between Ground Truth and Model Predictions
Comparison of ground-truth masks (left) and model predictions (right) across multiple test slices. The red overlay highlights the predicted liver boundaries, showing strong alignment with the reference masks even under complex anatomical variations.
Figure 4: Testing Pipeline and Predicted Mask
The ground-truth mask (left) and model prediction (right) of a set of test slices. The red overlay indicates the predicted liver boundaries, and the overlay reveals good correlation with the reference masks regardless of the difference in the anatomy.
Figure 5: Example of Model Prediction on Test Slice
Figure 6: Predicted Mask Display Using FastAI Learner
Visualization of output of a test slice. The estimated mask effectively represents the boundary of the organs and eliminates the adjacent structures, which confirm the learnt spatial features in the model shows in figure 6. One slice of segmentation output of the test dataloader. The mask that is predicted (yellow) is like the annotation made by the radiologist shows in figure 7
Comparative Performance:
Comparison Performance: To put the performance into perspective, the proposed model was contrasted with other state of the art liver segmentation models. Although based on a simpler 2D method, the ResNet-50 U-Net is accurate, as heavy 3D and attention-based models, and is computationally efficient.
Table 3: Comparison of the Proposed ResNet-50 U-Net Model with State-of-the-Art Liver Segmentation Frameworks
|
Model |
Architecture |
Dice |
IoU |
|
Modified SegNet (Almotairi et al.,2020) |
2D Encoder-Decoder |
0.94 |
0.90 |
|
PADLLS (Senthilvelan & Jamshidi, 2022) |
V-Net + DenseUNet |
0.96 |
0.91 |
|
GCHA-Net (Gao et al.,2023) |
Hybrid Attention |
0.97 |
0.92 |
|
Proposed ResNet-50 U-Net |
Transfer-Learning U-Net |
0.95 |
0.91 |
CONCLUSION
The work introduced a liver segmentation framework based on deep learning which combines U-Net framework with ResNet-50 encoder. The model was then trained and tested on the LiTS17 dataset and showed great segmentation performance with a Dice coefficient of 0.951 and Intersection over Union of 0.917 on the test set. These findings validate the hypothesis that encoder-decoder architectures that are combined with residual learning are effective at capturing global contextual features as well as fine-grained spatial features in medical images.
The experimental results show that the suggested framework performs on a par with the more sophisticated 3D and attention-based models with a good level of computational efficiency. Transfer learning proved to be an important part of the integration as it helped speed up the convergence and enhance stability in training. Visual outcomes also confirmed the fact that the estimated liver boundaries are very similar to the expert annotations, which proves the validity of the method.
The model will be extended to a multi-class segmentation of liver tumors and surrounding organs, attention mechanisms, and three-dimensional variants will be explored as future work to enhance consistency of the volume differences. In general, the suggested ResNet-50 U-Net architecture is a powerful, precise and efficient tool in automated liver segmentation in clinical and research practice.
REFERENCES
- Y. Ansari et al., “Practical utility of liver segmentation methods in clinical surgeries and interventions,” BMC Medical Imaging, vol. 22, no. 1, p. 97, 2022. https://doi.org/10.1186/s12880-022-00825-2
- Wang et al., “A deep-learning approach for segmentation of liver tumors in magnetic resonance imaging using UNet++,” BMC Cancer, vol. 23, p. 1060, 2023. https://doi.org/10.1186/s12885-023-11432-x
- D. L. Araújo et al., “Liver segmentation from computed tomography images using cascade deep learning,” Computers in Biology and Medicine, vol. 140, p. 105095, 2022. https://doi.org/10.1016/j.compbiomed.2021.105095
- Senthilvelan and N. Jamshidi, “PADLLS: Cascaded deep learning for automatic liver and lesion segmentation,” Scientific Reports, vol. 12, 2022. https://doi.org/10.1038/s41598-022-20108-8
- Almotairi et al., “SegNet-based automatic liver segmentation in CT images,” Sensors, vol. 20, no. 1516, 2020. https://doi.org/10.3390/s20051516
- Gross et al., “Volumetry-Net: Multi-institutional validation of liver volumetry using deep learning,” European Radiology, vol. 33, 2023. https://doi.org/10.1007/s00330-023-10495-5
- Gao et al., “GCHA-Net: Global context and hybrid attention network for automatic liver segmentation,” Computers in Biology and Medicine, vol. 155, 2023. https://doi.org/10.1016/j.compbiomed.2023.106574
- Zhou et al., “Validation of a fully automated liver segmentation algorithm using multi-scale deep reinforcement learning,” European Radiology, vol. 30, no. 8, pp. 4538–4548, 2020. https://doi.org/10.1007/s00330-020-06854-0
- Li et al., “Anatomy-guided multimodal registration by learning segmentation without ground truth: Application to intraprocedural CBCT–MR liver segmentation,” Medical Image Analysis, vol. 71, p. 102045, 2021. https://doi.org/10.1016/j.media.2021.102045
- Gao et al., “Artificial intelligence, machine learning, and deep learning in liver transplantation,” J. Hepatology Reports, vol. 5, no. 2, p. 100580, 2023. https://doi.org/10.1016/j.jhepr.2023.100580
- Manjunath and A. Kwadiki, “Automatic liver and tumor segmentation using modified ResUNet architecture,” Results in Control and Optimization, vol. 6, p. 100087, 2022. https://doi.org/10.1016/j.rico.2022.100087
- Khalifa et al., “3D deep learning-based liver segmentation using a fully convolutional network,” Bioengineering, vol. 9, no. 368, 2022. https://doi.org/10.3390/bioengineering9080368
- Sahu et al., “Deep learning-based liver segmentation for computer-aided diagnosis of hepatocellular carcinoma,” Visual Computing for Industry, Biomedicine, and Art, vol. 4, no. 50, 2021. https://doi.org/10.1186/s42490-021-00050-y
- Rajagopal et al., “3D hybrid attention U-Net for automated liver and lesion segmentation,” Biomedicines, vol. 11, no. 800, 2023. https://doi.org/10.3390/biomedicines11030800
- Kanazawa et al., “Automatic liver volumetry using deep convolutional neural networks validated by multi-center MRI data,” Scientific Reports, vol. 12, no. 1, 2022. https://doi.org/10.1038/s41598-022-09978-0
- Zhang et al., “Automatic liver segmentation using deep convolutional neural networks and cascade learning,” European Radiology, vol. 31, no. 10, pp. 7646–7658, 2021. https://doi.org/10.1007/s00330-021-07850-9
- Sharma et al., “Evaluation of AI-based segmentation algorithms for abdominal organs using deep CNNs,” BMC Medical Imaging, vol. 22, 2022. https://doi.org/10.1186/s12880-022-00825-2
- Singh et al., “Validation of liver lesion segmentation using deep hybrid networks,” Sensors, vol. 20, no. 1516, 2020. https://doi.org/10.3390/s20051516
- Al-Shamasneh et al., “Deep learning and AI in liver CT and MRI imaging: Current status and future perspectives,” Scientific Reports, vol. 12, 2022. https://doi.org/10.1038/s41598-022-20108-8
- M. Billah, A. Al Rakib, M. I. Haque, A. S. Ahamed, M. S. Hossain, and K. N. Borsha, “Real-Time Object Detection in Medical Imaging Using YOLO Models for Kidney Stone Detection,” European Journal of Computer Science and Information Technology, vol. 12, no. 7, pp. 54–65, 2024. https://doi.org/10.37745/ejcsit.2013/vol12n75465