1. Results For Each Metric
We ran the submission Docker using the following command:
docker load < teamname.tar.gz
docker run --gpus "device=0" --name teamname --rm -v \
/home/xxx/tdsc/Test/DATA:/input:ro -v
$(pwd)/predict:/predict --shm-size 8g teamname:latest
And obtained all the results using the code available on GitHub:
https://github.com/PerceptionComputingLab/TDSC-ABUS2023/tree/main/Final_Evaluation
All results are shown in the following table. The '-' symbol indicates that there are no results under the particular folder.
Rank |
---|
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
11 |
12 |
13 |
14 |
15 |
16 |
17 |
18 |
19 |
20 |
21 |
Team |
Status |
DICE |
HD |
ACC |
AUC |
FROC |
Short Paper |
seg.csv |
cls.csv |
det.csv |
---|---|---|---|---|---|---|---|---|---|---|
Blackbean | Succeed | - | - | - | - | - | No | - | - | - |
Deadluck | Succeed | 0.5616 | 162.9371 | 0.7286 | 0.7733 | 0.7704 | Yes | seg.csv | cls.csv | det.csv |
Discerning Tumor | Failed | - | - | - | - | - | Yes | - | - | - |
Dolphins | Succeed | 0.4665 | 266.1207 | - | - | - | Yes | seg.csv | - | - |
Eureka | Succeed | 0.4981 | 153.0743 | 0.6000 | 0.6425 | 0.6441 | Yes | seg.csv | cls.csv | det.csv |
FathomX | Succeed | 0.5400 | 121.1640 | 0.5429 | 0.5675 | 0.6153 | Yes | seg.csv | cls.csv | det.csv |
Infertdsc | Succeed | 0.3057 | 203.4005 | - | - | - | Yes | seg.csv | - | - |
Mispl | Succeed | 0.5342 | inf | 0.7143 | 0.7642 | 0.0000 | Yes | seg.csv | cls.csv | det.csv |
Nvauto | Succeed | 0.6020 | inf | - | - | - | Yes | seg.csv | - | - |
POA | Succeed | 0.6147 | 90.5339 | 0.6429 | 0.6558 | 0.7303 | Yes | seg.csv | cls.csv | det.csv |
Sante2024 | Succeed | 0.5377 | 96.5050 | 0.5429 | 0.5775 | 0.6383 | Yes | seg.csv | cls.csv | det.csv |
Shiontao | Succeed | 0.5861 | inf | 0.7571 | 0.8892 | 0.8468 | Yes | seg.csv | cls.csv | det.csv |
SMART | Failed | - | - | - | - | - | Yes | - | - | - |
Smcnscp | Succeed | - | - | - | - | 0.5327 | Yes | - | - | det.csv |
Strollers | Succeed | 0.4412 | 101.4036 | - | - | 0.3913 | Yes | seg.csv | - | det.csv |
Sunggukyung | Succeed | - | - | 0.6857 | 0.6842 | - | No | - | cls.csv | - |
UCLA CDX | Succeed | 0.0000 | inf | - | - | - | Yes | seg.csv | - | - |
Vicorob | Succeed | 0.5853 | 80.1817 | - | - | 0.6459 | Yes | seg.csv | - | det.csv |
Flamingo | Succeed | 0.5890 | inf | 0.7429 | 0.7708 | 0.6067 | Yes | seg.csv | cls.csv | det.csv |
Zhaoqiaochu | Succeed | 0.4890 | 81.7367 | - | - | - | Yes | seg.csv | - | - |
walltall | Failed | - | - | - | - | - | No | - | - | - |
2. Segmentation Rank
The segmentation task involves two metrics: the DICE coefficient and
the Hausdorff distance (HD).
Initially, we eliminated teams that did not provide valid results.
Subsequently, we normalized the remaining teams' scores using
min-max normalization,
calculated as (x - min(x)) / (max(x) - min(x)). Since lower HD
scores are better,
we computed the final result as (1 + Norm_DICE - Norm_HD) / 2.
Rank |
---|
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
Team |
DICE |
Norm_DICE |
HD |
Norm_HD |
Seg_Score |
---|---|---|---|---|---|
Deadluck | 0.5616 | 0.8283 | 162.9371 | 0.4451 | 0.6916 |
Dolphins | 0.4665 | 0.5204 | 266.1207 | 1 | 0.2602 |
Eureka | 0.4981 | 0.6227 | 153.0743 | 0.3920 | 0.6154 |
FathomX | 0.5400 | 0.7583 | 121.1640 | 0.2204 | 0.7689 |
Infertdsc | 0.3057 | 0.0000 | 203.4005 | 0.6627 | 0.1687 |
POA | 0.6147 | 1.0000 | 90.5339 | 0.0557 | 0.9722 |
Sante2024 | 0.5377 | 0.7509 | 96.5050 | 0.0878 | 0.8316 |
Strollers | 0.4412 | 0.4386 | 101.4036 | 0.1141 | 0.6622 |
Vicorob | 0.5853 | 0.9050 | 80.1817 | 0 | 0.9525 |
Zhaoqiaochu | 0.4890 | 0.5933 | 81.7367 | 0.0084 | 0.7925 |
3. Segmentation Rank with Fixed Penalization for Inf HD
Some teams received an 'inf' result in the HD metric, presenting a significant challenge. Addressing these 'inf' results has been a topic of much debate. A common solution is to penalize them with a fixed value. However, selecting an appropriate value can be subjective and might compromise fairness for teams that produce robust results. Conversely, simply excluding teams with 'inf' results doesn't accurately represent the performance of all teams. After careful consideration, we've decided to rank two boards separately. In this ranking, we've replaced the 'inf' result with the worst HD score from all valid results for each case, multiplied by 105%. The normalization process and the overall calculation formula remain unchanged. It's evident that different penalization approaches can influence final rankings significantly. Hence, we've designated the aforementioned leaderboard (which employs elimination of the 'inf' scores) as the primary board. We'll provide certificates based on this primary leaderboard, with no associated cash rewards.
Rank |
---|
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
11 |
12 |
13 |
14 |
Team |
DICE |
Norm_DICE |
HD |
Norm_HD |
Seg_Score |
---|---|---|---|---|---|
Deadluck | 0.5616 | 0.8283 | 162.9371 | 0.4451 | 0.6916 |
Mispl | 0.5342 | 0.7395 | 105.0751 | 0.1339 | 0.8028 |
Nvauto | 0.6020 | 0.9590 | 82.8654 | 0.0144 | 0.9723 |
Shiontao | 0.5861 | 0.9075 | 117.1939 | 0.1991 | 0.8542 |
Flamingo | 0.5890 | 0.9169 | 159.0311 | 0.4241 | 0.7464 |
Dolphins | 0.4665 | 0.5204 | 266.1207 | 1 | 0.2602 |
Eureka | 0.4981 | 0.6227 | 153.0743 | 0.3920 | 0.6154 |
FathomX | 0.5400 | 0.7583 | 121.1640 | 0.2204 | 0.7689 |
Infertdsc | 0.3057 | 0.0000 | 203.4005 | 0.6627 | 0.1687 |
POA | 0.6147 | 1.0000 | 90.5339 | 0.0557 | 0.9722 |
Sante2024 | 0.5377 | 0.7509 | 96.5050 | 0.0878 | 0.8316 |
Strollers | 0.4412 | 0.4386 | 101.4036 | 0.1141 | 0.6622 |
Vicorob | 0.5853 | 0.9050 | 80.1817 | 0 | 0.9525 |
Zhaoqiaochu | 0.4890 | 0.5933 | 81.7367 | 0.0084 | 0.7925 |
4. Classification Rank
For the classification task, we eliminated teams that did not
provide valid results.
We then normalized the remaining teams' scores using min-max
normalization.
The final result is calculated as (Norm_ACC + Norm_AUC) / 2.
Rank |
---|
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
Team |
ACC |
Norm_ACC |
AUC |
Norm_AUC |
Cls_Score |
---|---|---|---|---|---|
Deadluck | 0.7286 | 0.8667 | 0.7733 | 0.6399 | 0.7533 |
Eureka | 0.6000 | 0.2667 | 0.6425 | 0.2332 | 0.2499 |
FathomX | 0.5429 | 0.0000 | 0.5675 | 0.0000 | 0.0000 |
Mispl | 0.7143 | 0.8000 | 0.7642 | 0.6114 | 0.7057 |
POA | 0.6429 | 0.4667 | 0.6558 | 0.2746 | 0.3706 |
Sante2024 | 0.5429 | 0.0000 | 0.5775 | 0.0311 | 0.0155 |
Shiontao | 0.7571 | 1.0000 | 0.8892 | 1.0000 | 1.0000 |
Sunggukyung | 0.6857 | 0.6667 | 0.6842 | 0.3627 | 0.5147 |
Flamingo | 0.7429 | 0.9333 | 0.7708 | 0.6321 | 0.7827 |
5. Detection Rank
The evaluation of the detection performance utilizes the
Free-Response Receiver Operating Characteristic (FROC) metric.
The FROC performance is presented in terms of sensitivities at
different false positive (FP) levels. Specifically,
the average sensitivity at FP rates of 0.125, 0.25, 0.5, 1, 2, 4,
and 8
is employed as the primary evaluation metric for assessing the
detection performance.
These values are subsequently subjected to min-max normalization.
Rank |
---|
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
Team |
FROC |
Det_Score |
---|---|---|
Deadluck | 0.7704 | 0.8323 |
Eureka | 0.6441 | 0.5550 |
FathomX | 0.6153 | 0.4918 |
POA | 0.7303 | 0.7442 |
Sante2024 | 0.6383 | 0.5423 |
Shiontao | 0.8468 | 1.0000 |
Smcnscp | 0.5327 | 0.3104 |
Strollers | 0.3913 | 0.0000 |
Vicorob | 0.6459 | 0.5589 |
Flamingo | 0.6067 | 0.4729 |
6. Overall Rank
The overall performance is determined by considering only the teams
that have submitted valid results for all metrics.
We apply similar normalization methods as before, but
exclusively for teams with valid results.
The final overall result is obtained by:
(1 + Norm_DICE - Norm_HD) / 2 + (Norm_ACC + Norm_AUC) / 2 +
Norm_FROC
Rank |
---|
1 |
2 |
3 |
4 |
5 |
Team |
DICE |
Norm_Dice |
HD |
Norm_HD |
ACC |
Norm_ACC |
AUC |
Norm_AUC |
FROC |
Norm_FROC |
Overall |
---|---|---|---|---|---|---|---|---|---|---|---|
Deadluck | 0.5616 | 0.5449 | 162.9371 | 1.0000 | 0.7286 | 1.0000 | 0.7733 | 1.0000 | 0.7704 | 1.0000 | 2.2724 |
Eureka | 0.4981 | 0.0000 | 153.1000 | 0.8638 | 0.6000 | 0.3077 | 0.6425 | 0.3644 | 0.6441 | 0.1857 | 0.5898 |
FathomX | 0.5400 | 0.3593 | 121.2000 | 0.4230 | 0.5429 | 0.0000 | 0.5675 | 0.0000 | 0.6153 | 0.0000 | 0.4681 |
POA | 0.6147 | 1.0000 | 90.5300 | 0.0000 | 0.6429 | 0.5385 | 0.6558 | 0.4291 | 0.7303 | 0.7415 | 2.2253 |
Sante2024 | 0.5377 | 0.3397 | 96.5100 | 0.0825 | 0.5429 | 0.0000 | 0.5775 | 0.0486 | 0.6383 | 0.1483 | 0.8012 |
7. Overall Rank with Fixed Penalization for Inf HD
Similar to the Segmentation task, we've substituted the 'inf' result with the worst HD score from all valid results for each case, multiplied by 105%. This allows us to rank teams that have an 'inf' result. Please note, this board will only offer certificate rewards too.
Rank |
---|
1 |
2 |
3 |
4 |
5 |
6 |
7 |
Team |
DICE |
Norm_Dice |
HD |
Norm_HD |
ACC |
Norm_ACC |
AUC |
Norm_AUC |
FROC |
Norm_FROC |
Overall |
---|---|---|---|---|---|---|---|---|---|---|---|
deadluck | 0.5616 | 0.5449 | 162.9371 | 1.0000 | 0.7286 | 0.8667 | 0.7733 | 0.6399 | 0.7704 | 0.6818 | 1.7075 |
poa | 0.6147 | 1.0000 | 90.5339 | 0.0000 | 0.6429 | 0.4667 | 0.6558 | 0.2746 | 0.7303 | 0.5148 | 1.8854 |
Sante2024 | 0.5377 | 0.3397 | 96.5050 | 0.0825 | 0.5429 | 0.0000 | 0.5775 | 0.0311 | 0.6383 | 0.1316 | 0.7758 |
FathomX | 0.5400 | 0.3593 | 121.1640 | 0.4230 | 0.5429 | 0.0000 | 0.5675 | 0.0000 | 0.6153 | 0.0358 | 0.5039 |
Eureka | 0.4981 | 0.0000 | 153.0743 | 0.8638 | 0.6000 | 0.2667 | 0.6425 | 0.2332 | 0.6441 | 0.1558 | 0.4738 |
shiontao | 0.5861 | 0.7547 | 117.1939 | 0.3682 | 0.7571 | 1.0000 | 0.8892 | 1.0000 | 0.8468 | 1.0000 | 2.6932 |
Flamingo | 0.5890 | 0.7796 | 159.0311 | 0.9461 | 0.7429 | 0.9333 | 0.7708 | 0.6321 | 0.6067 | 0.0000 | 1.1995 |