1. Results For Each Metric

We ran the submission Docker using the following command:

docker load < teamname.tar.gz

docker run --gpus "device=0" --name teamname --rm -v \ /home/xxx/tdsc/Test/DATA:/input:ro -v $(pwd)/predict:/predict --shm-size 8g teamname:latest

And obtained all the results using the code available on GitHub:
https://github.com/PerceptionComputingLab/TDSC-ABUS2023/tree/main/Final_Evaluation

All results are shown in the following table. The '-' symbol indicates that there are no results under the particular folder.

Rank
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21

Team	Status	DICE	HD	ACC	AUC	FROC	Short Paper	seg.csv	cls.csv	det.csv
Blackbean	Succeed	-	-	-	-	-	No	-	-	-
Deadluck	Succeed	0.5616	162.9371	0.7286	0.7733	0.7704	Yes	seg.csv	cls.csv	det.csv
Discerning Tumor	Failed	-	-	-	-	-	Yes	-	-	-
Dolphins	Succeed	0.4665	266.1207	-	-	-	Yes	seg.csv	-	-
Eureka	Succeed	0.4981	153.0743	0.6000	0.6425	0.6441	Yes	seg.csv	cls.csv	det.csv
FathomX	Succeed	0.5400	121.1640	0.5429	0.5675	0.6153	Yes	seg.csv	cls.csv	det.csv
Infertdsc	Succeed	0.3057	203.4005	-	-	-	Yes	seg.csv	-	-
Mispl	Succeed	0.5342	inf	0.7143	0.7642	0.0000	Yes	seg.csv	cls.csv	det.csv
Nvauto	Succeed	0.6020	inf	-	-	-	Yes	seg.csv	-	-
POA	Succeed	0.6147	90.5339	0.6429	0.6558	0.7303	Yes	seg.csv	cls.csv	det.csv
Sante2024	Succeed	0.5377	96.5050	0.5429	0.5775	0.6383	Yes	seg.csv	cls.csv	det.csv
Shiontao	Succeed	0.5861	inf	0.7571	0.8892	0.8468	Yes	seg.csv	cls.csv	det.csv
SMART	Failed	-	-	-	-	-	Yes	-	-	-
Smcnscp	Succeed	-	-	-	-	0.5327	Yes	-	-	det.csv
Strollers	Succeed	0.4412	101.4036	-	-	0.3913	Yes	seg.csv	-	det.csv
Sunggukyung	Succeed	-	-	0.6857	0.6842	-	No	-	cls.csv	-
UCLA CDX	Succeed	0.0000	inf	-	-	-	Yes	seg.csv	-	-
Vicorob	Succeed	0.5853	80.1817	-	-	0.6459	Yes	seg.csv	-	det.csv
Flamingo	Succeed	0.5890	inf	0.7429	0.7708	0.6067	Yes	seg.csv	cls.csv	det.csv
Zhaoqiaochu	Succeed	0.4890	81.7367	-	-	-	Yes	seg.csv	-	-
walltall	Failed	-	-	-	-	-	No	-	-	-

2. Segmentation Rank

The segmentation task involves two metrics: the DICE coefficient and the Hausdorff distance (HD).
Initially, we eliminated teams that did not provide valid results.
Subsequently, we normalized the remaining teams' scores using min-max normalization,
calculated as (x - min(x)) / (max(x) - min(x)). Since lower HD scores are better,
we computed the final result as (1 + Norm_DICE - Norm_HD) / 2.

Rank
1
2
3
4
5
6
7
8
9
10

Team	DICE	Norm_DICE	HD	Norm_HD	Seg_Score
Deadluck	0.5616	0.8283	162.9371	0.4451	0.6916
Dolphins	0.4665	0.5204	266.1207	1	0.2602
Eureka	0.4981	0.6227	153.0743	0.3920	0.6154
FathomX	0.5400	0.7583	121.1640	0.2204	0.7689
Infertdsc	0.3057	0.0000	203.4005	0.6627	0.1687
POA	0.6147	1.0000	90.5339	0.0557	0.9722
Sante2024	0.5377	0.7509	96.5050	0.0878	0.8316
Strollers	0.4412	0.4386	101.4036	0.1141	0.6622
Vicorob	0.5853	0.9050	80.1817	0	0.9525
Zhaoqiaochu	0.4890	0.5933	81.7367	0.0084	0.7925

3. Segmentation Rank with Fixed Penalization for Inf HD

Some teams received an 'inf' result in the HD metric, presenting a significant challenge. Addressing these 'inf' results has been a topic of much debate. A common solution is to penalize them with a fixed value. However, selecting an appropriate value can be subjective and might compromise fairness for teams that produce robust results. Conversely, simply excluding teams with 'inf' results doesn't accurately represent the performance of all teams. After careful consideration, we've decided to rank two boards separately. In this ranking, we've replaced the 'inf' result with the worst HD score from all valid results for each case, multiplied by 105%. The normalization process and the overall calculation formula remain unchanged. It's evident that different penalization approaches can influence final rankings significantly. Hence, we've designated the aforementioned leaderboard (which employs elimination of the 'inf' scores) as the primary board. We'll provide certificates based on this primary leaderboard, with no associated cash rewards.

Rank
1
2
3
4
5
6
7
8
9
10
11
12
13
14

Team	DICE	Norm_DICE	HD	Norm_HD	Seg_Score
Deadluck	0.5616	0.8283	162.9371	0.4451	0.6916
Mispl	0.5342	0.7395	105.0751	0.1339	0.8028
Nvauto	0.6020	0.9590	82.8654	0.0144	0.9723
Shiontao	0.5861	0.9075	117.1939	0.1991	0.8542
Flamingo	0.5890	0.9169	159.0311	0.4241	0.7464
Dolphins	0.4665	0.5204	266.1207	1	0.2602
Eureka	0.4981	0.6227	153.0743	0.3920	0.6154
FathomX	0.5400	0.7583	121.1640	0.2204	0.7689
Infertdsc	0.3057	0.0000	203.4005	0.6627	0.1687
POA	0.6147	1.0000	90.5339	0.0557	0.9722
Sante2024	0.5377	0.7509	96.5050	0.0878	0.8316
Strollers	0.4412	0.4386	101.4036	0.1141	0.6622
Vicorob	0.5853	0.9050	80.1817	0	0.9525
Zhaoqiaochu	0.4890	0.5933	81.7367	0.0084	0.7925

4. Classification Rank

For the classification task, we eliminated teams that did not provide valid results.
We then normalized the remaining teams' scores using min-max normalization.
The final result is calculated as (Norm_ACC + Norm_AUC) / 2.

Rank
1
2
3
4
5
6
7
8
9

Team	ACC	Norm_ACC	AUC	Norm_AUC	Cls_Score
Deadluck	0.7286	0.8667	0.7733	0.6399	0.7533
Eureka	0.6000	0.2667	0.6425	0.2332	0.2499
FathomX	0.5429	0.0000	0.5675	0.0000	0.0000
Mispl	0.7143	0.8000	0.7642	0.6114	0.7057
POA	0.6429	0.4667	0.6558	0.2746	0.3706
Sante2024	0.5429	0.0000	0.5775	0.0311	0.0155
Shiontao	0.7571	1.0000	0.8892	1.0000	1.0000
Sunggukyung	0.6857	0.6667	0.6842	0.3627	0.5147
Flamingo	0.7429	0.9333	0.7708	0.6321	0.7827

5. Detection Rank

The evaluation of the detection performance utilizes the Free-Response Receiver Operating Characteristic (FROC) metric.
The FROC performance is presented in terms of sensitivities at different false positive (FP) levels. Specifically,
the average sensitivity at FP rates of 0.125, 0.25, 0.5, 1, 2, 4, and 8
is employed as the primary evaluation metric for assessing the detection performance.
These values are subsequently subjected to min-max normalization.

Rank
1
2
3
4
5
6
7
8
9
10

Team	FROC	Det_Score
Deadluck	0.7704	0.8323
Eureka	0.6441	0.5550
FathomX	0.6153	0.4918
POA	0.7303	0.7442
Sante2024	0.6383	0.5423
Shiontao	0.8468	1.0000
Smcnscp	0.5327	0.3104
Strollers	0.3913	0.0000
Vicorob	0.6459	0.5589
Flamingo	0.6067	0.4729

6. Overall Rank

The overall performance is determined by considering only the teams that have submitted valid results for all metrics.
We apply similar normalization methods as before, but exclusively for teams with valid results.
The final overall result is obtained by:
(1 + Norm_DICE - Norm_HD) / 2 + (Norm_ACC + Norm_AUC) / 2 + Norm_FROC

Rank
1
2
3
4
5

Team	DICE	Norm_Dice	HD	Norm_HD	ACC	Norm_ACC	AUC	Norm_AUC	FROC	Norm_FROC	Overall
Deadluck	0.5616	0.5449	162.9371	1.0000	0.7286	1.0000	0.7733	1.0000	0.7704	1.0000	2.2724
Eureka	0.4981	0.0000	153.1000	0.8638	0.6000	0.3077	0.6425	0.3644	0.6441	0.1857	0.5898
FathomX	0.5400	0.3593	121.2000	0.4230	0.5429	0.0000	0.5675	0.0000	0.6153	0.0000	0.4681
POA	0.6147	1.0000	90.5300	0.0000	0.6429	0.5385	0.6558	0.4291	0.7303	0.7415	2.2253
Sante2024	0.5377	0.3397	96.5100	0.0825	0.5429	0.0000	0.5775	0.0486	0.6383	0.1483	0.8012

7. Overall Rank with Fixed Penalization for Inf HD

Similar to the Segmentation task, we've substituted the 'inf' result with the worst HD score from all valid results for each case, multiplied by 105%. This allows us to rank teams that have an 'inf' result. Please note, this board will only offer certificate rewards too.

Rank
1
2
3
4
5
6
7

Team	DICE	Norm_Dice	HD	Norm_HD	ACC	Norm_ACC	AUC	Norm_AUC	FROC	Norm_FROC	Overall
deadluck	0.5616	0.5449	162.9371	1.0000	0.7286	0.8667	0.7733	0.6399	0.7704	0.6818	1.7075
poa	0.6147	1.0000	90.5339	0.0000	0.6429	0.4667	0.6558	0.2746	0.7303	0.5148	1.8854
Sante2024	0.5377	0.3397	96.5050	0.0825	0.5429	0.0000	0.5775	0.0311	0.6383	0.1316	0.7758
FathomX	0.5400	0.3593	121.1640	0.4230	0.5429	0.0000	0.5675	0.0000	0.6153	0.0358	0.5039
Eureka	0.4981	0.0000	153.0743	0.8638	0.6000	0.2667	0.6425	0.2332	0.6441	0.1558	0.4738
shiontao	0.5861	0.7547	117.1939	0.3682	0.7571	1.0000	0.8892	1.0000	0.8468	1.0000	2.6932
Flamingo	0.5890	0.7796	159.0311	0.9461	0.7429	0.9333	0.7708	0.6321	0.6067	0.0000	1.1995