Question

我是一位确实接受过有关刚果解放运动学习的医务工作者,我在这里花了许多时间阅读,希望在实地医学成像中使用。我们有一个名为“Datscan scintigraphy”的考试,这是人们大脑的代谢,看看他们是否患有Parkinson疾病(它是一种简化但足以理解的)。

The problem is that it s a long exam that takes about 30 min because the "camera" turns around the patient 120 times. So sometimes, our elder patient can t stand it and it s frustrating to not be able to help them with a diagnosis. This is why I m building a CNN trying to classify between "Normal Datscan" or "Abnormal Datscan" with only the first 2 projections (anterior and posterior, number 0 and 60), with as a final output a "probability of abnormal datscan" between 0 and 1. My goal is that after I get this probability, I could change the threshold and make it more sensitive or more specific, according to what we want.

I built a dataset with 887 datscans converted as npy arrays of each 120 of 128x128 pixel matrix, and only use 2 of them (number 0 and 60). It s grayscale images so 1 in channel. I tried different architectures in pytorch, here is the VGG one with BCEWithLogitsLoss:

 class ReseauConvolutionSigmo(nn.Module):
    def __init__(self):
        super(ReseauConvolutionSigmo, self).__init__()
        self.conv1a = nn.Conv2d(2, 64, 3, stride=1)
        self.conv1b = nn.Conv2d(64, 64, 5, stride=1)
        self.pool1 = nn.MaxPool2d(2,2)

        self.conv2a = nn.Conv2d(64, 128, 3, stride=1)
        self.conv2b = nn.Conv2d(128, 128, 3, stride=1)
        self.pool2 = nn.MaxPool2d(2,2)
        
        self.conv3a = nn.Conv2d(128, 256, 3, stride=1)
        self.conv3b = nn.Conv2d(256, 256, 3, stride=1)
        self.pool3 = nn.MaxPool2d(2,2)
                
        self.fc1 = nn.Linear(36864, 84)  
        self.fc2 = nn.Linear(84, 1)       
        
    def forward(self, x):
        x=x.float()
        
        x=self.conv1a(x)
        x=F.relu(x)
        x=self.conv1b(x)
        x=F.relu(x)
        x=self.pool1(x)
        
        x=self.conv2a(x)
        x=F.relu(x)
        x=self.conv2b(x)
        x=F.relu(x)
        x=self.pool2(x)
        
        x=self.conv3a(x)
        x=F.relu(x)
        x=self.conv3b(x)
        x=F.relu(x)
        x=self.pool3(x)
        
        x = torch.flatten(x, 1)  # Flatten the feature maps
        
        try:
            x = F.relu(self.fc1(x))
        except RuntimeError as e:
            e = str(e)
            if e.endswith("Output size is too small"):
                print("Image size is too small.")
            elif "shapes cannot be multiplied" in e:
                required_shape = e[e.index("x") + 1:].split(" ")[0]
                print(f"Linear layer needs to have size: {required_shape}")
            else:
                print(f"Error other: {e}") 
                
        x = self.fc2(x)

        return x

network = ReseauConvolutionSigmo()
n_epochs = 100
criterion = nn.BCEWithLogitsLoss()
optimizer = optim.Adam(network.parameters(), lr=0.001)

train_losses = [ ]
train_counter = [ ]
test_losses = [ ]
test_accuracy = [ ]

network.to(device)
print( ******* Evaluation initiale )
test()
for epoch in range(0, n_epochs):
  print( ******* Epoch  ,epoch)
  train()
  test()

但是,在这样做时,批量6个元件的输出点迅速趋同于同值,

Evaluation initiale
test loss= 0.7000894740570424
Output tensor([[0.0826],
[0.0827],
[0.0827],
[0.0825],
[0.0827],
[0.0827]])
Predicted tensor([[0.], [0.], [0.],[0.],[0.], [0.]])
Datscan tensor([[0.],[0.], [0.],[1.],[0.],[1.]])
Accuracy in test 57.36434108527132 %
Epoch  0
train loss= 0.6777993538058721
test loss= 0.6830593472303346
Output tensor([[-0.3489],
[-0.3479],
[-0.3391],
[-0.3410],
[-0.3442],
[-0.3469]])
Predicted tensor([[0.], [0.], [0.],[0.],[0.], [0.]])
Datscan tensor([[1.], [0.],[0.],[0.], [0.],[0.]])
Accuracy in test 57.36434108527132 %
Epoch  1
train loss= 0.7050089922088844
test loss= 0.6875958317934081
Output tensor([[-0.0826],
[-0.0826],
[-0.0826],
[-0.0826],
[-0.0826],
[-0.0826]])
Predicted tensor([[0.], [0.], [0.],[0.],[0.], [0.]])
Datscan tensor([[0.], [0.],[1.],[1.], [0.], [1.]])
Accuracy in test 57.751937984496124 %
Epoch  2
train loss= 0.6914097838676893
test loss= 0.6917881480483121
Output tensor([[-0.0191],
[-0.0191],
[-0.0191],
[-0.0191],
[-0.0191],
[-0.0191]])
Predicted tensor([[0.], [0.], [0.],[0.],[0.], [0.]])
Datscan tensor([[0.],[1.], [1.],[1.],[0.],[1.]])
Accuracy in test 57.36434108527132 %

##Even at further epoch:##

Epoch  40
train loss= 0.6704580792440817
test loss= 0.6978785312452982
Output tensor([[-0.6284],
[-0.6284],
[-0.6284],
[-0.6284],
[-0.6284],
[-0.6284]])
Predicted tensor([[0.], [0.], [0.],[0.],[0.], [0.]])
Datscan tensor([[0.],[0.], [0.],[0.],[1.],[1.]])
correct 147
total 258
Accuracy in test 56.97674418604651 %

如你所看到的那样,培训损失并没有减少很多,准确度为57%。

I was kinda desesperate and tried with another criterion: CrossEntropy: There it worked really better, with a final accuracy of 79% here for example the 3 last epochs:

Epoch  47
train loss= 0.0002839015607657339
test loss= 1.7087745488627646
correct 203
total 258
Accuracy in test 78.68217054263566 %
Sortie du réseau :
tensor([[-13.9290,   4.8103],
3.7896,  -9.5477],
[ -3.8057,  -0.1662],
1.8018,  -3.5083],
[ -3.6199,  -2.2624],
6.0148, -12.3137]])
Datscan :    tensor([1, 1, 0, 0, 1, 0])
Prédiction :  tensor([1, 0, 1, 0, 1, 0])

Epoch  48
train loss= 0.0002534455537248284
test loss= 1.7621866015008938
correct 201
total 258
Accuracy in test 77.90697674418605 %
Sortie du réseau :
tensor([[-24.7145,   7.8902],
[-21.3964,   8.6213],
2.1064,  -0.7032],
[ -1.3331,  -0.8390],
[ -5.9108,   4.1722],
[-14.4751,   4.5746]])
Datscan :    tensor([1, 1, 0, 1, 1, 1])
Prédiction :  tensor([1, 1, 0, 1, 1, 1])

Epoch  49
train loss= 0.00022697989463199136
test loss= 1.6694575882692397
correct 204
total 258
Accuracy in test 79.06976744186046 %
Sortie du réseau :
tensor([[ -9.4081,   1.5622],
[ -0.1025,   0.0649],
[-26.1112,   8.3820],
[-12.4035,   3.2135],
[ -6.0753,   0.1667],
9.9138,  -9.7220]])
Datscan :    tensor([0, 0, 1, 1, 1, 0])
Prédiction :  tensor([1, 1, 1, 1, 1, 0])


Sortie du réseau :
tensor([[ 14.8245, -11.1206],
4.8293,  -2.1229],
[-19.1812,   4.6617],
[ -7.9391,   3.3256],
30.0683, -26.7278],
[-11.0685,   3.2678]])

Datscan :    tensor([0, 1, 1, 1, 0, 1])
Prédiction :  tensor([0, 0, 1, 1, 0, 1])

因此,我是<>问询>的:

What makes the BCEloss train so badly even if it is a binary classification problem? And how come all the 6 element of the batch end up quickly toward the same output tensor? I tried changing the learning rate but without a clear improvement, maybe it s the optimizer?
From my understanding, the output in the BCEWithLogitsLoss are the 6 tensors of the batch, and he predicts "normal" if the output tensor is negative and "abnormal" if positive. But they re stuck in negative so they re all predicted normal. Since my goal is to make a "probability of abnormal datscan output" , if this model had a better accuracy I could just use this output tensor in a sigmoid and create a 0 to 1 probability right?
The output in the CrossEntropyLoss version are 2 x 6 Tensors, representing the "confidence" in being in the left class (so "normal") or the right class (so "abnormal"), and the higher tensor value is the predicted class. For example:

tensor([[ 14.8245, -11.1206], = predicted normal
        [  4.8293,  -2.1229], = predicted normal
        [-19.1812,   4.6617], = predicted abnormal
        [ -7.9391,   3.3256], = predicted abnormal
        [ 30.0683, -26.7278], = predicted normal
        [-11.0685,   3.2678]]) = predicted abnormal

However, while it has a better accuracy, the problem is how can I represent these output tensors as "probability of being abnormal"?

我非常感谢你的帮助,我期待着阅读你对你的思考,这总是令人非常感兴趣!

Answer 1

我可以回答为什么CELossWith Carloits不为你工作,但就第二种情况而言,你可以采取软性做法,只看第二栏,即抽样可能异常。鉴于

a = torch.tensor([[ 14.8245, -11.1206],
[4.8293,  -2.1229],
[-19.1812,   4.6617],
[ -7.9391,   3.3256],
[30.0683, -26.7278],
[-11.0685,   3.2678]])

适用软性强

b = torch.softmax(a, dim = -1)

页: 1

tensor([[1.0000e+00, 5.3974e-12],
        [9.9904e-01, 9.5561e-04],
        [4.4173e-11, 1.0000e+00],
        [1.2817e-05, 9.9999e-01],
        [1.0000e+00, 2.1566e-25],
        [5.9405e-07, 1.0000e+00]])

因此,异常沥青的概率将为

[5.3974e-12, 9.5561e-04, 1.0000e+00, 9.9999e-01, 2.1566e-25, 1.0000e+00]

友情链接