classification loss - negative sample labels #6

DishaDRao · 2021-03-14T21:22:39Z

Hi,

In the following snippet from the loss.py file:
''classify_loss = 0.5 * self.classify_loss(
pos_prob, pos_labels[:, 0]) + 0.5 * self.classify_loss(
neg_prob, neg_labels + 1) ''

why is the target label for sigmoid loss of negative samples given as ' neg_labels + 1' ?
Shouldn't it be just 'neg_labels'? (as the value of 'neg_labels' is initilaized as 0 in itself)

naoe1999 · 2021-05-20T03:22:36Z

@DishaDRao
I have the same concern about this.

I have fully trained with the existing code and got strange output which doesn't make sense at all.
So I doubt the loss function in the same way you mentioned.

Did you get some result on it?
I will try though.

DishaDRao · 2021-05-21T09:10:36Z

@naoe1999

Well, I did not try this code. However, I went through the code base of the original implementation ( the winning team) and understood where this labelling came from.

Basically, in the original implementation, the target labels for negative anchor boxes ('neg_labels') are given a label '-1'. Hence it makes sense to write 'neg_labels + 1' during the loss computation to make it 0. ( 0 stands for no object and 1 stands for an object).

However, in the current code base, the target labels for negative anchor boxes are already given a label of '0'. So it doesn't make sense to write 'neg_labels +1' during the loss computation.

In short, I think it's a mistake here and I suggest to run this code without adding 1 to neg_labels.
(''classify_loss = 0.5 * self.classify_loss(
pos_prob, pos_labels[:, 0]) + 0.5 * self.classify_loss(
neg_prob, neg_labels ) ''

Hope this works. If not, then it's an issue in some other part of the code!

MHansy · 2021-05-27T11:50:29Z

@naoe1999

Well, I did not try this code. However, I went through the code base of the original implementation ( the winning team) and understood where this labelling came from.

Basically, in the original implementation, the target labels for negative anchor boxes ('neg_labels') are given a label '-1'. Hence it makes sense to write 'neg_labels + 1' during the loss computation to make it 0. ( 0 stands for no object and 1 stands for an object).

However, in the current code base, the target labels for negative anchor boxes are already given a label of '0'. So it doesn't make sense to write 'neg_labels +1' during the loss computation.

In short, I think it's a mistake here and I suggest to run this code without adding 1 to neg_labels.
(''classify_loss = 0.5 * self.classify_loss(
pos_prob, pos_labels[:, 0]) + 0.5 * self.classify_loss(
neg_prob, neg_labels ) ''

Hope this works. If not, then it's an issue in some other part of the code!

Hello kindly help me the testing codes (How to test the training model) so as to get predicted nodules.

MHansy · 2021-05-27T11:51:52Z

@naoe1999

Well, I did not try this code. However, I went through the code base of the original implementation ( the winning team) and understood where this labelling came from.

Basically, in the original implementation, the target labels for negative anchor boxes ('neg_labels') are given a label '-1'. Hence it makes sense to write 'neg_labels + 1' during the loss computation to make it 0. ( 0 stands for no object and 1 stands for an object).

However, in the current code base, the target labels for negative anchor boxes are already given a label of '0'. So it doesn't make sense to write 'neg_labels +1' during the loss computation.

In short, I think it's a mistake here and I suggest to run this code without adding 1 to neg_labels.
(''classify_loss = 0.5 * self.classify_loss(
pos_prob, pos_labels[:, 0]) + 0.5 * self.classify_loss(
neg_prob, neg_labels ) ''

Hope this works. If not, then it's an issue in some other part of the code!

Testing codes please

naoe1999 · 2021-05-28T12:36:06Z

@DishaDRao
Thank you for your advice.

However, though I changed the loss function, I couldn't get any meaningful result.
When I trained say 50 epochs, the output value set of 3d grid cell becomes identical to each cell.

Yes, I guess some other part have an issue. Let me point out possible one.

The shape of the output tensor is (32, 32, 32, 3, 5),
which is (# of cells in x-axis, # of cells in y-axis, # of cells in z-axis, # of anchors in each cell, # of values that is x, y, z, r, c)

And the values inside this output tensor is repeated every single cell.
For example, output[0, 0, 0, :, :] == output[l, m, n, :, :] for all l, m, and n. Identical !!
This shouldn't have happened if the model had been trained properly.

I think this is due to very heavily imbalanced positive vs. negative ratio inside the target tensor.
If you have one nodule for a certain 3d-patch, if you look inside the "target" tensor, just only one value out of 3 x 32 x 32 x 32 tensor have a positive value. This makes it 1 : 98303 (3 x 32 x 32 x 32 - 1) imbalanced classification problem!!

After a good iteration of training, it becomes predicting all values to negative.
That is my theory. Well, I'm not sure it's the only issue, but I am quite sure this would be one of the major issues at least.

To solve this, maybe multiple anchor assignment to the GT nodule, and random sampling of negative target cell would be necessary.
I'm just not sure it would be smart to keep working on this code base instead of seeking and moving on to another.

Could you tell me if you have suggestion or any other code base you recommend?

naoe1999 · 2021-05-28T13:14:34Z

@MHansy

I didn't make test code for this model.
I just got a problem when I finished the training using this code base, which made me stop at that point.

Without solving this, test is meaningless.
The test result would be 0% in detection score (FROC, recall, all the scores), because it would predict all the input as negative!

Anyway, this is my test scheme I was going to do after it gives meaningful output :

You should get patches to cover all lung volume from each validation CT scan first.
Get output prediction from the trained model.
Store all the positive predictions as .csv file (same format as the LUNA16's sampleSubmission.csv)
NMS (non-maximum suppression) would be necessary for this step
Use noduleCADEvaluationLUNA16.py file to get the test score (FROC and so on).

You can download sampleSubmission.csv and noduleCADEvaluationLUNA16.py from the LUNA16's official site.

DishaDRao · 2021-05-28T14:54:36Z

@naoe1999 @MHansy

The problem of class imbalance is actually taken care of in the loss function. Even though the target lables may contain the ratio (positive to negative) that you have mentioned, the loss function takes care of this by employing 'negative hard mining' (similar to your idea of random sampling of the negatives) which restricts the number of negative anchor boxes to 2 ( depending on the batch size) per mini batch. That means the network sees an equal (or 1:2) ratio of positive and negative anchor boxes during the loss computation.

I strongly believe the problem in this code is how the rest of the targets are labelled. The anchor boxes for bounding box regresssion should be labelled based on its IOU and center-to-center parameterization with a ground truth box (as per the standard faster-rcnn). I don't see how that is employed in this code.

If the target itself doesn't have the right (position) labels, then I wouldn't expect to get any meaningful results after training. ( given the benifit of the doubt, even if the targets are labelled correctly, the testing requires de-parameterization of the predictions which can be done only if the target computation is deciphered)

In short, I wouldn't use this code for traning. This repository is nice to get an understanding on the preprocessing and augmentation part, but for actual implemetation I would recommend to check out the original code bases from (lfz/DSB2017, or 'wentaozhu/DeepLung'). They both are extremely similar, however the latter repo is simpler, it worked for me!

(ps. this repo has a google collab provided at the end. Howerver, I didn't use it nor check it out. I wanted a deeper understanding, hence skipped it entirely ;) )

naoe1999 · 2021-06-28T04:15:57Z

@DishaDRao

Thank you very much for your advice.
I've followed your recommendation, and it finally works for me too.

I'm using 'wentaozhu/DeepLung' repository for the training & evaluation with LUNA16 dataset, and starts getting meaningful FROC results.
For the segmentation of new CT scan data (they are not provided with segmentation data unlikely to LUNA16), I also found this repository helpful.

Many thanks! :-)

SirMwan · 2021-07-17T03:01:14Z

@naoe1999 @MHansy

The problem of class imbalance is actually taken care of in the loss function. Even though the target lables may contain the ratio (positive to negative) that you have mentioned, the loss function takes care of this by employing 'negative hard mining' (similar to your idea of random sampling of the negatives) which restricts the number of negative anchor boxes to 2 ( depending on the batch size) per mini batch. That means the network sees an equal (or 1:2) ratio of positive and negative anchor boxes during the loss computation.

I strongly believe the problem in this code is how the rest of the targets are labelled. The anchor boxes for bounding box regresssion should be labelled based on its IOU and center-to-center parameterization with a ground truth box (as per the standard faster-rcnn). I don't see how that is employed in this code.

If the target itself doesn't have the right (position) labels, then I wouldn't expect to get any meaningful results after training. ( given the benifit of the doubt, even if the targets are labelled correctly, the testing requires de-parameterization of the predictions which can be done only if the target computation is deciphered)

In short, I wouldn't use this code for traning. This repository is nice to get an understanding on the preprocessing and augmentation part, but for actual implemetation I would recommend to check out the original code bases from (lfz/DSB2017, or 'wentaozhu/DeepLung'). They both are extremely similar, however the latter repo is simpler, it worked for me!

(ps. this repo has a google collab provided at the end. Howerver, I didn't use it nor check it out. I wanted a deeper understanding, hence skipped it entirely ;) )

Hello @DishaDRao and @naoe1999

Kindly help please.

I tried to make follow up on your conversion and advises, and I went through wentaozhu/DeepLung repository and unfortunately at the LOSS CODES I find the same thing at the labels(+1).

BUT during training with that codes, I found that the loss does not decreasing, I am not sure if I have to remove (+1) in labels in the codes.

SirMwan · 2021-07-17T03:10:19Z

@DishaDRao did you refer at the point below? This is from data.py file from wentaozhu repository.

class LabelMapping(object):
def init(self, config, phase):
self.stride = np.array(config['stride'])
self.num_neg = int(config['num_neg'])
self.th_neg = config['th_neg']
self.anchors = np.asarray(config['anchors'])
self.phase = phase
if phase == 'train':
self.th_pos = config['th_pos_train']
elif phase == 'val':
self.th_pos = config['th_pos_val']

def __call__(self, input_size, target, bboxes, filename):
    stride = self.stride
    num_neg = self.num_neg
    th_neg = self.th_neg
    anchors = self.anchors
    th_pos = self.th_pos
    
    output_size = []
    for i in range(3):
        if input_size[i] % stride != 0:
            print(filename)
        # assert(input_size[i] % stride == 0) 
        output_size.append(int(input_size[i] / stride))  #Nimetoa int
    
    label = -1 * np.ones(output_size + [len(anchors), 5], np.float32)     #badili from np.float32
    offset = ((stride.astype('float')) - 1) / 2
    oz = np.arange(offset, offset + stride * (output_size[0] - 1) + 1, stride)
    oh = np.arange(offset, offset + stride * (output_size[1] - 1) + 1, stride)
    ow = np.arange(offset, offset + stride * (output_size[2] - 1) + 1, stride)

    for bbox in bboxes:
        for i, anchor in enumerate(anchors):
            iz, ih, iw = select_samples(bbox, anchor, th_neg, oz, oh, ow)
            label[iz, ih, iw, i, 0] = 0

    if self.phase == 'train' and self.num_neg > 0:
        neg_z, neg_h, neg_w, neg_a = np.where(label[:, :, :, :, 0] == -1)
        neg_idcs = random.sample(range(len(neg_z)), min(num_neg, len(neg_z)))
        neg_z, neg_h, neg_w, neg_a = neg_z[neg_idcs], neg_h[neg_idcs], neg_w[neg_idcs], neg_a[neg_idcs]
        label[:, :, :, :, 0] = 0
        label[neg_z, neg_h, neg_w, neg_a, 0] = -1

    if np.isnan(target[0]):
        return label
    iz, ih, iw, ia = [], [], [], []
    for i, anchor in enumerate(anchors):
        iiz, iih, iiw = select_samples(target, anchor, th_pos, oz, oh, ow)
        iz.append(iiz)
        ih.append(iih)
        iw.append(iiw)
        ia.append(i * np.ones((len(iiz),), np.int64))
    iz = np.concatenate(iz, 0)
    ih = np.concatenate(ih, 0)
    iw = np.concatenate(iw, 0)
    ia = np.concatenate(ia, 0)
    flag = True 
    if len(iz) == 0:
        pos = []
        for i in range(3):
            pos.append(max(0, int(np.round((target[i] - offset) / stride))))
        idx = np.argmin(np.abs(np.log(target[3] / anchors)))
        pos.append(idx)
        flag = False
    else:
        idx = random.sample(range(len(iz)), 1)[0]
        pos = [iz[idx], ih[idx], iw[idx], ia[idx]]
    dz = (target[0] - oz[pos[0]]) / anchors[pos[3]]
    dh = (target[1] - oh[pos[1]]) / anchors[pos[3]]
    dw = (target[2] - ow[pos[2]]) / anchors[pos[3]]
    dd = np.log(target[3] / anchors[pos[3]])
    label[pos[0], pos[1], pos[2], pos[3], :] = [1, dz, dh, dw, dd]
    return label

DishaDRao · 2021-07-17T19:37:27Z

@naoe1999 @MHansy
The problem of class imbalance is actually taken care of in the loss function. Even though the target lables may contain the ratio (positive to negative) that you have mentioned, the loss function takes care of this by employing 'negative hard mining' (similar to your idea of random sampling of the negatives) which restricts the number of negative anchor boxes to 2 ( depending on the batch size) per mini batch. That means the network sees an equal (or 1:2) ratio of positive and negative anchor boxes during the loss computation.
I strongly believe the problem in this code is how the rest of the targets are labelled. The anchor boxes for bounding box regresssion should be labelled based on its IOU and center-to-center parameterization with a ground truth box (as per the standard faster-rcnn). I don't see how that is employed in this code.
If the target itself doesn't have the right (position) labels, then I wouldn't expect to get any meaningful results after training. ( given the benifit of the doubt, even if the targets are labelled correctly, the testing requires de-parameterization of the predictions which can be done only if the target computation is deciphered)
In short, I wouldn't use this code for traning. This repository is nice to get an understanding on the preprocessing and augmentation part, but for actual implemetation I would recommend to check out the original code bases from (lfz/DSB2017, or 'wentaozhu/DeepLung'). They both are extremely similar, however the latter repo is simpler, it worked for me!
(ps. this repo has a google collab provided at the end. Howerver, I didn't use it nor check it out. I wanted a deeper understanding, hence skipped it entirely ;) )

Hello @DishaDRao and @naoe1999

Kindly help please.

I tried to make follow up on your conversion and advises, and I went through wentaozhu/DeepLung repository and unfortunately at the LOSS CODES I find the same thing at the labels(+1).

BUT during training with that codes, I found that the loss does not decreasing, I am not sure if I have to remove (+1) in labels in the codes.

Hi,

If you're following wentaozhu/DeepLung respository, you need not change anything in the loss function nor in data.py function. The negative samples are labelled in a correct manner. As mentioned in my previous comment, the +1 in the loss function is to make the nagative labels to 0. So, it's for a purpose!
Whereas in this repo (mostafa/Luna16) that +1 would be a mistake as the negative samples are not lablelled in a manner how data.py function does in the other repo!

So, through Wentazo's codes, the loss error that you're facing must be due to be something else. Probably your dataset/training method. May be you should look into their issues section.

SirMwan · 2021-07-17T22:39:39Z

So, through Wentazo's codes, the loss error that you're facing must be due to be something else. Probably your dataset/training method. May be you should look into their issues section.

@DishaDRao I am training through google collab, what I have done is to reduce batch size, also what else I have done is I am not using Dataparalle in training because I use single gpu.

Furthermore, chenges in the pytorch version must have some issues like int issues need to put in some areas. Ihave done it for almost two months now Iam getting crayz. I started the process again and again but no success.

If u dont mind, share with me your data.py, main.py and layers.py files.
my email is [email protected]
Thanks in advance.

SirMwan · 2021-07-17T22:47:01Z

So, through Wentazo's codes, the loss error that you're facing must be due to be something else. Probably your dataset/training method. May be you should look into their issues section.

@DishaDRao I am training through google collab, what I have done is to reduce batch size, also what else I have done is I am not using Dataparalle in training because I use single gpu.

Furthermore, chenges in the pytorch version must have some issues like int issues need to put in some areas. Ihave done it for almost two months now Iam getting crayz. I started the process again and again but no success.

If u dont mind, share with me your data.py, main.py and layers.py files.
my email is [email protected]
Thanks in advance.

This is the change in training I have done

def train(data_loader, net, loss, epoch, optimizer, get_lr, save_freq, save_dir):
start_time = time.time()

net.train()
lr = get_lr(epoch)
for param_group in optimizer.param_groups:
    param_group['lr'] = lr

metrics = []

for i, (data, target, coord) in enumerate(data_loader):
    if torch.cuda.is_available():
        data = Variable(data.cuda())
        target = Variable(target.cuda())
        coord = Variable(coord.cuda())
    data = data.float()
    target = target.float()
    coord = coord.float()


    optimizer.zero_grad()
    output = net(data, coord)
    loss_output = loss(output, target)
    loss_output[0].backward()
    optimizer.step()

    loss_output[0] = loss_output[0].item()    ####changes this part
    metrics.append(loss_output)

if epoch % args.save_freq == 0:            
    state_dict = net.state_dict()
    for key in state_dict.keys():
        state_dict[key] = state_dict[key].cpu()
        
    torch.save({
        'epoch': epoch,
        'save_dir': save_dir,
        'state_dict': state_dict,
        'args': args},
        os.path.join(save_dir, '%03d.ckpt' % epoch))

end_time = time.time()
metrics = np.asarray(metrics, np.float32)
print('Epoch %03d (lr %.5f)' % (epoch, lr))
print('Train:      tpr %3.2f, tnr %3.2f, total pos %d, total neg %d, time %3.2f' % (
    100.0 * np.sum(metrics[:, 6]) / np.sum(metrics[:, 7]),
    100.0 * np.sum(metrics[:, 8]) / np.sum(metrics[:, 9]),
    np.sum(metrics[:, 7]),
    np.sum(metrics[:, 9]),
    end_time - start_time))
print('loss %2.4f, classify loss %2.4f, regress loss %2.4f, %2.4f, %2.4f, %2.4f' % (
    np.mean(metrics[:, 0]),
    np.mean(metrics[:, 1]),
    np.mean(metrics[:, 2]),
    np.mean(metrics[:, 3]),
    np.mean(metrics[:, 4]),
    np.mean(metrics[:, 5])))
print()

SirMwan · 2021-07-17T23:04:40Z

In the data file also,

...
else:
imgs = np.load(self.filenames[idx])
bboxes = self.sample_bboxes[idx]
nz, nh, nw = imgs.shape[1:]
pz = int(np.ceil(float(nz) / self.stride)) * self.stride
ph = int(np.ceil(float(nh) / self.stride)) * self.stride
pw = int(np.ceil(float(nw) / self.stride)) * self.stride
imgs = np.pad(imgs, [[0,0],[0, pz - nz], [0, ph - nh], [0, pw - nw]], 'constant',constant_values = self.pad_value)

        xx,yy,zz = np.meshgrid(np.linspace(-0.5,0.5,int(imgs.shape[1]/self.stride)),   ##added int
                               np.linspace(-0.5,0.5,int(imgs.shape[2]/self.stride)),                 ##added int
                               np.linspace(-0.5,0.5,int(imgs.shape[3]/self.stride)),indexing ='ij')      ###added int
        coord = np.concatenate([xx[np.newaxis,...], yy[np.newaxis,...],zz[np.newaxis,:]],0).astype('float32')
        imgs, nzhw = self.split_comber.split(imgs)
        coord2, nzhw2 = self.split_comber.split(coord,
                                               side_len = int(self.split_comber.side_len/self.stride),
                                               max_stride = int(self.split_comber.max_stride/self.stride),
                                               margin = int(self.split_comber.margin/self.stride))
        assert np.all(nzhw==nzhw2)
        imgs = (imgs.astype(np.float32)-128)/128
        return torch.from_numpy(imgs), bboxes, torch.from_numpy(coord2), np.array(nzhw)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

classification loss - negative sample labels #6

classification loss - negative sample labels #6

DishaDRao commented Mar 14, 2021 •

edited

Loading

naoe1999 commented May 20, 2021

DishaDRao commented May 21, 2021

MHansy commented May 27, 2021

MHansy commented May 27, 2021

naoe1999 commented May 28, 2021

naoe1999 commented May 28, 2021

DishaDRao commented May 28, 2021 •

edited

Loading

naoe1999 commented Jun 28, 2021

SirMwan commented Jul 17, 2021

SirMwan commented Jul 17, 2021

DishaDRao commented Jul 17, 2021 •

edited

Loading

SirMwan commented Jul 17, 2021

SirMwan commented Jul 17, 2021

SirMwan commented Jul 17, 2021

classification loss - negative sample labels #6

classification loss - negative sample labels #6

Comments

DishaDRao commented Mar 14, 2021 • edited Loading

naoe1999 commented May 20, 2021

DishaDRao commented May 21, 2021

MHansy commented May 27, 2021

MHansy commented May 27, 2021

naoe1999 commented May 28, 2021

naoe1999 commented May 28, 2021

DishaDRao commented May 28, 2021 • edited Loading

naoe1999 commented Jun 28, 2021

SirMwan commented Jul 17, 2021

SirMwan commented Jul 17, 2021

DishaDRao commented Jul 17, 2021 • edited Loading

SirMwan commented Jul 17, 2021

SirMwan commented Jul 17, 2021

SirMwan commented Jul 17, 2021

DishaDRao commented Mar 14, 2021 •

edited

Loading

DishaDRao commented May 28, 2021 •

edited

Loading

DishaDRao commented Jul 17, 2021 •

edited

Loading