我需要在Py Torch实施多功能图像分类模式。 然而,我的数据并不平衡,因此我使用了Py Torch的WelsedRandomSampler
,以创建一种海关数据载荷。 但是,当我通过海关数据载荷检索时,我发现错误:IndexError: list index out of range
。
使用这一链接实施以下代码:https://discuss.pytorch.org/t/balanced-sampling-between-classes-with-torchvision-dataloader/273?u=sursubajramanian
def make_weights_for_balanced_classes(images, nclasses):
count = [0] * nclasses
for item in images:
count[item[1]] += 1
weight_per_class = [0.] * nclasses
N = float(sum(count))
for i in range(nclasses):
weight_per_class[i] = N/float(count[i])
weight = [0] * len(images)
for idx, val in enumerate(images):
weight[idx] = weight_per_class[val[1]]
return weight
weights = make_weights_for_balanced_classes(train_dataset.imgs, len(full_dataset.classes))
weights = torch.DoubleTensor(weights)
sampler = WeightedRandomSampler(weights, len(weights))
train_loader = DataLoader(train_dataset, batch_size=4,sampler = sampler, pin_memory=True)
根据https://stackoverflow.com/a/60813495/10077354的答复,以下是我的增订代码。 但当时我也创建了一个数据载体:loader = 数据检索(完整l_dataset,batch_size=4, 取样器=sampler)
,len(loader)
。
class_counts = [1691, 743, 2278, 1271]
num_samples = np.sum(class_counts)
labels = [tag for _,tag in full_dataset.imgs]
class_weights = [num_samples/class_counts[i] for i in range(len(class_counts)]
weights = [class_weights[labels[i]] for i in range(num_samples)]
sampler = WeightedRandomSampler(torch.DoubleTensor(weights), num_samples)
预祝福!
我包括一个基于以下公认答案的通用功能:
def sampler_(dataset):
dataset_counts = imageCount(dataset)
num_samples = sum(dataset_counts)
labels = [tag for _,tag in dataset]
class_weights = [num_samples/dataset_counts[i] for i in range(n_classes)]
weights = [class_weights[labels[i]] for i in range(num_samples)]
sampler = WeightedRandomSampler(torch.DoubleTensor(weights), int(num_samples))
return sampler
图像分析功能在数据集中发现每类图像的数量。 数据集中的每一行都包含图像和类别,因此我们考虑图中的第二个要素。
def imageCount(dataset):
image_count = [0]*(n_classes)
for img in dataset:
image_count[img[1]] += 1
return image_count