English 中文(简体)
采用有远见的方法 Random Py Torch的Sampler
原标题:Using WeightedRandomSampler in PyTorch

我需要在Py Torch实施多功能图像分类模式。 然而,我的数据并不平衡,因此我使用了Py Torch的WelsedRandomSampler,以创建一种海关数据载荷。 但是,当我通过海关数据载荷检索时,我发现错误:IndexError: list index out of range

使用这一链接实施以下代码:https://discuss.pytorch.org/t/balanced-sampling-between-classes-with-torchvision-dataloader/273?u=sursubajramanian

def make_weights_for_balanced_classes(images, nclasses):                        
    count = [0] * nclasses                                                      
    for item in images:                                                         
        count[item[1]] += 1                                                     
    weight_per_class = [0.] * nclasses                                      
    N = float(sum(count))                                                   
    for i in range(nclasses):                                                   
        weight_per_class[i] = N/float(count[i])                                 
    weight = [0] * len(images)                                              
    for idx, val in enumerate(images):                                          
        weight[idx] = weight_per_class[val[1]]                                  
    return weight 
weights = make_weights_for_balanced_classes(train_dataset.imgs, len(full_dataset.classes))
weights = torch.DoubleTensor(weights)
sampler = WeightedRandomSampler(weights, len(weights))

train_loader = DataLoader(train_dataset, batch_size=4,sampler = sampler, pin_memory=True)   

根据https://stackoverflow.com/a/60813495/10077354的答复,以下是我的增订代码。 但当时我也创建了一个数据载体:loader = 数据检索(完整l_dataset,batch_size=4, 取样器=sampler),len(loader)

class_counts = [1691, 743, 2278, 1271]
num_samples = np.sum(class_counts)
labels = [tag for _,tag in full_dataset.imgs] 

class_weights = [num_samples/class_counts[i] for i in range(len(class_counts)]
weights = [class_weights[labels[i]] for i in range(num_samples)]
sampler = WeightedRandomSampler(torch.DoubleTensor(weights), num_samples)

预祝福!

我包括一个基于以下公认答案的通用功能:

def sampler_(dataset):
    dataset_counts = imageCount(dataset)
    num_samples = sum(dataset_counts)
    labels = [tag for _,tag in dataset]

    class_weights = [num_samples/dataset_counts[i] for i in range(n_classes)]
    weights = [class_weights[labels[i]] for i in range(num_samples)]
    sampler = WeightedRandomSampler(torch.DoubleTensor(weights), int(num_samples))
    return sampler

图像分析功能在数据集中发现每类图像的数量。 数据集中的每一行都包含图像和类别,因此我们考虑图中的第二个要素。

def imageCount(dataset):
    image_count = [0]*(n_classes)
    for img in dataset:
        image_count[img[1]] += 1
    return image_count
最佳回答

该法典看上去是复杂的......。 你可以尝试如下:

#Let there be 9 samples and 1 sample in class 0 and 1 respectively
class_counts = [9.0, 1.0]
num_samples = sum(class_counts)
labels = [0, 0,..., 0, 1] #corresponding labels of samples

class_weights = [num_samples/class_counts[i] for i in range(len(class_counts))]
weights = [class_weights[labels[i]] for i in range(int(num_samples))]
sampler = WeightedRandomSampler(torch.DoubleTensor(weights), int(num_samples))
问题回答

奥基的答复中缺少一些关键信息。

请允许我说,你有10 000个样本,有10个。 你们要使用Weighted RandomSampler。 发给WeightedRandomSampler的<代码>重量/代码>是这10 000个样本中的每一个样本的重量,而不是这些类别。 因此,您必须计算这些样本的重量。

这里是这样做的一种方式。 用于一个热点编码标签:

# Assuming you already have created your train_dataset object which has all the labels stored.

def calc_sample_weights(labels, class_weights):
    return sum(labels * class_weights)

# Specify class weights. You can use any of the methods in the other answers to calculate class_weights.
class_weights = np.array([...])

# Create sample weights, i.e. weights for each of the 10000 samples.
sample_weights = [calc_sample_weights(label, class_weights) 
                                      for label in train_dataset.labels)]

# Create WeightedRandomSampler.
weighted_sampler = WeightedRandomSampler(sample_weights, len(train_dataset))

# Create Batch Sampler for retrieving batches of samples
batch_size = 32
batch_sampler = BatchSampler(weighted_sampler, batch_size, drop_last=False)

# Create train dataloader
train_loader = Dataloader(train_dataset, batch_sampler=batch_sampler)

在上述法典中,我们按要素计算抽样权重——乘以每一样本的类别——加权。 因此,如果类别权重为[1.0、0.5、0],而样本的标签为“1、0、1”的“一个照相”,那么该样本的总权重为1.0。 你可以用标签做类似的事情,这些标签不是单相加的,其方式是将类别——重量与样本的类别——标签指数挂钩,然后加权重。

通知,我们还设立了一名巴奇沙普勒。 这是因为,如果你重新取样,你就不直接使用权重。 相反,你应该使用BatchSampler。





相关问题
Calculating corresponding pixels

I have a computer vision set up with two cameras. One of this cameras is a time of flight camera. It gives me the depth of the scene at every pixel. The other camera is standard camera giving me a ...

Image comparison algorithm

I m trying to compare images to each other to find out whether they are different. First I tried to make a Pearson correleation of the RGB values, which works also quite good unless the pictures are a ...

How to recognize rectangles in this image?

I have a image with horizontal and vertical lines. In fact, this image is the BBC website converted to horizontal and vertical lines. My problem is that I want to be able to find all the rectangles in ...

Resources for Image Recognition

I am looking for a recommendation for an introduction to image processing algorithms (face and shape recognition, etc.) and wondered if anyone had an good recommendations, either for books, ...

How to programmatically disable the auto-focus of a webcam?

I am trying to do computer vision using a webcam (the model is Hercules Dualpix). I know it is not the ideal camera to use, but I have no choice here. The problem is the auto-focus makes it hard/...

Computing object statistics from the second central moments

I m currently working on writing a version of the MATLAB RegionProps function for GNU Octave. I have most of it implemented, but I m still struggling with the implementation of a few parts. I had ...

Viola-Jones face detection claims 180k features

I ve been implementing an adaptation of Viola-Jones face detection algorithm. The technique relies upon placing a subframe of 24x24 pixels within an image, and subsequently placing rectangular ...

Face detection and comparison

I m running a small research on face detection and comparison for my article. Currently, I m using rapid face detection based on haar like features based on OpenCV cascade (I ll implement learning ...

热门标签