English 中文(简体)
使用 ResNet50 创建[w, h, f] 的特性阵列
原标题:Using ResNet50 to create a feature tensor of [w, h, f]
I m trying to implement this paper but I m not following something in it. It wants me to use ResNet50 to extract features from an image but tells me the extracted features will be of dimension [w, h, f]. Everything I m seeing with ResNet50, though, is giving me back a tensor of [f] (as in, it turns my whole image into features and not my pixels into features) Am I reading this wrong or do I just not understand what I m supposed to be doing with ResNet50? Relevant quotes from paper: "We obtain an intermediate visual feature representation Fc of size f. We use the ResNet50 [26] as our backbone convolutional architecture." "In a first step, the three-dimensional feature Fc is reshaped into a two-dimensional feature by keeping its width, i.e. obtaining a feature shape (f × h, w)."
问题回答
first install timm, torch python packages via pip create model and load pre-trained weights import timm import torch model = timm.create_model( resnet50 , pretrained=True, features_only=True) # convert image torch tensor as ( nimages, channels, height, width ) ex- (1,3,224, 224) features = model( image ) print( features.shape ) (1, 2048, 224, 224)
I didn t read the paper in detail, but when they say [w, h, f] I don t think the w and h have to match the width and height of the original image. They likely just mean that if the output of your ResNet after the last Conv + Pooling layer is [w, h, f], you reshape it into 2d (making it it [fxh, w]) and then pass it through a fully-connected layer to make it f dimensional. Something like this import torch import torch.nn as nn import torchvision.models as models resnet = models.resnet50(pretrained=True) # Remove the last fully connected layer and adaptive pooling layers resnet = torch.nn.Sequential(*list(resnet.children())[:-2]) # Dummy image of shape [1, 3, 224, 224] image = torch.randn(1, 3, 224, 224) intermediate_features = resnet(image) # This will be [1, 2048, 7, 7] batch_size, channels, h, w = intermediate_features.size() # [1, 14336, 7] where f=14336 and w=7 reshaped_features = intermediate_features.view(batch_size, channels * h, w) fc_layer = nn.Linear(w, 1) # This layer reduces the w dimension to 1 final_output = fc_layer(reshaped_features) # [1, 14336, 1] final_output = final_output.squeeze(-1) # [1, 14336] print(final_output.shape) (My example also has batch size as a dimension because in the real world you work with batches of examples)




相关问题
Calculating corresponding pixels

I have a computer vision set up with two cameras. One of this cameras is a time of flight camera. It gives me the depth of the scene at every pixel. The other camera is standard camera giving me a ...

Image comparison algorithm

I m trying to compare images to each other to find out whether they are different. First I tried to make a Pearson correleation of the RGB values, which works also quite good unless the pictures are a ...

How to recognize rectangles in this image?

I have a image with horizontal and vertical lines. In fact, this image is the BBC website converted to horizontal and vertical lines. My problem is that I want to be able to find all the rectangles in ...

Resources for Image Recognition

I am looking for a recommendation for an introduction to image processing algorithms (face and shape recognition, etc.) and wondered if anyone had an good recommendations, either for books, ...

How to programmatically disable the auto-focus of a webcam?

I am trying to do computer vision using a webcam (the model is Hercules Dualpix). I know it is not the ideal camera to use, but I have no choice here. The problem is the auto-focus makes it hard/...

Computing object statistics from the second central moments

I m currently working on writing a version of the MATLAB RegionProps function for GNU Octave. I have most of it implemented, but I m still struggling with the implementation of a few parts. I had ...

Viola-Jones face detection claims 180k features

I ve been implementing an adaptation of Viola-Jones face detection algorithm. The technique relies upon placing a subframe of 24x24 pixels within an image, and subsequently placing rectangular ...

Face detection and comparison

I m running a small research on face detection and comparison for my article. Currently, I m using rapid face detection based on haar like features based on OpenCV cascade (I ll implement learning ...

热门标签