English 中文(简体)
Pytorch Geomet graphbatching not using Data Loader for Reinforcement learning
原标题:Pytorch Geometric graph batching not using DataLoader for Reinforcement learning

I am pretty new to using Graph Neural Networks (GNN). I am using PyTorch Geometric. I am creating a reinforcement learning algorithm, and I would therefore like to avoid using the inbuilt DataLoader as I generate data/observations on the go. However, I am encountering an issue when passing a batch of PyTorch Geometric Graphs. I have a numpy memory array with PyG graphs. I pick from this memory and try to push it through the neural network (NN).

通过CNN复制一幅图表似乎行不通。 我在每个节点都有代表。 然而,在使用批量时会出现问题。 通常,我可以创建一种 n子。 然而,Py Torch不能这样做,因为它无法处理PyG的数据类型。 因此,我使用Py Torch地球物理测量仪的构造功能制造了一种电池。 它是通过神经网络进行的;然而,产出层面似乎被忽视。 图表似乎合并为一个单一物体,然后作为单一图表通过。 然而,我期望的是<代码>[批号_size, n_nodes]而不是<代码>[批号_size * n_nodes]。 我不知道我是否正确这样做。 是否有更好的办法处理这一问题,以避免问题的严重性? 我并不相信,我只能把各阵列的每个N_nodes分开。

一种选择是,使用排行车将每个图表推向前进的通道,但这种做法效率很低。 也许有一个我失踪的简单环境? 我举了一个工作例子。

提前感谢。

import torch as T
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch_geometric.nn import GCNConv
from torch_geometric.data import Batch
from torch_geometric.data import Data
import numpy as np


class DeepNetworkGCN(nn.Module):
    def __init__(self, lr=0.001, input_dims=[1], fc1_dims=128, fc2_dims=128, out_dims=[1]):
        super(DeepNetworkGCN, self).__init__()

        # CNN part of network
        self.GCNconv1 = GCNConv(*input_dims, fc1_dims)
        self.GCNconv2 = GCNConv(fc1_dims, fc2_dims)

        # conform to output dimension
        self.fc1 = nn.Linear(fc2_dims, *out_dims)

        self.optimizer = optim.Adam(self.parameters(), lr=lr)
        self.loss = nn.MSELoss()
        self.device = T.device( cuda:0  if T.cuda.is_available() else  cpu )
        self.to(self.device)

    def forward(self, state):
        # Process graph data using GCN layers
        x = self.GCNconv1(state.x, state.edge_index)
        x = F.relu(x)
        x = self.GCNconv2(x, state.edge_index)

        # Final fully connected layer
        out = self.fc1(x)

        return out


def random_pyg_graph(num_nodes=3):  
    # random node features
    node_features = T.randint(0, 5, (num_nodes, 1), dtype=T.float)

    # random edge features
    edge_features = T.randn(num_nodes, num_nodes)

    # random edge indices
    edge_index = T.randint(0, num_nodes, (2, num_nodes * 2))

    # Remove self-loops
    edge_index = edge_index[:, edge_index[0] != edge_index[1]]

    # graph
    graph_data = Data(x=node_features, edge_index=edge_index, edge_attr=edge_features)

    return graph_data


# setup example
batch_size = 3
memory = np.zeros(batch_size, dtype=object)

# fill memory
for i in range(batch_size):
    memory[i] = random_pyg_graph()

# define model
CNN = DeepNetworkGCN()

# test for single PyG
output = CNN.forward(memory[0])
print(output)
# output 1 for each node e.g.
# tensor([[0.3770],
#        [0.6119],
#        [0.2014]], grad_fn=<AddmmBackward0>)

# test for numpy.ndarray
# FAILS! # FAILS! # FAILS!
# output = CNN.forward(memory[:]) # FAILS!
# FAILS! # FAILS! # FAILS!

# Create batch and do forward pass.
output = CNN.forward(Batch.from_data_list(memory[:]))
print(output)
# output dimension is weird. ( n_nodes*batch_size).
# tensor([[ 0.0173],
#         [ 0.0316],
#         [ 0.0282],
#         [ 0.0147],
#         [-0.0201],
#         [-0.0264],
#         [ 0.0147],
#         [-0.0084],
#         [ 0.0021]], grad_fn=<AddmmBackward0>)


问题回答

在Py Torch,如果你重新生成有关传单的数据集,则采用。 只需要<代码>__iter__而不是_len__/code>和__getitem__

摘自here:

import torch
from torch.utils.data.dataloader import DataLoader
from torch.utils.data import IterableDataset

class DataStream1(IterableDataset):

    def __init__(self) -> None:
        super().__init__()
        self.size_input = 4
        self.size_output = 2

    def generate(self):
        while True:
            x = torch.rand(self.size_input)
            y = torch.rand(self.size_output)
            yield x, y

    def __iter__(self):
        return iter(self.generate())

dataset = DataStream1()

train_loader = DataLoader(dataset=dataset)

for i, data in enumerate(train_loader):
    print (i, data)

如果你最终需要把多个条码(合并到某个时候,就会查到





相关问题
How to process large dataset in pytorch DDP mode?

I have a large Dataset about 900G. The memory of my machine is 1T. I want to train a model in distributed training mode. I have 10 gpus. I used to use tensorflow and horovod to make it. I split the ...

Tensorflow cannot detect CUDA enabled device

I have an RTX 4070 on my Dell XPS laptop that also comes with an Intel IRIS Xe Graphics card. I am using Windows 11. I have NVIDIA Graphics Driver Version 535.98 installed on my system and has support ...

No module named mmcv._ext

Tried to train the model but got the mmcv error No module named mmcv._ext mmcv library is already installed and imported mmcv version = 1.4.0 Cuda version = 10.0 Any suggestions to fix the issue??