'분류 전체보기' 카테고리의 글 목록 (4 Page)

카테고리 보기

Model Conversion between Tensorflow and Pytorch 2022.07.05
초 간단 논문리뷰 | Denoising Diffusion Probabilistic Models(DDPMs) 2022.06.15
초 간단 논문리뷰 | Graph-based Semi-supervised Learning 2022.05.17
초 간단 논문리뷰 | Graph Neural Networks란 2022.04.22

Model Conversion between Tensorflow and Pytorch

2022. 7. 5. 18:23

과거의 Tensorflow로 짜둔 모델을 Pytorch로 새롭게 훈련하지 않고 wegiht과 bias를 가져와서 변환을 해보고자 했습니다.
(2022.07.05 기준) 아직 테스트는 진행하지 못함. 추후에 샘플 모델 변환 테스트를 진행 후 코드를 추가로 올리도록 하겠습니다.
(2022.07.12 기준) Tensorflow -> ONNX -> Pytorch Test 완료

1. 직접 weight, bias 변경

https://blog.pingpong.us/torch-to-tf-tf-to-torch/

내부에서 사용하는 모델들을 PyTorch, TensorFlow 버전으로 재작성
1. 일단 모델 구조와 코드를 맞춰야 함
  
  idea: sklearn metric, preprocessing 사용하기 (변경해야할 코드 최소화)
  
  How to Convert from Keras/Tensorflow to PyTorch
모델의 모든 가중치를 변환해주는 코드를 추가 작성
1. TF와 Torch의 weight shape이 다름 → Transpose 하기
2. numpy로 변형 후 적용!
3. 각각 구현체의 순서가 다를 경우 분해 혹은 합쳐서 변형해야함…(이게 고될듯)

ex code (euni 조금 변형)

  # from TensorFlow Model to PyTorch Model
  # tf_module의 weight를 torch_module의 weight에 적용하는 예시
  tf_weights = tf_model.get_weights()[0]
  tf_biases = tf_model.get_weights()[1]

  ## TF와 Torch의 weight shape이 다름 → Transpose 하기
  torch_model.weight.data = torch.from_numpy(tf_weights)
  torch_model.bias.data = torch.from_numpy(tf_biases)

  # from PyTorch Model to TensorFlow Model
  weight = torch_model.state_dict()["weight"].detach().numpy()
  bias = torch_model.state_dict()["bias"].detach().numpy()

  ## TF와 Torch의 weight shape이 다름 → Transpose 하기
  tf_model.set_weights([weight, bias])

2. ONNX를 이용하는 방법

https://learnopencv.com/pytorch-to-tensorflow-model-conversion/

그림 출처: https://miro.medium.com

저작자표시 비영리 변경금지

'PYTHON으로 딥러닝하기' 카테고리의 다른 글

Custom Model TorchServing 성공기 (0)	2022.07.14
Model Conversion between Tensorflow and Pytorch \| From TF To Torch (0)	2022.07.12
Hyperparameters tuning \| Keras Tuner 튜토리얼 (0)	2022.04.22
Custom Pytorch Model serving with Flask (0)	2022.03.02
Python으로 딥러닝하기\| Generative Model & GAN (0)	2021.12.19

초 간단 논문리뷰 | Denoising Diffusion Probabilistic Models(DDPMs)

2022. 6. 15. 17:46

paper) https://arxiv.org/pdf/2006.11239.pdf (UC Berkeley)
code) https://github.com/hojonathanho/diffusion
original diffusion model paper) https://arxiv.org/pdf/1503.03585.pdf

요약
1. forward diffusion process $q(\mathbf{x}t|\mathbf{x}{t-1})$ : noise를 점점 증가 시켜 가면서 학습 데이터를 특정한(Gaussian) noise distribution으로 변환
2. reverse generative process $p_\theta(\mathbf{x}_{t-1}|\mathbf{x}_t)$ : noise distribution으로부터 학습 데이터를 복원(=denoising)하는 것을 학습단 위과정을 Markov Chain으로 표현.
왜 잘될까?
- 기존의 AutoEncoder와 같은 모델의 경우 Encoder 부분에서 어느 정도로 data가 encoding이 될지 알 수 없고 조절할 수 없었지만 Diffusion 모델의 경우 각 time step 마다 약간의 noise가 추가되고 reverse할 때 그 지정된 만큼(?), 종류(?)의 noise를 복원해내면 되기 때문이지 않을까..(마치 작은 tasks로 쪼개서 문제를 해결해나가듯)

https://lilianweng.github.io/posts/2021-07-11-diffusion-models/

1. Introduction

diffusion probabilistic model은 샘플에 매칭되는 데이터를 생성하기위해 variational inference를 사용하여 훈련된 parameterized Markov chain이다.

** Variational Inference 참고

(기계학습, Machine Learning) Week 1 Variational Inference | Lecture 1

** Markov chain: 마르코프 성질을 가진 이산 확률과정을 뜻합니다. (from wiki)

마르코프 성질은 과거와 현재 상태가 주어졌을 때의 미래 상태의 조건부 확률 분포가 과거 상태와는 독립적으로 현재 상태에 의해서만 결정된다는 것을 뜻한다.

이 Transition은 signal이 파괴될때까지(?= $x_0$까지) 샘플링의 반대반향으로 데이터에 노이즈를 점진적으로 추가하는 Markov chain의 diffusion process를 reverse하도록 학습된다.

diffusion이 적은 양의 Gaussian noise로 구성된 경우 샘플링 프로세스의 전환도 조건부 Gaussian으로 설정하면 충분하기때문에 간단한 neural network parameterization이 가능하다.

또한 diffusion model의 특정 parameterization은 훈련 중 multiple noise level에서 denoising score matching하는 것과 샘플링 중 Langevin dynamics 문제를 푸는 것과 동등하다는 것을 발견했다.

** Langevin dynamics (https://yang-song.github.io/blog/2021/score/)

Langevin dynamics provides an MCMC procedure to sample from a distribution $p(x)$ using only its score function $\nabla_\mathbf{x}\text{log} p(\mathbf{x})$ .

$s_\theta \approx \nabla_\mathbf{x}\text{log} p(\mathbf{x})$

Using Langevin dynamics to sample from a mixture of two Gaussians.

[Using Langevin dynamics to sample from a mixture of two Gaussians.]

이 논문에서는 diffusion model의 샘플링 프로세스가 autoregressive 모델로 일반화가 가능하다는 것과 autoregressive의 decoding과 유사한 일종의 점진적 decoding이라는 것을 보여준다.

** autoregressive decoder: 과거의 자기 자신을 사용하여 현재의 자신을 예측하는 모델이다. (from wiki)

$X_t = c+\Sigma_{i=1}^p \varphi_iX_{t-1} + \varepsilon_t$

2. Background

Forward process(diffusion process: from data to noise)

diffusion model이 다른 유형의 latent 모델과 구별되는 것은 대략적인 사후 확률 $q(\mathbf{x}\_{1:T}|\mathbf{x}_0)$ 이 데이터에 variance shedule $\beta_1, \dots, \beta_T$ 에 따라 Gaussian noise를 점진적으로 추가하는 Markov Chain에 고정된다는 것이다.
- $q({\bf x}_{1:T}|{\bf x}_0) := \Pi_{t=1}^T q({\bf x}_t|{\bf x}_{t-1}), \\ q({\bf x}_t|{\bf x}_{t-1}) := \mathcal{N}({\bf x}_t; \sqrt{1-\beta_t}{\bf x}_{t-1}, \beta_t \bf{I})$
forward process에서는 cosed form timestep t에서 xt를 샘플링하는 것을 허용한다. (한번에 0에서 t번째 샘플을 얻는 방법)
- $\alpha_t := 1-\beta_t, \bar\alpha_t := \Pi_{s=1}^t \alpha_s$
- $q(\mathbf{x}_t|\mathbf{x}_0) = \mathcal{N}(\mathbf{x}_t;\sqrt{\bar\alpha_t}\mathbf{x}_0, (1-\bar\alpha_t)\mathbf{I})$

Reverse process(generative process: from noise to data)

diffusion 모델은 pθ(x0):=∫pθ(x0:T)형식의 latent variable 모델이다.
- ( $\mathbf{x}_1, \cdots, \mathbf{x}_T$ 는 $\mathbf{x}_0 \sim q(\mathbf{x}_0$ )과 같은 차원의 latents를 갖는다.)
$p_\theta(x_{0:T})$ 는 reverse process라고 하며 Markov chain으로 $p(\mathbf{x}_T)=\mathcal{N}(x_T;0, \mathbf{I})$ 에서 시작하는 trained Gaussian Transition 으로 정의된다.

Objective

NNL(negative log likelihood): 훈련은 NNL을 최적화하여 수행
- $\mathbb{E}[-\text{log}p_\theta(\mathbf{x}0)] \le \mathbb{E}q[-\text{log}\frac{p\theta(\mathbf{x}{0:T})}{q(\mathbf{x}_{1:T}|\mathbf{x}0)}] \\ = \mathbb{E}q[-\text{log}p(\mathbf{x}T) - \Sigma{t\ge1}\text{log}\frac{p\theta(\mathbf{x}{t-1}|\mathbf{x}_t)}{q(\mathbf{x}t|\mathbf{x}{t-1})}] := L$

stochastic gradient descent으로 $L$ 의 random 항을 최적화함으로 효율적인 훈련이 가능하다. (Appendix A)

(모든 KL divergences는 Gaussian 분포들의 비교)

** KL divergences (from wiki)

: a measure of how one probability distribution P is different from a second, reference probability distribution Q.(=어떤 이상적인 분포에 대해, 그 분포를 근사하는 다른 분포를 사용해 샘플링을 한다면 발생할 수 있는 정보 엔트로피 차이를 계산한다.)

$D_{KL}(P||Q) = \Sigma P(x)log(\frac{P(x)}{Q(x)})$

** connections: Log likelihood, KL Divergence

The parameters that minimize the KL divergence are the same as the parameters that minimize the cross entropy and the negative log likelihood!

3. Diffusion models and denoising autoencoders

[detail은 논문 참고]

forward 프로세스의 분산 $\beta_t$ 와 reverse 프로세스의 model architecture, Gaussian distribution parameterization(mu)을 선택해야한다.

Forward process: $\beta_t$ 를 상수 취급(are fixed)하기 때문에 $L_T$ 상수취급하여 훈련에서 무시
Reverse process (L1:T−1):
- $\mu_\theta$ 를 평균 근사 훈련을 하여 $\tilde\mu_t$ 를 예측하거나 parameterization을 수정하여 $\epsilon$ 을 예측하도록 훈련.
- $\mu_\theta(\mathbf{x}_t, t) = \tilde{\mu}_t(\mathbf{x}_t, \frac{1}{\sqrt{\hat\alpha_t}}\big(\mathbf{x}t - \sqrt{1-\bar\alpha_t}\epsilon\theta(\mathbf{x}_t))\big) = \frac{1}{\sqrt\alpha_t}\big(\mathbf{x}t - \frac{\beta_t}{\sqrt{1-\bar\alpha_t}}\epsilon\theta(\mathbf{x}_t, t)\big)$
- ( ${\bf x}_{0}$ 을 예측할 수 있지만 실험 초기에 샘플 품질이 더 나빠지는 것으로 나타남)
Data scaling (L0):(샘플링이 끝나면 μθ(x1,1) 를 noise를 제거하고 표시한다.)
- 0 ~ 255로 구성되어있는 이미지 데이터는 $[-1, 1]$ 에서 선형 확장된다. 이렇게 하면 reverse process가 standard normal 사전분포 $p(\mathbf{x}_T)$ 에서 시작하여 일관되게 확장된 입력에 작동한다.
Simplified training objective
- Algorithm 1, 단순화된 objective를 이용한 전체 training 프로세스
  - variational bound의 변형에 대해 학습하는 것이 sample 품질(구현 더 간단)에 더 유리하다는 것을 발견함:
  - $L_{simple}(\theta) := \mathbb{E}_{t,{\bf x}0, \epsilon} [ ||\epsilon - \epsilon_\theta(\sqrt{\bar\alpha}{\bf x}_0 + \sqrt{1-\bar\alpha_t}\epsilon,t)||^2]$
- Algorithm 2, 전체 샘플링 프로세스는 데이터 밀도의 학습된 gradient로 $\theta$ 를 사용하는 Langevin dynamics 과 유사하다. ( $\epsilon_{\theta}$ is a function approximator intended to predict $\epsilon$ from ${\bf x}_t$ )

4. Experiments

[Details are in Appendix B.]

Our neural network architecture follows the backbone of PixelCNN++ [52], which is a U-Net [48] based on a Wide ResNet [72].

We replaced weight normalization [49] with group normalization [66] to make the implementation simpler.

Our 32 × 32 models use four feature map resolutions (32 × 32 to 4 × 4), and our 256 × 256 models use six.

All models have two convolutional residual blocks per resolution level and self-attention blocks at the 16 × 16 resolution between the convolutional blocks [6].

Diffusion time $t$ is specified by adding the Transformer sinusoidal position embedding [60] into each residual block.

정리하다 발견한 디테일한 노션정리: https://sang-yun-lee.notion.site/Denoising-Diffusion-Probabilistic-Models

읽어보면 좋은 논문)

Dall-e 2 : https://arxiv.org/pdf/2102.12092.pdf (OpenAI)

Imagen : https://gweb-research-imagen.appspot.com/paper.pdf (Google Research, Brain)

저작자표시 비영리 변경금지

'초 간단 논문리뷰' 카테고리의 다른 글

초 간단 논문리뷰 \| Phraseformer: Multimodal Key-phrase Extraction using Transformer and Graph Embedding (0)	2022.07.25
초 간단 논문리뷰 \| Graph-based Semi-supervised Learning (0)	2022.05.17
초 간단 논문리뷰 \| Graph Neural Networks란 (0)	2022.04.22
초 간단 논문리뷰 \| data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language (0)	2022.03.30
Self-supervised Learning 이란 \| CV, NLP, Speech (0)	2022.03.29

초 간단 논문리뷰 | Graph-based Semi-supervised Learning

2022. 5. 17. 17:42

(이 포스팅에서는 논문 전체를 다루지 않고 Graph-based SSL의 기초적인 부분(Chapter4.A까지)만 다룹니다.)

Abstract

Semi-supervised learning (SSL)은 labeld data와 unlabeled data 모두를 통합하는 기능이 있기 때문에 상당한 가치가 있다. Graph-based semi-supervised learning (GSSL)은 다양한 도메인에서 장점이 있다는 것이 입증되었다. 논문의 주요한 기여는 graph regularization과 graph embedding methods를 포함한 GSSL의 새로운 일반화 분류법에 있다.

1. Introduction

Graph-based SSL (GSSL)은 graph 구조상 중요한 manifold를 압축하여 반영하여 자연스럽게 사용될 수 있기때문에 특히 유망하다. graph는 구조적으로 nodes는 모든 samples를 표현할 수 있고 weighted edges는 nodes의 쌍 사이의 유사성을 반영한다.

[GSSL의 주요 과정]

Step 1. Graph construction.

A similarity graph는 labeled, unlabeled samples를 포함한 데이터로 구성되어있다.

original samples사이의 관계를 잘 표현하는 것이 중요한 과제이다.

Step 2. Label inference.

이전 step에서 만들어진 graph 구조 정보를 통합하여 labeled samples로 부터 unlabeled samples로 label 정보를 전파하는 것을 수행한다.

(this paper) Label inference methods are then categorized into two main groups: graph regularization methods and graph embedding methods. [논문참고]

2. Background and Definition

A Comprehensive Survey on Graph Neural Networks

[논문참고]

3. Graph construction

Goal: $G = (V,E,W)$ , V: nodes set, E : edges, W: associate weights(=인접행렬)
Assumption
1. undirected graph임 → W: symmetric
2. edge weights는 non-negative → $W_{ij} \ge 0, \forall i \ne j$
3. no self-loop
Edges: similarity weights computed from featureshttps://junhyungbyun.github.io/Graph-based-Semi-Supervised-Learning/
: x3는 similarity weights(=edge)가 강한 x1과 같은 label을 가질 가능성이 높다

https://junhyungbyun.github.io/Graph-based-Semi-Supervised-Learning/

A. Unsupervised methods

기존의 보편적인 전략, graph 구성시 label 정보 무시

k-nearest-neighbor graph→ 밀도는 고려하지 않음
$W_{ij} = \begin{cases} sim({\bf x}_i, {\bf x}_j) & i\in \mathcal{N}(j) \\ 0 & otherwise \end{cases}$
b-Matching:
- (a) graph sparsification (edge selection) $P\in {0,1}^{n\times n}$ , 1: egde 존재 $\Delta \in \mathcal{R}_{+}^{n\times n}$ : symmetric distance matirx
- $\Sigma_j P_{ij} = b, P_{ii}=0, P_{ij}=P_{ji}, \forall 1 \le i, i \le n$
- $min_{P\in{0,1}^{n\times n}} \Sigma_{i,j} P_{ij} \Delta_{ij}$
- (b) edge re-weighting (W구하기)
  - Binary kernel, Gaussian kernel(=RBF Kernel), Locally linear reconstruction
  - → similarity graph construction시 제한을 줌 ⇒ label inference시 더 균형잡힌 전파 가능

B. Supervised methods

lable 정보를 prior knowledge로서 graph 생성시 사용할 수 있다. ⇒ supervised information를 추가하면 inference 과정을 촉진할 수 있다.

C. ETC(논문에는 언급되지 않음)

fully connected graph: weight decays with distance $w_{ij} = exp(\frac{-||x_i - x_j||^2}{\sigma^2})$

$\epsilon$ -radius graph (k-nn과 유사하지만 개수를 제한하지 않음, 단 주변 노드가 없을 수 있음)

4. Graph regularization

general regularization framework
$\mathcal{L}(f) = \Sigma_{(x_i,y_i)\in\mathcal{D}}\underbrace{\mathcal{L}_s(f(x_i),y_i)}{supervised\ loss} + \mu \ \Sigma_{x\in\mathcal{D}l+\mathcal{D}_u}\underbrace{\mathcal{L}_r(f(x_i))}{regularization\ loss}$

A. label propagation

주어진 label은 fix해야함. labeled nodes의 정보가 edges를 통해 unlabeled nodes로 label을 예측할 때 가이드하는 역할을 한다.

[basic label propagation 알고리즘]

Step 1. All nodes propagate labels for one step $Y\leftarrow TY$

Step 2. Row-normalize Y to maintain the class probability interpretation.

Step 3. Clamp the labeled data. Repeat from step 2 until Y converges.

1) Gaussian random fields

이차 손실함수의 경우 infinity weight를 가짐 → closed-form으로 만드는 harmonic function을 사용

** Harmonic function: unlabeled data의 경우 descrete labels({0,1})를 continuous values로 표현할 수 있다.

$E(f) = \mathcal{L}_r = \frac{1}{2}\Sigma_{i,j} W_{ij}(f(x_i)-f(x_j))^2$

$f(x_i) = f_l(x_i) \equiv y_i$ , $f$ 를 최소화

unlabeled nodes는 제약조건 $Lf = 0$ 을 만족함
unlabeled node는 neighbor nodes의 평균
$f(x_j) = \frac{1}{d_j}\Sigma_{i\sim j} W_{ij}f(x_i)$ , $j = l+1, \dots, l+u$
반복적인 방식으로 해석: $f^{(t+1)} \leftarrow P\cdot f^{(t)}$ , $P = D^{-1}W$

$L$ : Laplacian matrix

Diagonal degree matrix $D: D_{ii} = \Sigma_{j=1}^nW_{ij}$ ** degree: node에 연결된 edges 개수
Graph Laplacian matrix
- $\Delta = D-W$

→ 가중치 행렬을 4 blocks로 분할하면 추론가능

$W = \begin{bmatrix} W_{ll} & W_{lu} \ W_{ul} & W_{uu} \end{bmatrix}$

$f_u = (D_{uu}-W_{uu})^{-1}W_{ul}f_l = (I-P_{uu})^{-1}P_{ul}f_l$

→ 단점)

(a) label을 fix하고 진행하여 noise에 약함 (유연하지 않음)

(b) 새로운 test points에 대해서 직접적으로 적용할 수 없음 (Figure 2. Transductive)

2) Local and global consistency(LGC)

$\mathcal{L}(f) = \frac{1}{2}(\Sigma_{i,j} W_{ij}(\frac{1}{\sqrt{D_{ii}}}f(x_i)- \frac{1}{\sqrt{D_{jj}}}f(x_j))^2)+\mu\ \Sigma_{i=1}^{n_l}(f(x_i)-y_i)^2$

→ GRF와 다른점:

(a) labeled node가 seed label과 같지 않을 수 있음 (=유연함, noise가 있을 때 유리)

(b) $\frac{1}{\sqrt{D_{ii}}}$ (degree)로 각 node의 label에 penalty를 줄 수 있음. 상위 nodes의 영향력을 regularization할 수 있고 불규칙한 graph에도 보장함

저작자표시 비영리 변경금지

'초 간단 논문리뷰' 카테고리의 다른 글

초 간단 논문리뷰 \| Phraseformer: Multimodal Key-phrase Extraction using Transformer and Graph Embedding (0)	2022.07.25
초 간단 논문리뷰 \| Denoising Diffusion Probabilistic Models(DDPMs) (0)	2022.06.15
초 간단 논문리뷰 \| Graph Neural Networks란 (0)	2022.04.22
초 간단 논문리뷰 \| data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language (0)	2022.03.30
Self-supervised Learning 이란 \| CV, NLP, Speech (0)	2022.03.29

초 간단 논문리뷰 | Graph Neural Networks란

2022. 4. 22. 21:21

Components of Graph Network

A Comprehensive Survey on Graph Neural Networks

paper: https://arxiv.org/pdf/1901.00596.pdf

image classification, natural language understanding 등 기존 tasks의 데이터는 Euclidean space에서 표현했다. 하지만 application 수가 증가하며 생성된 데이터는 non-Euclidean domains이고 이는 object 사이에 복잡한 관계를 갖는 graph로 표현된다.

1. Introduction

graphs는 불규칙적일 수 있다. 다양한 순서가 없는 node를 가질수도, node에 연결된 nieghbors의 숫자도 다를 수 있다. 따라서 graph domain에는 다르게 적용해야한다.

graph 데이터를 다루기위해 기존의 딥러닝 CNNs, RNNs, autoencoder 방법으로부터 새로운 일반화를 통한 응용한 연구가 진행되어왔다. 예를 들어 graph convolution은 2D convolution의 일반화가 될 수 있다. 2D convolution과 유사하지만 graph convolution은 node’s neighborhood 정보의 weight average를 가진다.

** manifolds: 국소적으로는 유클리드 공간과 닮았지만 대역적으로는 위상 공간이다.

2. BackGround & Definition

outlines the background of graph neural networks, lists commonly used notations, and defines graph-related concepts.

A. Background

A brief history of graph neural networks (GNNs)

이전 연구들은 recurrent graph neural networks의 카테고리였다. 고정된 point에 도달할때 까지 neighbor information을 전파하는 방법으로 target node’s representation을 학습한다. 하지만 이 프로세스는 컴퓨팅 비용이 비싸며 이 문제를 극복하기 위한 노력이 증가해왔다.

2009년에는 RNN(message passing)의 아이디어를 상속하면서 구조적으로 non-recursive layer를 합성하여 graph mutual dependency를 해결하는 RecGNNs(Micheli et al.) 방법이 제안되었다.

CV 도메인에서 CNN의 성공으로 graph를 데이터를 병렬적 개발하기 위한 convolution이 재정의되었다. ConvGCNNs(Bruna et al.(2013)) 은 spectral-based 접근법과 spatial-based 접근법으로 주요하게 나눠진다.

Graph neural networks vs. network embedding

network embedding의 목적은 low-dimensional vector로 representation을 생성하고 network topology structure와 node content information을 모두 보존하여 classification, clustering, recommendation등을 쉽게 수행할수 있도록 하는 것이다. matrix factorization, random walks 방법(non-deep learning method)을 사용한다. (예시, support vector machines for classification)

GNNs은 딥러닝 모델이며 end-to-end graph와 연관된 task를 해결하는 것이 목적이다. GNNs는 다양한 task를 위한 neural network model의 그룹이며 다양한 network embedding을 포함하고 있다.

Graph neural networks vs. graph kernel methods

GNNs과 유사하게 graph kernels는 mapping function을 이용하여 graphs or nodes를 vector space로 임베딩할 수 있다. mapping function이 learnable하기보다는 deterministic하다는 점이 다르다.

B. Definition

Definition 1 (Graph):

$G=(V,E)$ , $V$ : nodes, $E$ : set of edges

$v_i \in V, e_{ij} = (v_i, v_j) \in E$

neighborhood of a node $N(v) = {u\in V|(v,u) \in E}$

adjacency matrix $A$ is $n \times n$ matrix

** 인접 행렬: edges의 관계를 표현

$\begin{cases} A_{ij} = 1 ,\ if\ e_{ij} \in E \ A_{ij} = 0,\ if\ e_{ij} \notin E \end{cases}$

node attributes matrix $X \in R^{n\times d}$

node feature attribute $x_v \in R^d$

edge attribute $X^e \in R^{m\times c}$

edge feature attribute $x_{v,u}^e \in R^c$

Definition 2 (Directed Graph):

directed(방향이 있는) graph는 node로부터 다른 node로 향하는 edges가 있는 것이다. adjacecy matrix가 대칭적인 경우에만 undirected graph이다.

Definition 3 (Spatial-Temporal Graph):

$G^{(t)} = (V,E,X^{(t)}), X^{(t)} \in R^{n\times d}$ : node attributes matrix도 고려!

spatial-temporal graph는 node attributes가 시간에 따라 dynamic하게 변화하는 attributed graph이다.

3. Categorization and Frameworks

clarifies the categorization of graph neural networks

A. Taxonomy of Graph Neural Netowrks (GNNs)

Recurrent graph neural networks (RecGNNs): GNN의 개척자라고 볼 수 있다.

RecGNN의 목적은 recurrent neural architectures로 node representation을 학습하는 것이다. graph의 node은 stable equilibrium(=변화하지 않는 안정된 상태)에 도달할 때까지 neighbor와 정보를 교환한다고 가정한다.

Convolutional graph neural networks (ConvGNNs): grid data로 부터 graph data로 convolution 연산을 일반화한다.

주된 아이디어는 feature $x_v$ 와 neighbors’ features $x_u$ (where $u \in N(v)$ )를 조합하여 node $v$ ’s features $x_u$ 를 생성하는 것이다. RecGNNs와 달리 ConvGNNs는 high-level node representation을 추출하기 위해 multiple graph convolutional layers를 쌓는다.

Figure 2a shows a ConvGNN for node classification.

Figure 2b demonstrates a ConvGNN for graph classification.

Graph autoencoders (GAEs): unsupervised learning framework이다.

node/graph를 latent vector space로 인코딩하고 encoded information으로부터 graph data를 재구성한다. network embedding과 graph generative distribution을 학습하는데 유용한다.

Figure 2c presents a GAE for network embedding.

Spatial-temporal graph neural networks (STGNNs): hidden patterns를 학습하는 것이 목적이다.

핵심 아이디어는 spatial dependency와 temporal dependency를 한번에 고려하는 것이다.

Figure 2d illustrates a STGNN for spatial-temporal graph forecasting.

B. Frameworks

mechanisms

Node-level outputs relate to node regression and node classification tasks. With a multi-perceptron or a softmax layer as the output layer, GNNs are able to perform node-level tasks in an end-to-end manner.
Edge-level outputs relate to the edge classification and link prediction tasks. two nodes’ hidden representations로 the label/connection strength of an edge 예측할 수 있다.
Graph-level outputs relate to the graph classification task. graph level에서 compact representation를 얻을 수 있다.

Training Frameworks

Many GNNs (e.g., ConvGNNs) can be trained in a (semi-) supervised or purely unsupervised way within an end-to-end learning framework, depending on the learning tasks and label information available at hand.

Semi-supervised learning for node-level classification
Supervised learning for graph-level classification
Unsupervised learning for graph embedding

저작자표시 비영리 변경금지

'초 간단 논문리뷰' 카테고리의 다른 글

초 간단 논문리뷰 \| Denoising Diffusion Probabilistic Models(DDPMs) (0)	2022.06.15
초 간단 논문리뷰 \| Graph-based Semi-supervised Learning (0)	2022.05.17
초 간단 논문리뷰 \| data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language (0)	2022.03.30
Self-supervised Learning 이란 \| CV, NLP, Speech (0)	2022.03.29
초 간단 논문리뷰 \| Knowledge Distillation: A Survey (0)	2021.08.27

PREV 1 2 3 4 5 6 7 ···22 NEXT

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

매일매일 딥러닝

카테고리 보기

Model Conversion between Tensorflow and Pytorch

1. 직접 weight, bias 변경

2. ONNX를 이용하는 방법

'PYTHON으로 딥러닝하기' 카테고리의 다른 글

초 간단 논문리뷰 | Denoising Diffusion Probabilistic Models(DDPMs)

1. Introduction

2. Background

3. Diffusion models and denoising autoencoders

4. Experiments

'초 간단 논문리뷰' 카테고리의 다른 글

초 간단 논문리뷰 | Graph-based Semi-supervised Learning

Abstract

1. Introduction

2. Background and Definition

3. Graph construction

A. Unsupervised methods

B. Supervised methods

4. Graph regularization

A. label propagation

'초 간단 논문리뷰' 카테고리의 다른 글

초 간단 논문리뷰 | Graph Neural Networks란

Components of Graph Network

A Comprehensive Survey on Graph Neural Networks

1. Introduction

2. BackGround & Definition

A. Background

B. Definition

3. Categorization and Frameworks

A. Taxonomy of Graph Neural Netowrks (GNNs)

B. Frameworks

'초 간단 논문리뷰' 카테고리의 다른 글

+ Recent posts

티스토리툴바

단축키

내 블로그

블로그 게시글

모든 영역