塞进裤子ヾ(≧O≦)〃嗷~

0%

神经网络基础(二)

MLP

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
import torch
import torch.nn as nn
import torch.nn.functional as F

seed = 1337
torch.manual_seed(seed)
torch.cuda.manual_seed_all(seed)

class MLP(nn.Module):
def __init__(self, inputDim, hiddenDim, outputDim):
super(MLP,self).__init__()
self.fc1 = torch.nn.Linear(in_features=inputDim,out_features=hiddenDim)
self.fc2 = torch.nn.Linear(in_features=hiddenDim, out_features=outputDim)
#确保层中的输出数量等于下一层的输入数量。

def forward(self, x_in, apply_softmax = False):
"""
:param x_in(torch.Tensor): an input data tensor. x_in.shape should be (batch, input_dim)
:param apply_softmax:(bool): a flag for the softmax activation
should be false if used with the Cross Entropy losses
:return:the resulting tensor. tensor.shape should be (batch, output_dim)
"""
intermediate = F.relu(self.fc1(x_in))
output = self.fc2(intermediate)
#添加Dropout:以0.5概率丢弃
#output = self.fc2(F.dropout(intermediate, p=0.5))

if apply_softmax:
output = F.softmax(output, dim=1)
return output

batch_size = 2 # number of samples input at once
input_dim = 3
hidden_dim = 100
output_dim = 4

# Initialize model
mlp = MLP(input_dim, hidden_dim, output_dim)
print(mlp)
-----
MLP(
(fc1): Linear(in_features=3, out_features=100, bias=True)
(fc2): Linear(in_features=100, out_features=4, bias=True)
)

项目实战

使用MLP进行姓氏分类,输入名字判断国籍

https://github.com/drawStar/Machine-Learning/tree/master/pytorch/surnameClassify

CNN

dilation

扩张卷积(dilated convolutions)又名空洞卷积(atrous convolutions)

向卷积层引入了一个称为 “扩张率(dilation rate)”的新参数,该参数定义了卷积核处理数据时各值的间距。换句话说,相比原来的标准卷积,扩张卷积(dilated convolution) 多了一个hyper-parameter(超参数)称之为dilation rate(扩张率),指的是kernel各点之前的间隔数量,正常的convolution 的 dilatation rate为 1.

mark

Re:

https://www.cnblogs.com/marsggbo/p/9737991.html

优点:

扩展卷积在保持参数个数不变的情况下增大了卷积核的感受野,同时它可以保证输出的特征映射(feature map)的大小保持不变。一个扩张率为2的3×3卷积核,感受野与5×5的卷积核相同,但参数数量仅为9个

相关概念

卷积维度

根据数据性质,卷积可以是一维、二维或三维的,分别由Conv1d、Conv2d和Conv3d模块实现 。

我們假設一個序列是600個單詞,每個單詞的詞向量是300維,那麼一個序列輸入到網路中就是(600,300),當我使用Conv1D進行卷積的時候,實際上就完成了直接在序列上的卷積,卷積的時候實際是以(3,300)進行卷積,又因為每一行都是一個詞向量,因此使用Conv1D(kernel_size=3)也就相當於使用神經網路進行了n_gram=3的特徵提取了。

换句话说,Conv1D(kernel_size=3)实际就是Conv2D(kernel_size=(3,300)),当然必须把输入也reshape成(600,300,1),即可在多行上进行Conv2D卷积。600为seqlength ,300是dim。

https://zhuanlan.zhihu.com/p/84888798

NLP中的卷积运算大多是一维的卷积 .

1
2
#Convd
torch.nn.Conv1d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True)

in_channels(int) – 输入信号的通道。在文本分类中,即为词向量的维度embedding dim,若是one-hot则是Vocabulary的大小。
out_channels(int) – 卷积产生的通道(卷积核的个数)。有多少个out_channels,就需要多少个1维卷积
kerner_size(int or tuple) - 卷积核的尺寸,卷积核的大小为(k,),第二个维度是由in_channels来决定的,所以实际上卷积大小为kerner_size*in_channels
stride(int or tuple, optional) - 卷积步长,默认为1
padding (int or tuple, optional)- 输入的每一条边补充0的层数
dilation(int or tuple, `optional``) – 卷积核元素之间的间距
groups(int, optional) – 从输入通道到输出通道的阻塞连接数
bias(bool, optional) - 如果bias=True,添加偏置

注意:

mark

1
2
3
4
5
6
7
8
9
10
11
12
13
batch_size = 2
one_hot_size = 8#embedding_size
sequence_length = 5
data = torch.randn(batch_size, one_hot_size, sequence_length)
#注意data的顺序 [batch_size,embedding_size,sequence_length]
conv1 = nn.Conv1d(in_channels=one_hot_size, out_channels=6,kernel_size=3)
intermediate1 = conv1(data)
#大小为[batch_size,out_channels,卷积后大小]
print(data.size())
print(intermediate1.size())
-----
torch.Size([2, 8, 5])#[batch_size,embedding_size,sequence_length]
torch.Size([2, 6, 3])#[batch_size,out_channels,卷积后大小]
Note:

RNNs want the input with the batch dimension second, whereas CNNs want the batch dimension first.

if your sentence(s) are shorter than the largest filter used. You will then have to pad your sentences to the length of the largest filter

channel

channels refers to the feature dimension along each point in the input.

如在图像中,对应于RGB组件的图像中的每个像素有三个通道。在使用卷积时,文本数据也可以采用类似的概念。从概念上讲,如果文本文档中的“像素”是单词,那么通道的数量就是词汇表的大小。

mark

A common design pattern is not to shrink the number of channels by more than a factor of two from one convolutional layer to the next 。

kernel_size

NLP中kernel_size对应于n-grams

1*1 卷积

Network-in-Network(NiN) Connections

This is useful in mapping from feature maps with many channels to shallower feature maps

mark

权值共享

对于一个100*100像素的图像,使用3*3大小的卷积核卷积,则这个卷积核的参数对这个图像而言是共同享有的。

项目实战

使用MLP进行姓氏分类,输入名字判断国籍

https://github.com/drawStar/Machine-Learning/tree/master/pytorch/surnameClassify

RNN

vanilla/elman RNN

RNN反向传播的计算

https://zhuanlan.zhihu.com/p/28687529

timestep的理解
请问rnn和lstm中batchsize和timestep的区别是什么? - 知乎
https://www.zhihu.com/question/279046805/answer/405409960

mark

mark

上图有点问题,a和x各有一个权重矩阵,激活函数g1,vanilla RNN一般使用tanh。

mark

mark

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43

class ElmanRNN(nn.Module):
def __init__(self, input_size, hidden_size, batch_first=False):
super(ElmanRNN, self).__init__()
self.rnn_cell = nn.RNNCell(input_size, hidden_size)
self.batch_first = batch_first
self.hidden_size = hidden_size

def _initialize_hidden(self, batch_size):
return torch.zeros((batch_size, self.hidden_size))

def forward(self, x_in, initial_hidden=None):
"""
:param x_in: if batch_first: [batch_size, seq_length, feature_size]
else: [seq_length, batch_size, feature_size]
:param initial_hidden: 指定初始的hidden state,若未提供,则使用0
:return: hidden vector at each time step
if batch_first: [batch_size, seq_length, hidden_size]
else: [seq_length, batch_size, hidden_size]
"""
if self.batch_first: # 修改为[seq_length, batch_size, feature_size]
batch_size, seq_size, feat_size = x_in.size()
x_in = x_in.permute(1,0,2)
else:
seq_size, batch_size, feat_size = x_in.size()

hiddens = [] # 保存每个时间步的hidden state
if initial_hidden is None:
initial_hidden = self._initial_hidden(batch_size)
initial_hidden = initial_hidden.to(x_in.device)

hidden_t = initial_hidden # [batch_size, hidden_size]

for t in range(seq_size):
# x_in [seq_size, batch_size, feature_size]
hidden_t = self.rnn_cell(x_in[t], hidden_t) # [batch_size, hidden_size]
hiddens.append(hidden_t)

hiddens = torch.stack(hiddens) # [seq_size, batch_size, hidden_size]
if self.batch_first:
hiddens = hiddens.permute(1, 0, 2)

return hiddens

vanilla/elman RNN 两个缺陷:

①不能很好的保留长序列信息:

每次更新hidden state而不管它是否有意义,即没有控制哪些hidden state应该保留,哪些应该抛弃

②梯度的稳定性:

GRU

相比LSTM的优势,计算复杂度更低

https://zhuanlan.zhihu.com/p/34203833

梯度修剪

整个循环结构通常包括前向传播、成本计算、反向传播和参数更新。在更新参数之前,我们将在需要时执行梯度修剪,以确保梯度不是“爆炸”的。

很多的方法来修剪梯度,一个比较简单的方法是梯度向量的每一个元素都被限制在[−N,N][-N,N][−N,N]的范围,通俗的说,有一个maxValue(比如10),如果梯度的任何值大于10,那么它将被设置为10,如果梯度的任何值小于-10,那么它将被设置为-10,如果它在-10与10之间,那么它将不变。

cell

RNN 和 RNNCell 层的区别在于前者能够处理整个序列,而后者一次只处理序列中一个时间点的数据

  • LSTM is a recurrent layer
  • LSTMCell is an object (which happens to be a layer too) used by the LSTM layer that contains the calculation logic for one step.

Batch Normalization

BatchNorm applies a transformation to the output of a CNN by scaling the activations to have zero mean and unit variance

1
2
3
4
5
6
7
8
9
10
11
# Using a Conv1D layer with batch normalization
# ...
self.conv1 = nn.Conv1d(in_channels=1, out_channels=10, kernel_size=5, stride=1)
self.conv1_bn = nn.BatchNorm1d(num_features=10)
# ...

def forward(self, x):
# ...
x = F.relu(self.conv1(x))
x = self.conv1_bn(x)
# ...

BRNN

mark

if help:小手一抖点个广告 or 大手一挥资助一下