塞进裤子ヾ(≧O≦)〃嗷~

0%

Attention

Attention机制

To incorporate attention, we consider not only the final hidden state of the encoder, but also the hidden states for each of the intermediate steps.

These encoder hidden states are, somewhat uninformatively, called values (or in some situations, keys).

Attention also depends on the previous hidden state($S_{i-1}$) of the decoder, called the *query *.

attention由一个向量来表示,这个向量的维数与它所关注的values的维数相同,This is called the attention vector, or attention weights, or sometimes alignment.

首先,query($S_{i-1}$) 和key($h_{i}$) 得到attention weights

最后,attention weights和values得到context vector(即glimpse ),作为decoder的输入。

使用compatibility function来更新下一个time step的attention vector.

mark

Figure:Attention in action at time step t=0 of the decoder. The predicted output is “for” and the attention block takes into account the hidden states of the encoder ϕw for all input words.

上图所示的The attention mechanism depends on the encoder states for all the time steps in the input. This is also known as global attention. In contrast, for local attention, you could devise an attention mechanism that depended only on a window of the input around the current time step

The attention weights are typically floating-­point values between 0 and 1. This is called soft attention. In contrast, it is possible to learn a binary 0/1 vector for attention. This is called hard attention.

mark

mark

if help:小手一抖点个广告 or 大手一挥资助一下