# 塞进裤子ヾ(≧O≦)〃嗷~

0%

## Attention机制

To incorporate attention, we consider not only the final hidden state of the encoder, but also the hidden states for each of the intermediate steps.

These encoder hidden states are, somewhat uninformatively, called values (or in some situations, keys).

Attention also depends on the previous hidden state($S_{i-1}$) of the decoder, called the *query *.

attention由一个向量来表示，这个向量的维数与它所关注的values的维数相同,This is called the attention vector, or attention weights, or sometimes alignment.

Figure:Attention in action at time step t=0 of the decoder. The predicted output is “for” and the attention block takes into account the hidden states of the encoder ϕw for all input words.

The attention weights are typically floating-­point values between 0 and 1. This is called soft attention. In contrast, it is possible to learn a binary 0/1 vector for attention. This is called hard attention.

if help:小手一抖点个广告 or 大手一挥资助一下