The attention mechanism has been studied extensively in recent years to enhance the model’s learning ability. In this paper, we propose a new attention mechanism including the temporal attention module and spatial attention module. These two modules are combined in the 3D ResNet-18 network to provide “attention” to the critical features of the volume. In particular, the temporal attention module exploits the motion relationship between frames, and the spatial attention module is interested in the spatial relationship between features. The experimental results for the proposed model show that our proposed method achieves competitive performance compared with the recently published modern deep and heavy networks.