This paper focuses on human action retrieval, which aims to retrieve video segments form a video database based on an action of interest specified on-the-fly by a user. In this work, we focus on capturing temporal information of actions and propose to utilize Dynamic Time Warping (DTW) to measure the temporal distortion and difference between a pair of actions. Temporal motion saliency of a query video is introduced to re-rank the retrieval results. We evaluate our method in the Breakfast dataset and show that our method is more effective than the baselines, which do not consider any temporal orders of actions.