In this paper, we investigate estimation performance of a vision-based observer, presented in one of the authors’ previous work, when a target object takes stochastic motion in three dimensional space. The configuration space of the full 3-D rigid object motion is known to be given by the product space SE(3)=R3 ×SO(3). Consequently, the stochastic motion must be described by a stochastic differential equation (SDE) on SE(3). We thus first formulate the SDE on SE(3) describing the evolution of the estimation error between the actual motion and its estimate produced by the visual observer. Then, we analyze the estimation accuracy in the framework of the noise-to-state stability (NSS). However, since NSS guarantees qualitative properties, we also take the notion of ultimately exponential boundedness in mean sense to clarify the quantitative estimation accuracy. Finally, we demonstrate validity of the latter result through simulation.