There are several successful deployments of peer to peer (P2P) video streaming systems which provide acceptable QoS. Researches on these systems continue to improve the experienced quality by system users. Since received video quality mostly depends on the parent selection, an efficient parent selection algorithm can increase the received video bitrate by peers and provide seamless streaming. In this paper, we propose a novel parent selection method based on reinforcement learning. By the proposed system model, the newly joined peer explores the peers in the system first, and uses this information for its further parent selection actions. We implemented our model on a CoolStreaming-like P2P video streaming system in ns3. Our results indicate that, selected parents by using reinforcement learning approach improve the playback continuity, with respect to parent selection method used by CoolStreaming. Furthermore, reinforcement learning approach helps peers to find more stable parents in case of peer churn.