Abstract: Traditional self-supervised learning requires convolutional neural networks (CNNs) using external pretext tasks (i.e., image- or video-based tasks) to encode high-level semantic visual ...