Global warming will cause unprecedented changes to the world. Predicting events such as food insecurities in specific earth regions is a valuable way to face them with adequate policies. Existing food insecurity prediction models are based on handcrafted features such as population counts, food prices, or rainfall measurements. However, finding useful features is a challenging task, and data scarcity hinders accuracy. We leverage unsupervised pre-training of neural networks to automatically learn useful features from widely available Landsat-8 satellite images. We train neural feature extractors to predict whether pairs of images are coming from spatially close or distant regions on the assumption that close regions should have similar features. We also integrate a temporal dimension to our pre-training to capture the temporal trends of satellite images with improved accuracy. We show that with unsupervised pre-training on a large set of satellite images, neural feature extractors achieve a macro F1 of 65.4% on the Famine Early Warning Systems network dataset—a 24% improvement over handcrafted features. We further show that our pre-training method leads to better features than supervised learning and previous unsupervised pre-training techniques. We demonstrate the importance of the proposed time-aware pre-training and show that the pre-trained networks can predict food insecurity with limited availability of labeled data.