In spite of the superior performance deep neural networks have proven in thousands of applications in the past few years, addressing the over-sensitivity of these models to noise and/or intentional slight perturbations is still an active area of research. In the computer vision domain, perturbations can be directly applied to the input images. The task in the natural language processing domain is quite harder due to the discrete nature of natural languages. There has been a considerable amount of effort put to address this problem in high-resource languages like English. However, there is still an apparent lack of such studies in the Arabic language, and we aim to be the first to conduct such a study in this work. In this study, we start by training seven different models on a sentiment analysis task. Then, we propose a method to attack our models by means of the worst synonym replacement where the synonyms are automatically selected via the gradients of the input representations. After proving the effectiveness of the proposed adversarial attack, we aim to design a framework that enables the development of models robust to attacks. Three different frameworks are proposed in this work and a thorough comparison between the performance of these frameworks is presented. The three scenarios revolve around training the proposed models either on adversarial samples only or also including clean samples beside the adversarial ones, and whether or not to include weight perturbation during training.