Automatically extracting knowledge from small datasets with a valid causal ordering is a challenge for current state-of-the-art methods in machine learning. Extracting other type of knowledge is important but challenging for multiple engineering fields where data are scarce and difficult to collect. This research aims to address this problem by presenting a machine learning-based modeling framework leveraging the knowledge available in fundamental units of the variables recorded from data samples, to develop parsimonious, explainable, and graph-based simulation models during the early design stages. The developed approach is exemplified using an engineering design case study of a spherical body moving in a fluid. For the system of interest, two types of intricated models are generated by (1) using an automated selection of variables from datasets and (2) combining the automated extraction with supplementary knowledge about functions and dimensional homogeneity associated with the variables of the system. The effect of design, data, model, and simulation specifications on model fidelity are investigated. The study discusses the interrelationships between fidelity levels, variables, functions, and the available knowledge. The research contributes to the development of a fidelity measurement theory by presenting the premises of a standardized, modeling approach for transforming data into measurable level of fidelities for the produced models. This research shows that structured model building with a focus on model fidelity can support early design reasoning and decision making using for example the dimensional analysis conceptual modeling (DACM) framework.