Federal and local agencies have identified a need to create building databases to help ensure that critical infrastructure and residential buildings are accounted for in disaster preparedness and to aid the decision-making processes in subsequent recovery efforts. To respond effectively, we need to understand the built environment—where people live, work, and the critical infrastructure they rely on. Yet, a major discrepancy exists in the way data about buildings are collected across the United SStates There is no harmonization in what data are recorded by city, county, or state governments, let alone at the national scale. We demonstrate how existing open-source datasets can be spatially integrated and subsequently used as training for machine learning (ML) models to predict building occupancy type, a major component needed for disaster preparedness and decision -making. Multiple ML algorithms are compared. We address strategies to handle significant class imbalance and introduce Bayesian neural networks to handle prediction uncertainty. The 100-year flood in North Carolina is provided as a practical application in disaster preparedness.