Many research articles used difficult-to-interpret black-box Machine Learning (ML) models to classify Alzheimer’s disease (AD) without examining their biological relevance. In this article, an ML workflow was developed to interpret black-box models based on Shapley values. This workflow enabled the model-agnostic visualization of complex relationships between model features and predictions and also the explanation of individual predictions, which is important in clinical practice. To demonstrate this workflow, eXtreme Gradient Boosting (XGBoost) and Random Forest (RF) classifiers were trained for AD classification. All models were trained on the Alzheimer’s Disease Neuroimaging Initiative (ADNI) or Australian Imaging and Lifestyle flagship study of Ageing (AIBL) dataset and were validated for independent test datasets of both cohorts. The results showed improved performances for black-box models in comparison to simple Classification and Regression Trees (CARTs). For the classification of Mild Cognitive Impairment (MCI) conversion and the ADNI training dataset, the best model achieved a classification accuracy of 71.03 % for the ADNI test dataset and 67.65 % for the entire AIBL dataset. This RF used a logical long-term memory test, the count of Apolipoprotein E ε4 (ApoEε4) alleles and the volume of the left hippocampus as the most important features.