The aim of this paper is to combine automatically generated image keywords with radiographs, thus enabling an enriched multi-modal image representation for body part classification. The proposed method could also be used to incorporate meta data into images for combined learning. Multi-modality is achieved by branding the radiographs via intensity markers, which denotes the occurrence of textual features. There is a need to create systems capable of adequately detecting and classifying body parts in radiology images, as the number of digital medical scans taken daily has expeditiously increased. This is a fundamental step towards computer-aided interpretation, as manual annotation is time-consuming, prone to errors and often impractical. Word embeddings are derived from keywords, automatically generated with the Long Short-Term Memory (LSTM) based Recurrent Neural Network (RNN) Show-and-Tell model, and incorporated by augmentation into radiographs with Word2Vec: Deep learning systems are trained with the augmented radiographs. Using the data sets, Musculoskeletal Radiographs (MURA) and ImageCLEF 2015 Medical Clustering Task, the proposed approach obtains best prediction accuracy, with 95.78 % and 83.90 %, respectively.