Assessment of terminological density in scientific publications on physical culture
DOI:
https://doi.org/10.15561/physcult.2025.0203Keywords:
terminological density, sport and exercise sciences, scientific publications, thematic dictionaries, Medical Subject Headings (MeSH), Web of Science, bibliographic analysis, quantitative evaluation, scientific styleAbstract
Background and Study Aim. Scientific publications in the field of physical culture demonstrate considerable diversity in terminological usage and structural organization. With increasing standards for the quality of academic writing, the need for an objective and quantitative evaluation of terminological density has become more pressing. The aim of this study was to develop and apply a method for automated assessment of terminological density in scientific articles on physical culture using adapted thematic dictionaries.
Material and Methods. The study was based on articles retrieved from the Web of Science (WoS) database. A total of 16 593 bibliographic records related to physical culture were extracted over the past five years. Two dictionaries were employed for analysis: the official Medical Subject Headings (MeSH) in XML format and a thematic dictionary constructed from the WoS document corpus. The analysis included full-text PDF articles from 12 scientific journals, of which 6 were categorized as Q3, 1 as Q4, 3 were indexed in DOAJ, and 2 were not indexed. Terminological density was calculated in Python using the pandas library and evaluated on a scale ranging from very low to high.
Results. The assessment covered 12 journals in the field of physical culture. An optimal density level (0.010–0.019) was identified in 2 journals (16.7%), corresponding to a “balanced use of scientific terminology.” Three journals (25.0%) demonstrated low density (<0.01), characterized as “insufficient elaboration of the topic in scientific language.” In 7 journals (58.3%), a higher density (0.020–0.039) was observed, interpreted as either an “attempt to enhance scientific rigor” or an “excessive terminological load.”
Conclusions. The evaluation of terminological density provides an objective measure of the scientific style of publications in the field of physical culture. The differences identified across journals highlight variability in approaches to presenting scientific material. The integration of specialized dictionaries and the application of relative indicators offer a robust basis for ongoing monitoring and optimization of scientific discourse.
References
Han J, Kamber M, Pei J. Data mining: concepts and techniques. 3rd ed. Amsterdam Boston: Elsevier/Morgan Kaufmann; 2012.
Ramos J. Using TF-IDF to determine word relevance in document queries. In: Proceedings of the First Instructional Conference on Machine Learning, Vol. 242, Citeseer; 2003. P. 29–48.
Wang Y. Research on the TF–IDF algorithm combined with semantics for automatic extraction of keywords from network news texts. Journal of Intelligent Systems, 2024;33(1): 20230300. https://doi.org/10.1515/jisys-2023-0300
Wang W, Zhang J, Zhou F, Chen P, Wang B. Paper acceptance prediction at the institutional level based on the combination of individual and network features. Scientometrics, 2021;126(2): 1581–1597. https://doi.org/10.1007/s11192-020-03813-x
National Library of Medicine. MeSH Indexing Manual. Bethesda, MD: U.S. Department of Health & Human Services; 2022.
Mao Y, Lu Z. MeSH Now: automatic MeSH indexing at PubMed scale via learning to rank. Journal of Biomedical Semantics, 2017;8(1): 15. https://doi.org/10.1186/s13326-017-0123-3
Kim S, Yeganova L, Wilbur WJ. Meshable : searching PubMed abstracts by utilizing MeSH and MeSH-derived topical terms. Bioinformatics, 2016;32(19): 3044–3046. https://doi.org/10.1093/bioinformatics/btw331
Kiss A, Temesi Á, Tompa O, Lakner Z, Soós S. Structure and trends of international sport nutrition research between 2000 and 2018: bibliometric mapping of sport nutrition science. Journal of the International Society of Sports Nutrition, 2021;18(1): 12. https://doi.org/10.1186/s12970-021-00409-5
Venâncio TF, Costa MJ, Santos CC, Batalha N, Hernández-Beltrán V, Gamonales JM, et al. Evolution of documents related to strength training research on competitive swimmers: a bibliometric review. Frontiers in Sports and Active Living, 2025;7: 1603576. https://doi.org/10.3389/fspor.2025.1603576
Jagiello M, Lochbaum M. Pedagogical strategies for enhancing physical activity: a systematic review of trends and approaches. Pedagogy of Health. 2024;2(2):37–43. https://doi.org/10.15561/health.2024.0201
Jagiello M, Lochbaum M. Modern methods and means of physical culture in the rehabilitation of various population groups: a systematic review. Physical Culture, Recreation and Rehabilitation, 2024;3(2): 34–45. https://doi.org/10.15561/physcult.2024.0201
Yermakova T. Risk factors and prevention of falls in children under 3 years: a systematic review. Physical Culture, Recreation and Rehabilitation, 2025;4(1): 17–34. https://doi.org/10.15561/physcult.2025.0103
Yermakova T. Patterns and risk factors of falls among older adults: a systematic review. Pedagogy of Health. 2025;1(1):11–21. https://doi.org/10.15561/health.2025.0102
National Library of Medicine. Download MeSH Data. XML Format [cited 2025 May 17]. Available from: https://www.nlm.nih.gov/databases/download/mesh.html
Clarivate. KeyWords Plus generation, creation, and changes [Internet]. 2025 [updated 2025 Jul 21; cited 2025 Jul 22]. Available from: https://support.clarivate.com/ScientificandAcademicResearch/s/article/KeyWords-Plus-generation-creation-and-changes?language=en_US
Van Eck NJ, Waltman L. Software survey: VOSviewer, a computer program for bibliometric mapping. Scientometrics, 2010;84(2): 523–538. https://doi.org/10.1007/s11192-009-0146-3
Cobo MJ, López‐Herrera AG, Herrera‐Viedma E, Herrera F. SCIMAT : A new science mapping analysis software tool. Journal of the American Society for Information Science and Technology, 2012;63(8): 1609–1630. https://doi.org/10.1002/asi.22688
Trieschnigg D, Pezik P, Lee V, De Jong F, Kraaij W, Rebholz-Schuhmann D. MeSH Up: effective MeSH text classification for improved document retrieval. Bioinformatics, 2009;25(11): 1412–1418. https://doi.org/10.1093/bioinformatics/btp249
Leblanc V, Hamroun A, Bentegeac R, Le Guellec B, Lenain R, Chazard E. Added Value of Medical Subject Headings Terms in Search Strategies of Systematic Reviews: Comparative Study. Journal of Medical Internet Research, 2024;26: e53781. https://doi.org/10.2196/53781
Lipscomb CE. Medical Subject Headings (MeSH). Bulletin of the Medical Library Association, 2000;88(3), 265–266.
Chen C, Hu Z, Liu S, Tseng H. Emerging trends in regenerative medicine: a scientometric analysis in CiteSpace. Expert Opinion on Biological Therapy, 2012;12(5): 593–608. https://doi.org/10.1517/14712598.2012.674507
Sujarwo, Paramitha ST, Hasyim AH, Ramadhan MG, Setiawan I. A bibliometric analysis of research on physical activity and fitness among preschool children in Asia (2020-2024). Edu Sportivo: Indonesian Journal of Physical Education, 2024;5(3): 243–257. https://doi.org/10.25299/esijope.2024.vol5(3).19085
Pradhan P, Zala LN. Bibliometrics analysis and comparison of global research literatures on research data management extracted from Scopus and Web of Science during 2000–2019. Library Philosophy and Practice (e-journal), 2021;5519:1-17.
Van Eck NJ, Waltman L. Citation-based clustering of publications using CitNetExplorer and VOSviewer. Scientometrics, 2017;111(2): 1053–1070. https://doi.org/10.1007/s11192-017-2300-7
Aria M, Cuccurullo C. Bibliometrix: An R-tool for comprehensive science mapping analysis. J Informetrics. 2017;11(4):959–975. https://doi.org/10.1016/j.joi.2017.08.007
Breuer T, Schaer P, Tunger D. Relevance assessments, bibliometrics, and altmetrics: a quantitative study on PubMed and arXiv. Scientometrics, 2022;127(5): 2455–2478. https://doi.org/10.1007/s11192-022-04319-4
Han O, Demydenko O. Terminological richness of english-language scientific-popular and media texts in physics. Advanced Linguistics, 2023;(12). https://doi.org/10.20535/2617-5339.2023.12.290971
Elsevier. Scopus Author Guidelines. Amsterdam: Elsevier; 2021.
Ding Y, Chowdhury GG, Foo S. Bibliometric cartography of information retrieval research by using co-word analysis. Inf Process Manag. 2001;37(6):817–842. https://doi.org/10.1016/S0306-4573(00)00051-0
Haunschild R, Bornmann L, Marx W. Climate change research in view of bibliometrics. PLoS One. 2016;11(7):e0160393. https://doi.org/10.1371/journal.pone.0160393
Bekhuis T, Demner-Fushman D, Crowley R. Comparative effectiveness research designs: An analysis of terms and coverage in Medical Subject Headings (MeSH) and Emtree. J Med Libr Assoc. 2013;101(2):92–100. https://doi.org/10.3163/1536-5050.101.2.004
Koloski B, Pollak S, Škrlj B, Martinc M. Extending Neural Keyword Extraction with TF-IDF tagset matching. In: Proc EACL Hackashop on News Media Content Analysis and Automated Report Generation; 2021. p. 22–29. https://aclanthology.org/2021.hackashop-1.4.pdf
Valkanas K, Diamandis P. Pareto distribution in virtual education: challenges and opportunities. Canadian Medical Education Journal, 2021; https://doi.org/10.36834/cmej.73511
Nisonger TE. The “80/20 Rule” and Core Journals. The Serials Librarian, 2008;55(1–2): 62–84. https://doi.org/10.1080/03615260801970774
Solnyshkina MI, Gatiyatullina GM, Kupriyanov RV, Ziganshina CR. Lexical density as a complexity predictor: the case of Science and Social Studies textbooks. Research Result Theoretical and Applied Linguistics. 2023;9(1). https://doi.org/10.18413/2313-8912-2023-9-1-0-2
Halliday MAK. Spoken and written language. Oxford: Oxford University Press; 1985.
Bajerowska A. Kilka uwag o fachowości przyczynek do rozważań teoretycznych [Some remarks on the professionalism of contributions to theoretical considerations]. Kwartalnik Neofilologiczny, 2024; 5–19. (In Polish). https://doi.org/10.24425/kn.2024.149614
Istiqomah F, Basthomi Y. Exploring nominalization and lexical density deployed within research article abstracts: A grammatical metaphor analysis. Englisia: Journal of Language, Education, and Humanities, 2024;11(2): 14. https://doi.org/10.22373/ej.v11i2.20390
Qaiser S, Ali R. Text Mining: Use of TF-IDF to Examine the Relevance of Words to Documents. International Journal of Computer Applications, 2018;181(1): 25–29. https://doi.org/10.5120/ijca2018917395
Li X, Zhang A, Li C, Ouyang J, Cai Y. Exploring coherent topics by topic modeling with term weighting. Information Processing & Management, 2018;54(6): 1345–1358. https://doi.org/10.1016/j.ipm.2018.05.009
Fu Z, Su Y, Meng Z, Collier N. Biomedical Named Entity Recognition via Dictionary-based Synonym Generalization. In: Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, Singapore: Association for Computational Linguistics; 2023. p. 14621–14635. https://doi.org/10.18653/v1/2023.emnlp-main.903
Slater LT, Bradlow W, Ball S, Hoehndorf R, Gkoutos GV. Improved characterisation of clinical text through ontology-based vocabulary expansion. Journal of Biomedical Semantics, 2021;12(1): 7. https://doi.org/10.1186/s13326-021-00241-5
Kugic A, Pfeifer B, Schulz S, Kreuzthaler M. Embedding-based terminology expansion via secondary use of large clinical real-world datasets. Journal of Biomedical Informatics, 2023;147: 104497. https://doi.org/10.1016/j.jbi.2023.104497
Thießen F, D’Souza J, Stocker M. Probing Large Language Models for Scientific Synonyms. In: SEMANTICS 2023 EU: 19th International Conference on Semantic Systems, September 20-22, 2023, Leipzig, Germany. 2023;3510:1–14.
Pans M, Madera J, González LM, Pellicer-Chenoll M. Physical Activity and Exercise: Text Mining Analysis. International Journal of Environmental Research and Public Health, 2021;18(18): 9642. https://doi.org/10.3390/ijerph18189642
Pavliuk A, Rohach O, Sheludchenko S, Yefremova N, Boichuk V. Structural and derivational parameters of the sports terminology. Research Trends in Modern Linguistics and Literature, 2022;5: 16–31. https://doi.org/10.29038/2617-6696.2022.5.16.31
Pavliuk IB. Terminological fields of fitness terminology. Folium, 2023;(2): 59–65. (In Ukrainian). https://doi.org/10.32782/folium/2023.2.9
Tsybanyuk O, Mishkulynets O, Komisaryk M, Kuznietsova K, Chuyko H. Genesis of the transformation of terminology in the field of physical education and sports in Romania, historical context. Conhecimento & Diversidade, 2023;15(37): 381–403. https://doi.org/10.18316/rcd.v15i37.10966
Mănescu DC. Big Data Analytics Framework for Decision-Making in Sports Performance Optimization. Data, 2025;10(7): 116. https://doi.org/10.3390/data10070116
Lee Y, Kang JH, Lee S, Oh T, Choi S. The Evolution of Terminology: A Scoping Review of Terms and Concepts Used to Research Sport in the Digital Realm. Quest, 2024;76(4): 462–480. https://doi.org/10.1080/00336297.2024.2357370
Klégr A, Bozděchová I. Sports Terminology as a Source of Synonymy in Language: the Case of Czech. Revista Alicantina de Estudios Ingleses, 2019;(32): 163. https://doi.org/10.14198/raei.2019.32.07
Ranđelović N, Živković D, Piršl D, Piršl T, Đošić A. Classification of sports terms: Thematic approach. Fizicko vaspitanje i sport kroz vekove, 2023;10(1): 1–10. https://doi.org/10.5937/spes2301001R
Qutab I, Malik KI, Arooj H. Sentiment Classification Using Multinomial Logistic Regression on Roman Urdu Text. International Journal of Innovations in Science and Technology, 2022;4(2): 323–335. https://doi.org/10.33411/IJIST/2022040204
Manning CD, Schütze H. Foundations of statistical natural language processing. Cambridge, Mass: MIT Press; 1999.
Wang J. Utilizing Text Mining Technology to Enhance English Learners’ Vocabulary. International Journal of Electronics and Communication Engineering, 2024;11(9): 86–98. https://doi.org/10.14445/23488549/IJECE-V11I9P109
Lu W, Huang S, Yang J, Bu Y, Cheng Q, Huang Y. Detecting research topic trends by author-defined keyword frequency. Information Processing & Management, 2021;58(4): 102594. https://doi.org/10.1016/j.ipm.2021.102594
Zhang Q, Lu W, Yang Y, Chen H, Chen J. Automatic Identification of Research Articles Containing Data Usage Statements. In: Knowledge Discovery and Data Design Innovation, Dallas, Texas, USA: WORLD SCIENTIFIC; 2017. p. 67–87. https://doi.org/10.1142/9789813234482_0004
Kim A, Kim SS. Engaging in sports via the metaverse? An examination through analysis of metaverse research trends in sports. Data Science and Management, 2024;7(3): 181–188. https://doi.org/10.1016/j.dsm.2024.01.002
Hammerschmidt J, Calabuig F, Kraus S, Uhrich S. Tracing the state of sport management research: a bibliometric analysis. Management Review Quarterly, 2024;74(2): 1185–1208. https://doi.org/10.1007/s11301-023-00331-x
Shilbury D. A bibliometric analysis of four sport management journals. Sport Management Review, 2011;14(4): 434–452. https://doi.org/10.1016/j.smr.2010.11.005
Yan E, Williams J, Chen Z. Understanding disciplinary vocabularies using a full-text enabled domain-independent term extraction approach. Glanzel W (ed.) PLOS ONE, 2017;12(11): e0187762. https://doi.org/10.1371/journal.pone.0187762
Ellen Riloff, Jessica Shepherd. A Corpus-Based Approach for Building Semantic Lexicons. In: Second Conference on Empirical Methods in Natural Language Processing. 1997. P. 117124.
Horák A, Baisa V, Rambousek A, Suchomel V. A New Approach for Semi-Automatic Building and Extending a Multilingual Terminology Thesaurus. International Journal on Artificial Intelligence Tools, 2019;28(02): 1950008. https://doi.org/10.1142/S0218213019500088
Buhin Pandur M, Dobša J, Kronegger L. Topic modelling in social sciences: case study of Web of Science. In: Central European Conference on Intelligent and Information Systems; 2020 Oct; Varaždin, Croatia; 2020. P. 67–72.
Ahmad M, Mahmood AM, Siddique AR. Variation in academic writing: A corpus-based research on syntactic features across four disciplinary divisions. Novitas-ROYAL (Research on Youth and Language), 2023;17(2), 50–65. https://doi.org/10.5281/zenodo.10015816
Nasseri M, Thompson P. Lexical density and diversity in dissertation abstracts: Revisiting English L1 vs. L2 text differences. Assessing Writing, 2021;47: 100511. https://doi.org/10.1016/j.asw.2020.100511
Bakuuro J. In the Belly of Text Complexity: Unravelling the Nexus between Lexical Density and Readability. Athens Journal of Philology, 2024;11(3): 255–274. https://doi.org/10.30958/ajp.11-3-4
Khatra O, Shadgan A, Taunton J, Pakravan A, Shadgan B. A Bibliometric Analysis of the Top Cited Articles in Sports and Exercise Medicine. Orthopaedic Journal of Sports Medicine, 2021;9(1): 2325967120969902. https://doi.org/10.1177/2325967120969902
Staunton CA, Abt G, Weaving D, Wundersitz DWT. Misuse of the term ‘load’ in sport and exercise science. Journal of Science and Medicine in Sport, 2022;25(5): 439–444. https://doi.org/10.1016/j.jsams.2021.08.013
Francoeur A. Fawcett, Peter (1997) : Translation and Language. Linguistic Theories Explained, coll. «Translation Theories Explained », Manchester (UK), St. Jerome Publishing, 160 p. Meta: Journal des traducteurs, 1999;44(3): 514. https://doi.org/10.7202/002768ar
Kim H, Kim SH, Kim J, Kim EH, Gu JH, Lee D. A keyword-based approach to analyzing scientific research trends: ReRAM present and future. Scientific Reports, 2025;15(1): 12011. https://doi.org/10.1038/s41598-025-93423-5
Mulia Al-Amien M, Hidayati D, Haryadi D. Analysis Of Scientific Article Writing Ability. International Journal of Educational Management and Innovation, 2022;3(1): 103–110. https://doi.org/10.12928/ijemi.v3i1.5555
Mendoza-Muñoz M, Vega-Muñoz A, Carlos-Vivas J, Denche-Zamorano Á, Adsuar JC, Raimundo A, et al. The Bibliometric Analysis of Studies on Physical Literacy for a Healthy Life. International Journal of Environmental Research and Public Health, 2022;19(22): 15211. https://doi.org/10.3390/ijerph192215211
Memon AR, Chen S, To QG, Vandelanotte C. Vigorously cited: a bibliometric analysis of the 100 most cited sedentary behaviour articles. Journal of Activity, Sedentary and Sleep Behaviors, 2023;2(1): 13. https://doi.org/10.1186/s44167-023-00022-8
Li F, Xie W, Han Y, Li Z, Xiao J. Bibliometric and visualized analysis of exercise and osteoporosis from 2002 to 2021. Frontiers in Medicine, 2022;9: 944444. https://doi.org/10.3389/fmed.2022.944444
Arnal-Gómez A, Navarro-Molina C, Espí-López GV. Bibliometric analysis of core journals which publish articles of physical therapy on aging. Physical Therapy Research, 2020;23(2): 216–223. https://doi.org/10.1298/ptr.E10024
Downloads
Published
How to Cite
Issue
License
Copyright (c) 2025 Sergii Iermakov, Georgiy Korobeynikov, David Curby

This work is licensed under a Creative Commons Attribution 4.0 International License.
Abstract views: 295 / PDF downloads: 170


