ka | en
Company Slogan TODO

Medical Texts Initial Processing for Classification Problem

Author: manana khachidze
Co-authors: Magda Tsintsadze Maia Archuadze Papuna Qarchava
Keywords: text processing
Annotation:

The issue of classification is one of the important directions in information retrieval. Text type information in medical science plays a significant role. Text classification process has to contain text initial processing that means: “stop words” filtering out from the text, stemming and lemmatization, followed by word frequency calculations. The procedure of stemming and lemmatization is fulfilled using the well-known algorithms of Lovins and Porter, that are less effective for Georgian Language due to language complex structure. The word “root” defining algorithm is offered proper for stemming and lemmatization. The algorithm uses the word database. Because of peculiarities of medical branch the database future update with appropriate medical terminology is needed. In the process or research the mentioned database of Georgian words was updated using additional medical terms corresponding to ICD10.



Web Development by WebDevelopmentQuote.com
Design downloaded from Free Templates - your source for free web templates
Supported by Hosting24.com