What I have is a preliminary algorithm for extracting the root given a Hebrew lexical word. There is nothing about meaning yet! The mechanism seems to work more often than not.
Here is the sequence in English. What do you think? The prefixes and suffixes are explained in more detail here.
Common single prefixes 'א:ה:ב:ל:ת:מ:י:ו:כ'
Common single suffixes 'ה:ך:י:ו:ת'
Common double prefixes 'וא:וה:וב:ול:ות:ומ:וי:וכ'
Common double plural suffixes 'ים:ות'
Common double possessive suffixes 'כם:נו'
Common triple suffixes 'ינו:יכם'
and given a table of roots (you can't do this without some internal memory)
Step 1 - strip obvious plurals and possessives
- when the length of the lexical word is greater than 5 and the last three characters are a common triple suffix then strip the last three characters - else take the whole word
- when the length of what remains is greater than 4 and the last two characters are a common double possessive suffix then strip the last two characters
- else take what remains
- when the length of what remains is greater than 5 and the last two characters are a common double suffix and the first two characters are NOT a common double prefix then strip the last two characters
- when the length of what remains is greater than 4 and the last two characters are a common double suffix and the first character is NOT a common single prefix then strip the last two characters
- when the length of what remains is greater than 5 and the last character is a common single suffix and the first two characters are NOT a common double prefix then strip the last character
- when the length of what remains is greater than 4 and the last character is a common single suffix and the first character is NOT a common single prefix then strip the last character
- else take what remains
- If you find a match in the root table for what remains, you are done
- if what remains is still plural and its length is 4 see if the singular form is in the root table and matches the first two characters
- if what remains ends with a common single suffix see if there is a three character root that matches the rest of the word
- if what remains begins with a common single prefix see if there is a root that matches the rest of the word
- if what remains begins with a common double prefix see if there is a root that matches the rest of the word
- if what remains is longer than 4 characters and begins with a common single prefix and ends with a common single suffix see if there is a root that matches the rest of the word
- if what remains is longer than 5 characters and begins with a common double prefix and ends with a common single suffix see if there is a root that matches the rest of the word
- if what remains is longer than 3 characters and begins with a common single prefix see if there is a root longer than 2 characters that is contained in the rest of the word
- if what remains is longer than 4 characters and begins with a common double prefix see if there is a root longer than 2 characters that is contained in the rest of the word
- if what remains begins with a common single prefix and ends with a common single prefix see if there is a root matching the rest of the word
- if what remains is plural see if the singular form matches the first remaining characters
- if what remains contains a mater vav, see if you can find a matching root without the mater
- if what remains contains a mater yod, see if you can find a matching root without the mater
- if what remains begins with a common single prefix see if there is a root that is contained in the rest of the word
- See if there is a root that is contained in the word
No comments:
Post a Comment