My wife and I just conducted a thought experiment. If two bodies of text share words, how could one begin to measure their relationships.
We used a three line poem -
Little Jack Horner
Sat in a corner
Eating his Christmas Pie
and
Little Miss Muffet
Sat on a tuffet
Eating her Curds and Whey
On inspection, these share 3 words in 3 lines not three words in 6 lines. As a result I think I must modify my match algorithm to divide by the greater of the number of verses rather than by the sum. I think I will also distinguish samech from sin by using w for sin. So previously recorded scores will change and can be read as x matches per verse.
Tuesday, December 25, 2007
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment