SUMMARIZAION of JEWISH LAW ARTICLES in HEBREW Yaakov HaCohen-Kerner Eylon Malin Itschack Chasson Department of Computer Science Jerusalem College of Technology (Machon Lev), 21 Havaad Haleumi St., P.O.B. 16031, 91160 Jerusalem, Israel Abstract With the explosive growth of online information, there is a growing need for summaries. This paper describes a summarization model for Jewish law articles in Hebrew by selection of the most relevant sentences. There are a few unique aspects in our research. The first was checking the relevance of all the traditional methods on texts that differ from magazines articles, both in style and length. Another unique aspect is that our system is probably the first developed for summaries of Hebrew articles. We have developed a hybrid summarization method that achieves better results than all other traditional summarization methods. Keywords: Hebrew, Jewish law articles, sentence extraction, text summarization 1 INTRODUCTION Summaries can be read with limited effort in a shorter reading time. Therefore, people prefer to read summaries rather than the entire text, before they decide whether they are going to read the whole text or not. Humans have an incredible ability to condense huge amounts of information and they are known as excellent summarizers. However, creation of summaries by people requires expensive time and money. Therefore, there has been an increase in research and development in the domain of automatic text summarization. summarization is given by Zechner [13]. Various text summarization sources, e.g.: books, papers, conferences, workshops, projects, and systems are available at several web-sites, e.g.: and, Many summarization systems have used either one of two main approaches: Natural Language Processing (NLP) [1], and sentence extraction [2]. The NLP approach is based on understanding the sentence of the documents. NLP has some very sophisticated models, which require large databases and a very large processing time. On the other hand, the sentence extraction approach is based on gathering the most relevant sentences from the original text. These sentences are presented by the order of their appearance in the original text. Our model belongs to the second approach. In contrast to many summarization models that were designed and checked mostly for English articles found in magazines and newspapers, our model deals with articles referring to Jewish law written in Hebrew. These articles discuss different religious problems that have awakened up over the last years due to sociological and technological developments. Examples for such problems are: 1. When is a person considered dead ? 2. Are animals that are not mentioned in ancient Jewish writings kosher ? Text summarization is the process of distilling the most important information from a source (or sources) to produce an abridged version for a particular user (or users) and task (or tasks) [9]. The purpose of these articles is not only to give answers to these questions. Each answer must be based on both ancient Jewish writings and answers given by previous rabbinical authorities over the years. More so, arguments contradicting the author’s answer should also be referred to. The author should give an acceptable explanation to solve such arguments. Basic and classical articles in text summarization appear in “Advances in automatic text summarization” [9]. A literature survey on information extraction and text This paper is arranged as specified hereinafter. Section 2 gives background concerning text summarization based on sentence extraction. Section 3 described the model we designed for our article corpus. Section 4 presents experiments that have been carried out, followed by various results. Section 5 will give a quick survey of the research and a few proposals for future research. 2 TEXT SUMMARIZATION BASED on SENTENCE EXTRACTION A study made by Kupiec et. al. [4] has shown that 79% of the sentences in man-made abstracts in their corpus are extremely similar to sentences from the original article. In fact, some of the sentences were even an extracted verbatim from the original article. Therefore, sentences extracted directly from the original text without being revised or rephrased can make quite an appropriate abstract. Summarization systems that work on the basis of sentence extraction usually rate sentences according to various features. We will now present a quick survey of the most frequent models for rating sentences. 2.1 Proposed and baseline methods 1) Term frequency (TF): This method scores a sentence according to the amount of key words that appear in the sentence. First, in order to distinguish between significant key words terms and other terms, the system will pass through the text, scoring each term according to the number of occurrences in the text. Words and terms that have a grammatical role for the language (e.g.: of, the, I, am, etc.) will be excluded from the key words list according to a ready-made stop list. Once the system has a database of the key words and the number of their occurrences, the score of each sentence is calculated by the frequency of the key words that occur in it: TF ( s ) f (t ) where {t} s is the set of {t} s terms in a certain sentence s, and number of occurrences of the term whole text [7, 2]. f (t ) refers to the t throughout the 2) Cue words: This method scores a sentence according to the appearance of words and terms that indicate the importance of the sentence, e.g. “the meaning of this is”, “for conclusion”, etc. the more cue words occur in the sentence, the higher score the sentence will be given: CW ( s) {c} s 1 c where {c} s refers to the set of terms in a certain sentence s, and C refers to the number of terms defined as cue words [2]. 3) Sentence length: It is most probable that sentences that are very short are not included in a summary [12]. This method scores each sentence by dividing the number of its words by the number of words in the longest sentence (in order to normalize the score): SL( s ) length ( s ) length ( s max ) where s is the current sentence, smax is the longest sentence [5]. 4) Negative score: There are phrases which indicate clearly that the sentences in which they occur are not belonged to the summary. These phrases are defined as negative phrases, and will grant the sentences in which they appear a negative score. Examples for such phrases could be: “for example” or “it could be that”. The negative score is calculated as follows: CW ( s) {N} s 1 N where N is the number of negative words [10]. 5) Sentence position: This method scores a sentence by its position relative to its paragraph, and according to the relative position of its paragraph in the article. The sentence position is calculated as follows: sp ( s ) val ( pos , par ) where pos is the position of the sentence in the paragraph, par is the paragraph number in the article, and val is a function that returns the score taking into consideration these two parameters. Return values of val are determined by statistical results [2, 8, 6]. 6) Centrality: It is assumed that a sentence that has a big probability of being part of the summary, summarizes few sentences. Taking this into consideration, the sentence is scored by the number of sentences it resembles divided by the number of sentences in the article. The centrality score is calculated as follows: res( s i , s j ) C ( si ) s j {S si } S 1 where res ( s i , s j ) is a function that checks the resemblance between the sentences si and s j according to various parameters [11]. 7) Resemblance to title: This method scores a sentence according to its resemblance to the title. Sentences that resemble the title will be granted a higher score. The resemblance to title score is calculated as follows: TR ( s) res ( s, t ) where res( s, t ) is a function that checks resemblance between a sentence s and the title t [2, 11]. 8) Term frequency inverse sentence frequency (TF-ISF) : Key words occurring in fewer sentences are much more probable to belong to the summary. This method extends the TF method (no. 1) and takes The ISF property also into consideration. The ISF property is calculated as follows: ISF (t ) {s t} 1 {s t} where is the number of sentences containing the term t. This method gives a higher rank to keywords appearing in fewer sentences. The less sentences the keyword occurs in, the higher rank the keyword will get. Since this feature is a weaker indicator than the term frequency, the keyword is multiplied by log 2 ( ISF ) and not by the ISF score itself. The TF-ISF score is finally calculated as follows: TF ( s) f (t ) * log 2 ISF (t ) [11]. We {t } s have also developed other methods, which will be discussed on section 3. 2.2. The Hebrew language Most of the models that were designed in this field were developed for the English language. In this sub-section we would like to point out six properties of the Hebrew language, which make the implementation of the model much harder: 1) Tenses – most verbs in the English language differ from the base form only by one or more letters added at the end of the word. This makes words much easier to compare. Truncating all characters after the fifth [3] or sixth [12] character of the word would quite do the trick. In Hebrew, however, such a simple process may not be so helpful since the various forms change the basic form of the word in various ways. In some cases the same base form can have over 7000 (!!!) forms for different tenses and bodies. This feature of the Hebrew language makes it nearly impossible to compare two words without making a morphological analysis. For example, the Hebrew word (mesukam meaning summarizedpassive), and the Hebrew word (sikamty meaning I summarized) are both words from the same root. 2) Word suffices – there are 5 letters in Hebrew, which are written differently when they appear at the end of the word. This feature of the Hebrew language also making it harder to compare two words. In the previous example, the Hebrew letter in the Hebrew word , and the Hebrew letter from the Hebrew word are both derived from the same Hebrew letter in the Hebrew root . . Although, in the first word it is written by another character since it is positioned at the end of the word. 3) Preposition letters – Unlike English that has unique words dedicated to express relations between objects (such as: in, at, and, from, since, etc…), Hebrew has 8 letters concatenated in the beginning of the word where each letter expresses another relation. For example, the Hebrew word (mehasikum) means “from the summary”. The Hebrew letter ‘ 'expresses the determiner ‘the’, and the letter ‘ 'expresses the preposition ‘from’. 4) Pronoun letters –English has unique words dedicated to ownership (such as: her, his, etc…). Whereas Hebrew has letters concatenated to the end of the word to express such ownership. For example the Hebrew word ‘ ’ (maamar) means ‘article’, whereas the Hebrew word ‘ ’ (maamari) means ‘my article’. 5) Words in Hebrew that can be written in different ways are very frequent. That is words that are written in plene spelling and in deficient spelling. For Instance the Hebrew word ‘ ’ (o’hel meaning tent) can be written also in deficient spelling as ‘ ’ (o’hel). The Hebrew word ‘ ’ (limed meaning taught) can be written also in deficient spelling as ‘ ’ (limed). 6) Initials – initials are much more frequent in Hebrew than in English. Due to their frequency, ambiguous initials are not so rare. For example, the initials " have many interpretations, e.g. ‘ ’ (ea efshar meaning impossible), ‘ ’ (ani omer meaning I says), ‘ ’ (amar Avraham meaning Abraham said). 3 Our SUMMARIZAYION MODEL Our model summarizes Jewish law articles written in Hebrew. The generated summaries should be conclusive. We implement most of the methods mentioned on section 2.1. Implementing these methods on articles in the Hebrew language was much harder. The difficulties arise mostly in methods that are based on words comparison (e.g. TF, centrality, title resemblance). Since it is hard to identify two words that have different forms on the one hand, but same root on the other hand. The TF method works on the basis of words comparison. Therefore, some features of the Hebrew language make comparison rather difficult. Many terms in this corpus jargon are written by initials. The pronoun and preposition letters concatenated to words in Hebrew cause numerous problems as well. Comparison between terms is far more complicated under these circumstances. Even more so, such problems occur when implementing the methods based on sentences similarity. For the implementation of these, there was a need to cope with tenses and forms as well. After performing experiments with the various methods, we found the optimal method combination for getting the best results. During the experiment phase we checked the results of both each method individually, and the results of various combinations of different methods. We have also tested a new method. This method is based on associative words. At first, the system finds the text domain by seeking the most frequent key words, and then determines which domain they belong to (we have built a word list for each domain for that purpose). Once the domain is determined, key words belonging to this domain will receive higher score. For example, under the domain ‘constitution and government’ keywords such as: democracy, liberality, and president, will be given higher score than other keywords. After running each method individually we, we have taken the 5 best methods, and combined them into a hybrid method (defined below). Each method was given a different weight. Each sentence was given a weighted score calculated by all these methods, where each score was multiplied by its weight. The final score was given according to the following methods multiplied by their weights: TF-ISF, position, cue words, section title and domain by the following hybrid equation: TF _ ISF ( s ) POS ( s ) CUE ( s ) ST ( s ) D(s) where TF_ISF(s) is the score of TF-ISF method, is the weight of the TF-ISF method, pos(s) is the score of position method , is the weight of position method, etc. note that 0 , , , , 1 and that 1. 4 EXPERIMENTAL RESULTS In order to measure the success of the summarization methods mentioned above, a few comparison functions were suggested. Mani and Bloedorn [8] suggested an automatic procedure for generation of reference summaries, for articles with author-provided summaries. The main idea of the procedure is taking the sentences having the closest resemblance (according to the cosine measure) to the sentences in the authorprovided summary, in order to present them as abstracts. It is quite obvious that one of the most significant components of such procedure is the sentence comparison function. Our results have been evaluated by the recall and precision measures. The recall measure is defined by the number of the correct sentences in the generated summary divided by the total number of sentences in the reference summary. The precision measure is defined by the number of correct sentences in the generated summary divided by the total number of sentences in the generated summary. Our corpus contains 60 articles. Each one of them has its own author-provided summary. We compared between the summaries of our system and the reference summaries of the procedure that was assisted by author-provided summary. The results were awfully low. The highest precision/recall result was 0.14/0.25. But as we read both of the summaries, we have found that ours was much more indicative. It seems that the cosine measure does not take into consideration some very significant factors. The main problem of this way of measuring is taking into consideration the similarity of any two words (excluding stop-list words), without regarding their importance to the text and its domain. Therefore, we needed to develop a new method for checking the resemblance between sentences. The method we have developed is taking into consideration the following factors: Similar words that belong to the text issue will be given a higher matching score. This factor will be calculated this way: wi wi {s1}, wi {s 2} s1 s2 where s1 and s2 are 2 the sentences that are compared and wi is a word that belongs to the issue. We defined special Jewish Rabbinical conclusive cue words, as words that indicate a conclusion e.g. must, forbidden, etc… conclusive key words that belong to the text issue will be given a higher matching score. This factor will be calculated this way: wi wi {s1}, wi {s 2} s1 s2 where s1 and s2 are the sentences 2 that are compared and wi is a conclusive key word. Regular cue words that that indicate importance of the sentence will be also given a higher matching score. This factor will be calculated this way: wi wi {s1}, wi {s 2} s1 s2 where s1 and s2 are the sentences that are 2 compared and wi is a normal cue word. Besides the aforementioned similarity factors, we have also taken in consideration the cosine measure. Each of these factors was multiplied by a coefficient the following way: I C R D where I is the issue words factor, C is the Jewish Rabbinical conclusive cue word, R is the regular cue words factor, and D is the dry cosine measure method. The coefficients make the following equation: 1 and 0 , , , 1. This comparison function yielded not only much higher similarity between the summaries of our system and the reference summaries of the procedure that was assisted by author-provided summaries. It yielded even more indicative summaries for the latter as well. TF Table 1 presents the recall and precision results of different summarization methods tested on our corpus. The length of the summary is 10% of the length of the article, a common summarization ratio. Our hybrid method has given the best recall/precision results 0.42/0.21, since it is the best method that suits our conclusive summarization task. These results regarded as reasonable comparing to those of Neto [11] 0.42/0.4. These recall and precision results are presented again by histograms in Figures 1 and 2, respectively. Table 1. Recall and precision results of summarization methods on our corpus Method TF Cue words Length Position Centrality Title Section Title Tf-isf Domain words Hybrid Recall 0.3 0.18 0.09 0.19 0.04 0.04 0.17 Precision 0.16 0.09 0.18 0.1 0.02 0.02 0.1 0.3 0. 37 0.16 0. 18 0.42 0.21 TF 0.4 Length 0.3 Position 0.2 Centrality Section Title Tf-isf Domain w ords recall 0.1 Title 1 0.25 Length 0.2 Position 0.15 Centrality 0.1 Title 0.05 Section Title Tf-isf Domain words Hybrid precision 1 0 summarization methods Fig. 2. Precision results of summarization methods 5 SUMMARY and FUTURE RESEARCH In this paper we have described a new method for conclusive summarization using sentence extraction. This method gave the optimal results for the texts we have dealt with. We have also developed a system that is compatible with the Hebrew language. As far as we know no existing program is able to summarize Hebrew articles. Future directions for research are: (1) Some rabbinical authorities are taken more seriously by all authors than others. We suggest giving higher scores to sentences where those rabbinical authorities are cited. (2) It is known that certain authors take into consideration some rabbinical authorities rather than others. Therefore the importance of different rabbinical authorities should be computed relatively to the discussed author. More general research proposals are: (1) Developing automatic learning technique concerning tuning coefficients of the similarity and hybrid functions. This might improve the results of the extraction, (2) Elaborating the model for summarizing other kinds of Hebrew articles, (3) Experimenting the model on larger data-set of various kinds of Hebrew articles. 0.5 Cue w ords Cue words 0 sum m arization m ethods Hybrid Fig. 1. Recall results of summarization methods ACKNOWLEDGEMENTS Thanks to Ari Cirota and two anonymous referees for many valuable comments on earlier versions of this paper. 6 REFERENCES [1] C. Aone, M. E. Okurowski, J. Gorlinsky and B. Larsen, “A Scalable Summarization System Using Robust NLP,” Proc. of the ACL Work shop on Intelligent Scalable Text Summarization, pp. 66-73, 1997. [2] [3] H.P. Edmundson, “New Methods in Automatic Extraction,” Journal of the ACM 16(2): pp. 264285, 1969. Reprinted in Advances in Automatic Text Summarization, I. Mani and M.T. Maybury (eds.), Cambridge, Massachusetts: MIT Press, pp. 21-42, 1999. Y. HaCohen-Kerner, “Automatic Extraction of Keywords from Abstracts,” Proc. of the Seventh International Conference on Knowledge-Based Intelligent Information & Engineering. Lecture Notes in Artificial Intelligence 2773, Berlin: Springer-Verlag, pp. 843-849, 2003. [4] J. Kupiec, J. Pederson and F. Chen, “A trainable document summarizer,” Proc. of the 18th Annual International ACM SIGIR, pp. 68–73, 1995. [5] C-Y. Lin, “Training a Selection Function for Extraction,” Proc. of the 8th International Conference on Information and Knowledge Management (CIKM 99), Kansa City, Missouri, pp. 55-62, 1999. [6] C-Y. Lin and E.H. Hovy, “Identifying Topics by Position,” Proc. of the Applied Natural Language Processing Conference (ANLP-97), pp. 283-290, 1997. [7] H. P. Luhn, “The automatic creation of literature abstracts,” IBM Journal of Research and Development, 2(2): pp. 159-165, 1958. Reprinted in Advances in Automatic Text Summarization, I. Mani and M.T. Maybury (eds.), Cambridge, Massachusetts: MIT Press, pp. 15-21, 1999. [8] I. Mani and E. Bloedorn, “Machine Learning of Generic and User-Focused Summarization,” Proceedings of AAAI-98, pp. 821-826, 1998. [9] I. Mani and M. T. Maybury, “Advances in automatic text summarization,” Cambridge, MA: MIT Press., pp. ix-xv, 1999. [10] S. Myaeng and D. Jang, “Development and evaluation of a statistically based document summarization system,” In Mani and Maybury. Advances in Automatic Text Summarization. MIT Press, Cambridge, Massachusetts, pp. 137154, 1999. [11] J. L. Neto, A. A. Freitas and C. A. A. Kaestner, “Automatic Text Summarization Using a Machine Learning Approach,” Proc. of the 16th Brazilian Symposium on Artificial Intelligence, SBIA-2002, Porto de Galinhas/Recife, Brazil, pp. 205-215, 2002. [12] K. A. Zechner, “Fast Generation of Abstracts from General Domain Text Corpora by Extracting Relevant Sentences,” Proc. of the 16th international Conference on Computational Linguistics, pp. 986-989, 1996. [13] K. A. Zechner, “A Literature Survey on Information Extraction and Text Summarization,” Term Paper, Carnegie Mellon University, 1997.
