Michael Elhadad

Michael Elhadad
Ben-Gurion University of the Negev | bgu · Department of Computer Science

PhD

About

105
Publications
35,010
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
3,420
Citations
Citations since 2017
12 Research Items
989 Citations
2017201820192020202120222023050100150
2017201820192020202120222023050100150
2017201820192020202120222023050100150
2017201820192020202120222023050100150
Additional affiliations
September 1992 - present
Ben-Gurion University of the Negev
Position
  • Professor (Associate)

Publications

Publications (105)
Preprint
Masked language modeling (MLM) is one of the key sub-tasks in vision-language pretraining. In the cross-modal setting, tokens in the sentence are masked at random, and the model predicts the masked tokens given the image and the text. In this paper, we observe several key disadvantages of MLM in this setting. First, as captions tend to be short, in...
Preprint
Full-text available
Recent works have shown that supervised models often exploit data artifacts to achieve good test scores while their performance severely degrades on samples outside their training distribution. Contrast sets (Gardneret al., 2020) quantify this phenomenon by perturbing test samples in a minimal way such that the output label is modified. While most...
Article
Full-text available
Objective: In Hebrew online health communities, participants commonly write medical terms that appear as transliterated forms of a source term in English. Such transliterations introduce high variability in text and challenge text-analytics methods. To reduce their variability, medical terms must be normalized, such as linking them to Unified Medi...
Preprint
Full-text available
In a global setting, texts contain transliterated names from many cultural origins. Correct transliteration depends not only on target and source languages but also, on the source language of the name. We introduce a novel methodology for transliteration of names originating in different languages using only monolingual resources. Our method is bas...
Article
Full-text available
Information and software systems development is rapidly changing due to exponential technology development. This acceleration is also impacting other technology or engineering domains. Thus, there is a need to identify problems and their solutions, and to reason about new options so as to better arrive at the right decision of which technology or s...
Preprint
Recent work in the field of automatic summarization and headline generation focuses on maximizing ROUGE scores for various news datasets. We present an alternative, extrinsic, evaluation metric for this task, Answering Performance for Evaluation of Summaries. APES utilizes recent progress in the field of reading-comprehension to quantify the abilit...
Article
Full-text available
Query Focused Summarization (QFS) has been addressed mostly using extractive methods. Such methods, however, produce text which suffers from low coherence. We investigate how abstractive methods can be applied to QFS, to overcome such limitations. Recent developments in neural-attention based sequence-to-sequence models have led to state-of-the-art...
Article
Full-text available
In the context of the Electronic Health Record, automated diagnosis coding of patient notes is a useful task, but a challenging one due to the large number of codes and the length of patient notes. We investigate four models for assigning multiple ICD codes to discharge summaries taken from both MIMIC II and III. We present Hierarchical Attention-G...
Article
Query-Focused Summarization (QFS) summarizes a document cluster in response to a specific input query. QFS algorithms must combine query relevance assessment, central content identification, and redundancy avoidance. Frustratingly, state of the art algorithms designed for QFS do not significantly improve upon generic summarization methods, which ig...
Conference Paper
Full-text available
Word embedding vectors are used as input for a variety of tasks. Choosing the right model and features for producing such vectors is not a trivial task and different embedding methods can greatly affect results. In this paper we repurpose the "Pyramid Method" annotations used for evaluating automatic summarization to create a benchmark for comparin...
Conference Paper
Full-text available
Update summarization is a form of multidocument summarization where a document set must be summarized in the context of other documents assumed to be known. Efficient update summarization must focus on identifying new information and avoiding repetition of known information. In Query-focused summarization, the task is to produce a summary as an ans...
Article
Full-text available
The clinical notes in a given patient record contain much redundancy, in large part due to clinicians' documentation habit of copying from previous notes in the record and pasting into a new note. Previous work has shown that this redundancy has a negative impact on the quality of text mining and topic modeling in particular. In this paper we descr...
Conference Paper
Full-text available
This document overviews the strategy, ef-fort and aftermath of the MultiLing 2013 multilingual summarization data collec-tion. We describe how the Data Contrib-utors of MultiLing collected and gener-ated a multilingual multi-document sum-marization corpus on 10 different lan-guages: Arabic, Chinese,. We discuss the rationale be-hind the main decisi...
Article
Full-text available
Online Consumer Health websites are a major source of information for patients worldwide. We focus on another modality, online physician advice. We aim to evaluate and compare the freely available online expert physicians' advice in different countries, its scope and the type of content provided. Using automated methods for information retrieval an...
Article
Full-text available
We present a constituency parsing system for Modern Hebrew. The system is based on the PCFG-LA parsing method of Petrov et al. 2006 , which is extended in various ways in order to accommodate the specificities of Hebrew as a morphologically rich language with a small treebank. We show that parsing performance can be enhanced by utilizing a language...
Article
Full-text available
Background The increasing availability of Electronic Health Record (EHR) data and specifically free-text patient notes presents opportunities for phenotype extraction. Text-mining methods in particular can help disease modeling by mapping named-entities mentions to terminologies and clustering semantically related terms. EHR corpora, however, exhib...
Article
Full-text available
Syntactic parsers have made a leap in accuracy and speed in recent years. The high order structural information provided by dependency parsers is useful for a variety of NLP applications. We present a biomedical model for the EasyFirst parser, a fast and accurate parser for creating Stanford Dependencies. We evaluate the models trained in the biome...
Conference Paper
Full-text available
When porting parsers to a new domain, many of the errors are related to wrong attachment of out-of-vocabulary words. Since there is no available annotated data to learn the attachment preferences of the target domain words, we attack this problem using a model of selectional preferences based on domain-specific word classes. Our method uses Latent...
Article
Full-text available
We introduce precision-biased parsing: a parsing task which favors precision over recall by allowing the parser to abstain from decisions deemed uncertain. We focus on dependency-parsing and present an ensemble method which is capable of assigning parents to 84% of the text tokens while being over 96% accurate on these tokens. We use the precision-...
Article
Full-text available
This paper presents a named entity rec- ognition (NER) system for the Hebrew language. The Hebrew language has high morphological ambiguity, which makes automatic processing difficult. The first step in our work was to define the tagging task for the Hebrew lan- guage. Tagging guidelines were phrased and agreements tests were per- formed among huma...
Article
Full-text available
The execution of a business process (BP) often interacts with multiple independent entities, whose behavior cannot always be predicted. This is why, although a business process may have a single ideal execution path, in practice, many executions will encounter events, errors or missing deadlines, that lead the process off this path. Exception handl...
Article
Full-text available
Current approaches to RNA structure prediction range from physics-based methods, which rely on thousands of experimentally measured thermodynamic parameters, to machine-learning (ML) techniques. While the methods for parameter estimation are successfully shifting toward ML-based approaches, the model parameterizations so far remained fairly constan...
Conference Paper
Full-text available
Motivation. Current approaches to RNA structure prediction range from physics-based methods, which rely on thousands of experimentally-measured thermodynamic parameters, to machine- learning (ML) techniques. While the methods for parameter estimation are successfully shifting toward ML-based approaches, the model parameterizations so far remained f...
Data
Supplementary Material. Details of data acquisition, UMLS division into rough semantic categories and division of heading/sub-headings into areas.
Article
Full-text available
The OMIM database is a tool used daily by geneticists. Syndrome pages include a Clinical Synopsis section containing a list of known phenotypes comprising a clinical syndrome. The phenotypes are in free text and different phrases are often used to describe the same phenotype, the differences originating in spelling variations or typing errors, vary...
Conference Paper
Full-text available
We experiment with extending a lattice parsing methodology for parsing Hebrew (Goldberg and Tsarfaty, 2008; Golderg et al., 2009) to make use of a stronger syntactic model: the PCFG-LA Berkeley Parser. We show that the methodology is very effective: using a small training set of about 5500 trees, we construct a parser which parses and segments unse...
Article
Full-text available
Information Retrieval (IR) research has recently started ad- dressing the information need of exploratory search. where the searcher may be unfamiliar with the domain or not have decided what is the goal of his query. A popular tool to support exploratory search is the use of faceted search. The implementation of faceted search requires that docume...
Article
Full-text available
Information Retrieval (IR) research has recently started ad-dressing the information need of exploratory search, where the searcher may be unfamiliar with the domain or not have decided what is the goal of his query. A popular approach to support exploratory search is the usage of faceted search. The implementation of faceted search requires that d...
Article
Full-text available
creativeness / a pleasing field / of bloom Word associations are an important element of linguistic creativity. Traditional lexical knowledge bases such as WordNet formalize a limited set of systematic relations among words, such as synonymy, polysemy and hy- pernymy. Such relations maintain their sys- tematicity when composed into lexical chains....
Conference Paper
Full-text available
We present a novel deterministic dependency pars- ing algorithm that attempts to create the easiest arcs in the dependency structure first in a non-directional manner. Traditional deterministic parsing algorithms are based on a shift-reduce framework: they traverse the sentence from left-to-right and, at each step, per- form one of a possible set o...
Article
Full-text available
We propose the notion of a structural bias inherent in a parsing system with respect to the language it is aiming to parse. This structural bias characterizes the behaviour of a parsing system in terms of structures it tends to under- and over- produce. We propose a Boosting-based method for uncovering some of the structural bias inherent in parsin...
Article
Full-text available
We investigate the performance of an easy-first, non-directional dependency parser on the Hebrew Dependency treebank. We show that with a basic feature set the greedy parser's accuracy is on a par with that of a first-order globally optimized MST parser. The addition of morphological-agreement feature improves the parsing accuracy, making it on-par...
Conference Paper
Full-text available
We describe a newly available Hebrew Dependency Treebank, which is extracted from the Hebrew (constituency) Tree- bank. We establish some baseline un- labeled dependency parsing performance on Hebrew, based on two state-of-the-art parsers, MST-parser and MaltParser. The evaluation is performed both in an artifi- cial setting, in which the data is a...
Conference Paper
Full-text available
We present a framework for interfacing a PCFG parser with lexical information from an external resource following a dif- ferent tagging scheme than the treebank. This is achieved by defining a stochas- tic mapping layer between the two re- sources. Lexical probabilities for rare events are estimated in a semi-supervised manner from a lexicon and la...
Conference Paper
Full-text available
We use the technique of SVM anchoring to demonstrate that lexical features extracted from a training corpus are not necessary to obtain state of the art results on tasks such as Named Entity Recognition and Chunk- ing. While standard models require as many as 100K distinct features, we derive models with as little as 1K features that perform as wel...
Conference Paper
Full-text available
We present a new method to evaluate a search ontology, which relies on mapping ontology instances to textual documents. On the basis of this mapping, we evaluate the adequacy of ontology relations by measuring their classification potential over the textual documents. This data-driven method provides concrete feedback to ontology maintainers and a...
Conference Paper
Full-text available
Word associations are an important element of linguistic creativity. Traditional lexical knowledge bases such as WordNet formalize a limited set of systematic relations among words, such as synonymy, polysemy and hypernymy. Such relations maintain their systematicity when composed into lexical chains. We claim that such relations cannot explain the...
Conference Paper
Full-text available
We present a loosely-supervised method for context-free identification of transliterated foreign names and borrowed words in Hebrew text. The method is purely statistical and does not require the use of any lexicons or linguistic analysis tool for the source languages (Hebrew, in our case). It also does not require any manually annotated data for t...
Conference Paper
Full-text available
We report on an effort to build a corpus of Modern Hebrew tagged with parts of speech and morphology. We designed a tagset spe- cific to Hebrew while focusing on 4 aspects: the tagset should be consistent with common linguistic knowledge; there should be maxi- mal agreement among taggers as to the tags assigned to maintain consistency; the tagset s...
Conference Paper
Full-text available
Morphological disambiguation proceeds in 2 stages: (1) an analyzer provides all possible analyses for a given token and (2) a stochastic disambiguation module picks the most likely analysis in context. When the analyzer does not recognize a given token, we hit the problem of unknowns. In large scale corpora, unknowns appear at a rate of 5 to 10 % (...
Conference Paper
Full-text available
We address the task of unsupervised POS tag- ging. We demonstrate that good results can be obtained using the robust EM-HMM learner when provided with good initial conditions, even with incomplete dictionaries. We present a family of algorithms to compute effective initial estimations p(t|w). We test the method on the task of full morphological dis...
Conference Paper
Full-text available
We present a fast, space efficient and non- heuristic method for calculating the decision function of polynomial kernel classifiers for NLP applications. We apply the method to the MaltParser system, resulting in a Java parser that parses over 50 sentences per sec- ond on modest hardware without loss of accu- racy (a 30 time speedup over existing m...
Article
Full-text available
Tagged corpora are essential for evaluating and training nat- ural language processing tools. The cost of constructing large enough manually tagged corpora is high, even when the annotation level is shallow. This article describes a sim- ple method to automatically create a partially tagged cor- pus, using Wikipedia hyperlinks. The resulting corpus...
Article
Full-text available
The convergence between business process modeling and the service-oriented architecture has created a significant opportunity for Information Technology (IT ) system integrators: they can offer effective business process outsourcing for Small Medium Enterprise s (SME )s that often cannot afford the cost of designing, provisioning, and operating the...
Conference Paper
Full-text available
Conference Paper
Full-text available
We study the issue of porting a known NLP method to a language with little existing NLP resources, specifically Hebrew SVM-based chunking. We introduce two SVM-based methods - Model Tampering and Anchored Learning. These allow fine grained analysis of the learned SVM models, which provides guidance to identify errors in the training cor- pus, disti...
Article
Full-text available
In (Goldberg and Elhadad, 2007) we pre-sented two techniques (SVM Model Tamper-ing and Anchored Learning) for investigating the SVM learning process and resulting mod-els. These techniques were applied to the task of SVM based Hebrew NP Chunking. The re-sults were better understanding of SVM based chunking, of the role lexical features play in the...
Article
Full-text available
Computational linguistics methods are typically first developed and tested in English. When applied to other languages, assumptions from English data are often applied to the target language. One of the most common such assumptions is that a "standard" part-of-speech (POS) tagset can be used across languages with only slight variations. We discuss...
Conference Paper
Full-text available
Natural language generation (NLG) refers to the process of producing text in a spo- ken language, starting from an internal knowledge representation structure. Aug- mentative and Alternative Communica- tion (AAC) deals with the development of devices and tools to enable basic con- versation for language-impaired people. We present an applied protot...
Conference Paper
Full-text available
Morphological disambiguation is the pro- cess of assigning one set of morphologi- cal features to each individual word in a text. When the word is ambiguous (there are several possible analyses for the word), a disambiguation procedure based on the word context must be applied. This paper deals with morphological disambiguation of the Hebrew langua...
Conference Paper
Full-text available
We present a method for Noun Phrase chunking in Hebrew. We show that the traditional definition of base-NPs as non- recursive noun phrases does not apply in Hebrew, and propose an alternative defi- nition of Simple NPs. We review syntac- tic properties of Hebrew related to noun phrases, which indicate that the task of Hebrew SimpleNP chunking is ha...
Article
Full-text available
We present an authoring system for logical forms encoded as conceptual graphs (CG). The system belongs to the family of WYSIWYM (What You See Is What You Mean) text generation systems: logical forms are entered interactively and the cor-responding linguistic realization of the expressions is generated in several languages. The system maintains a mo...
Article
Full-text available
We present an implemented procedure to select an appropriate connective to link two propositions, which is part of a large text generation system. Each connec- tive is defined as a set of constraints between features of fire propositions it connects. Our focus has been to identify pragmatic features that can be produced by a deep generator to provi...
Article
Full-text available
Hebrew' includes a very productive noun-compounding construction called smixut. Because smixut is marked morphologically and is restricted by many syntactic constraints, it has been the focus of many descriptive studies in Hebrew grammar.
Article
Full-text available
This paper presents a specific part of HUGG, a generation grammar for Hebrew. This part deals with determiners and quantifiers. Our main goal is to determine which set of features must be present in the input to the generation grammar to control the generation of complex determiners and quantifiers.
Article
Full-text available
This paper presents the integration of a large- scale, reusable lexicon for generation with the FUF/SURGE unification-based syntactic realizer. The lexicon was combined from multiple existing re- sources in a semi-automatic process. The integra- tion is a multi-step unification process. This integration allows the reuse of lexical, syntactic, and s...
Article
Full-text available
End-user computing is needed in creative artistic applications or integrated editing environments, where the activity cannot be planned in advance. Following the paper by Orlarey et al., concrete abstractions (abstractions from examples) are suggested as a new mode for function definition, appropriate for end-user editor programmability. For certai...
Article
Full-text available
Syntactic realization grammars have traditionally attempted to accept inputs with the highest possible level of abstraction, in or- der to facilitate the work of the compo- nents (sentence planner) preparing the in- put. Recently, the search for higher abstraction has been, however, challenged (E1hadad and Robin, 1996)(Lavoie and Rambow, 1997) (Bus...
Article
Full-text available
We present a method to automatically generate a concise summary by identifying and synthesizing similar elements across related text from a set of multiple documents. Our approach is unique in its usage of language generation to reformulate the wording of the summary.
Article
Full-text available
We address the problem of ordering several circumstantials when generating or revising a clause. This problem occurs in the context of a multi-document summarization system that relies on language generation to incrementally reformulate the wording of fragments of sentences extracted from the documents. We present the results of an extensive corpus...
Article
Full-text available
Hebrew includes a very productive noun-compounding construction called smixut. Because smixut is marked morphologically and is restricted by many syntactic constraints, it has been the focus of many descriptive studies in Hebrew grammar. We present the treatment of smixut in HUGG, a FUF-based syntactic realization system capable of producing comple...
Article
Full-text available
This paper describes surge, a syntactic realization front-end for natural language generation systems. By gradually integrating complementary aspects of various linguistic theories within the computational framework of functional unification, surge has evolved to be one of the most comprehensive grammars of English for language generation available...
Article
Full-text available
this paper on the development and evolution of such a component, the surge system. surge has been widely distributed in the generation community and has been embedded into several complete systems. The goals of reusability and wide coverage have led to a large system and many of the issues faced during the development of the system are issues commo...
Article
Full-text available
ion Level Yael Dahan Netzer and Michael Elhadad Ben Gurion University Department of Mathematics and Computer Science, Beer Sheva, 84105, Israel (yaeln---elhadad)@cs.bgu.ac.il Abstract Syntactic realization grammars have traditionally attempted to accept inputs with the highest possible level of abstraction, in order to facilitate the work of the co...
Article
Full-text available
Two methods are used for evaluation of summarization systems: an evaluation of generated summaries against an "ideal" summary and evaluation of how well summaries help a person perform in a task such as information retrieval. We carried out two large experiments to study the two evaluation methods. Our results show that different parameters of an e...
Article
Full-text available
words Article: 3050 words Keywords: Unification, FUGs, Types, Generation Abstract Functional Unification Grammars (FUGs) are popular for natural language applications because the formalism uses very few primitives and is uniform and expressive. In our work on text generation, we have found that it also has annoying limitations: it is not suited for...
Article
Full-text available
Q Should I take AI this semester? We address the problem of generating a coherent A If you want to take courses like paragraph presenting arguments for a conclusion in a Natural Language Processing or text generation system. Existing text planning tech- Expert Systems or Vision niques are not appropriate for this task for two main next semester, re...
Article
Full-text available
We investigate one technique to produce a summary of an original text without requiring its full semantic interpretation, but instead relying on a model of the topic progression in the text derived from lexical chains. We present a new algorithm to compute lexical chains in a text, merging several robust knowledge sources: the WordNet thesaurus, a...
Article
Full-text available
Standard Functional Unification Grammars (FUGs) provide a structurally guided top-down control regime for text generation that is not appropriate for handling non-structural and dynamic constraints. We introduce two control tools that we have implemented for FUGs to address these limitations: bk-class, a tool to limit search by using a form of depe...
Article
Full-text available
Computer music environments (CMEs) are notoriously difficult to design and implement. As computer programs, they reflect the complex nature of music ontology and must support real-time manipulation of multimedia data. In addition, these programs must be usable by native users, supporting their creative process without obstructing it through technic...
Conference Paper
Full-text available
Syntactic realization grammars have tradi- tionally attempted to accept inputs with the highest possible level of abstraction, in or- der to facilitate the work of the compo- nents (sentence planner) preparing the in- put. Recently, the search for higher ab- straction has been, however, challenged (E1- hadad and Robin, 1996)(Lavoie and Ram- bow, 19...
Article
Full-text available
Lexical choice is a computationally complex task, requiring a generation system to consider a potentially large number of mappings between concepts and words. Constraints that aid in determining which word is best come from a wide variety of sources, including syntax, semantics, pragmatics, the lexicon, and the underlying domain. Furthermore, in so...
Article
Full-text available
This paper presents a lexical choice component for complex noun phrases. We first explain why lexical choice for NPs deserves special attention within the standard pipeline architecture for a generator. The task of the lexical chooser for NPs is more complex than for clauses because the syntax of NPs is less understood than for clauses, and therefo...
Article
Full-text available
A general language for specifying resource allocation in Scheduling problems is presented. TRAPS is a superclass of the RAPS (Resource Allocation Problem Specification) language developed by the authors [31]. TRAPS enables the specification of a scheduling problem by adding built in time operators, on top of existing terms for resources, activities...
Article
Text generation is a field of artificial intelligence aiming at producing computer models of natural language production. This paper discusses the use of the theory of ‘Argumentation in Language’ in the field of Text Generation. Most text generators follow the same sequence of steps to produce a coherent paragraph: content determination - selecting...
Article
Full-text available
Standard Functional Unification Grammars (FUGs) provide a structurally guided top-down control regime for sentence generation. When using FUGs to perform content realization as a whole, including lexical choice, this regime is no longer appropriate for two reasons: (1) the unification of non-lexicalized semantic input with an integrated lexico-gram...
Article
Full-text available
This paper presents a procedure to generate judgment determiners, e.g., many, few. Although such determiners carry very little objective information, they are extensively used in everyday language. The paper presents a precise characterization of a class of such determiners using three semantic tests. A conceptual representation for sets is then de...
Article
Full-text available
Using Argumentation to Control Lexical Choice: A Functional Unification Implementation Michael Elhadad This thesis investigates the impact of the pragmatic situation on surface generation. It presents new surface generation techniques that improve on both aspects of surface generation: (1) lexical choice, which consists of choosing words and their...
Article
Full-text available
This document is the user manual for FUF version 5.2, a natural language generator program that uses the technique of unification grammars. The program is composed of two main modules: a unifier and a linearizer. The unifier takes as input a semantic description of the text to be generated and a unification grammar, and produces as output a rich sy...
Conference Paper
It is shown how COMET, a system that uses natural language and graphics generation components to produce the text and pictures of its explanations dynamically, can create a variety of different explanations to explain the same concepts, and thus better meets the needs of different users. The authors focus on four ways in which COMET produces differ...
Conference Paper
Full-text available
We address the problem of generating adjectives in a text generation system. We distinguish between usages of adjectives informing the hearer of a property of an object and usages expressing an intention of the speaker, or an argumentative orientation. For such argumentative usages, we claim that a generator cannot simply map from information in th...
Conference Paper
For many applications that must explain reasoning, processes, or plans to their users, multimedia explanations are more effective than text or pictures alone the use of two or more modalities makes it possible to communicate the same or complementary itiormation in different ways. While hypermedia is a currently available technology for providing s...
Conference Paper
Full-text available
being modified. In addition, these decisions interact with the lexical properties of adjectives, the syntax of the clause We address the problem of generating adjectives in a text and other factors like collocations. In this paper we there- generation system. We distinguish between usages of ad- selection of an adjective? mation in the knowledge ba...
Article
Full-text available
Language generation systems have used a variety of grammatical formalisms for producing syntactic structure and yet, there has been little research evaluating the formalisms for the specifics of the generation task. In our work at Columbia we have primarily used a unification based formalism, a functional Unification Grammar (FUG) and have found it...
Article
Full-text available
This paper presents a procedure to generate judgment determiners, e.g., many, few. Although such deter-miners carry very little objective information, they are extensively used in everyday language. The pa-per presents a precise characterization of a class of such determiners using three semantic tests. A con-ceptual representation for sets is then...
Article
Full-text available
Morphological disambiguation is the pro- cess of assigning one set of morphologi- cal features to each individual word in a text. When the word is ambiguous (there are several possible analyses for the word), a disambiguation procedure based on the word context must be applied. This paper deals with morphological disambiguation of the Hebrew langua...

Network

Cited By

Projects

Project (1)