
Danushka Bollegala- PhD (Information Engineering)
- Professor at University of Liverpool
Danushka Bollegala
- PhD (Information Engineering)
- Professor at University of Liverpool
About
241
Publications
37,826
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
4,234
Citations
Introduction
I have broad research interests in Artificial Intelligence (AI) spanning across multiple fields such as Natural Language Processing (NLP), Web/Data Mining (DM), Machine Learning (ML), and Evolutionary Computation (EC). I have worked on numerous research problems in those fields and have published my work in top international such as ACL, NAACL, EMNLP, COLING, WWW, WSDM, IJCAI, AAAI, ECAI and journal such as IEEE TKDE, ACM TALIP, Elsevier IPM in those respective fields.
Current institution
Additional affiliations
April 2012 - present
April 2010 - March 2012
April 2007 - March 2010
Publications
Publications (241)
In commonsense generation, given a set of input concepts, a model must generate a response that is not only commonsense bearing, but also capturing multiple diverse viewpoints. Numerous evaluation metrics based on form- and content-level overlap have been proposed in prior work for evaluating the diversity of a commonsense generation model. However...
The meaning conveyed by a sentence often depends on the context in which it appears. Despite the progress of sentence embedding methods, it remains unclear how to best modify a sentence embedding conditioned on its context. To address this problem, we propose Condition-Aware Sentence Embeddings (CASE), an efficient and accurate method to create an...
Retrieval Augmented Generation (RAG) has gained popularity as a method for conveniently incorporating novel facts that were not seen during the pre-training stage in Large Language Model (LLM)-based Natural Language Generation (NLG) systems. However, LLMs are known to encode significant levels of unfair social biases. The modulation of these biases...
Background: Population ageing has led to an increase in multimorbidity and polypharmacy. Some medications may need to be stopped, but patient attitudes towards deprescribing are poorly understood. This study explores attitudes towards (de)prescribing in patients with multimorbidity in the UK primary care.
Methods: Patients with multimorbidity were...
Sentence embeddings represent the meaning of a given sentence using a fixed dimensional vector. Different approaches have been proposed in the Natural Language Processing (NLP) community for learning encoders that can produce accurate sentence embeddings that perform well for diverse downstream tasks that require sentence representations. Despite t...
Unsupervised constituency parsers organize phrases within a sentence into a tree-shaped syntactic constituent structure that reflects the organization of sentence semantics. However, the traditional objective of maximizing sentence log-likelihood (LL) does not explicitly account for the close relationship between the constituent structure and the s...
Introduction
Structured medication reviews (SMRs), introduced in the United Kingdom (UK) in 2020, aim to enhance shared decision-making in medication optimisation, particularly for patients with multimorbidity and polypharmacy. Despite its potential, there is limited empirical evidence on the implementation of SMRs, and the challenges faced in the...
Words change their meaning over time as well as in different contexts. The sense-aware contextualised word embeddings (SCWEs) such as the ones produced by XL-LEXEME by fine-tuning masked langauge models (MLMs) on Word-in-Context (WiC) data attempt to encode such semantic changes of words within the contextualised word embedding (CWE) spaces. Despit...
Social biases such as gender or racial biases have been reported in language models (LMs), including Masked Language Models (MLMs). Given that MLMs are continuously trained with increasing amounts of additional data collected over time, an important yet unanswered question is how the social biases encoded with MLMs vary over time. In particular, th...
Artificial intelligence (AI), specifically, Natural Language Processing (NLP) is being hailed as a new breeding ground for immense innovation potential. Researchers believe that NLP-based technologies could help to solve societal issues such as equality and inclusion, education, health, and hunger, climate action etc., and many more. Tackling these...
Sense embedding learning methods learn multiple vectors for a given ambiguous word, corresponding to its different word senses. For this purpose, different methods have been proposed in prior work on sense embedding learning that use different sense inventories, sense-tagged corpora and learning methods. However, not all existing sense embeddings c...
Cosine similarity between two words, computed using their contextualised token embeddings obtained from masked language models (MLMs) such as BERT has shown to underestimate the actual similarity between those words (Zhou et al., 2022). This similarity underestimation problem is particularly severe for highly frequent words. Although this problem h...
Languages are dynamic entities, where the meanings associated with words constantly change with time. Detecting the semantic variation of words is an important task for various NLP applications that must make time-sensitive predictions. Existing work on semantic variation prediction have predominantly focused on comparing some form of an averaged c...
Discrete prompts have been used for fine-tuning Pre-trained Language Models for diverse NLP tasks. In particular, automatic methods that generate discrete prompts from a small set of training instances have reported superior performance. However, a closer look at the learnt prompts reveals that they contain noisy and counter-intuitive lexical const...
Numerous types of social biases have been identified in pre-trained language models (PLMs), and various intrinsic bias evaluation measures have been proposed for quantifying those social biases. Prior works have relied on human annotated examples to compare existing intrinsic bias evaluation measures. However, this approach is not easily adaptable...
We show that the $\ell_2$ norm of a static sense embedding encodes information related to the frequency of that sense in the training corpus used to learn the sense embeddings. This finding can be seen as an extension of a previously known relationship for word embeddings to sense embeddings. Our experimental results show that, in spite of its simp...
We study the relationship between task-agnostic intrinsic and task-specific extrinsic social bias evaluation measures for Masked Language Models (MLMs), and find that there exists only a weak correlation between these two types of evaluation measures. Moreover, we find that MLMs debiased using different methods still re-learn social biases during f...
Dynamic contextualised word embeddings represent the temporal semantic variations of words. We propose a method for learning dynamic contextualised word embeddings by time-adapting a pretrained Masked Language Model (MLM) using time-sensitive templates. Given two snapshots $C_1$ and $C_2$ of a corpora taken respectively at two distinct timestamps $...
Systematic literature review (SLR) is a crucial method for clinicians and policymakers to make their decisions in a flood of new clinical studies. Because manual literature screening in SLR is a highly laborious task, its automation by natural language processing (NLP) has been welcomed. Although intervention is a key information for literature scr...
Meta-embedding (ME) learning is an emerging approach that attempts to learn more accurate word embeddings given existing (source) word embeddings as the sole input. Due to their ability to incorporate semantics from multiple source embeddings in a compact manner with superior performance, ME learning has gained popularity among practitioners in NLP...
Masked Language Models (MLMs) have shown superior performances in numerous downstream Natural Language Processing (NLP) tasks. Unfortunately, MLMs also demonstrate significantly worrying levels of social biases. We show that the previously proposed evaluation metrics for quantifying the social biases in MLMs are problematic due to the following rea...
Combining multiple source embeddings to create meta-embeddings is considered effective to obtain more accurate embeddings. Different methods have been proposed to develop meta-embeddings from a given set of source embeddings. However, the source embeddings can contain unfair gender bias, and the bias in the combination of multiple embeddings and de...
Masked Language Models (MLMs) pre-trained by predicting masked tokens on large corpora have been used successfully in natural language processing tasks for a variety of languages. Unfortunately, it was reported that MLMs also learn discriminative biases regarding attributes such as gender and race. Because most studies have focused on MLMs in Engli...
Prior work on integrating text corpora with knowledge graphs (KGs) to improve Knowledge Graph Embedding (KGE) have obtained good performance for entities that co-occur in sentences in text corpora. Such sentences (textual mentions of entity-pairs) are represented as Lexicalised Dependency Paths (LDPs) between two entities. However, it is not possib...
Meta-embedding (ME) learning is an emerging approach that attempts to learn more accurate word embeddings given existing (source) word embeddings as the sole input. Due to their ability to incorporate semantics from multiple source embeddings in a compact manner with superior performance, ME learning has gained popularity among practitioners in NLP...
A variety of contextualised language models have been proposed in the NLP community, which are trained on diverse corpora to produce numerous Neural Language Models (NLMs). However, different NLMs have reported different levels of performances in downstream NLP applications when used as text representations. We propose a sentence-level meta-embeddi...
Probing Pre-trained Language Models (PLMs) using prompts has indirectly implied that language models (LMs) can be treated as knowledge bases. To this end, this phenomena has been effective especially when these LMs are fine-tuned towards not just data of a specific domain, but also to the style or linguistic pattern of the prompts themselves. We ob...
Sense embedding learning methods learn different embeddings for the different senses of an ambiguous word. One sense of an ambiguous word might be socially biased while its other senses remain unbiased. In comparison to the numerous prior work evaluating the social biases in pretrained word embeddings, the biases in sense embeddings have been relat...
Word embedding models have recently shown some capability to encode hierarchical information that exists in textual data. However, such models do not explicitly encode the hierarchical structure that exists among words. In this work, we propose a method to learn hierarchical word embeddings (HWEs) in a specific order to encode the hierarchical info...
The recent popularization of social web services has made them one of the primary uses of the World Wide Web. An important concept in social web services is social actions such as making connections and communicating with others and adding annotations to web resources. Predicting social actions would improve many fundamental web applications, such...
This paper presents a novel unsupervised abstractive summarization method for opinionated texts. While the basic variational autoencoder-based models assume a unimodal Gaussian prior for the latent code of sentences, we alternate it with a recursive Gaussian mixture, where each mixture component corresponds to the latent code of a topic sentence an...
Curated Document Databases (CDD) play an important role in helping researchers find relevant articles in scientific literature. Considerable recent attention has been given to the use of various document ranking algorithms to support the maintenance of CDDs. The typical approach is to represent the update document collection using a form of word em...
More and more users have been taking various actions to diverse resources referred to by URLs such as news, web pages, images, products, movies as a result of the growth of social media. They are annotating, tweeting in Twitter, reblogging in Tumblr, and Liking in Facebook, etc. Analyses about these diverse actions will be useful for aggregating or...
This paper presents a novel unsupervised abstractive summarization method for opinionated texts. While the basic variational autoencoder-based models assume a unimodal Gaussian prior for the latent code of sentences, we alternate it with a recursive Gaussian mixture, where each mixture component corresponds to the latent code of a topic sentence an...
Cross-lingual text representations have gained popularity lately and act as the backbone of many tasks such as unsupervised machine translation and cross-lingual information retrieval, to name a few. However, evaluation of such representations is difficult in the domains beyond standard benchmarks due to the necessity of obtaining domain-specific p...
Masked Language Models (MLMs) have shown superior performances in numerous downstream NLP tasks when used as text encoders. Unfortunately, MLMs also demonstrate significantly worrying levels of social biases. We show that the previously proposed evaluation metrics for quantifying the social biases in MLMs are problematic due to following reasons: (...
A health outcome is a measurement or an observation used to capture and assess the effect of a treatment. Automatic detection of health outcomes from text would undoubtedly speed up access to evidence necessary in healthcare decision making. Prior work on outcome detection has modelled this task as either (a) a sequence labelling task, where the go...
Counterfactual statements describe events that did not or cannot take place. We consider the problem of counterfactual detection (CFD) in product reviews. For this purpose, we annotate a multilingual CFD dataset from Amazon product reviews covering counterfactual statements written in English, German, and Japanese languages. The dataset is unique a...
Negative sampling is a limiting factor w.r.t. the generalization of metric-learned neural networks. We show that uniform negative sampling provides little information about the class boundaries and thus propose three novel techniques for efficient negative sampling: drawing negative samples from (1) the top-$k$ most semantically similar classes, (2...
Embedding entities and relations of a knowledge graph in a low-dimensional space has shown impressive performance in predicting missing links between entities. Although progresses have been achieved, existing methods are heuristically motivated and theoretical understanding of such embeddings is comparatively underdeveloped. This paper extends the...
In comparison to the numerous debiasing methods proposed for the static non-contextualised word embeddings, the discriminative biases in contextualised embeddings have received relatively little attention. We propose a fine-tuning method that can be applied at token- or sentence-levels to debias pre-trained contextualised embeddings. Our proposed m...
Word embeddings trained on large corpora have shown to encode high levels of unfair discriminatory gender, racial, religious and ethnic biases. In contrast, human-written dictionaries describe the meanings of words in a concise, objective and an unbiased manner. We propose a method for debiasing pre-trained word embeddings using dictionaries, witho...
Multi-step ahead prediction in language models is challenging due to the discrepancy between training and test time processes. At test time, a sequence predictor is required to make predictions given past predictions as the input, instead of the past targets that are provided during training. This difference, known as exposure bias, can lead to the...
Curated Document Databases play a critical role in helping researchers find relevant articles in available literature. One such database is the ORRCA (Online Resource for Recruitment research in Clinical trials) database. The ORRCA database brings together published work in the field of clinical trials recruitment research into a single searchable...
Explanation has been a central feature of AI systems for legal reasoning since their inception. Recently, the topic of explanation of decisions has taken on a new urgency, throughout AI in general, with the increasing deployment of AI tools and the need for lay users to be able to place trust in the decisions that the support tools are recommending...
Prior work investigating the geometry of pre-trained word embeddings have shown that word embeddings to be distributed in a narrow cone and by centering and projecting using principal component vectors one can increase the accuracy of a given set of pre-trained word embeddings. However, theoretically, this post-processing step is equivalent to appl...
A semantic relation between two given words a and b can be represented using two complementary sources of information: (a) the semantic representations of a and b (expressed as word embeddings) and, (b) the contextual information obtained from the co-occurrence contexts of the two words (expressed in the form of lexico-syntactic patterns). Pattern-...
This paper examines the effect of using co-reference chains based conversational history against the use of entire conversation history for conversational question answering (CoQA) task. The QANet model is modified to include conversational history and NeuralCoref is used to obtain co-reference chains based conversation history. The results of the...
Knowledge Graph Embedding methods learn low-dimensional representations for entities and relations in knowledge graphs, which can be used to infer previously unknown relations between pairs of entities in the knowledge graph. This is particularly useful for expanding otherwise sparse knowledge graphs. However, the relation types that can be predict...
Prior work has investigated the problem mining arguments from online reviews by classifying opinions based on the stance expressed explicitly or implicitly. An implicit opinion has the stance left unexpressed linguistically while an explicit opinion has the stance expressed explicitly. In this paper, we propose a bipartite graph-based approach to r...
Evaluating the performance of neural network-based text generators and density estimators is challenging since no one measure perfectly evaluates language quality. Perplexity has been a mainstay metric for neural language models trained by maximizing the conditional log-likelihood. We argue perplexity alone is a naive measure since it does not expl...
The Conversational Question Answering (CoQA) task involves answering a sequence of inter-related conversational questions about a contextual paragraph. Although existing approaches employ human-written ground-truth answers for answering conversational questions at test time, in a realistic scenario, the CoQA model will not have any access to ground...
Dialogue engines that incorporate different types of agents to converse with humans are popular. However, conversations are dynamic in the sense that a selected response will change the conversation on-the-fly, influencing the subsequent utterances in the conversation, which makes the response selection a challenging problem. We model the problem o...
This paper presents a contextualized graph attention network that combines edge features and multiple sub-graphs for improving relation extraction. A novel method is proposed to use multiple sub-graphs to learn rich node representations in graph-based networks. To this end multiple sub-graphs are obtained from a single dependency tree. Two types of...
Domain adaptation considers the problem of generalising a model learnt using data from a particular source domain to a different target domain. Often it is difficult to find a suitable single source to adapt from, and one must consider multiple sources. Using an unrelated source can result in sub-optimal performance, known as the \emph{negative tra...
Language-independent tokenisation (LIT) methods that do not require labelled language resources or lexicons have recently gained popularity because of their applicability in resource-poor languages. Moreover, they compactly represent a language using a fixed size vocabulary and can efficiently handle unseen or rare words. On the other hand, languag...
We propose a novel non-parametric method for cross-modal retrieval which is applied on top of precomputed image and text embeddings. By combining our method with standard approaches for building image and text encoders, trained independently with a self-supervised classification objective, we create a baseline model which outperforms most existing...
Prior work has investigated the problem mining arguments from online reviews by classifying opinions based on the stance expressed explicitly or implicitly. An implicit opinion has the stance left unexpressed linguistically while an explicit opinion has the stance expressed explicitly. In this paper, we propose a bipartite graph-based approach to r...
Protecting the privacy of search engine users is an important requirement in many information retrieval scenarios. A user might not want a search engine to guess his or her information need despite requesting relevant results. We propose a method to protect the privacy of search engine users by decomposing the queries using semantically \emph{relat...
Task-specific scores are often used to optimize for and evaluate the performance of conditional text generation systems. However, such scores are non-differentiable and cannot be used in the standard supervised learning paradigm. Hence, policy gradient methods are used since the gradient can be computed without requiring a differentiable objective....
Coherent division of legal document bundles, whether this is done in the context of court bundles, briefs or some other application, is a time consuming and challenging task. We propose an approach whereby this process can be automated. Two variations are considered. The first addresses the scenario where the topic labelling is pre-defined and adop...
Word embeddings learnt from massive text collections have demonstrated significant levels of discriminative biases such as gender, racial or ethnic biases, which in turn bias the down-stream NLP applications that use those word embeddings. Taking gender-bias as a working example, we propose a debiasing method that preserves non-discriminative gende...
In this paper we propose a novel neural language modelling (NLM) method based on \textit{error-correcting output codes} (ECOC), abbreviated as ECOC-NLM. This latent variable based approach provides a principled way to choose a varying amount of latent output codes and avoids exact softmax normalization. Instead of minimizing measures between the pr...
Continuous authentication is significant with respect to many online applications where it is desirable to monitor a user’s identity throughout an entire session; not just at the beginning of the session. One example application domain, where this is a requirement, is in relation to Massive Open Online Courses (MOOCs) when users wish to obtain some...
We propose in this paper a combined model of Long Short Term Memory and Convolutional Neural Networks (LSTM-CNN) that exploits word embeddings and positional embeddings for cross-sentence n-ary relation extraction. The proposed model brings together the properties of both LSTMs and CNNs, to simultaneously exploit long-range sequential information a...
This paper carries out an empirical analysis of various dropout techniques for language modelling, such as Bernoulli dropout, Gaussian dropout, Curriculum Dropout, Variational Dropout and Concrete Dropout. Moreover, we propose an extension of variational dropout to concrete dropout and curriculum dropout with varying schedules. We find these extens...














































































































































