Ahmed Abbasi

Ahmed Abbasi
  • Doctor of Philosophy
  • Chair at University of Notre Dame

About

116
Publications
157,819
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
7,135
Citations
Introduction
Ahmed Abbasi is the Joe and Jane Giovanini endowed chaired professor at the University of Notre Dame. He has over 20 years of experience pertaining to AI, machine learning, and natural language processing, with applications in online fraud and security, health, and social media. Ahmed’s research has been funded through multiple grants from the National Science Foundation. He has also received the IBM Faculty Award, AWS Research Grant, and Microsoft Research Azure Award for his work on Big Data.
Current institution
University of Notre Dame
Current position
  • Chair

Publications

Publications (116)
Preprint
Full-text available
Closed large language models (LLMs) such as GPT-4 have set state-of-the-art results across a number of NLP tasks and have become central to NLP and machine learning (ML)-driven solutions. Closed LLMs' performance and wide adoption has sparked considerable debate about their accessibility in terms of availability, cost, and transparency. In this stu...
Preprint
Full-text available
Predictive machine learning models are widely used in customer relationship management (CRM) to forecast customer behaviors and support decision-making. However, the dynamic nature of customer behaviors often results in significant distribution shifts between training data and serving data, leading to performance degradation in predictive models. D...
Article
The use of machine learning (ML) models to assess and score textual data has become increasingly pervasive in an array of contexts including natural language processing, information retrieval, search and recommendation, and credibility assessment of online content. A significant disruption at the intersection of ML and text are text-generating larg...
Article
Every day, patients access and generate online health content through a variety of channels, creating an ever-expanding sea of digital data. At the same time, proponents of public health have recently called for timely, granular, and actionable data to address a range of public health issues, stressing the need for social listening platforms that c...
Article
Robust digital experimentation platforms have become increasingly pervasive at major technology and e-commerce firms worldwide. They allow product managers to use data driven decision-making through online controlled experiments that estimate the average treatment effect (ATE) relative to a status quo control setting and make associated inferences....
Article
The use of machine learning (ML) to detect depression in online settings has emerged as an important health and wellness use case. In particular, the use of deep learning methods for depression detection from textual content posted on social media has garnered considerable attention. Conversely, there has been relatively limited evaluation of depre...
Article
Full-text available
Research on automated mental health assessment tools has been growing in recent years, often aiming to address the subjectivity and bias that existed in the current clinical practice of the psychiatric evaluation process. Despite the substantial health and economic ramifications, the potential unfairness of those automated tools was understudied an...
Preprint
Full-text available
The scaling laws have become the de facto guidelines for designing large language models (LLMs), but they were studied under the assumption of unlimited computing resources for both training and inference. As LLMs are increasingly used as personalized intelligent assistants, their customization (i.e., learning through fine-tuning) and deployment on...
Article
“The Century of Disasters” refers to the increased frequency, complexity, and magnitude of natural and man-made disasters witnessed in the 21st century: the impact of such disasters is exacerbated by infrastructure vulnerabilities, population growth/urbanization, and a challenging policy landscape. Technology-enabled disaster management (TDM) has a...
Article
An expanding body of information systems research is adopting a design perspective on artificial intelligence (AI), wherein researchers prescribe solutions to problems using AI approaches rather than describing or explaining AI-related phenomena being studied. In this editorial, we address some of the challenges faced in publishing design research...
Article
Fairness measurement is crucial for assessing algorithmic bias in various types of machine learning (ML) models, including ones used for search relevance, recommendation, personalization, talent analytics, and natural language processing. However, the fairness measurement paradigm is currently dominated by fairness metrics that examine disparities...
Article
Objective: Psychiatric evaluation suffers from subjectivity and bias, and is hard to scale due to intensive professional training requirements. In this work, we investigated whether behavioral and physiological signals, extracted from tele-video interviews, differ in individuals with psychiatric disorders. Methods: Temporal variations in facial...
Article
Topic modeling is a commonly used text analysis tool for discovering latent topics in a text corpus. However, while topics in a text corpus often exhibit a hierarchical structure (e.g., cellphone is a sub-topic of electronics), most topic modeling methods assume a flat topic structure that ignores the hierarchical dependency among topics, or utiliz...
Preprint
Full-text available
Research on automated mental health assessment tools has been growing in recent years, often aiming to address the subjectivity and bias that existed in the current clinical practice of the psychiatric evaluation process. Despite the substantial health and economic ramifications, the potential unfairness of those automated tools was understudied an...
Article
Full-text available
Data science has been described as the fourth paradigm for scientific discovery. The latest wave of data science research on machine learning and artificial intelligence (AI) is growing exponentially and garnering millions of annual citations. However, this growth has been accompanied by a diminishing emphasis on social good challenges analysis rev...
Article
Full-text available
Background Automatic speech recognition (ASR) technology is increasingly being used for transcription in clinical contexts. Although there are numerous transcription services using ASR, few studies have compared the word error rate (WER) between different transcription services among different diagnostic groups in a mental health setting. There has...
Preprint
Objective: The current clinical practice of psychiatric evaluation suffers from subjectivity and bias, and requires highly skilled professionals that are often unavailable or unaffordable. Objective digital biomarkers have shown the potential to address these issues. In this work, we investigated whether behavioral and physiological signals, extrac...
Preprint
BACKGROUND Automatic speech recognition (ASR) technology is increasingly being used for transcription in clinical contexts. Although there are numerous HIPAA-compliant transcription services using ASR, few studies have compared the word error rate (WER) between different transcription services among different diagnostic groups in a mental health se...
Article
Digital experiments are routinely used to test the value of a treatment relative to a status quo control setting — for instance, a new search relevance algorithm for a website or a new results layout for a mobile app. As digital experiments have become increasingly pervasive in organizations and a wide variety of research areas, their growth has pr...
Article
Full-text available
Analysts, managers, and policymakers are interested in predictive analytics capable of offering better foresight. It is generally accepted that in forecasting scenarios involving organizational policies or consumer decision making, personal characteristics, including personality, may be an important predictor of downstream outcomes. The inclusion o...
Article
Full-text available
Phishing is a significant security concern for organizations, threatening employees and members of the public. Phishing threats against employees can lead to severe security incidents, whereas those against the public can undermine trust, satisfaction, and brand equity. At the root of the problem is the inability of Internet users to identify phish...
Article
Phishing websites become a critical cybersecurity threat affecting individuals and organizations. Phishing-website detection tools are designed to protect users against such sites. Nevertheless, detection tools face serious user trust and suboptimal performance issues which require trust calibration to align trust with the tool’s capabilities. We e...
Article
Full-text available
Adverse event detection is critical for many real-world applications including timely identification of product defects, disasters, and major socio-political incidents. In the health context, adverse drug events account for countless hospitalizations and deaths annually. Since users often begin their information seeking and reporting with online se...
Preprint
Psychometric measures of ability, attitudes, perceptions, and beliefs are crucial for understanding user behaviors in various contexts including health, security, e-commerce, and finance. Traditionally, psychometric dimensions have been measured and collected using survey-based methods. Inferring such constructs from user-generated text could affor...
Article
Full-text available
The scholarly information-seeking process for behavioral research consists of three phases: search, access, and processing of past research. Existing IT artifacts, such as Google Scholar, have in part addressed the search and access phases, but fall short of facilitating the processing phase, creating a knowledge inaccessibility problem. We propose...
Article
Full-text available
The authors examine consumers’ information channel usage during the customer journey by employing a hedonic and utilitarian (H/U) perspective, an important categorization of consumption purpose. Taking a retailer-category viewpoint to measure the H/U characteristics of 20 product categories at 40 different retailers, this study combines large-scale...
Article
Full-text available
Psychometric measures reflecting people’s knowledge, ability, attitudes, and personality traits are critical for many real-world applications, such as e-commerce, health care, and cybersecurity. However, traditional methods cannot collect and measure rich psychometric dimensions in a timely and unobtrusive manner. Consequently, despite their import...
Article
Full-text available
With greater impetus on broad postmarket surveillance, the Voice of the Customer (VoC) has emerged as an important source of information for understanding consumer experiences and identifying potential issues. In organizations, risk management groups are increasingly interested in working with their information technology teams to develop robust Vo...
Article
This research examines the roles of health literacy, health numeracy, and trust in doctor on: 1) patient anxiety when consulting a doctor; 2) frequency of physician consultations; and 3) patient subjective well‐being (SWB). Our sample consisted of 4,040 adults representative of the U.S. in terms of age, income, and education, but equally split amon...
Article
Full-text available
Historical events and the illumination of unequal treatment of cardiovascular and other diseases among African Americans and their white counterparts have suppressed African Americans’ participation in research. Approaches that bring scientific professionals into actual partnership with affected communities show promise for overcoming this reluctan...
Article
We propose a Discussion Logic-based Text Analytics (DiLTA) framework, which combines theories developed in social science and text mining fields. The framework extracts features that uncover discussion logic and uses these features in analyzing online discussions. A series of models are proposed including conversation disentanglement, coherence ana...
Chapter
We propose a method to evaluate adverse drug event (ADE) narratives using biomedical semantic similarity measures. Automated drug surveillance systems have used social media as a prime resource to detect ADEs. However, the problem of language usage over social media has been a challenge in evaluating the performance of such systems. We address this...
Article
Full-text available
Social media and online communities provide organizations with new opportunities to support their business-related functions. Despite their various benefits, social media technologies present two important challenges for sense-making. First, online discourse is plagued by incoherent, intertwined conversations that are often difficult to comprehend....
Article
Full-text available
Twitter has emerged as a major social media platform and generated great interest from sentiment analysis researchers. Despite this attention, state-of-the-art Twitter sentiment analysis approaches perform relatively poorly with reported classification accuracies often below 70%, adversely impacting applications of the derived sentiment information...
Article
Full-text available
As more firms adopt big data analytics to better understand their customers and differentiate their offerings from competitors, it becomes increasingly difficult to generate strategic value from isolated and unfocused ad hoc initiatives. To attain sustainable competitive advantage from big data, firms must achieve agility in combining rich data acr...
Article
Full-text available
In this work, we study two approaches for the problem of RNA-Protein Interaction (RPI). In the first approach, we use a feature-based technique by combining extracted features from both sequences and secondary structures. The feature-based approach enhanced the prediction accuracy as it included much more available information about the RNA-protein...
Conference Paper
Full-text available
In this work, we study string-based approaches for the problem of RNA-Protein Interaction (RPI). We apply string algorithms and data structures to extract effective string patterns for prediction of RPI, using both sequence information (protein and RNA sequences), and structure information (protein and RNA secondary structures). This led to differe...
Conference Paper
Full-text available
The number of active, online phishing websites continues to grow unabated in recent years. This has created an ever-increasing security risk for both individual and enterprise users in terms of identity theft, malware, financial loss, etc. Although resources exist for tracking, cataloguing, and blacklisting these types of sites (e.g., PhishTank.com...
Conference Paper
Full-text available
Phishing website-based attacks remain pervasive, with high user susceptibility continuing to be a major factor. In this study we use cluster analysis coupled with an elaborate controlled experiment involving hundreds of participants to identify and examine high susceptibility user segments in terms of their perceptions, demographics, and phishing w...
Conference Paper
Full-text available
Regulators, analysts, policy-makers, and advocacy groups are increasingly interested in utilizing the abundance of available online Health 2.0 content to support key decision-making tasks. However, existing systems are ill-suited to deal with the plethora of medical spam and variety of relevant online channels. We present a prototype system for col...
Conference Paper
Full-text available
The accumulated literature base in the behavioral sciences represents a great source of knowledge on human behaviors, and yet the same literature has grown beyond human comprehension. We address this information overload problem by proposing a novel IT artifact-TheoryOn. Based on the design science paradigm, we identify five design requirements. We...
Article
Full-text available
Big data has received considerable attention from the information systems (IS) discipline over the past few years, with several recent commentaries, editorials, and special issue introductions on the topic appearing in leading IS outlets. These papers present varying perspectives on promising big data research topics and highlight some of the chall...
Article
Full-text available
Behavior prediction has become an important area of emphasis, with applications ranging from e-commerce, marketing analytics, and financial forecasting to smart health, security informatics, and crime prevention. However, traditional behavior modeling approaches have shortcomings: heavy reliance on objective, observed data, and a failure to conside...
Article
Full-text available
The guest editors of this special issue on predictive analytics discuss the second part of their split special issue with a look at the micro level. The four articles are nice examples of predictive modeling, constituting a broad body of work with appropriate commonalities and collectively providing important takeaways for the research and practiti...
Article
Full-text available
By successfully exploiting human vulnerabilities, fake websites have emerged as a major source of online fraud. Fake websites continue to inflict exorbitant monetary losses and also have significant ramifications for online security. We explore the process by which salient performance-related elements could increase the reliance on protective tools...
Article
Full-text available
The guest editors of this special issue on predictive analytics discuss how they split submissions between macro and micro levels, with the March/April installment covering the macro scale and May/June covering micro. All three articles are nice exemplars of predictive analytics, encompassing novel insights, key nuances, rigorous analytical methods...
Article
Full-text available
Phishing websites continue to successfully exploit user vulnerabilities in household and enterprise settings. Existing anti-phishing tools lack the accuracy and generalizability needed to protect Internet users and organizations from the myriad of attacks encountered daily. Consequently, users often disregard these tools' warnings. In this study, u...
Article
The accumulated literature base in the behavioral sciences represents the IS discipline’s greatest source of knowledge, and yet the same literature has grown beyond human comprehension. An experiment is conducted showing the inability of experts to retrieve relevant constructs using full-text search. To address this inability to access the body of...
Article
Full-text available
Methods and tools to conduct authorship analysis of web contents is of growing interest to researchers and practitioners in various security-focused disciplines, including cybersecurity, counter-terrorism, and other fields in which authorship of text may at times be uncertain or obfuscated. Here we demonstrate an automated approach for authorship a...
Conference Paper
Full-text available
Twitter has become one of the quintessential social media platforms for user-generated content. Researchers and industry practitioners are increasingly interested in Twitter sentiments. Consequently, an array of commercial and freely available Twitter sentiment analysis tools have emerged, though it remains unclear how well these tools really work....
Article
Full-text available
This special section of "Trends & Controversies" focuses on social media analytics for smart health. The introduction, called "Social Media Analytics for Smart Health," is provided by Ahmed Abbasi and Donald Adjeroh. Then Mark Dredze and Michael J. Paul have written "Natural Language Processing for Health and Social Media." Next, Fatemeh "Mariam" Z...
Article
Full-text available
The ability to automatically detect fraudulent escrow websites is important in order to alleviate online auction fraud. Despite research on related topics, fake escrow website categorization has received little attention. In this study we evaluated the effectiveness of various features and techniques for detecting fake escrow websites. Our analysis...
Article
Full-text available
Fake websites have emerged as a major source of online fraud, accounting for billions of dollars of loss by Internet users. We explore the process by which salient design elements could increase the use of protective tools, thus reducing the success rate of fake websites. Using the protection motivation theory, we conceptualize a model to investiga...
Article
Full-text available
Fake online pharmacies have become increasingly pervasive, constituting over 90% of online pharmacy websites. There is a need for fake website detection techniques capable of identifying fake online pharmacy websites with a high degree of accuracy. In this study, we compared several well-known link-based detection techniques on a large-scale test b...
Article
Full-text available
Existing fake website detection systems are unable to effectively detect fake websites. In this study, we advocate the development of fake website detection systems that employ classification methods grounded in statistical learning theory (SLT). Experimental results reveal that a prototype system developed using SLT-based methods outperforms seven...
Article
Full-text available
Social tagging, as a novel approach to information organization and discovery, has been widely adopted in many Web2.0 applications. The tags provide a new type of information that can be exploited by recommender systems. Nevertheless, the sparsity of ternary <user, tag, item> interaction data limits the performance of tag-based collaborative filter...
Conference Paper
Full-text available
With the rapid growth in available genomic data, robust and efficient methods for identifying RNA secondary structure elements, such as hairpins, have become a significant challenge in computational biology, with potential applications in prediction of RNA secondary and tertiary structures, functional classification of RNA structures, micro RNA tar...
Conference Paper
Full-text available
Twitter sentiment analysis has become widely popular. However, stable Twitter sentiment classification performance remains elusive due to several issues: heavy class imbalance in a multi-class problem, representational richness issues for sentiment cues, and the use of diverse colloquial linguistic patterns. These issues are problematic since many...
Conference Paper
Social intelligence derived from Health 2.0 content has become of significant importance for various applications, including post-marketing drug surveillance, competitive intelligence, and to assess health-related opinions and sentiments. However, the volume, velocity, variety, and quality of online health information present challenges, necessitat...
Article
Full-text available
Social tagging, as a novel approach to information organization and discovery, has been widely adopted in many Web 2.0 applications. Tags contributed by users to annotate a variety of Web resources or items provide a new type of information that can be exploited by recommender systems. Nevertheless, the sparsity of the ternary interaction data amon...
Conference Paper
Full-text available
Analyzing authorship of online texts is an important analysis task in security-related areas such as cybercrime investigation and counter-terrorism, and in any field of endeavor in which authorship may be uncertain or obfuscated. This paper presents an automated approach for authorship analysis using machine learning methods, a robust stylometric f...
Article
Full-text available
Financial fraud can have serious ramifications for the long-term sustainability of an organization, as well as adverse effects on its employees and investors, and on the economy as a whole. Several of the largest bankruptcies in U.S. history involved firms that engaged in major fraud. Accordingly, there has been considerable emphasis on the develop...
Article
Full-text available
Despite the increased prevalence of sentiment-related information on the Web, there has been limited work on focused crawlers capable of effectively collecting not only topic-relevant but also sentiment-relevant content. In this article, we propose a novel focused crawler that incorporates topic and sentiment information as well as a graph-based tu...
Article
Full-text available
Fake medicalWeb sites have become increasingly prevalent. Consequently, much of the health-related information and advice available online is inaccurate and/or misleading. Scores of medical institution Web sites are for organizations that do not exist and more than 90% of online pharmacy Web sites are fraudulent. In addition to monetary losses exac...
Conference Paper
Full-text available
Users on the web are unknowingly becoming more susceptible to scams from cyber deviants and malicious websites. There has been much work in the identification of malicious websites using application layer features based on content (HTML, images, links, etc.) and a plethora of classification techniques. However, there has been little work on using f...
Conference Paper
Full-text available
Phishing website-based attacks continue to present significant problems for individual and enterprise-level security, including identity theft, malware, and viruses. While the performance of anti-phishing tools has improved considerably, it is unclear how effective such tools are at protecting users. In this study, an experiment involving over 400...
Conference Paper
Full-text available
This paper examines the salient factors in the calibration of trust in automated fake website detection tools. Drawing upon the trust-in-automation theory, we propose the human-automation contrast (HAC) theory in which the parallels of human-to-human versus human-to-automation interactions and trust development processes are discussed. We then rely...
Conference Paper
Full-text available
Additive or overlapping clustering is a technique that is used to analyze overlapping cluster structure in data. In this paper, we motivate the overlapping clustering problem using an example of categorizing movies. We describe the ADCLUS and INDCLUS overlapping clustering models as discrete versions of the CANDECOMP/PARAFAC models. We describe the...
Conference Paper
Anti-phishing systems are developed to prevent users from interacting with fraudulent websites. However these tools are ineffective since users often disregard their warnings. We present a design science-based assessment of interface design elements for such systems. An extensive taxonomy of important design elements is constructed. A survey is use...
Conference Paper
Full-text available
Despite the prevalence of sentiment-related content on the Web, there has been limited work on focused crawlers capable of effectively collecting such content. In this study, we evaluated the efficacy of using sentiment-related information for enhanced focused crawling of opinion-rich web content regarding a particular topic. We also assessed the i...
Article
Full-text available
This paper presents a cyber-archaeology approach to social movement research. The approach overcomes many of the issues of scale and complexity facing social research in the Internet, enabling broad and longitudinal study of the virtual communities supporting social movements. Cultural cyber-artifacts of significance to the social movement are coll...
Article
Full-text available
Fake websites have become increasingly pervasive, generating billions of dollars in fraudulent revenue at the expense of unsuspecting Internet users. The design and appearance of these websites makes it difficult for users to manually identify them as fake. Automated detection systems have emerged as a mechanism for combating fake websites, however...
Article
Full-text available
Opinion mining, a subdiscipline within data mining and computational linguistics, refers to the computational techniques for extracting, classifying, understanding, and assessing the opinions expressed in various online news sources, social media comments, and other user-generated content. This Trends & Controversies department and the previous one...
Article
Full-text available
The unprecedented growth of the Internet has given rise to the Dark Web, the problematic facet of the Web associated with cybercrime, hate, and extremism. Despite the need for tools to collect and analyze Dark Web forums, the covert nature of this part of the Internet makes traditional Web crawling techniques insufficient for capturing such content...
Chapter
Full-text available
Framing a collective identity is an essential process in a social movement. The identity defines the orientation of public actions to take and establishes an informal interaction network for circulating important information and material resources. While domestic social movements emphasize the coherence of identity in alliance, global or cyber-acti...
Article
Full-text available
Although text opinion mining in-volves many important tasks, accu-rately assigning sentiment polarities (such as positive, negative, or neutral) and intensities (such as high or low) remains a critical challenge. Given the complexities and nuances associated with opinion classi it is gen-erally considered more dif than traditional text mining tasks...
Article
Full-text available
As fake Website developers become more innovative, so too must the tools used to protect Internet users. A proposed system combines a support vector machine classifier and a rich feature set derived from Website text, linkage, and images to better detect fraudulent sites.

Network

Cited By