
Sebastian Ventura- Ph.D.
- Professor (Full) at University of Córdoba
Sebastian Ventura
- Ph.D.
- Professor (Full) at University of Córdoba
About
436
Publications
317,328
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
22,917
Citations
Introduction
Sebastián Ventura is a Professor of Computing and Artificial Intelligence at the University of Córdoba. His teaching is devoted to machine learning and artificial intelligence. His research labor is developed as head and researcher of the "Knowledge Discovery and Intelligent Systems" (KDIS) research group, and it is focused on data science, data analytics, big data, machine learning, data mining, and its applications.
Current institution
Additional affiliations
April 2013 - January 2023
September 1998 - April 2016
Editor roles
Education
September 1993 - July 1996
Publications
Publications (436)
This paper was originally presented at the NATO Science and Technology Symposium (ICMCIS) organized by the Information Systems Technology (IST) Panel, IST-209-RSY-the ICMCIS, held in Oeiras, Portugal, 13-14 May 2025. This study examines two use cases involving the application of large language models (LLMs) in knowledge management (KM) and predicti...
In the field of anomaly detection in time series, remarkable advances based on deep learning methodologies and, more specifically, reconstruction-based methods have been proposed. These methods are particularly valuable, as they can capture the fundamental structure of the data and enable the detection of subtle anomalies that traditional technique...
Predictive Maintenance (PdM) emerges as a critical task of Industry 4.0, driving operational efficiency, minimizing downtime, and reducing maintenance costs. However, real-world industrial environments present unsolved challenges, especially in predicting simultaneous and correlated faults under evolving conditions. Traditional batch-based and deep...
Text classification plays a fundamental role in Natural Language Processing (NLP) and is essential for applications such as sentiment analysis, topic labeling, and language detection. While using graphs to represent text shows promise for capturing complex relationships between words and documents, current methods often fall short in encoding seman...
The maintenance advancements achieved in Industry 4.0 generate large amounts of data, necessitating complete, accurate, and precise labels for training datasets to align with corresponding ground truth. These labels serve as annotations for early anomaly detection. Delivering high‐quality annotations derived from weak labels and striking a balance...
Higher education institutions actively integrate information and communication technologies through learning management systems (LMS), which are crucial for online education. This study used data mining techniques to predict the autonomous scores of students in the online Law and Psychology programs at the Technical University of Manabi. The proces...
The class imbalance issue is evident in medical datasets, posing a hurdle for accurate predictive modeling. Maintaining the natural characteristics of medical data while mitigating this issue is paramount in ensuring the success of clinical decision support systems. Therefore, this study proposes a genetic algorithm-based data selection method (GA-...
Recommending suitable housing faces significant challenges due to the continuous increase in demand and the need to meet habitability standards. This document presents an innovative approach to address these challenges through a housing recommendation method based on distances to key spatial points and latent characteristics of the properties. The...
Hyperparameter optimization on Machine Learning models is crucial for their correct refinement. For complex big models (such as Deep Learning models), in which a single training model is supposed to have a very high computational cost, this optimization sometimes becomes unfeasible. Multi-fidelity optimization algorithms are a solution to alleviate...
Sentiment analysis on big data presents unique challenges due to the volume of unstructured data. Traditional single-node systems struggle with this scale, necessitating the use of distributed computing systems like Apache Spark. This study investigates the role of large-scale data preprocessing and feature extraction in sentiment analysis tasks. W...
Alterations in alternative splicing are emerging as a novel hallmark in cancer biology, offering new insights. However, integrative analyses of splicing are still scarce, particularly in rare cancers such as pancreatic neuroendocrine tumors (PanNETs). These tumors are highly heterogeneous, complicating diagnosis and treatment. This study is the fir...
Data stream learning is a very relevant paradigm because of the increasing real-world scenarios generating data at high velocities and in unbounded sequences. Stream learning aims at developing models that can process instances as they arrive, so models constantly adapt to new concepts and the temporal evolution in the stream. In multi-label data s...
Data stream learning is a very relevant paradigm because of the increasing real-world scenarios generating data at high velocities and in unbounded sequences. Stream learning aims at developing models that can process instances as they arrive, so models constantly adapt to new concepts and the temporal evolution in the stream. In multi-label data s...
Este estudio tiene como objetivo explorar las posibilidades de la supervisión
débil aplicadas al mantenimiento en el ámbito militar, implementando
las propuestas de la industria 4.0 y 5.0. Nos centramos en la
monitorización de variables, su etiquetado y el desarrollo de modelos
mediante técnicas de aprendizaje supervisado débil, de acuerdo con los...
Machine learning and medical diagnostic studies often struggle with the issue of class imbalance in medical datasets, complicating accurate disease prediction and undermining diagnostic tools. Despite ongoing research efforts, specific characteristics of medical data frequently remain overlooked. This article comprehensively reviews advances in add...
Entrepreneurial activity, a subject of enduring intrigue among scholars, continues to captivate attention, especially in distinct contexts such as Morocco. This study undertakes the formidable task of comprehending and forecasting entrepreneurial activity using the comprehensive Global Entrepreneurship Monitor (GEM) dataset for Morocco. Employing a...
A mediados del siglo XX, se compuso la Suite Illiac para cuarteto de cuerdas, la cual está considerada como la primera obra en la que se emplea una computadora durante el proceso de composición. Este hecho marcó un hito significativo en el ámbito de la tecnología como generadora de música mediante algoritmos, gracias a los trabajos pioneros realiza...
A mediados del siglo XX, se compuso la Suite Illiac para cuarteto de cuerdas, la cual está considerada como la primera obra en la que se emplea una computadora durante el proceso de composición. Este hecho marcó un hito significativo en el ámbito de la tecnología como generadora de música mediante algoritmos, gracias a los trabajos pioneros realiza...
Unemployment, a significant economic and social challenge, triggers repercussions that affect individual workers and companies, generating a national economic impact. Forecasting the unemployment rate becomes essential for policymakers, allowing them to make short-term estimates, assess economic health, and make informed monetary policy decisions....
In recent years, significant attention has been paid to fuzzy recommender systems for housing, highlighting their ability to effectively handle the imprecision and uncertainty inherent in the real estate market. With the objective of improving the filtering of recommendations in the real estate sector, the PRISMA 2020 methodology was applied to per...
Sequential pattern mining is a dynamic and thriving research field that aims to extract recurring sequences of events from complex datasets. Traditionally, focusing solely on the order of events often falls short of providing precise insights. Consequently, incorporating the temporal intervals between events has emerged as a vital necessity across...
Hyper-parameter tuning of machine learning models has become a crucial task in achieving optimal results in terms of performance. Several researchers have explored the optimisation task during the last decades to reach a state-of-the-art method. However, most of them focus on batch or offline learning, where data distributions do not change arbitra...
Background
Lung neuroendocrine neoplasms (LungNENs) comprise a heterogeneous group of tumors ranging from indolent lesions with good prognosis to highly aggressive cancers. Carcinoids are the rarest LungNENs, display low to intermediate malignancy and may be surgically managed, but show resistance to radiotherapy/chemotherapy in case of metastasis....
Predicting student dropout is a crucial task in online education. Traditionally, each educational entity (institution, university, faculty, department, etc.) creates and uses its own prediction model starting from its own data. However, that approach is not always feasible or advisable and may depend on the availability of data, local infrastructur...
El mantenimiento predictivo ha supuesto un importante hito en la forma en la que los sistemas industriales se analizan con el fin de detectar anomalías en el funcionamiento y posibles fallos antes de que éstos ocurran. En este trabajo se presenta una Herramienta de Sostenimiento Avanzado (HSA) del Ejército de Tierra que permite mejorar la planifica...
This paper introduces a spiking neural network able to learn multiple tasks using their unique characteristic, namely, that their behavior can be changed based on the modulation of the firing threshold of spiking neurons. We designed and tested a threshold-modulated spiking neural network (TM-SNN) to solve multiple classification tasks using the ap...
Super-resolution is an area of Computer Vision comprising various techniques to recover a high-resolution image from a low-resolution counterpart. These techniques can also be used to enhance a low-resolution input image without a native high-resolution original. Single Image Super-Resolution (SISR) techniques aim to do this in a picture-by-picture...
Clustering is an unsupervised learning task that groups objects in a multi-dimensional space based on similarity criteria. The goal is to make groups that contain objects that are similar to each other and different from other groups. This work proposes a novelty genetic algorithm to solve the clustering problem based on partitions and estimate aut...
Knowledge extraction through machine learning techniques has been successfully applied in a large number of application domains. However, apart from the required technical knowledge and background in the application domain, it usually involves a number of time-consuming and repetitive steps. Automated machine learning (AutoML) emerged in 2014 as an...
The use of back propagation through the time learning rule enabled the supervised training of deep spiking neural networks to process temporal neuromorphic data. However, their performance is still below non-spiking neural networks. Previous work pointed out that one of the main causes is the limited number of neuromorphic data currently available,...
Las instituciones de educación superior se enfrentan actualmente a varios retos frente a los sistemas de evaluación informatizados para conseguir llegar al conocimiento inmerso de textos no estructurado. La aplicación de análisis de sentimiento mediante aprendizaje automático favorece la exploración de textos no estructurado para la gestión educati...
La evaluación por pares puede ser útil en todos los niveles educativos existentes; una herramienta utilizada para este tipo de evaluación es la rúbrica, instrumento cuya principal finalidad es compartir los criterios de realización de las tareas de aprendizaje y de evaluación con los estudiantes y entre el profesorado. El propósito de esta investig...
The task of detection of common and unique characteristics among different cancer subtypes is an important focus of research that aims to improve personalized therapies. Unlike current approaches mainly based on predictive techniques, our study aims to improve the knowledge about the molecular mechanisms that descriptively led to cancer, thus not r...
Background
Lung neuroendocrine neoplasms (LungNENs) comprise a heterogeneous group of tumors ranging from indolent lesions with good prognosis to highly aggressive cancers. Carcinoids are the rarest LungNENs, display low to intermediate malignancy and may be surgically managed, but show resistance to radiotherapy/chemotherapy in case of metastasis....
Mining high utility itemsets is an emerging and very active research area in data mining. The goal is to mine all itemsets with a utility value, in terms of importance to the user, no less than a predefined threshold value. Setting an appropriate threshold value is not trivial, requiring not only multiple trials but also the know-how in the applica...
Resumen Existen numerosos problemas de clasicación de creciente actualidad en los que un patrón puede tener asignadas varias clases de for-ma simultánea. Este tipo de problemas, de-nominados problemas de clasicación multi-etiqueta, deben ser abordados con técnicas es-pecícas que generen modelos de clasicación más precisos que los obtenidos mediante...
Early melanoma diagnosis is the most important factor in the treatment of skin cancer and can effectively reduce mortality rates. Recently, Generative Adversarial Networks have been used to augment data, prevent overfitting and improve the diagnostic capacity of models. However, its application remains a challenging task due to the high levels of i...
Teacher evaluation is presented as an object of study of great interest, where multiple efforts converge to establish models from the association of heterogeneous data from academic actors, one of these is the students' community, who stands out for their contribution with rich data information for the establishment of teacher evaluation in higher...
Recently, Convolutional Neural Networks have achieved performance levels similar to those achieved by dermatologists. However, the diagnosis of melanoma remains a challenging task, mainly due to the high inter and intra-class variability in images of moles. This paper introduces a new framework to improve the state-of-the-art effective melanoma dia...
perdona que estoy en clase. Te lo mando:
In the airline industry, the Revenue and Pricing teams generally spend a considerable amount of time analysing and interpreting the actions of their competitors. Most of the time the analysts have to use their analytical skills to create ad-hoc methods to interpret or find patterns in the fares. In this fi...
Students’ performance prediction is one of the essential educational data mining research fields. Predicting students’ performance aims at improving the learning process inside educational institutions. This is achieved by early prediction of at-risk students who are vulnerable to drop out to help them and improve their performance sooner. Therefor...
En este trabajo, se consideran los parámetros obtenidos en los análisis de las muestras realizados por el Laboratorio Central de Ejército (LCE) cuyo fin es el de determinar la conformidad para el servicio de los aceites lubricantes y líquidos hidráulicos empleados en las plataformas del Ejército de Tierra. A partir de los que se realiza un estudio...
El mantenimiento de instalaciones industriales ha sido siempre una tarea crítica para garantizar el buen funcionamiento de los sistemas y su disponibilidad. Las estrategias de mantenimiento tradicionales han estado marcadas por enfoques correctivos y preventivos. Sin embargo, los últimos avances en sensorización y aprendizaje automático han impulsa...
In this paper we explore capabilities of spiking neural networks in solving multi-task classification problems using the approach of single-tasking of multiple tasks. We designed and implemented a multi-task spiking neural network (MT-SNN) that can learn two or more classification tasks while performing one task at a time. The task to perform is se...
The multi-label classification task has been widely used to solve problems where each of the instances may be related not only to one class but to many of them simultaneously. Many of these problems usually comprise a high number of labels in the output space, so learning a predictive model from such datasets may turn into a challenging task since...
To provide a good study plan is key to avoid students’ failure. Academic advising based on student’s preferences, complexity of the semester, or even background knowledge is usually considered to reduce the dropout rate. This article aims to provide a good course index to recommend courses to students based on the sequence of courses already taken...
Dysregulation of the splicing machinery is emerging as a hallmark in cancer due to its association with multiple dysfunctions in tumor cells. Inappropriate function of this machinery can generate tumor-driving splicing variants and trigger oncogenic actions. However, its role in pancreatic neuroendocrine tumors (PanNETs) is poorly defined. In this...
Predictive maintenance is a field of study whose main objective is to optimize the timing and type of maintenance to perform on various industrial systems. This aim involves maximizing the availability time of the monitored system and minimizing the number of resources used in maintenance. Predictive maintenance is currently undergoing a revolution...
En la primera parte de este trabajo, se exponen los resultados de una encuesta realizada entre empresas y autónomos andaluces sobre el conocimiento y el uso de esta tecnología en el ámbito empresarial. Este estudio ha revelado algunos resultados que deben ser considerados por los diferentes agentes e instituciones de cara a impulsar la adopción de...
Students’ engagements reflect their level of involvement in an ongoing learning process which can be estimated through their interactions with a computer-based learning or assessment system. A pre-requirement for stimulating student engagement lies in the capability to have an approximate representation model for comprehending students’ varied (dis...
Peer evaluation consists of the evaluation of students by their peers following criteria or rubrics provided by the teacher, where the way to evaluate students is specified so that they achieve the desired competencies. The quality of the measurement instrument must meet two essential criteria: validity and reliability. In this research, we explore...
Knowledge discovery is a complex process involving several phases. Some of them are repetitive and time-consuming, so they are susceptible of being automated. As an example, the large number of machine learning algorithms, together with their hyper-parameters, constitutes a vast search space to explore. In this vein, the term AutoML was coined to e...
Dysregulation of the splicing machinery is emerging as a hallmark in cancer due to its association with multiple dysfunctions in tumor cells. Inappropriate function of this machinery can generate tumor-driving splicing variants and trigger oncogenic actions. However, its role in pancreatic neuroendocrine tumors (PanNETs) is poorly defined. In this...
Applying data mining for improving the outcomes of the educational process has become one of the most significant areas of research. The most important corner stone in the educational process is students' performance. Therefore, early prediction of students' performance aims to assist at-risk students by providing appropriate and early support and...
In this paper, we applied a peer assessment scenario at the Technical University of Manabí (Ecuador). Students and professors evaluated some works through rubrics, assigned a numerical score, and provided textual feedback grounding why such a numerical score was determined, to detect inaccuracy between both assessments. The proposed model uses soft...
Background
Pancreatic ductal adenocarcinoma (PDAC) is a highly lethal cancer, requiring novel treatments to target both cancer cells and cancer stem cells (CSCs). Altered splicing is emerging as both a novel cancer hallmark and an attractive therapeutic target. The core splicing factor SF3B1 is heavily altered in cancer and can be inhibited by Plad...
Melanoma is one of the main causes of cancer-related deaths. The development of new computational methods as an important tool for assisting doctors can lead to early diagnosis and effectively reduce mortality. In this work, we propose a convolutional neural network architecture for melanoma diagnosis inspired by ensemble learning and genetic algor...
Skin cancer is one of the most common types of cancers in the world, being melanoma the most lethal form. Automatic melanoma diagnosis from skin images has recently gained attention within the machine learning community, due to the complexity involved. In the past few years, convolutional neural network models have been commonly used to approach th...
Resumen-La tarea de clustering o agrupamiento consiste en encontrar la mejor agrupación de patrones en función de un criterio de similitud o disimilitud entre ellos. De esta forma, se busca que los patrones dentro de un clúster sean muy similares entre ellos y disimilares de otros clústeres. Definir el criterio de similitud entre patrones resulta a...
Multi-label classification has been used to solve a wide range of problems where each example in the dataset may be related either to one class (as in traditional classification problems) or to several class labels at the same time. Many ensemble-based approaches have been proposed in the literature, aiming to improve the performance of traditional...
Background
Pancreatic ductal adenocarcinoma (PDAC) remains an appallingly lethal cancer, requiring novel treatments to target both cancer cells and cancer stem cells (CSCs). Altered splicing is emerging as a novel cancer hallmark and attractive therapeutic target. The core splicing factor SF3B1 is heavily altered in cancer and can be inhibited by P...
This paper presents an approach based on emerging pattern mining to analyse cancer through genomic data. Unlike existing approaches, mainly focused on predictive purposes, the proposed approach aims to improve the understanding of cancer in a descriptive way, not requiring either any prior knowledge or hypothesis to be validated. Additionally, it e...
The propositionalization process tries to find distinctive features of the examples in a database to transform such relational data into a simpler representation. More informative features have a positive impact on the classification capabilities of the learning algorithms. In this work, we propose a new propositionalization method, which generates...
In this paper we present a Competitive Rate-Based Algorithm (CRBA) that approximates operation of a Competitive Spiking Neural Network (CSNN). CRBA is based on modeling of the competition between neurons during a sample presentation, which can be reduced to ranking of the neurons based on a dot product operation and the use of a discrete Expectatio...
BACKGROUND:
The dataset from genes used for the prediction of HCV outcome was evaluated in a previous study by means of conventional statistical methodology.
OBJECTIVE:
The aim of this study was reanalyze this same dataset using the data mining approach in order to find models that improve the classification accuracy of the genes studied.
METHO...
Multi-Target Regression problem comprises the prediction of multiple continuous variables given a common set of
input features, unlike traditional regression tasks, where just one output target is available. There are two major challenges when addressing this problem, namely the exploration of the inter-target dependencies and the modeling of compl...
The current state of the art in supervised descriptive pattern mining is very good in automatically finding subsets of the dataset at hand that are exceptional in some sense. The most common form, subgroup discovery, generally finds subgroups where a single target variable has an unusual distribution. Exceptional model mining (EMM) typically finds...
Periodic frequent patterns are sets of events or items that periodically appear in a sequence of events or transactions. Many algorithms have been designed to identify periodic frequent patterns in data. However, most assume that the periodic behavior of a pattern does not change much over time. To address this limitation, this paper proposes to di...


















































































![Labeling Scheme adapted of [19] and Sample](profile/Maricela-Pinargote-Ortega/publication/377484925/figure/tbl2/AS:11431281218538470@1705625769855/Labeling-Scheme-adapted-of-19-and-Sample_Q320.jpg)







![Case CR=1\documentclass[12pt]{minimal} \usepackage{amsmath}...](publication/376719641/figure/fig5/AS:11431281240993017@1714874500899/Case-CR1documentclass12ptminimal-usepackageamsmath-usepackagewasysym_Q320.jpg)

































































![Analysis of the convergence of (ES)2\documentclass[12pt]{minimal}...](publication/360032261/figure/fig5/AS:1180826825953317@1658542462486/Analysis-of-the-convergence-of-ES2documentclass12ptminimal-usepackageamsmath_Q320.jpg)



























































































