
Nung Kion LeeUniversity Malaysia Sarawak · Faculty of Cognitive Sciences and Human Development
Nung Kion Lee
PhD Comptuer Science
About
56
Publications
17,560
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
294
Citations
Citations since 2017
Introduction
I am an Associate Professor at the Faculty of Cognitive Sciences and Human Development, Universiti Malaysia Sarawak. I am interested in developing Machine Learning algorithm and employ it in solving interesting and challenging real-world problems. Information about my research lab can be found at https://sites.google.com/site/leeailab/.
Additional affiliations
June 2019 - present
Universiti Malaysia Sarawak
Position
- Professor (Associate)
Description
- Associate Professor in Artificial Intelligence
May 2012 - May 2019
Universiti Malaysia Sarawak
Position
- Professor (Associate)
Education
September 2007 - November 2012
Publications
Publications (56)
The Human Immunodeficiency Virus (HIV) infection is a global pandemic that has claimed 33 million lives to-date. One of the most efficacious treatments for naïve or pretreated HIV patients is the HIV integrase strand transfer inhibitors (INSTIs). However, given that HIV treatment is life-long, the emergence of HIV strains resistant to INSTIs is an...
Automated Essay Scoring (AES) is a service or software that can predictively grade essay based on a pre-trained computational model. It has gained a lot of research interest in educational institutions as it expedites the process and reduces the effort of human raters in grading the essays as close to humans’ decisions. Despite the strong appeal, i...
The Human Immunodeficiency Virus (HIV) infection is a global pandemic that has claimed 33 million lives to date. One of the most efficacious treatment for naive or pre-treated HIV patients is with the HIV integrase strand transfer inhibitors (INSTIs). However, given that HIV treatment is life-long, the emergence of HIV-1 strains resistant to INSTIs...
The Human Immunodeficiency Virus (HIV) infection is a global pandemic that has claimed 33 million lives to date. One of the most efficacious treatment for naïve or pre-treated HIV patients is with the HIV integrase strand transfer inhibitors (INSTIs). However, given that HIV treatment is lifelong, the emergence of HIV-1 strains resistant to INSTIs...
Aims: Aquaculture has grown tremendously in Malaysia over the past decades. However, guaranteeing aquaculture sustainability is a big challenge in terms of maintaining continuous output with a safe environment. Furthermore, the cultured species should be free from antibiotic resistance bacterial and antibiotic residue. This study aimed to monitor t...
Enhancers are indispensable DNA elements responsible for elevation of gene transcriptional efficiency that regulates biological processes tightly at various developmental stages, linking them to numerous genetic diseases. Discovering the enhancer landscape of the genome will not only benefit mankind, but also aid in conservation researches involvin...
Enhancers are indispensable elements in various developmental stages, orchestrating numerous biological processes via the elevation of gene expression with the aid of transcription factors. Enhancer variations have been linked to various onset of genetic diseases, highlighting their equal importance as the coding regions in the genome. Despite the...
This paper presents the development of an automated essay scoring mechanism based on the Malaysian University English Test essay marking criteria using the Design-based research (DBR). It is a learning intervention to facilitate students in their essay writing process and at the same time, serves as a tool for teachers to mark essay. DBR is the mos...
MORTIS: A MORTuary registry Information System
Poh KJ, Norliza I, Tung Mun Yei1, Bong C.H. 1, Lee N.K2, & M. Hamdi M3
Forensic Department, Sarawak General Hospital (SGH), Jalan Hospital, 93586 Kuching Sarawak.
1Universiti Malaysia Sarawak (UNIMAS), Faculty of Computer Science Information Technology (FCSIT).
2Universiti Malaysia Sarawak (UNIMAS), F...
Computational DNA motif discovery is important because it allows for speedy and cost effective analysis of sequences enriched with DNA motifs, performs large scale comparative studies, and tests hypotheses on biological problems. In this work, we provide a comprehensive survey on DNA motif discovery using genetic algorithm (GA). According to the wa...
The race for the discovery of enhancers at a genome-wide scale has been on since the commencement of next generation sequencing decades after the discovery of the first enhancer, SV40. A few enhancer-predicting features such as chromatin feature, histone modifications and sequence feature had been implemented with varying success rates. However, to...
We propose an improved solution to the three-stage DNA motif prediction approach. The three-stage approach uses only a subset of input sequences for initial motif prediction, and the initial motifs obtained are employed for site detection in the remaining input subset of non-overlaps. The currently available solution is not robust because motifs ob...
The aim of this study is to develop and assess a mobile-based learning concerning the Picture Exchange Communication (PECS) for Autism Spectrum Disorder (ASD) children's caregivers. Being an inexpensive intervention, the PECS has been proposed by other parents who have practised it on their children with non-verbal and behavioral disabilities. The...
Job recruitment portals become the main recruitment channel in most of the organizations nowadays because they offer many advantages to recruiters and job applicants. An outstanding recruitment system should be able to filter and recommend the best potential candidates for a job vacancy so that it can avoid hiring of inappropriate individuals or mi...
Convolutionary neural networks (CNN) has been widely used for DNA motif discovery due to its high accuracy. To employ CNN for DNA motif discovery task, the input DNA sequences are required to be encoded as numerical values and represented as either vector or multi-dimensional matrix. This paper evaluates the simple and more compact ordinal encoding...
The medical information system plays an important role for either the flow of business transaction or record medication histories. In the Department of Nuclear Medicine of Sarawak General Hospital, the officers and doctors handle the patient information manually during registration and diagnosis process. Unfortunately, this manual documented system...
Genome annotation is an essential task for understanding and analyzing the whole genome and its function. We have sequenced the complete proboscis Monkey (Nasalis larvatus) genome due to its important for medical and evolutionary studies. We have performed an initial annotation of the genes genome using the MAKER gene annotation pipeline. 3084 gene...
In our previous work we proposed ENSPART-an ensemble method for DNA motif discovery which partitions input dataset into several equal size subsets runs by several distinct tools for candidate motif prediction. The candidate motifs obtained from different data subsets are merged to obtain the final motifs. Nevertheless, the original ENSPART has seve...
Unravelling gene expression has become a critical procedure in bioinformatics world today and required continuous efforts to form a complete picture of enhancers. Enhancers are explicit patterns of gene expression that bound by activators to stimulate transcription. It could reside in upstream or downstream thousands of base pairs away without any...
Epigenetic marks like chromatin remodelers and histone marks are eminent indicator of enhancers' activity. K-mer is a simple representation of DNA sequences that has been useful for epigenetic marks prediction. While many studies have been utilizing k-mer as feature of epigenetic marks prediction, no comprehensive studies have been done to show sop...
Organizational culture defines an organization's uniqueness and identity. It is made up of values, beliefs, attitudes, norms, and patterns of behavior that are shared and adopted by individuals in the organization to cope with internal and external pressure. Computerized culture audit system is more cost efficient, time saving and is less prone to...
Organizational culture expresses an organization's distinctiveness and identity. It is made up of values, beliefs, attitudes, norms and patterns of behavior that are shared and implemented by individuals in the organization. Understanding an organization's culture enables us to know why organizations do what they do and what they need to achieve. A...
Sequence logo is a well-accepted scientific method to visualize the conservation characteristics of biological sequence motifs. Previous studies found that using sequence logo graphical representation for scientific evidence reports or arguments could seriously cause biases and misinterpretation by users. This study investigates on the visual attri...
This study proposes two modified frameworks of Bacterial Foraging Optimization Algorithm (BFOA) for data classification. The aim is to optimize and improve the k-nearest neighbor classifier performance by manipulating the global search capability of BFOA. To the best of our knowledge, no work has been done to fully utilize BFOA as a single classifi...
Recent advances in information and communication technology (ICT)
infrastructure can be harnessed to support and improve the quality of
teaching and learning of English writing skills especially for second
language context where rule based support is necessary. Essay writing
is indeed the most demanding tasks to both teachers and students. From
con...
Policy implementation science has moved from its traditional top-down ‘’how-to’’ approach that
focuses primarily on the policy implementers while neglecting those affected by the
implementation, to the hybrid ‘’what-and-how-to’’ approach, where the policy implementation
mechanics incorporate the understanding of the ‘’meaning’’ and ‘’benefits’’ of...
Epigenetic signatures such as chromatin and histone modification marks are prominent indicator of enhancer motif regions. While many works have been using k-mer as feature of epigenetic sequence, no comprehensive studies has been done to compare and contrast how the different choices of k-mers feature parameter affect machine learning algorithm per...
Assessing essays and providing feedback to learners is undoubtedly a daunting, time consuming task for language teachers
especially for formative assessment. Formative assessment requires feedbacks that indicate learning gaps that inform
ideas for further improvement. Although providing high quality feedback is important, teachers are often in dile...
This paper presents a GA-based method to generate novel logical-based features, represented by parse trees, from DNA sequences enriched with H3K4me1 histone signatures. Current methods which mostly utilize k-mers content features are not able to represent the possible complex interaction of various DNA segments in H3K4me1 regions. We hypothesize th...
Sequence logo is a popular graphical method for displaying the conservation characteristics of a
sequence motif profile. Studies have found that decision making biases and misused of sequence
logo occur due to the differences in individual perception, knowledge and experiences. The aim of
this study is to identify the differences between the novice...
Using Genetic Algorithm, this paper presents a modelling method to generate novel logical-based features from DNA sequences enriched with H3K4mel histone signatures. Current histone signature is mostly represented using k-mers content features incapable of representing all the possible complex interactions of various DNA segments. The main contribu...
Noisy objects have been known to affect negatively on the performance of clustering algorithms. This paper addresses the problem of high false positive rates in using self-organizing map (SOM) for DNA motif prediction due to the noisy background sequences in the input dataset. We propose the use of sequence filter in the pre-processing step to remo...
Sequence motif's characteristics are commonly visualized by using a sequence logo. This paper describes a user study aimed at evaluating the effectiveness of sequence logo as evaluation metric for motif prediction tools. We also investigate the nature of confirmation biases in using sequence logos in result reporting in publications. While sequence...
Sequence logo is an important tool to visualize DNA sequence motifs obtained from transcriptional analysis. In this paper, we aim to investigate its design features that could affect users' performance. We focus our analysis on the learnability of the sequence logo for performing motif evaluation tasks. In addition, its effectiveness as a motif eva...
Sequence Logo is a visualization method for displaying conservations characteristics of a sequence (DNA, RNA, proteins) motif profile obtained from either wet-lab or computational analysis. Usage of visualization in decision making carries some elements of subjectivity. In addition, people's decisions are often biased in favor of their proposed hyp...
Discrimination of transcription factor binding sites (TFBS) from background sequences plays a key role in computational motif discovery. Current clustering based algorithms employ homogeneous model for problem solving, which assumes that motifs and background signals can be equivalently characterized. This assumption has some limitations because bo...
We present a clustering algorithm called Self-organizing Map Neural Network with mixed signals discrimination (SOMIX), to discover binding sites in a set of regulatory regions. Our framework integrates a novel
intra-node soft competitive procedure in each node model to achieve maximum discrimination of motif from background signals.
The intra-node...
Understanding human's information processing for information retrieval task has contributed to the advancements of database modelling and user interaction design. In particular, the query writing model proposed by Ogden has improved our understanding on core steps in writing a relational database query. However, this model is limited in explaining...
Identification of motifs in DNA sequences using classification techniques is one of computational approaches to discovering novel binding sites. In the previous work [16], we proposed a simple and effective method for motif detection using a single crisp rule governed by a mismatch-based matrix similarity score (MISCORE). In this paper, we consider...
Discovery of motifs plays a key role in understanding gene regulation in organisms. Existing tools for motif discovery demonstrate some weaknesses in dealing with reliability and scalability. Therefore, development of advanced algorithms for resolving this problem will be useful. This paper aims to develop data mining techniques for discovering mot...
To detect or discover motifs in DNA sequences, two important concepts related to existing computational approaches are motif
model and similarity score. One of motif models, represented by a position frequency matrix (PFM), has been widely employed
to search for putative motifs. Detection and discovery of motifs can be done by comparing kmers with...
This paper presents an overview on the application of neural networks (NN) in bioinformatics, specifically in the classification of protein family/superfamily. Protein classification is important for both biological data analysis and knowledge discovery. NN has been one of the most widely used methods for protein classification. In this paper, deta...
Traditionally, two protein sequences are classified into the same class if their feature patterns have high homology. These feature patterns were originally extracted by sequence alignment algorithms, which measure similarity between an unseen protein sequence and identified protein sequences. Neural network approaches, while reasonably accurate at...
Neural classifiers have been widely used in many application areas. This paper describes generalized neural classifier based on the radial basis function network. The contributions of this work are: i) improvement on the standard radial basis function network architecture, ii) proposed a new cost function for classification, iii) hidden units featu...
In this paper, we would like to share our experiences in building a web database application using an object-relational paradigm. The system we built is basically an online system for casual tutors to claim their work for payment. At the design stage, we use an object-oriented design. Since the database backend is a relational database management s...
A protein super-family consists of proteins which share amino acid sequence homology and which may therefore be functionally and structurally related. Traditionally, two protein sequences are classified into the same class if they have high homology in terms of feature patterns extracted through sequence alignment algorithms. As the sizes of the pr...
A protein super-family consists of proteins which share amino acid sequence homology and which may therefore be functionally
and structurally related. One of the benefits from this category grouping is that some hint of function may be deduced for
individual members from information on other members of the family. Traditionally, two protein sequenc...
One of the collection joins types in Object Oriented Database (OODB) is collection equi-join. The main feature of collection joins is that they involve collection types. In this paper we present our experience in implementing collection equi-join algorithms by using Message Passing Interface (MPI). In particular, it layouts the fundamental techniqu...
Questions
Questions (2)
I have been teaching ANN subject for a number of years, mainly using traditional face-to-face lecture, blended learning using moodle, and have computer lab sessions using scikit-learn.
I would like to seek for your sharing or idea how to make the course fun and prepare the students for IR4.0 skills (communication, collaboration, creativity, critical thinking etc). I have watched many online courses (coursera, udemy, udacity, MIT opencourseware etc), and mostly are based on traditional approaches.
By fun, it will probably refers to the teaching learning methods or activities in class that can engage students and highly interactive. Would appreciate your sharing if you have experiences of teaching ANN or AI related courses.
Sequence logos have been one of the most widely used methods for the visualization of proteins or DNA motifs in the past 20 years. It has been used in many ways mainly to visually inspect the characteristics of predicted/discovered motifs and evaluate motif discovery tools (i.e, empirical study). I found it is very misleading to use sequence logos to compare computational motif prediction tools because it is a very subjective affair to say that a logo for example "resembles", "is close in appearance to", or "like" the known or annotated ones. My own small scale study found that there is a great mismatch between the "appearance" of the sequence logo and the actual performance of a tool. However, I am surprise this visualization method is still a widely accepted method for computational motif discovery tool prediction. My question is: why? What is your opinion of using this evaluation method?










































































































![Fig. 1. Protein classification processes (Adapted from [5])](profile/Nung-Kion-Lee/publication/319552445/figure/fig1/AS:631675604201483@1527614608363/Protein-classification-processes-Adapted-from-5_Q320.jpg)


















