Science topic
Web Mining - Science topic
Explore the latest questions and answers in Web Mining, and find Web Mining experts.
Questions related to Web Mining
会议征稿:2024年智能计算与数据挖掘国际学术会议 (ICDM 2024)
Call for papers: 2024 International Conference on Intelligent Computing and Data Mining (ICDM 2024) will be held on September 20-22, 2024 in Chaozhou, China.
重要信息
大会官网(投稿网址):https://ais.cn/u/AFBBfq
大会时间:2024年9月20-22日
大会地点:中国-潮州
收录检索:EI Compendex,Scopus
智能计算与数据挖掘是当今信息技术领域的研究热点,并在众多领域都有着广泛的应用,如金融、医疗、教育、交通等。随着大数据时代数据量爆炸式增长,如何从海量数据中提取有价值的信息,一直是需要迭代解决的问题。2024年智能计算与数据挖掘国际学术会议(ICDM 2024)为探讨相关问题提供一个平台,各位专家学者将深入探讨最新研究成果,通过对数据的分析和处理,提供智能化的决策支持,讨论在面对复杂问题时,如何运用数据驱动的方法,通过分析数据背后的规律和关联,找到问题的本质和解决方案,欢迎广大学者踊跃参会交流。
会议征稿主题
智能计算:遗传算法、进化计算与学习、群智能与优化、独立成分分析、自然计算、量子计算、神经网络、模糊理论与算法、普适计算、机器学习、深度学习、自然语言处理、智能控制与自动化、智能数据融合、智能数据分析与预测等。
数据挖掘:网络挖掘、数据流挖掘、并行和分布式算法、图和子图挖掘、大规模数据挖掘方法、文本、视频和多媒体数据挖掘、可扩展数据预处理、高性能数据挖掘算法、数据安全和隐私、电子商务的数据挖掘系统等。
*其他相关主题亦可
论文投稿
ICDM 2024所征稿件会经由2-3位组委会专家审稿,最终所录用的论文将以IEEE出版,收录进IEEE Xplore数据库,见刊后由期刊社提交至EI Compendex和Scopus检索。
参会须知
ICDM 2024的参会设有口头演讲/海报展示/听众三种形式,可点击以下链接报名参会,在会后领取参会证书:https://ais.cn/u/AFBBfq
1、口头演讲:申请口头报告,时间为10-15分钟左右
2、海报展示:制作A1尺寸彩色海报,线上/线下展示
3、听众参会:不投稿仅参会,可与现场嘉宾/学者进行交流互动
4、汇报PPT和海报,请于会议前一周提交至大会邮箱 (icicdm@163.com)
5、论文录用后可享一名作者免费参会名额

I have a number of ongoing researches on adaptive web mining techniques and online social network analysis with applications.
Collaboration with funding support for presentation of research outputs in top conferences, workshops and international journals is highly solicited.
Please you can contact me via temitayo.fagbola@fuoye.edu.ng
Thank you
Dear all,
Do you know any available data set for text summarization-with text summaries?
We have a dataset collected from multiple users and would like to measure levels of similarities and distance between users to build users profiles. Currently we are using some common approches for clustering such as k-means, Hierarchical clustering, GMM but would like to hear from other active researchers if there are other useful techniques that we haven't thought of.
Can anyone suggest an open-source, real-time data set for applying fuzzy clustering?
I am interested in finding out the frequency of updates in several websites from a central point instead of scanning every page and link for dates or contacting the owner. I have tried Wayback Machine but I'm not sure crawling information is the same as updates.
Because the paper Tweet Segmentation and its Application to
Named Entity Recognition it does not tell how the meaningful phrases are splitted
We have witnessed the power of a regular search engine like Google. There is a semantic search engine like Swoogle as well. However, we are trying to build a semantic search engine with more user friendly display capability and relevant ranking algorithm. Can anybody suggest ideas?
I have movielens dataset containing ratings of 1682 movie by 973 users. i want to make a movie Recommendation system. How to do this Project with MATLAB or Python.
1) I need to extract the movie genre from dbpedia with Sparql, can anyone provide me with links or materials on this code?
2) I want a user- feature matrix with genre as data points (18) and users as observations (6040). I need the procedure to get this done. Relevant links and documents will be appreciated.
Thks
Is there any API available for collecting the Facebook data-sets to implement Sentiment analysis.
I'm brand new to social network analysis. I'm trying to identify meme creators in twitter. Is there a way to do this using data downloaded from twitter?
I am looking for multiple documents that were drawn from the same domain. I would like to aggregate information from multple documents for summarization.
I have Engineering data where I need to classify Event vs NonEvent based on operational parameters. My Event class data size is about 1% and NonEvent class data size is 99%. I read an article about oversampling and undersampling. But in my case, these methods doesn't work. In my case Event is highly related to non-Event because it is sensor data which get captured very frequently.
How can I classify Event vs Non-Event in imbalanced class classification problem?
I am doing a research in tweets and hashtags analysis related to influenza predictions and i need to have a historical dataset from twitter during influenza period (December to march).
I am asking if anybody have an idea about how to get this data?
Kindly give the links of e-learning weblog data sets
I want a dataset for types of hackers based on there behaviors on websites.
Or I want to build a dataset for hackers but I don't have any ideas how to build dataset.
Hi all,
I want to use big data clustering algorithms in my PhD work but i don't know which topic is appropriate to apply big data clustering on it , I mean what is the good application of the big data clustering algorithms
if you can help me i will be grateful to you
I want to fetch online news from different news sources from today to one month back. How i can download those news? Is there any news API available in Python to download for Hindi News such as AajTak, Dainik Jagran, Dainik Bhaskar etc.
result is obtained only by typing the keywords in the search box
I am building an analysis tool and would like to see how it behaves with real world data. Also another traces like car GPS, or even checking data, like when you use a bus card to pay for a ride might help.
I am already looking forward to use twitter data, collected from the geolocated tweets.
Other than
KNN
Dictionary based
Motif based
At this point, our research does not wish to perform blind text mining; however, we may wish to provide some indication of the type of text content in which we are interested.
The content are from existing Technology Manuals, Blogs both written by paid technology writers and external content. We intend to build a FAQ corpus out of these. Thanks.
Hello,
I'm looking for a dataset with (obviously) features and genre tags for every song. I already have the subset retrievable from the official website (http://labrosa.ee.columbia.edu/millionsong/lastfm), but it seems the entire set went missing and it's hard to locate a source.
The Google group dedicated to the dataset is inactive and close.
I wonder if someone is working on the same data or can point me to a similar dataset.
Many thanks in advance.
Hi
I know the quiet (not-updated) "A Comparison of Open Source Search Engines" by Christian Middleton, Ricardo Baeza-Yates. It does not contain all newer open source code libraries
Is there library faster than Lucene in Information Retrieval at the moment?
Also, what is the whole capability of Lucene package about term-weighting scheme?
Thanks
Osman
Hi all,
I am trying to find patterns within a web site visitors. Though I could extract all data I'd want, I see the convenience of working with sample data.
First step is setting the date range; having in consideration that a web site is a dynamical environment, it may be misleading to take a wide period.
So, for this data type, what date range would be appropiate? (I normally take 1 up to 3 months). Once time frame is selected, what sampling methods should I use to ensure sample representativeness?
Many thanks!
My thesis is about Analysis and Auto generation of FAQ lists in different domains. For conducting experiments, I need high volume of FAQs. That's the reason I am looking for a publicly available data-set containing FAQs in various domain (or even one specific domain).
I'd like to mine web pages that'd result in a dataset of pages taken from a particular website (eg. news sites). It'd target articles not only from one section but also from the other sections on the site (for instance, politics, tech and etc. from CNN.com). All of these articles are combined and retrieved from the 3 years publication and that means I'd have all of the articles published in the 3 years time. What are the tools and techniques that I can opt to do?
A tool which can accept tamil documents for classification and other processing steps for mining
what is the different between the Rand index by this function
(Rand <- function(clust1, clust2) clv.Rand(std.ext(clust1, clust2)))
and Rand index by cluster.stats in the package of fpc because when I apply on my cluster I get different result which is right?
I'm looking for suggestions of researches that analyze big data volumes with the aim of discovery association rules or standards that enhance and further qualify the teaching and learning process to the student.
I am looking for a classification method to build a binary classifier for web documents: i.e. a classifier that predicts whether the document belong to domain of interest or not. A domain here is a broad category e.g. science. I am wondering if there is any work in neural network community to do this efficiently with training data ~10K labeled webpages with labels 0/1.
Simple "language model" based approach hasn't been proved useful till now. Would a NN based model make more sense for this task?
I am doing my research about ambiguity and desambiguation in science of information in representation system and sociocognitive and new infocomunication paradigms. do you have any advice, please?
Meta objects: - Likes, dislikes, no.of comments, etc
Visual objects: - person, places etc
Someone can help me to find some information or some papers about how Facebook uses big data, including Hadoop ?
I try to find, but can't find anything..Someone help me..
I am using Tesla K20. I got an error that the shared memory is limited to 16K although K20 supports up to 48K. How to configure the GPU and NVCC compiler to use 48K shared memory instead of 16K?
I am trying to reproduce experimental results from
[1] G. Guo, G. Mu, Y. Fu, and T. S. Huang, “Human age estimation using bio-inspired features,” 2009 IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. Work. CVPR Work. 2009, pp. 112–119, 2009.
They use the yamaha Gender and Age (YGA) database as well as the FG-NET aging database and it seems like all of the links to their supposed location are down.
Link for FG-net aging database should be the following
http://www.fgnet.rsunit.com/ (DOWN)
and for the YGA database I can not even find a reference
Do you know other database used for Age estimation ? which would include age and gender labeling.
Thank you very much for your time
Pascal
Students prediction,Educational data mining.
I am in search of a good standard dataset relevant for Music Recommendation systems which should consist of music with places and tags.
I want to download random tweets from Twitter for specific time period (of two years 2011-2013). I have tried using statuses/sample API, but couldn't specify the time period.
I am doing Research in e Learning field based on Tweets. For this purpose, I need huge collection of tweets E.g. 20,000.
Is there any twitter API for searching domain specific results such as Computer Science etc. ?
Thanks in advance.
At the moment, I was able to find these papers:
1. Prototype a Knowledge Discovery Infrastructure by Implementing Relational Grid Monitoring Architecture (R-GMA) on European Data Grid (EDG) by Frank Wang, Na Helian, Yike Guo, Steve Thompson, John Gordon.
2. Knowledge grid-based problem-solving platform by Lu Zhen, Zuhua Jiang,Jun Liang.
Thank you in advance for any help.
I am looking for old web news, blogs and forums between 1995 and 2007. Any suggestions please?
Hello all, I am working on project. I want to download twitter data. By using twitter API, I am able to download only 3 tweets. Is there a way to download at least 1000 tweets?
I used GS with a function on image processing that calculates the symmetry of two images. The number of function execution increases dramatically when the size of the image is doubled. Does anyone have an explanation?
rank aggregation algorithm etc. for recommendation process.
How can I say that a particular tweet is rumor. I don't want to use any supervised knowledge to identify rumors
In symbolic logic, you can translate proper sentences to logical constructs: propositional or first order. This is very difficult to do in tweets because it is not written in proper english, at least not for the most part. So, I am designing an algorithm to convert tweets to logical constructs. If you have some ideas or would like to collaborate, let me know. Thanks!
I want to know about real case study of privacy threat cause of association rule mining (Distributed or centralized database).
Actually I have search so many paper on web mining with fuzzy logic but I am not able to find out most recent one. Can anyone please tell me how I can get most recent papers?
Many years ago I read a paper on a hardware implementation of an information retrieval system. It was implemented as a circuit board, where the query would be set by putting jumpers on one side of the board and the result would be indicated by LEDs or the equivalent on another side of the board. The math behind it was very insightful, and I'd love to find it again, but I've been unable to. The paper was written (probably well) before 1975, perhaps even in the 1950's. I vaguely remember that the primary author's name began with an S but that's as far as I've gotten. (I'm not thinking of Vannevar Bush's Memex.)
Can anyone help?
I would like to know what free online text mining tools I can use for user profile?
Can mining of user profile be applied online?
Dear respected scientists and colleagues,
I am looking for a climate change corpus to do text analysis on it. If you have one or you know a journal that I can download their abstracts I would be very much obliged.
Thanks,
-Dr.Hamed
We are working on the incremental timetabling problem. We found a timetable that is satisfies hard constraints and optimizes soft ones. After accepting the timetable, new constraints appear.
Is there a solution that suggests minimum change to have the new table with the incremental constraints without destroying the old table in order to minimize the disturbance of the stakeholders?
I am doing my research in web usage mining. I can't get the extract data sets. The World Cup '98 log files are present in Internet Traffic Archive, but I don't know the file format for this file to open it.
During any association mining process it is a big challenge to remove uninteresting rules. We are interested in effective formal and experimental method for finding interestingness of the multilevel rules.
I am interested in using machine learning to recognize social interaction patterns such as disagreements, and potentially use those patterns to generate new simulated interactions. I've been working with crowd sourced descriptions of social interactions, but these are more narrative and less action driven.
Are you aware of publicly available datasets of annotated social interactions?
Types of data that might be good candidates are annotated movie scripts or forum threads. Skeletal/gesture data could also be interesting.
I want to know the proper use of sentiwordnet by using wordnet.
Big Data and Data Science have continued to emerge among practitioners and researchers. But the foundation of these concepts involve large volumes and a variety of data created at high velocity. Hence, the focus have generally been on bigger organisations that generate such data. However, small and medium sized organisations are also active adopter of ICT. Can Big Data and Data Science benefit small and medium enterprises as well and how?
I need to extract information from distinct template html web pages.
I am working on web log mining processes. I need a tool which performs pre-processing (data cleaning, user identification, sesion identification) of server log file.
In pagerank algorithm, is it necessary that a page be connected directly with every other page? When using damping factor?
Currently I am working on web log mining techniques. I want to choose clustering techniques but clustering has been used extensively in web log mining.
How to generate document term matrix from a different type of web page? Is there any source available that provides csv files for document term matrix generation?
I want to implement pageranks and various improved pagerank algorithms on graph data but I am unable to find a simulator or real implementation of a pagerank algorithm.
Many web pages aim to publish latest news and using feeds or other related technologies spread summaries of news. So once you have the summary, title and the link to the main web, the next step is retrieval the associated text, knowing that the web page has non relevant information in it, such as banner, rating of news, advertising etc. What are best tools for achieve this goal of extracting the associated text of a news, having title, summary and web link?
I am doing a project involving Google search engine. However, I do not know how
to export the results from Google and then store them to a text file or a database?
I want to arrange documents for automatic e-learning which suggest new topics to students.
many authors use perplexity/entropy to validate their model but I'm not fully satisfied with this. Again some author use topic coherence (Pointwise Mutual information). Can anyone suggest most accurate method to test topic model?
To analyze customer behaviour and customer segmentation in telecommunication
Using the user profiling, click stream analysis, opinion mining and sentiment aalysis
Learner's extent of courseware exploration (quantitatively and qualitatively) need to be determined in this research for adaptiveness and administrative actions.
Defining the correct query for a crawler is important before launching the crawler. Being able to iteratively test and refine the query on a historical twitter corpus will improve the process.
E-HealthCare based on Semantic Web Technologies
What would be the most suitable machine learning approach to classify a web site? Yes, we can apply text mining concepts on web content but the problem is that the content is not only in a single language, it also has some images and video content too. The other thing is that web content is more semi-structured, unlike the simple text content of any document.
New research topic in iit or iiit going on......
In our world some regions say complex systems are a bad thing , I noticed that some other world regions recommend complex systems as good thing.
Or is it only based on content?
The purpose is to provide relevance-ranked topic keywords that represent the content of a document or web page.
Is there anyone who works on SNA? Do you have any recommended tools for SNA?
I'll start some research related to SNA, especially for twitter and facebook, but its hard for me to develop some tools to gain data from twitter and facebook, basically I just want to get the "text based data" from both. And I'll use text mining to analyze all that I need.
Could you please give me an advice, what kind of tools that I can use to get data from twitter and facebook?
For effective query formulation based on users web log surveys...
Knowledge Representation (KR) it's a good method to extract relevant information or document to the users' needs. We need this technique in the e-learning application.
I'm a Master student in the Faculty of Science.However,I can't find doctors who work in the field of Data Mining. At first, I thought to do my research on "Recommendation Systems", but since it depends on "Web Text Mining" and "Text Parsing", I thought it would be better to start by the last two. I have done previous researches on both topics. The research attached mirrors my field of interest.I would be glad if you advised me about the best place I should join: DMCM or elsewhere.
This the website of DMCM















































































































































