Is there any data set available containing FAQs in different domains?
My thesis is about Analysis and Auto generation of FAQ lists in different domains. For conducting experiments, I need high volume of FAQs. That's the reason I am looking for a publicly available data-set containing FAQs in various domain (or even one specific domain).


All Answers (7)
Hello Fatemeh Razzaghi,
You can create a crawler just like googlebot to search and crawl all the FAQ's across the internet at one go , this way it will facilitate your automated search for FAQ's over the internet.
This link might be useful with crawler stuff :
https://commoncrawl.org/
http://support.import.io/
Hope this information helps you out.
Hi,
This could be interesting for you: https://aws.amazon.com/datasets/41740.
It's a generic 541 TB big public dataset on web crawl data. You may need to develop or use tools (suggestion: hadoop) that can target content related to the specific topic you're interested in.
Hope this is somewhat useful for you!
Ciao
Look at the FIRE (Forum for Information Retrieval and Evaluation) conference websites. They had FAQ Retrieval challenge for many years. I hope you can find the link for FAQs at their websites or you contact the organizers to provide the same.
They have FAQ dataset for several domains such as Agriculture, banking, etc.
You may also have a look at the following:
The usenet FAQ archives http://www.allanswers.org/
Game FAQs http://www.gamefaqs.com/
Internet FAQs http://www.faqs.org/faqs/
You can also have a look at sites like:
linuxquestions.org
ubuntu forums
ubutu community documentation
so you would be able to know more about the ontology of FAQs