Deep Web Content Mining

{"references": ["Bin He, Kevin chen-chuan chang; \"Automatic complex schema\nmatching across web query interfaces: A correlation mining\napproach\"; ACM Transactions on Databases Systems; Vol. 31; No.1;\nPages 1-45; March 2006.", "Michael K. Bergman; \"The Deep Web: Surfacing Hidden Value\";\nwww.BrightPlanet.com; Pages 1-5; 2001.", "Kevin chen-chuan chang; \"Toward Large Scale Integration: Building a\nMetaquerier over databases on the web\"; VLDB Journal; 2005.", "Zhen Zhang; \"Light-weight Domain-based Form Assistant: Querying\nweb databases on the fly \"; 31st VLDB Conference; Trondheim\nNorway; 2005.", "M. A. Hearst and J. O. Pederson; \"Reexamining the cluster hypothesis:\nScatter/gather on retrieval results\"; In Proceedings of SIGIR; Pages 76-\n84; 1996.", "O. Zamir and O. Etzioni; \"Web document clustering: a feasibility\ndemonstration\"; In Proceedings of SIGIR; 1998.", "Sh. Ajoudanian, M. Davarpanah Jazi, and M. Saraee; \"Discovering\nKnowledge from Deep Web Databases using Correlation Mining\nApproach\"; IDMC Conference; Iran; 2007.", "Bin He, Kevin chen-chuan chang; \"Statistical schema matching across\nweb query interfaces\"; In SIGMOD Conferences; 2003.", "E. Rahm, P. A. Bernstein;\"A survey of approaches to automatic schema\nmatching\"; VLDB Journal; no 10; Pages 334-350; 2001.\n[10] Agrawal R., Imielinski T., Swami A. N.; \"Mining association rules\nbetween sets of items in large databases\"; In SIGMOD Conference;\n1993.\n[11] Y-K Lee, W-Y Kim, Y. D. Cai; \"Efficient mining of correlated\npatterns\"; In SIGMOD Conference; 2003.\n[12] S. Brin, R. Motwani, C. Silverstein; \"Beyond market baskets:\ngeneralizing association rules to correlations\"; In SIGMOD\nConference; 1997."]}

The rapid expansion of the web is causing the constant growth of information, leading to several problems such as increased difficulty of extracting potentially useful knowledge. Web content mining confronts this problem gathering explicit information from different web sites for its access and knowledge discovery. Query interfaces of web databases share common building blocks. After extracting information with parsing approach, we use a new data mining algorithm to match a large number of schemas in databases at a time. Using this algorithm increases the speed of information matching. In addition, instead of simple 1:1 matching, they do complex (m:n) matching between query interfaces. In this paper we present a novel correlation mining algorithm that matches correlated attributes with smaller cost. This algorithm uses Jaccard measure to distinguish positive and negative correlated attributes. After that, system matches the user query with different query interfaces in special domain and finally chooses the nearest query interface with user query to answer to it.

Keywords

Content mining, information extraction., correlation mining, complex matching

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average