Deep Web Search Interface Identification: A Semi-Supervised Ensemble Approach

Other literature type, Article English OPEN
Wang, Hong; Xu, Qingsong; Zhou, Lifeng; (2014)
  • Publisher: Multidisciplinary Digital Publishing Institute
  • Journal: Information (issn: 2078-2489)
  • Related identifiers: doi: 10.3390/info5040634
  • Subject: ensemble learning | Information technology | T58.5-58.64 | semi-supervised learning | search interface identification | Deep Web mining
    acm: ComputingMethodologies_PATTERNRECOGNITION

To surface the Deep Web, one crucial task is to predict whether a given web page has a search interface (searchable HyperText Markup Language (HTML) form) or not. Previous studies have focused on supervised classification with labeled examples. However, labeled data are... View more
  • References (41)
    41 references, page 1 of 5

    1. Bergman, M.K. White Paper: The deep web: Surfacing hidden value. J. Electron. Publ. 2001, 7, doi:10.3998/3336451.0007.104.

    2. Cope, J.; Craswell, N.; Hawking, D. Automated Discovery of Search Interfaces on the Web. In Proceedings of the 14th Australasian Database Conference (ADC2003), Adelaide, Australia, 4-7 February 2003; pp. 181-189.

    3. Madhavan, J.; Ko, D.; Kot, L.; Ganapathy, V.; Rasmussen, A.; Halevy, A. Google's Deep Web crawl. Proc. VLDB Endow. 2008, 1, 1241-1252.

    4. Khare, R.; An, Y.; Song, I.Y. Understanding deep web search interfaces: A survey. ACM SIGMOD Rec. 2010, 39, 33-40.

    5. Hedley, Y.L.; Younas, M.; James, A.; Sanderson, M. Sampling, information extraction and summarisation of hidden web databases. Data Knowl. Eng. 2006, 59, 213-230.

    6. Noor, U.; Rashid, Z.; Rauf, A. TODWEB: Training-Less ontology based deep web source classification. In Proceedings of the 13th International Conference on Information Integration and Web-based Applications and Services, Bali, Indonesia, 5-7 December 2011; ACM: New York, NY, USA, 2011; pp. 190-197.

    7. Balakrishnan, R.; Kambhampati, S. Factal: Integrating deep web based on trust and relevance. In Proceedings of the 20th international conference companion on World wide web, Hyderabad, India, 28 March-1 April 2011; ACM: New York, NY, USA, 2011; pp. 181-184.

    8. Palmieri Lage, J.; da Silva, A.; Golgher, P.; Laender, A. Automatic generation of agents for collecting hidden web pages for data extraction. Data Knowl. Eng. 2004, 49, 177-196.

    9. Chang, K.; He, B.; Li, C.; Patel, M.; Zhang, Z. Structured databases on the web: Observations and implications. ACM SIGMOD Rec. 2004, 33, 61-70.

    10. Ye, Y.; Li, H.; Deng, X.; Huang, J. Feature weighting random forest for detection of hidden web search interfaces. Comput. Linguist. Chin. Lang. Process. 2009, 13, 387-404.

  • Related Research Results (1)
  • Metrics
    No metrics available
Share - Bookmark