Downloads provided by UsageCounts
{"references": ["Atlam, E., Fuketa, M., Morita, K., Aoe, J. (2003). Documents Similarity\nMeasurement using Field Association Terms, Information Processing &\nManagement, 39(6): 809-824.", "Atlam, E., Ghada, E., Morita, K., Fuketa, M., Aoe, J. (2006). Automatic\nbuilding of new field association word candidates using search engine,\nInformation Processing & Management, 42(4): 951-962.", "Atlam, E., Morita, K., Fuketa, M., Aoe, J. (2002). A new method for\nselecting English field association terms of compound words and its\nknowledge representation, Information Processing & Management, 38(6):\n807-821.", "Bennet N.A., He, Q., Powell K., Schatz, B.R. (1999). Extracting noun\nphrases for all of MEDLINE, In Proceedings of the AMIA Symposium,\npp. 671-5.", "Diab M., Kadri Hacioglu (2004), and Daniel Jurafsky. Automatic tagging\nof Arabic text: From raw text to base phrase chunks. In Proceedings of\nthe 5th Meeting of the North American Chapter of the Association for\nComputational Linguistics/Human Language Technologies Conference\n(HLTNAACL04), Boston, MA, 2004.", "Dorji, T., Atlam, E., Yata, S., Fuketa, M., Morita, K., Aoe, J. (2009)\nBuilding a Dynamic and Comprehensive Field Association Terms\nDictionary from Domain-specific Corpora using Linguistic Knowledge,\nIn Proceedings of the fifth Corpus Linguistics Conference, Liverpool,\nUK.", "Dozawa, T. (1999). Innovative multi information dictionary Imidas-99.\nAnnual Series. Japan: Zueisha Publication Co. (in Japanese).", "Drouin, P. (2004). Detection of domain specific terminology using\ncorpora comparison, In Proceedings of the 4th International conference on\nLanguage resources and evaluation (CLREC), pp. 79-82.", "Fuketa, M., Lee, S., Tsuji, T., Okada, M., Aoe, J. (2000). A Document\nClassification Method by using Field Association Words, International\nJournal of Information Sciences 126: 57-70.\n[10] Graham-Cumming, J. (2005) Naive Bayesian Text Classification: Fast,\naccurate, and easy to implement, Dr. Dobb's Journal,\nhttp://www.ddj.com/development-tools/184406064, (Accessed 3\nSeptember 2009).\n[11] Habash, Nizar and Owen Rambow (2005). Arabic Tokenization,\nMorphological Analysis, and Part-of-Speech Tagging in One Fell Swoop.\nIn Proceedings of the Conference of American Association for\nComputational Linguistics (ACL05)\n[12] Jiang, G., Sato, H., Endoh, A., Ogasawara, K., Sakurai, T. (2005).\nExtraction of Specific Nursing Terms Using Corpora Comparison, In\nProceedings of the AMIA Annual Symposium, 2005: 997.\n[13] Krauthammer, M., Nenadic, G. (2004). Term identification in the\nbiomedical literature, Journal of Biomedical Information, 37(6): 512-\n526.\n[14] Lan M., Tan C., Low H., Sung S. (2005). A comprehensive comparative\nstudy on term weighting schemes for text categorization with support\nvector machines. In Posters Proc. 14th International World Wide Web\nConference, pp. 1032-1033.\n[15] Lee, S., Shishibori, M., Sumitomo, T., Aoe, J. (2002). Extraction of\nField-coherent Passages, Information Processing & Management, 38(2):\n173-207.\n[16] Pang, S., Kasabov, N. (2009) Encoding and decoding the knowledge of\nassociation rules over SVM classification trees, Knowledge and\nInformation Systems, 19(1): 79-105.\n[17] Patry, A., Langlais, P., (2005) Corpus-based terminology extraction.\nProceedings of the 7th International Conference on Terminology and\nKnowledge Engineering, Copenhagen, Denmark, pp. 313-321.\n[18] Peng, T., Zuo, W., He, F. (2008) SVM based adaptive learning method\nfor text classification from positive and unlabeled documents, Knowledge\nand Information Systems, Springer London, 16(3): 281-301.\n[19] Rokaya, M., Atlam, E., Fuketa, M., Dorji, T., Aoe, J. (2008) Ranking of\nField Association Terms using co-word analysis, Information Processing\nand Management, 44(2): 738-755.\n[20] Salton, G., Allan, J., Buckley, C. (1993) Approaches to passage retrieval\nin full text information systems. Proceedings of the 16th annual\ninternational ACM/SIGIR conference on research and development in\ninformation retrieval, pp. 49-58.\n[21] Saneifar, H., Bonniol, S., Laurent, A., Poncelet, P., Roche, M. (2009)\nTerminology Extraction from Log Files, Database and Expert Systems\nApplications, Lecture Notes in Computer Science, 5690: 769 - 776.\n[22] Sharif, U. M., Ghada, E., Atlam, E., Fuketa, M., Morita, K., Aoe, J.\n(2007). Improvement of building field association term dictionary using\npassage retrieval, Information Processing and Management, 43(2): 1793-\n1807.\n[23] Shereen Khoja. 2001. APT: Arabic Part-of-speech Tagger., Proc. of the\nStudent Workshop at NAACL 2001Smadja, F. (1993) Retrieving\ncollocations form text: Xtract, Computational Linguistics, 19(1): 143-\n177.\n[24] Srinivasan, P., Pant, G., Menczer, F. (2005) A general evaluation\nframework for regional crawlers. Information Retrieval, 8(3):417-447.\n[25] Stanford TreeTagger - a Language-Independent Part-of-speech Tagger,\nhttp://nlp.stanford.edu/software/tagger.shtml (Downloaded 5 November\n2009)\n[26] Tsuji, T., Nigazawa, H., Okada, M., Aoe, J. (1999) Early Field\nRecognition by Using Field Association Words, In Proceedings of the\n18th International Conference on Computer Processing of Oriental\nLanguages, pp. 301-304.\n[27] Velardi, P., Navigli, R., D'Amadio, P. (2008) Mining the Web to Create\nSpecialized Glossaries, IEEE Intelligent Systems, 23(5): 18-25.\n[28] Wang, P., Hu, J., Zeng, H., Chen, Z. (2008) Using Wikipedia knowledge\nto improve text classification, Knowledge and Information Systems,\n19(3): 265-394.\n[29] Wikipedia Foundation, Inc., English Wikipedia Dumps,\nhttp://dumps.wikimedia.org/arwiki/ (Downloaded 5 November 2009)"]}
Field Association (FA) terms are a limited set of discriminating terms that give us the knowledge to identify document fields which are effective in document classification, similar file retrieval and passage retrieval. But the problem lies in the lack of an effective method to extract automatically relevant Arabic FA Terms to build a comprehensive dictionary. Moreover, all previous studies are based on FA terms in English and Japanese, and the extension of FA terms to other language such Arabic could be definitely strengthen further researches. This paper presents a new method to extract, Arabic FA Terms from domain-specific corpora using part-of-speech (POS) pattern rules and corpora comparison. Experimental evaluation is carried out for 14 different fields using 251 MB of domain-specific corpora obtained from Arabic Wikipedia dumps and Alhyah news selected average of 2,825 FA Terms (single and compound) per field. From the experimental results, recall and precision are 84% and 79% respectively. Therefore, this method selects higher number of relevant Arabic FA Terms at high precision and recall.
Arabic Field Association Terms, document classification, information extraction, information retrieval.
Arabic Field Association Terms, document classification, information extraction, information retrieval.
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
| views | 5 | |
| downloads | 5 |

Views provided by UsageCounts
Downloads provided by UsageCounts