Identification of threats using linguistics-based knowledge extraction.

Name: Identification of threats using linguistics-based knowledge extraction.
Creator: Chew, Peter A.
Keywords: Information Retrieval Military Intelligence, Computing, Linguistics, Military Intelligence, Applied Linguistics, Hypothesis, 16. Peace & justice, 99 General And Miscellaneous//Mathematics, Semantics, Computational Linguistics

Chew, Peter A.

Found an issue? Give us feedback

https://digital.libr...arrow_drop_down

https://digital.library.unt.ed...

Report

License: pd

Data sources: UnpayWall

https://doi.org/10.2172/940522...

Report . 2008

Data sources: Crossref

https://dx.doi.org/10.2172/940...

Other literature type

Data sources: Microsoft Academic Graph

University of North Texas: UNT Digital Library

Report . 2008

Data sources: Bielefeld Academic Search Engine (BASE)

Identification of threats using linguistics-based knowledge extraction.

descriptionPublicationkeyboard_double_arrow_right Report , Other literature type 01 Sep 2008 United States Publisher:Office of Scientific and Technical Information (OSTI)

Authors: Chew, Peter A.;

doi: 10.2172/940522

Identification of threats using linguistics-based knowledge extraction.

- Summary
- Subjects
- Metrics

Abstract

One of the challenges increasingly facing intelligence analysts, along with professionals in many other fields, is the vast amount of data which needs to be reviewed and converted into meaningful information, and ultimately into rational, wise decisions by policy makers. The advent of the world wide web (WWW) has magnified this challenge. A key hypothesis which has guided us is that threats come from ideas (or ideology), and ideas are almost always put into writing before the threats materialize. While in the past the 'writing' might have taken the form of pamphlets or books, today's medium of choice is the WWW, precisely because it is a decentralized, flexible, and low-cost method of reaching a wide audience. However, a factor which complicates matters for the analyst is that material published on the WWW may be in any of a large number of languages. In 'Identification of Threats Using Linguistics-Based Knowledge Extraction', we have sought to use Latent Semantic Analysis (LSA) and other similar text analysis techniques to map documents from the WWW, in whatever language they were originally written, to a common language-independent vector-based representation. This then opens up a number of possibilities. First, similar documents can be found across language boundaries. Secondly, a set of documents in multiple languages can be visualized in a graphical representation. These alone offer potentially useful tools and capabilities to the intelligence analyst whose knowledge of foreign languages may be limited. Finally, we can test the over-arching hypothesis--that ideology, and more specifically ideology which represents a threat, can be detected solely from the words which express the ideology--by using the vector-based representation of documents to predict additional features (such as the ideology) within a framework based on supervised learning. In this report, we present the results of a three-year project of the same name. We believe these results clearly demonstrate the general feasibility of an approach such as that outlined above. Nevertheless, there are obstacles which must still be overcome, relating primarily to how 'ideology' should be defined. We discuss these and point to possible solutions.

Country

United States

Related Organizations

University of North Texas
United States

Keywords

Information Retrieval Military Intelligence, Computing, Linguistics, Military Intelligence, Applied Linguistics, Hypothesis, 99 General And Miscellaneous//Mathematics, Semantics, Computational Linguistics, Detection, Sabotage, And Information Science, Computational Intelligence, Feasibility Studies, Learning, Standardized Terminology

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average

Beta

SDGs Suggest

16. Peace & justice

Beta

SDGs:

16. Peace & justice,

Related to Research communities

Knowmad Institut