<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=undefined&type=result"></script>');
-->
</script>

COPY SCRIPT

For further information contact us at helpdesk@openaire.eu

Compressed Context Modeling for Text Compression

Name: Compressed Context Modeling for Text Compression
Creator: Kulekci, M. Oguzhan
Keywords: 0508 media and communications, 4. Education, 05 social sciences, 0202 electrical engineering, electronic engineering, information engineering, 02 engineering and technology

descriptionPublicationkeyboard_double_arrow_right Article , Other literature type 01 Mar 2011Publisher:IEEEJournal:2011 Data Compression Conference

Authors: Kulekci, M. Oguzhan;

doi: 10.1109/dcc.2011.44

Compressed Context Modeling for Text Compression

- Summary
- Metrics

Abstract

In text compression, statistical context modeling aims to construct a model to calculate the probability distribution of a character based upon its context. The order -- $k$ context of a symbol is defined as the string formed by its preceding $k$ symbols. This study introduces compressed context modeling, which defines the order -- $k$ context of a character as the sequence of $k$-bits composed of the entropy compressed representations of its preceding characters. While computing the compressed context of a symbol at some position in a given text, enough number of characters are involved in the compressed context so as to produce $k$-bits of information. Thus, instead of certain number of characters, certain amount of \emph{information} is considered as the context of a character, and this property enables the prediction of each character to be performed with nearly uniform amount of information. Experiments are conducted to compare the proposed modeling against the classical fixed-length context definitions. The files in the large Calgary corpus are modeled with the newly introduced compressed context modeling and with the classical fixed-length context modeling. It is observed that on the average the statistical model with the proposed method uses $13.76$ percent less space measured according to the number of distinct contexts, while providing $5.88$ percent gain in empirical entropy measured by the information content as bits per character.

Related Organizations

Scientific and Technological Research Council of Turkey
Turkey

Impact byBIP!

	citations This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average