ICADL 2007 - LNCS 4822

An Efficient Dictionary Mechanism Based on Double-Byte

Lei Yang¹, Jian-Yun Shang¹, and Yan-Ping Zhao²

¹Dept. of Computer Science, Beijing Institute of Technology
jeffy2008@gmail.com
shangjia@bit.edu.cn

²School of Management and Economics, Beijing Institute of Technology, Beijing 100081, P.R. China
zhaoyp@bit.edu.cn

Abstract. Dictionary is an efficient management of large sets of distinct strings in memory. It has significant influence on Natural Language Process, Information Retrieval and other areas. In this paper, we propose an efficient dictionary mechanism, which is suitable for Double-Byte coding languages. Compared with other five popular dictionary mechanisms, this mechanism performs the best of all. It improves the search performance greatly and reduces the complexity of the construction and maintenance of the dictionary. It can be well applied in large-scale and real-time processing systems. Since Unicode is a typical double-byte code which can represents all kinds of characters in the world, this dictionary will be applicable for multi-language dictionaries.

Keywords: Dictionary, Double-Byte, Information Retrieve, multi-language

LNCS 4822, p. 420 ff.

Full article in PDF | BibTeX