Abstract: This thesis describes analysis of user web query behavior associated with Oak Ridge National Laboratory's (ORNL) Enterprise Search System (Hereafter, ORNL Intranet). The ORNL Intranet provides users a means to search all kinds of data stores for relevant business and research information using a single query. The Global Intranet Trends for 2010 Report suggests the biggest current obstacle for corporate intranets is "findability and Siloed content". Intranets differ from internets in the way they create, control, and share content which can make it often difficult and sometimes impossible for users to find information. Stenmark (2006) first noted studies of corporate internal search behavior is lacking and so appealed for more published research on the subject. This study employs mature scientific internet web query transaction log analysis (TLA) to examine how corporate intranet users at ORNL search for information. The focus of the study is to better understand general search behaviors and to identify unique trends associated with query composition and vocabulary. The results are compared to published Intranet studies. A literature review suggests only a handful of intranet based web search studies exist and each focus largely on a single aspect of intranet search. This implies that the ORNL study is the first to comprehensively analyze a corporate intranet user web query corpus, providing results to the public. This study analyzes over 65,000 user queries submitted to the ORNL intranet from September 17, 2007 through December 31, 2007. A granular relational data model first introduced by Wang, Berry, and Yang (2003) for Web query analysis was adopted and modified for data mining and analysis of the ORNL query corpus. The ORNL query corpus is characterized using Zipf Distributions, descriptive word statistics, and Mutual Information. User search vocabulary is analyzed using frequency distribution and probability statistics.
The results showed that ORNL users searched for unique types of information. ORNL users are uncertain of how to best formulate queries and don't use search interface tools to narrow search scope. Special domain language comprised 38% of the queries. The average results returned per query for ORNL were too high and no hits occurred 16.34%.