Improving information retrieval through correspondence analysis instead of latent semantic analysis

Qi, Qianqian; Hessen, Dave; Van der Heijden, P.G.M.

doi:https://doi.org/10.1007/s10844-023-00815-y

Improving information retrieval through correspondence analysis instead of latent semantic analysis

Files

s10844-023-00815-y.pdf (690.34 KB)

Publication date

2024

Authors

Qi, Qianqian

Hessen, David J.

van der Heijden, P.G.M.

DOI

https://doi.org/10.1007/s10844-023-00815-y

Document Type

Article

Metadata

Show full item record

Collections

Utrecht University Repository

License

cc_by

Abstract

The initial dimensions extracted by latent semantic analysis (LSA) of a document-term matrix have been shown to mainly display marginal effects, which are irrelevant for information retrieval. To improve the performance of LSA, usually the elements of the raw document-term matrix are weighted and the weighting exponent of singular values can be adjusted. An alternative information retrieval technique that ignores the marginal effects is correspondence analysis (CA). In this paper, the information retrieval performance of LSA and CA is empirically compared. Moreover, it is explored whether the two weightings also improve the performance of CA. The results for four empirical datasets show that CA always performs better than LSA. Weighting the elements of the raw data matrix can improve CA; however, it is data dependent and the improvement is small. Adjusting the singular value weighting exponent often improves the performance of CA; however, the extent of the improvement depends on the dataset and the number of dimensions.

Keywords

Information retrieval, Initial dimensions, Singular value decomposition, Singular value weighting exponent, Software, Artificial Intelligence, Information Systems, Hardware and Architecture, Computer Networks and Communications

Citation

Qi, Q, Hessen, D & Van der Heijden, P G M 2024, 'Improving information retrieval through correspondence analysis instead of latent semantic analysis', Journal of Intelligent Information Systems, vol. 62, no. 1, pp. 209–230. https://doi.org/10.1007/s10844-023-00815-y

URI

http://hdl.handle.net/1874/436820

Improving information retrieval through correspondence analysis instead of latent semantic analysis

Files

Publication date

Authors

Editors

Advisors

Supervisors

DOI

Document Type

Metadata

Collections

License

Abstract

Keywords

Citation

URI