Information theory applied to feature selection of binary-coded infrared spectra for automated interpretation by retrieval of reference data

Publication date

1979-12-03

Authors

Dupuis, P.F.
Cleij, P.
Klooster, H.A. van 't
Dijkstra, Auke

Editors

Advisors

Supervisors

DOI

Document Type

Article
Open Access logo

License

Abstract

A method is described for feature selection from infrared spectra, intended for identification of organic compounds by computer-aided retrieval of reference data contained in small files. Complete discrimination of the binary-coded spectra is achieved by selecting a minimum number of spectral features; the information content is used as the selection criterion. The selection procedure is applied to five data sets (saturated and unsaturated hydrocarbons, alcohols, ethers and aldehydes/ketones) involving some 400 spectra. Each spectrum is uniquely coded by using about 10% of the 140 spectral features (binary-coded peak positions) available originally. For the intensity, a threshold of 50% appears to be applicable in some cases. For coding the frequency or wavelength parameter, wavenumbers (cm-1) are preferred to wavelengths (mm). The method takes into account the a priori probabilities of spectral features and their correlations. Results of a retrieval program for a few “unknown” spectra are given.

Keywords

Citation