DISCRIMINATION OF HEALTHY AND COLORECTAL CANCER PATIENTS USING FTIR AND PLS-DA

Spectroscopic methods have already been used as effective tools in several studies involving the detection of cancer. Fourier transform infrared spectroscopy (FTIR) has already been applied in the discrimination of cancer cells and tissues or blood of patients with the disease, observing that this technique requires the use of chemometric algorithms to obtain such results. The aim of this study was to employ a partial least squares discriminant analysis (PLS-DA) with FTIR data in the discrimination of plasma samples from patients with colorectal cancer (CRC) and healthy individuals of both genders. Multivariate analysis was performed using PLSDA of the sample triplicates (n=90) with different types of processing. The best PLS-DA condition was obtained using the 1st derivative, 1 orthogonal signal correction (OSC) and no pre-processing. With 4 latent variables (LV), the model presented a root mean square error of cross-validation (RMSECV) of 0.0004 and coefficient of determination (r2) of 1.0000. The accuracy, precision and sensitivity of the model were 100%. This work presented an innovative methodology in which the differentiation between healthy and primary CRC patients was done directly from the plasma using non-invasive, fast, simple and low-cost technologies. DISCRIMINAÇÃO DE PACIENTES HÍGIDOS E COM CÂNCER COLORRETAL UTILIZANDO ESPECTROSCOPIA NO INFRAVERMELHO COM TRANSFORMADA DE FOURIER E QUIMIOMETRIA PALAVRAS CHAVE: FTIR. Câncer colorretal. PLS-DA. Quimiometria.


INTRODUCTION
Cancer is one of the most frequent diseases of the 21 st century. Colorectal cancer (CRC), specifically, is the third cancer of higher incidence and the second in mortality (BRAY et al., 2018;SIEGEL et al., 2017). However, the techniques used in its diagnosis are not always sensitive enough to do so, especially in the early stages, and many of them are quite invasive (STRUM, 2016). Instead of conventional biopsy, the use of liquid biopsies for the diagnosis and follow-up of the disease has been suggested, since they are not invasive. Various body fluids can be used with this purpose, such as plasma, serum and urine (CREE, 2015;HEITZER;ULZ;GEIGL, 2015;WANG et al., 2017).
However, some of these biomarkers have not yet been validated for use in clinical practice due to great variability among individuals, low sensitivity and specificity of the tests, lack of standardization of the procedures and preanalytical conditions and the high cost (EL MESSAOUDI et al., 2013;HEITZER;ULZ;GEIGL, 2015; SCHWARZENBACH; HOON; PANTEL, 2011).
Fourier transform infrared (FTIR) spectroscopy has great potential as a health tool because it is relatively simple, low cost, noninvasive, nondestructive, reproducible and uses small amounts of sample with minimum preparation (MIKKONEN et al., 2016;MOVASAGHI;REHMAN;UR REHMAN, 2008;SIMSEK OZEK et al., 2016;XIANG et al., 2010). FTIR associated with chemometrics allows the analysis of discrete biochemical changes related to a pathological state. Several neoplasms have already been detected using FTIR and multivariate calibration methods (CHABER et al., 2018;GAJJAR et al., 2013;HANDS et al., 2016;OLD et al, 2014;OLLESCH et al., 2014;) and, for CRC, mostly using solid (SAHU; MORDECHAI, 2010) and eventually liquid biopsies (KHANMOHAMMADI et al., 2007). Such studies are supported by the use of chemometric algorithms of classification as Linear Discriminant Analysis (LDA) (KHANMOHAMMADI et al., 2007) and Soft Independent Modeling by Class Analogy (SIMCA) (KHANMOHAMMADI et al., 2009). On the other hand, the use of Discriminant Analysis with Multivariate Partial Least Squares Calibration (PLS-DA) has been shown to be a better performance tool for discrimination of CRC biopsies using Raman spectroscopy (BERGHOLT et al., 2015;BERGHOLT et al., 2016;LIU et al., 2016), but its applicability in the detection of patients with CRC in liquid samples using FTIR has not yet been evaluated. Liquid biopsies (blood, plasma) are easier to collect and analyze than solid biopsies and FTIR is often easier to implement in laboratory routines. In this sense, the aim of this study was to evaluate the use of the Attenuated Total Reflection (ATR-FTIR) technique in association with PLS-DA for discrimination of plasma samples from colorectal cancer patients and healthy individuals. CRC is a type of malignant neoplasm that affects the colon (the longest part of the large intestine, between the cecum and rectum) and the rectum (the final portion of the large intestine before the anus). Like other malignancies, the development of CRC is a multi-step process involving genetic mutations in intestinal mucosal cells, activation of oncogenes (tumor-promoting genes), and loss or mutation of tumor suppressor genes (BOYLE; LEON, 2002;VOGELSTEIN et al., 1988). Most cases of CRC are derived from benign adenomatous polyps, or adenomas, abnormal growths of intestinal wall epithelial cells (BOYLE; LEON, 2002;STRUM, 2016). Few risk factors of non-dietary origin have been established for this neoplasm (e.g. inflammatory bowel diseases), most of which are associated with the individual's lifestyle and diet (BOYLE; LEON, 2002;HUNCHAREK;MUSCAT;KUPELNICK, 2008;PARK et al., 2005).
CRC is the third cancer with the highest incidence and the second in mortality worldwide. More than 1.8 million new cases and 881,000 deaths were estimated for 2018, representing 1 in 10 cases and cancer deaths for both genres (BRAY et al., 2018). When CRC is treated in the early stages, the chance of a favorable outcome is substantially increased compared to the disease in more advanced stages. In this sense, preventive actions are of great importance for the improvement of disease outcomes as well as reducing the mortality. Routine laboratory tests, such as fecal occult blood, have shown good results in reducing mortality associated with this disease (BAILEY; AGGARWAL; IMPERIALE, 2016;BÉNARD et al., 2018;HARDCASTLE et al., 1996;MANDEL et al., 1993;VENTURA et al., 2014).
Among the tests available for the detection of the disease are colonoscopy, CT scan colonography, sigmoidoscopy, barium enema, fecal occult blood (guaiac-based and immunochemical), and DNA in feces.
However, these tests present some disadvantages, especially regarding cost and sensitivity, and some are considered to be quite invasive, such as colonoscopy (STRUM, 2016). Patients' acceptance of more invasive tests is an important factor to consider in the prevention of CRC. Patients' pain and discomfort before, during, and after procedures and need for sedatives and analgesics may decrease the acceptability of these exams by patients (SVENSSON et al., 2002;USSUI, 2010;USSUI et al., 2013).

LIQUID BIOPSIES AND TUMOR BIOMARKERS
Noninvasive tumor screening and diagnosis are a challenge in clinical practice. Tumors are very heterogeneous and the collected sample is not always representative enough. In addition, the biopsy makes it difficult to monitor tumor dynamics closely (CHAN et al., 2013;CREE, 2015;CROWLEY et al., 2013;HEITZER;ULZ;GEIGL, 2015). The concept of liquid biopsies has gained importance due to its potential in solving such "problems". Liquid biopsy refers to the use of body fluids, such as blood and urine, to diagnose and monitor cancers (or other conditions) through specific biomarkers (CROWLEY et al., 2013;WANG et al., 2017).
DNA-free fragments of tumor origin can be found in the plasma, serum or urine of an individual, and are known as circulating tumor DNA (ctDNA) (CRISTOFANILLI et al., 2004;WANG et al., 2017). The ctDNA derives from tumor masses that release these fragments due to cell death and signaling and bring information of great relevance about the tumor and its dynamics and genetic characteristics (AL-NEDAWI et al., 2008;CHENG;SU;Revista Jovens Pesquisadores ISSN 2237048X, DOI: 10.17058/rjp.v9i2.13372 QIAN, 2016TOLNAY, 2018;VALADI et al., 2007;ZHANG et al., 2019). Several studies demonstrate the potential of ctDNA as prognosis and diagnosis biomarkers (CHENG; SU; QIAN, 2016;COHEN et al., 2008;CRISTOFANILLI et al., 2004;DE BONO et al., 2008), including early stage cancers (ALIX-PANABIÈRES; PANTEL, 2016;BETTEGOWDA et al., 2014;DAWSON et al., 2013;RHIM et al., 2012). Specific tumor-related mutations can be identified directly in plasma or serum of a patient, such as KRAS (acts in cell signaling and proliferation) (THIERRY et al., 2014;WANG et al., 2004), TP53 (acts on cell cycle control and apoptosis) (ARMAGHANY et al., 2012;LANE;BENCHIMOL, 1990;VOGELSTEIN et al., 1988;WANG et al., 2004) and EGFR Other biomolecules have also shown potential as cancer biomarkers. Studies report changes in the free amino acid profile in plasma of cancer patients and among patients with different types of cancer (CASCINO et al., 1995;DEJONG et al., 2005;HEBER;BYERLY;CHLEBOWSKI, 1985;KUBOTA;MEGUID;HITCH, 1992;LAI et al., 2005;MIYAGI et al., 2011;NORTON et al., 1985;PROENZA et al., 2003), even in early stages of the disease It is important to emphasize that, because of tumor heterogeneity, biomarkers may vary during their progression and/or treatment (CHAN et al., 2013;CREE, 2015;CROWLEY et al., 2013). Thus, the analysis of biomarkers at different times can provide important information for disease follow-up, such as monitoring the patient's response to a certain treatment (HEITZER; ULZ; GEIGL, 2015;WANG et al., 2017). In order to effectively implement it into clinical practice, it will be necessary to develop rapid, sensitive and cost-effective methods (HEITZER; ULZ; GEIGL, 2015).

FTIR AS AN ALTERNATIVE FOR TECHNOLOGICAL INNOVATION IN HEALTHCARE
FTIR is a technique that allows the detection of biochemical changes, even if discrete, related to a pathological state, as it analyzes all the molecules present in the sample rapidly and simultaneously (SCOTT et al., 2010;SIMSEK OZEK et al., 2016;XIANG et al., 2010).
When a cell, tissue or biological fluid is traversed by an infrared radiation beam, an interaction between this radiation and the chemical bonds of the components of the biological sample occurs (SCOTT et al., 2010).
The intensities of the FTIR spectra can provide quantitative information, whereas the frequencies reveal qualitative characteristics about the nature of the sample (MOVASAGHI; REHMAN; UR REHMAN, 2008;ORPHANOU, 2015;SCOTT et al., 2010). The result of the interaction between the radiation and the sample produces a spectrum consisting of bands, which represent the vibrations of the chemical bonds of the compounds contained therein (MOVASAGHI; REHMAN; UR REHMAN, 2008;ORPHANOU, 2015). The FTIR spectrum is the sum of all these contributions, including changes in cells, tissues or fluids that occur in pathological processes.
Revista Jovens Pesquisadores ISSN 2237 048X, DOI: 10.17058/rjp.v9i2.13372 Consequently, the probability of two samples having the same spectrum is quite small, which makes it a molecular "fingerprint" of it. In addition, changes in a sample's "fingerprint" due to a pathological state make it possible to detect and follow the disease process (MOVASAGHI; REHMAN; UR REHMAN, 2008;ORPHANOU, 2015;SCOTT et al., 2010;XIANG et al., 2010).
Recently, FTIR has emerged as one of the main tools for biomedical applications and has made significant progress in the field of clinical evaluation because it is relatively simple, low cost, noninvasive, nondestructive, reproducible and uses small amounts of sample with minimal preparation (MIKKONEN et al., 2016;MOVASAGHI;REHMAN;UR REHMAN, 2008;SIMSEK OZEK et al., 2016;XIANG et al., 2010). Numerous correction (OSC) amount by leave-one-out cross-validation. The PLS-DA condition that presented the best results was selected to proceed with the correlation model and the number of latent variables (LV) was chosen considering the first LV with a root square error of cross-validation (RMSECV) lower than 1% (RMSECV ≤ 0.01).
The calibration set was defined based on the minimum LV number, according to ASTM E1655-05 (ASTM INTERNATIONAL, 2012). From the results of the best PLS-DA condition, the samples were separated into two classes or groups: control and CRC. They were then organized in crescent order of error and systematically separated with 1:1 ratio for the calibration and validation sets, taking care to include the highest and lowest error value (independent of the class) in the calibration set, as well as observing so that the calibration and validation sets presented a close number of samples of each class. The results for the correlation model were obtained from the tests for each sample of the validation set and evaluated through RMSECV, mean square error of prediction (RMSEP), coefficient of determination (R 2 ), sensitivity, accuracy and precision as previously described (BERGHOLT et al., 2015;BERGHOLT et al., 2016;LIU et al., 2016). Figures for PLS-DA / FTIR models were obtained using the Origin 7.0 software (OriginLab Corporation ® ).
Compared to the control group, the average spectrum of the CRC class showed increased absorption intensity in the regions of 3300-3250 cm -1 and 1300-1050 cm -1 , attributed to the vibrations of proteins and DNA, respectively. It can be assumed that the increase in the intensity observed in these regions is due to the increase of DNA in the circulation (cfDNA and ctDNA) due to cell death of cells of the tumor mass (LI et al., 2003;SCHWARZENBACH et al., 2008;SCHWARZENBACH;HOON;PANTEL, 2011) and to the increase in the concentration or variation of the amino acid and protein profile associated with cancers (GAUTAM et al., 2012;MIYAGI et al., 2011;REYNÉS et al., 2011). In order to verify whether these assumptions are true or not, it would be necessary to carry out specific analyzes, not included in the scope of this work, but which remain as a suggestion for future work. In total, 27 PLS-DA conditions were tested. The best PLS-DA was obtained when the 1st derivative, 1 OSC and none pre-processing per variable were used. With 4 LV, the model presented RMSECV=0.0004,
Correlation graphic (Figure 2, A) present the predicted values for each class. Residuals graphic (Figure 2, B) reveals a certain degree of systematic error in the prediction, mainly in the CRC class calibration set. This characteristic indicates that there would be need to include non-modeled variance in the first 4 LV and it is possibly related to the eventual inclusion of outlier samples. However, since the error was quite low, the predictive quality of the model is not impaired.
It was necessary to use OSC in the data set because, when applied, the predictive quality (measured by means of RMSECV and RMSEP) was good in all conditions (data not shown). Besides, the number of LV to describe the model was fairly low and the R 2 reached 1.0000, demonstrating how well the model can describe the observed results, as previously reported by Wold et al. (1998). This is due to the fact that OSC can remove noise that disturb the data and non-linear relationships between the spectral data and the attributed classes, which made possible to work with the full range of the spectrum in the model (ESTEBAN-DIEZ; GONZÁLEZ-SÁIZ; PIZARRO, 2004;SJÖBLOM et al., 1998;WOLD et al., 1998  DNA related (ARMAGHANY et al., 2012;CHENG;SU;QIAN, 2016;SCHWARZENBACH;HOON;PANTEL, 2011;THIERRY et al., 2014;VOGELSTEIN et al., 1988;WANG et al., 2017) and protein related (ABRAMSON, 1982;GAUTAM et al., 2012;MIYAGI et al., 2011;REYNÉS et al., 2011) alterations have been described for several types of cancers, including CRC. The concentration and the characteristics of the DNA presented in the circulation of patients with CRC are different from healthy ones (ARMAGHANY et al., 2012;BETTEGOWDA et al., 2014;CAO et al., 2018;PEREIRA et al., 2017;SCHWARZENBACH et al., 2008;SPINDLER et al., 2015;UMETANI et al., 2006). Likewise, the protein profile is also altered in patients with CRC (BI et al., 2006;MIYAGI et al., 2011;MOLINARI et al., 2009;STULÍK et al., 2001). Tomonaga et al. (2004)  The spectral regions with score greater than zero represent the infrared bands that contributed to class discrimination in the PLS-DA model.
Our work presented an innovative methodology, in which the differentiation between healthy and primary CRC patients was done directly from the plasma, in a non-invasive, fast, simple and low-cost manner, through a correlation model (PLS-DA) with spectra obtained by FTIR. The methodology shown herein could be easily included in routine laboratory tests, since it requires a very small amount of blood plasma with minimal or no sample preparation and does not use reagents or commercial kits. Nevertheless, the presented methodology has some limitations. The small number of samples used makes the model less representative, considering the heterogeneity of tumors and great individual variation (CRANLEY et al., 2013, CROWLEY et al., 2013, EL MESSAOUDI et al., 2016, HEITZER, ULZ, GEIGL, WANG et al., 2017. The model is not static, that is, changes made in it, such as adding new samples, will change the model response, even if discreetly. The model can only discriminate samples from CRC patients if they already have blood changes related to the disease. In addition, the response of the model to samples from patients with other cancers was not evaluated, limiting only to primary CRC. Despite these limitations, the methodology presented demonstrates the potential that FTIR associated with chemometrics has as complementary analysis for the already available CRC screening and diagnostic techniques, which could help improving cancer surveillance and early detection.

CONCLUDING REMARKS
The use of the PLS-DA regression with FTIR data showed high accuracy in the discrimination of plasma samples from patients with primary CRC and healthy individuals directly from the plasma, demonstrating the potential of the techniques as a diagnostic alternative in clinical practice.