Transformation to Achieve Perfect Correlation
DOI:
https://doi.org/10.64060/JASR.v1i3.4Keywords:
Correlation coefficient, Generalized inverse, Multiple linear regressions, Canonical regression, Normal distributionAbstract
Correlation and linear regression are common means to evaluate association and empirical relationships between two or more variables. Such relationships often show significant departure of |r_XY | from unity. Existing transformations to increase correlation fail to achieve perfect correlation. For a bivariate data, the paper proposes transforming Y to y=G.‖x‖‖y‖, which gives r_(X y)=1 where G is the G-inverse of the matrix A=x.x^Tand x, y denote vectors of deviation scores. The concept is extended to perfect linearity between a dependent variable (Y) and a set of independent variables (Multiple linear regressions) or between set of dependent variables and set of independent variables (Canonical regression), avoiding problems of insignificant beta coefficients in univariate and multivariate regression models and outliers. Empirical illustration of G-inverse and extensions for multiple linear regressions and Canonical regressions are also given. The proposed transformation is a novel method of introducing perfect correlation between two variables. Extension of the concept in multiple linear regressions and canonical regression will go a long way in empirical researches in various branches of science. Future studies may include finding distribution of the proposed perfect correlations and comparison of efficacy of our suggested approach against other traditional ones by providing quantitative evidences.
Downloads
References
Agresti A. (2002). Categorical data analysis (2nd ed). Hoboken, NJ: Wiley
Bignardi G., Dalmaijer E.S., Astle D.E. (2022): Testing the specificity of environmental risk factors for developmental out-comes. Child Dev. 93:e282–e298. doi: 10.1111/cdev.13719
Brooks, Thomas, Pope, D. and Marcolini, Michael. (2014): Airfoil Self-Noise. UCI Machine Learning Repository. https://doi.org/10.24432/C5VW2C.
Brossart, D. F., Parker, R. I., & Castillo, L. G. (2011). Robust regression for single-case data analysis: How can it help? Behavior Research Methods, 43(3), 710–719. https://doi.org/10.3758/s13428-011-0079-7
Box, G. E. P. and Cox, D. R. (1964): An analysis of transformations, Journal of the Royal Statistical Society, Series B, 26, 211-252.
Chakrabartty, Satyendra Nath (2023): Improving Linearity in Health Science Investigations. Health Sci J. Vol. 17 No. 4: 1010. DOI: 10.36648/1791-809X.17.4.1010
Chakrabartty, S. N., Kangrui, Wang and Chakrabarty, Dalia (2024): Reliable Uncertainties of Tests & Surveys - a Data-driven Ap-proach. International Journal of Metrology and Quality Engineering (IJMQE).15, 4, 1 – 14. https://doi.org/10.1051/ijmqe/2023018
Cox DR.(1972). Regression models and life-tables (with discussion). J R STAT SOC ; B. 34:187-220. doi: http://dx.doi.org/10.2307/2985181
Erceg-Hurn, D. M., & Mirosevich, V. M. (2008). Modern robust statistical methods: An easy way to maximize the accuracy and power of your research. American Psychologist, 63(7), 591–601. https://doi.org/10.1037/0003-066X.63.7.591
Feng, Ge, Peng, Jing, TU, Dongke, Zheng, Julia Z. and Feng, Changyong (2016). Two Paradoxes in Linear Regression Analysis. Shanghai Archives of Psychiatry, Vol. 28, No. 6, 355 – 360. https://doi.org/10.11919/j.issn.1002-0829.216084
Field, A. P., & Wilcox, R. R. (2017). Robust statistical methods: A primer for clinical psychology and experimental psychopatholo-gy researchers. Behaviour Research and Therapy, 98(Supp. C), 19–38. https://doi.org/10.1016/j.brat.2017.05.013
Fox, S. and Hammond, S. (2017). Investigating the multivariate relationship between impulsivity and psychopathy using canonical correlation analysis. Personality and Individual Differences, 111, 187-192. doi:10.1016/j.paid.2017.02.025
Gavurova B., Rigelsky M., Ivankova V. (2020): Perceived health status and economic growth in terms of gender-oriented inequalities in the OECD countries. Economics and Sociology, 13:245–257. doi: 10.14254/2071-789X.2020/13-2/16.
Hand, D. J. ( 1996): Statistics and the Theory of Measurement, J. R. Statist. Soc. A; 159, Part 3, 445-492
Jamieson, S. (2004): Likert scales: How to (ab) use them. Medical Education, 38, 1212 -1218
Kim, Y., Kim, T.-H., & Ergun, T. (2015). The instability of the Pearson correlation coefficient in the presence of coincidental outli-ers. Finance Research Letters, 13, 243–257. https://doi.org/10.1016/j.frl.2014.12.005
Kovacevic, M. (2011): Review of HDI Critiques and Potential Improvements, The Human Development Research Paper (HDRP) Se-ries, Research Paper 2010/33.
Liu Y, Ruan J, Wan C, Tan J, Wu B, Zhao Z. (2022): Canonical correlation analysis of factors that influence quality of life among patients with chronic obstructive pulmonary disease based on QLICD-COPD (V2.0). BMJ Open Respir Res. 9(1):e001192. doi: 10.1136/bmjresp-2021-001192.
Loco, J.V; Elskens, M., Croux, C. and Beernaert, H. (2002). Linearity of calibration curves: use and misuse of the correlation coeffi-cient. Accreditation and Quality Assurance (7):281–285. DOI 10.1007/s00769-002-0487-6
Malakar B., Roy S.K., Pal B. (2022): Relationship between physical strength measurements and anthropometric variables: Multivari-ate analysis. J. Public Health Dev. 20:132–145. doi: 10.55131/jphd/2022/200111
Mardia, K.V. and Bibby, J.M. and Kent, J.T. (1982): Multivariate analysis, Academic Press
Niven, E. B., & Deutsch, C. V. (2012). Calculating a robust correlation coefficient and quantifying its uncertainty. Computers & Geosciences, 40, 1–9. https://doi.org/10.1016/j.cageo.2011.06.021
Parkin D, Rice N, Devlin N.(2010): Statistical analysis of EQ-5D profiles: does the use value sets bias inferences? Med Decis Making 30(5): 556–565. DOI: 10.1177/0272989X09357473
Rao, C. Radhakrishna and Mitra, Sujit Kumar (1971). Generalized Inverse of Matrices and its Applications. New York: John Wiley & Sons. ISBN 978-0-471-70821-6
Rousseeuw, P. J., & van Zomeren, B. C. (1990). Unmasking multivariate outliers and leverage points. Journal of the American Sta-tistical Association, 85(411), 633–639. https://doi.org/10.1080/01621459.1990.10474920
Song-Gui Wang & Shein-Chung Chow (1987): Some results on canonical correlations and measures of multivariate associa-tion. Communications in Statistics - Theory and Methods, 16:2, 339-351, DOI: 10.1080/03610928708829370
Stefano, Claudio; Fontanella, Francesco; Maniaci, Marilena and Freca, Alessandra (2018). Avila. UCI Machine Learning Repository. https://doi.org/10.24432/C5K02X
Vasylieva T, Gavurova B, Dotsenko T, Bilan S, Strzelec M, Khouri S. (2023): The Behavioral and Social Dimension of the Public Health System of European Countries: Descriptive, Canonical, and Factor Analysis. Int J Environ Res Public Health. 20(5):4419. doi: 10.3390/ijerph20054419.
Wessa P. (2012): Box-Cox Linearity Plot (v1.0.5) in Free Statistics Software (v1.1.23-r7), Office for Research Development and Ed-ucation. http://www.wessa.net/rwasp_boxcoxlin.wasp/
Wilcox, R. R. (2023). Robust Correlation Coefficients That Deal With Bad Leverage Points. Methodology, Vol. 19(4), 348–364. https://doi.org/10.5964/meth.11045
Wilcox, R. R. (2022). Introduction to robust estimation and hypothesis testing (5th ed.). Academic Press.
Yellowlees, A., Bursa, F., Fleetwood, K. J., Charlton, S., Hirst, K. J., Sun, R., & Fusco, P. C. (2016). The appropriateness of ro-bust regression in addressing outliers in an anthrax vaccine potency test. Bioscience, 66(1), 63–72. https://doi.org/10.1093/biosci/biv159
Downloads
Published
Issue
Section
License
Copyright (c) 2025 SCOPUA Journal of Applied Statistical Research

This work is licensed under a Creative Commons Attribution 4.0 International License.























