Why is the Least Square Error Method Dangerous?

vol.25 número1

Deep Learning for Sentiment Analysis of Tunisian Dialect

Lexical Patterns Based on Maximal Frequent Secuences for Automatic Keyphrase Extraction

índice de autores

índice de materia

búsqueda de artículos

Home Page

lista alfabética de revistas

Servicios Personalizados

Revista

Articulo

Indicadores

Citado por SciELO
Accesos

Links relacionados

Similares en SciELO

Compartir

Otros
Otros

Permalink

Computación y Sistemas

versión On-line ISSN 2007-9737versión impresa ISSN 1405-5546

Comp. y Sist. vol.25 no.1 Ciudad de México ene./mar. 2021 Epub 13-Sep-2021

https://doi.org/10.13053/cys-25-1-3473

Articles

Why is the Least Square Error Method Dangerous?

Vaclav Skala¹^*

Edward Kansa²

¹University of West Bohemia, Faculty of Applied Sciences, Czech Republic, skala@kiv.zcu.cz

²Convergent Solutions, USA, edwardjkansa@gmail.com

This contribution briefly describes some ”dangerous” features of the Least Square Error (LSE) methods, which are not generally known, but often used in applications, and researchers are not aware of those. The LSE is usually used in approximations of acquired data to find ”the best fit” of the data, especially in financial economics and related fields. However, the LSE method is not invariant to some standard basic operations used within a solution of a linear system of equations.

Keywords: Least square error; system of linear equations; numerical mathematics; over determined system; invariant operations

1 Introduction

The Least Square Error (LSE) is usually used for finding ”the best fit” of measured data, which leads to a solution of an over-determined system of linear equations Ax=b. The LSE method is very often used in financially oriented applications using linear and non-linear regressions. In some specific cases, the total least square method is to be used, mostly related to the implicit representation [¹, ⁶, ⁹, ¹⁰]. However, the LSE method’s result depends on the physical units of the data domain used in the polynomial regression case.

1.1 Linear System of Equations

In the case of the linear system of equations Ax=b, when the matrix A(n××n) is non-singular, there are several standard methods for solving a linear system of equations [⁴].

However, solution of the linear system of equations Ax=b and Ax=0 is equivalent to the outer product (extended cross product)[⁸], and the modified Gauss elimination method can be used without division operation [⁷]. Some operations are used quite frequently, especially in connection with preconditioning or in a solution of the linear system, e.g. a row multiplication, a row swap, etc.

PAD D−1x=Pb, (1)

where P and D are non-singular matrices (n××n). A simple preconditioning method for a large system of equations uses diagonal matrices P and D [¹⁵]. Multiplication of the i-th row of the extended matrix [A||b]] by pi≠≠0 is invariant to the linear system’s solution. The multiplication of the j-th column of the matrix A by dj≠≠0 represents the unit change of the xj, see Eq. 1.

2 Over-Determined Systems

In the case of the over-determined linear system, the matrix A is (n××m), n>>m, the vector b is (m××1), the LSE is usually used to obtain an approximate solution. However, in many cases, users are not aware of the LSE properties [¹⁷]. It is well known, that a result of the LSE approximation depends on physical units used, if polynomial regression is used, e.g. in the estimation of processing time, etc. Let us consider a regression function φφ(t):

φ(t)=a0+a1t+a2tlog⁡t+a3t2+… (2)

If the time unit [s]] is used, the results are different from the case, when the unit [ms]] is used. Also, the element a0, which represents a value for t=0, causes some problems; detected also in interpolation and approximation using Radial Basis Function (RBF) [², ⁵, ¹³, ¹⁴].

In the case of the linear regression, the LSE method is usually applied directly to the data set using pseudo-inverse as follows:

ATAx=ATb i⋅e., x=(ATA)−1ATb. (3)

Let us consider the LSE formulation as in Eq. 1, but modified for an over-determined system of linear equations. Then the LSE use leads to:

(PAD)TPAD D−1x=(PAD)TPb. (4)

where P(n××n) and D(m××m) are non-singular diagonal matrices. Using algebraic operations:

DTATPT PAD D−1x=DTATPT Pb. (5)

As the matrix D is diagonal and non-singular, it is possible to multiply Eq. 5 from the left by (DT)−−1. It results to:

ATQA D D−1x=ATQ b. (6)

where Q=PTP is a diagonal matrix of pi2 row multipliers. If ξξ=D−−1x, then Eq. 6 can be rewritten as:

ATQA D ξ=ATQ b. (7)

then the solution of Eq. 7 using LSE method:

ξ=(ATQA D)−1 ATQ b=D−1(ATQA)−1 ATQ b. (8)

x=Dξ. (9)

Therefore in the case of linear regression, the LSE method, Eq. 3:

— is invariant to physical units used, if the transformation x=Dξξ is used,
— is not invariant to row multiplications due to dependency on the matrix P, resp. Q, which represents multipliers of rows.

3 Example

Let us consider two simple examples of the LSE use for two different simple cases with a modification, when the first row of the extended matrix [A||b]] is multiplied by the value 10:

— the first case – a function is given as z=a1x+a2y, i.e. a plane passing the origin, and values of (x,y,z) are given as (1,2,1),(2,2,2),(3,7,7):

[132237][a1a2]=[127] [10302237][a1a2]=[1027] (10)

The solutions are x=[11/21,2/3]]T and x=[275/129,−−46/129]]T.
— the second case – a function is given as y=kx+q, i.e., a line in E2 not passing the origin, and values of (x,y) are given as (1,1),(2,2),(3,7):

[112131][kq]=[127] [10102131][kq]=[1027] (11)

The solutions are x=[3,−−3/4]]T and x=[435/167,−−808/501]]T.

These elementary examples serve to understand the limitations of the LSE use. The results are valid for d-dimensional space, in general. In the first case, usually, users are not aware of that. The second case can be easily understood as the k represents a normal vector generally in a higher dimension, while q is related to a distance from the origin. The solution of Eq. 3 might be unstable, as the matrix AT A is generally numerically ill-conditioned [³, ¹¹, ¹³, ¹⁴].

It should be noted, that in many cases the Total Least Square Error (TLSE) should be used instead. However, it leads to more complicated computation [⁹, ¹⁶].

A simple preconditioning [¹², ¹⁵] should be considered. Also, the modified Gauss elimination method [⁷, ⁸] can be used as a solution of a linear system is equivalent to the outer product use.

4 Conclusion

This contribution describes selected mostly un-known properties of the Least Square Error for the approximation of acquired data. The LSE method is non-invariant to the multiplication of a row of the extended matrix [A||b]]. Also, in the case of non-existent metric between parameters, like a distance and a normal vector, the LSE based approximation should not be used.

Acknowledgements

This research was partially supported by the Czech Science Foundation (GACR), project No. GA 17-05534S.

The author would like to thank colleagues and students at the University of West Bohemia for hints and suggestions. Thanks also belong to colleagues at Shandong University and Zhejiang University (China) for their critical comments and constructive suggestions and to anonymous reviewers for their valuable comments and hints provided.

References

1. 1. Alciatore, D. G., Miranda, H. (1995). The Best Least-Squares Line Fit. Academic Press. [ Links ]

2. 2. Cervenka, M., Skala, V. (2020). Conditionality analysis of the radial basis function matrix. LNCS, Vol. 12250, pp. 30–43. [ Links ]

3. 3. Kansa, E., Holoborodko, P. (2017). On the ill-conditioned nature of c^∞ RBF strong collocation. Engineering Analysis with Boundary Elements, Vol. 78, pp. 26–30. [ Links ]

4. 4. Lay, D. (2006). Linear Algebra and Its Applications. Pearson international edition. Pearson/Addison-Wesley. [ Links ]

5. 5. Majdisova, S. V. Z. (2017). Big geo data surface approximation using radial basis functions: A comparative study. Computers & Geosciences, Vol. 109, pp. 51–58. [ Links ]

6. 6. Markovsky, I., Van Huffel, S. (2007). Overview of total least-squares methods. Signal Processing, Vol. 87, No. 10, pp. 2283–2302. [ Links ]

7. 7. Skala, V. (2013). Modified gaussian elimination without division operations. ICNAAM’13, AIP Conference Proceedings, Vol. 1558, pp. 1936–1939. [ Links ]

8. 8. Skala, V. (2016). Extended cross-product and solution of a linear system of equations. Lecture Notes in Computer Science, Vol. 9786, pp. 18–35. [ Links ]

9. 9. Skala, V. (2016). A new formulation for total least square error method in d-dimensional space with mapping to a parametric line. Proceedings of the ICNAAM 2016, volume 1738, pp. 480106–1 – 480106–4. [ Links ]

10. 10. Skala, V. (2016). Total least square error computation in E2: A new simple, fast and robust algorithm. Proceedings of the 33rd Computer Graphics International, CGI’16, Association for Computing Machinery, pp. 1–4. [ Links ]

11. 11. Skala, V. (2017). High dimensional and large span data least square error: Numerical stability and conditionality. Int. J. Appl. Phys. Math.-IJAPM, Vol. 7, pp. 148–156. [ Links ]

12. 12. Skala, V. (2017). Least square method robustness of computations: What is not usually considered and taught. FedCSIS’17, pp. 537–541. [ Links ]

13. 13. Skala, V. (2017). RBF interpolation with CSRBF of large data sets. Proceedings of the ICCS’17, volume 108, pp. 2433–2437. [ Links ]

14. 14. Skala, V. (2018). RBF approximation of big data sets with large span of data. Proceedings of the MCSI’2017, volume 1, pp. 212–218. [ Links ]

15. 15. Skala, V. (2020). Conditionality of linear systems of equations and matrices using projective geometric algebra. Computational Science and Its Applications ICCSA’20, Springer, pp. 3–17. [ Links ]

16. 16. Smolik, M., Skala, V., Majdisova, Z. (2020). A new simple, fast and robust total least square error computation in E2: Experimental comparison. Lecture Notes in Electrical Engineering, Vol. 554, pp. 325–334. [ Links ]

17. 17. Smolik, M., Skala, V., Nedved, O. (2016). A comparative study of LOWESS and RBF approximations for visualization. LNCS, Vol. 9787, pp. 405–419. [ Links ]

Received: September 08, 2020; Accepted: November 16, 2020

^* Corresponding author: Vaclav Skala, e-mail: skala@kiv.zcu.cz

This is an open-access article distributed under the terms of the Creative Commons Attribution License