首页 资讯正文

译科技『连载』:从可视化线性代数开始机器学习(一)

  人工智能被广泛应用于图像识别、语音识别、自然语言处理、智能推荐、自动驾驶、智能制造、医疗保健等众多领域,对社会、经济、科技的发展产生了深远影响。

  机器学习是人工智能的核心,是使计算机智能化的根本途径。机器学习是一门多领域交叉学科,涉及多门学科,专门研究计算机怎样模拟或实现人类的学习行为,以获取新的知识和技能,并重新组织已有知识结构使之不断提升自身性能。机器学习的发展已经取得叹为观止的成就,彻底改变了人类对人工智能的认知。想了解机器学习,就需要先学习数学基础,包括线性代数、微积分、概率论等知识,这些数学基础对于深度学习等人工智能领域的理解至关重要。

  数据观将从今日起连载《从可视化线性代数开始机器学习》系列文章,作者为欧洲航天局机器学习专家Marcello Politi,专业深度讲述机器学习的“黑魔法”。

Visualized Linear Algebra to Get Started with Machine Learning: Part 1

从可视化线性代数开始机器学习(一)

  Master elements of linear algebra, start with simple and visual explanations of basic concepts.

  掌握线性代数要素,从基本概念的简明阐释开始。

  Often the main difficulty one faces when one wants to begin one’s journey into the world of machine learning is having to understand math concepts. Sometimes this can be difficult if you do not have a solid background in subjects such as linear algebra, statistics, probability, optimization theory, or others.

  当你刚刚踏入机器学习的世界,面临的主要困难往往是数学概念的理解。如果没有扎实的线性代数、统计学、概率论、优化理论等学科背景,你通常难以往下走。

  In this article then, I would like to start by giving intuitive explanations of basic linear algebra concepts that are essential before delving into the world of Machine Learning. Obviously, this article is not meant to be exhaustive there is a lot to know about this subject, but maybe it can be a first approach to tackling this subject!

  本文而言,首先,我想对线性代数基础概念进行直观解释,这些概念有助于我们深入机器学习的世界。显然,这篇文章并非详尽无遗,关于机器学习和数学概念还有很多东西需要了解,但它可以成为解决这些问题的第一种方法。

Introduction

简介

What is a vector?

什么是矢量?

Simple Vector Operations

简单向量运算

Projections

投影

Basis, Vector Space and Linear Indipendence

基、矢量空间和线性无关

Matrices and Solving Equations

矩阵和求解方程

  Introduction

  简介

  Why Linear Algebra is important for Data Science?

  为什么线性代数对数据科学很重要?

  Linear algebra allows us to solve real-life problems, especially problems that are very common in data science.

  线性代数能够解决我们现实生活中的问题,尤其是数据科学中十分常见的问题。

  Assume we go to the market to buy 3 avocados, and 4 broccoli and pay $8. The next day we buy 11 avocados and 2 broccoli and pay $12.Now we want to find out how much a single avocado and a single broccoli cost. We have to solve the following expressions simultaneously.

  假设我们去市场购买 3 个鳄梨和 4 个西兰花,并支付 8 美元。第二天,我们买了 11 个鳄梨和 2 个西兰花,并支付了 12 美元。现在,我们想知道单个鳄梨和单个西兰花的成本是多少,我们必须同时解决下列数学表达式。

Linear Algebra Problem (Image By Author)

线性代数问题(图片来自作者)

  Another typical problem is to find the best parameters of a function for it to fit the data we have collected. So suppose we already know what kind of function we need to use, but this function can change its form since it depends on some parameters. We want to find the best form and therefore the best parameters.

  另一个典型问题是,找到一个函数的最佳参数,使其适合我们收集的数据。因此,假设我们已经知道需要使用什么样的函数,但这个函数可以改变其形式,因为它依赖于一些参数。我们想找到最好的形式,从而找到最佳参数。

Fitting Data (Image By Author)

数据拟合(图片来自作者)

  Let’s for example call μ = param1 and θ = param2.

  Usually, in Machine Learning, we want to iteratively update bot [μ, θ] to find at the end some good curve that fits our data.

  例如,我们调用μ = param1和θ = param2。

  通常,在机器学习中,我们希望迭代更新 bot [μ, θ]以在最后找到适合我们数据的良好曲线。

  Let’s say that a curve far away from the optimal green curve has a high error, while a curve similar to the green one has a low error. We usually say that we want to find those parameters [μ, θ] in order to minimize the error, so find the curve which is as closest as possible to the green one.

  比方说,离最佳绿色曲线较远的曲线有较高误差,而与绿色曲线相似的曲线有较低误差。我们通常说要找到那些参数[μ, θ],使误差最小化,所以要找到尽可能接近绿色曲线的其他曲线。

  Let’s see how linear algebra can help us with these problems!

  让我们看看线性代数如何帮助我们解决这些问题!

  What is a vector?

  什么是矢量?

  A vector in physics is a mathematical entity that has a direction a sign and a magnitude. So it is commonly represented visually with an arrow.

  物理学中的矢量是一个具有方向、符号和大小的数学实体。所以通常在用箭头直观表示。

Vector (Image By Author)

矢量(图片来自作者)

  Often in computer science, the concept of vector is generalized. In fact, you will hear many times the term list instead of vector. In this conception, the vector is nothing more than a list of properties that we can use to represent anything.

  在计算机科学中,矢量的概念常常被泛化。事实上,你会多次听到用“列表”代替“矢量”这个词。在这种概念中,矢量不过是一个属性列表,我们可以用它来表示任何东西。

  Suppose we want to represent houses according to 3 of their properties:

  假设我们要根据房屋的3个属性来表示房屋:

  1. The number of rooms

  1. 房间数

  2. The number of bathrooms

  2. 卫生间数

  3. Square meters

  3.平方米

 

Lists (Image By Author)

列表(图片来自作者)

  For example, in the image above we have two vectors. The first represents a house with 4 bedrooms, 2 bathrooms and 85 square meters. The second, on the other hand, represents a house with 3 rooms, 1 bathroom and 60 square meters.

  例如,在上图中我们有两个矢量。第一个代表一个有4间卧室、2 间卫生间和85平方米的房子;第二个代表一个有3个房间、1个卫生间和60平方米的房子。

  Of course, if we are interested in other properties of the house we can create a much longer vector. In this case, we will say that the vector instead of having 3 dimensions will have n dimensions. In machine learning, we can often have hundreds or thousands of dimensions!

  当然,如果对房子的其他属性感兴趣,可以创建一个更长的矢量。在这种情况下,我们会说矢量不是3维,而是N维。在机器学习中,通常可以有成百上千个维度。

  Simple Vector Operations

  简单矢量运算

  There are operations we can perform with vectors, the simplest of which are certainly addition between two vectors, and multiplication of a vector by a scalar (i.e., a simple number).

  我们可以对矢量执行一些操作,其中最简单的就是两个矢量之间的加法,以及一个矢量与一个标量(即一个简单的数字)的乘法。

  To add 2 vectors you can use the parallelogram rule. That is, you draw vectors parallel to those we want to add and then draw the diagonal. The diagonal will be the resulting vector of the addition. Believe me, it is much easier to understand this by looking directly at the following example.

  要添加2个矢量,可以使用平行四边形规则。也就是说,绘制与要加的矢量平行的矢量,然后画出对角线。对角线将是加法的结果矢量。直接看下面的例子会更容易理解这一点。

Vector Addition (Image By Author)

矢量加法(图片来自作者)

  While multiplication by a scalar stretches the vector by n units. See the following example.

  而与标量相乘则将矢量拉长N个单位。请看下面的例子。

Vector -Scala Multiplication (Image By Author)

斯卡拉矢量乘法(图片来自作者)

  Modulus and Inner Product

  模和内积

  A vector is actually always expressed in terms of other vectors. For example, let us take as reference vectors, two vectors i and j both with length 1 and orthogonal to each other.

  一个矢量实际上总是用其他矢量表示。例如,我们把两个长度为1且相互正交的矢量i和j作为参考矢量。

Unit Lenght Vectors (Image By Author)

单位矢量(图片来自作者)

  Now we define a new vector r, which starts from the origin, that is, from the point where i and j meet, and which is a times longer than i, and b times longer than j.

  现在我们定义一个新的矢量r,它从原点开始,即从i和j 的交点开始,比i长a倍,比j 长b倍。

A vector in Space (Image By Author)

空间矢量(图片来自作者)

  More commonly we refer to a vector using its coordinates r = [a,b], in this way we can identify various vectors in a vector space.

  更常见的是,我们使用坐标 r = [a,b] 来引用矢量,这样我们就可以在矢量空间中识别各种矢量。

  Now we are ready to define a new operation, the modulus of a vector, that is, its length can be derived from its coordinates and is defined as follows.

  现在我们准备好定义一个新的操作,向量的模,即它的长度可以从它的坐标中导出,定义如下。

ector Modulus (Image by Author)

矢量模数(图片来自作者)

  The Inner Product on the other hand is another operation with which given two vectors, it multiplies all their components and returns the sum.

  另一方面,内积是另一种运算,给定两个矢量,它将它们的所有分量相乘并返回总和。

Inner (dot) Product (Image By Author)

内(点)积(图片来自作者)

  The inner product has some properties that may be useful in some cases :

  内积有一些特性,在某些情况下可能是有用的:

  commutative : r*s = s*r

  distributive over addition : r*(s*t) = r*s + r*t

  associative over scalar multiplication: r*(a*s) = a*(r*s) where a is a scalar

  Notice that if you compute the inner product of a vector per itself, you will get its modulus squared!

  交换律:r*s = s*r

  分配加法:r*(s*t) = r*s + r*t

  结合标量乘法:r*(a*s) = a*(r*s) 其中 a 是标量

  请注意,如果你计算一个矢量本身的内积,你将得到它的模的平方!

  Inner (dot) Product (Image by Author)

内(点)积(图片来自作者)

Cosine (dot) Product

  余弦(点)积

  So far we have only seen a mathematical definition of the inner product based on the coordinates of vectors. Now let us see a geometric interpretation of it. Let us create 3 vectors r, s and their difference r-s, so as to form a triangle with 3 sides a,b,c.

  到目前为止,我们只看到了基于向量坐标的内积的数学定义。现在让我们看看它的几何解释。创建 3 个向量r, s和它们的差r-s,形成一个有a,b,c 三条边的三角形。

Triangle (Image By Autor)

三角形(图片来自作者)

  We know from our high school days that we can derive c using a simple rule of trigonometry.

  我们从高中时代就知道,可以使用简单的三角法则推导出c。

Trigonometry (Image By Author)

三角学(图片来自作者)

  But then we can derive from the above that:

  但是,也可以从上面得出:

(Image By Author)

(图片来自作者)

  So the comprised angle has a strong effect on the result of this operation. In fact in some special cases where the angle is 0°, 90°, and 180° we will have that the cosine will be 0,1,-1 respectively. And so we will have special effects on this operation. So for example, 2 vectors that are 90 degrees to each other will always have a dot product = 0.

  所以夹角对这个操作的结果有很大的影响。事实上,在角度为 0°、90° 和 180°的特殊情况下,余弦将分别为 0,1,-1,这将对此操作产生特殊效果。因此,例如,2个相互成90°的矢量总有一个点积=0。

  Projection

  投影

  Let’s consider two vectors r and s. These two vectors are close to each other from one side and make an angle θ in between them. Let’s put a torch on top of s, and we’ll see a shadow of s on r. That’s the projection of s on r.

  让我们假设两个矢量 r 和 s。这两个矢量从一侧彼此靠近并在它们之间形成角度θ,在s上放一个手电筒,我们会在r上看到s的影子。那是s在r上的投影。

Projection (Image By Author)

投影(图片来自作者)

  There are 2 basics projection operations:

  有两种基本的投影操作:

  Scalar Projection: gives us the magnitude of the projection

  标量投影:投影的大小

  Vector Projections: gives us the projection vector itself

  矢量投影:投影矢量本身

Projections (Image By Author)

投影(图片来自作者)

  Changing Basis

  变基

  Changing basis in linear algebra refers to the process of expressing a vector in a different set of coordinates, called a basis. A basis is a set of linearly independent vectors that can be used to express any vector in a vector space. When a vector is expressed on a different basis, its coordinates change.

  线性代数中的变基是指用一组不同的坐标来表达一个向量的过程,这组坐标称为基。一个基是一组线性独立的矢量,可以用来表达矢量空间中的任何矢量。当一个矢量用不同的基来表达时,它的坐标就会改变。

  We have seen, for example, that in two dimensions each vector can be represented as a sum of two basis vectors [0,1] and [1,0]. These two vectors are the basis of our space. But can we use two other vectors as the basis and not just these two? Certainly but in this case the coordinates of each vector in our space will change. Let’s see how.

  例如,我们已经看到,在二维空间中,每个矢量都可以表示为两个基矢量 [0,1] 和 [1,0] 的和。这两个矢量是我们空间的基础。试想,可以使用其他两个矢量作为基础而不仅仅是这两个吗?当然可以,但在这种情况下,空间中每个矢量坐标都会发生变化。

New basis (Image by Author)

新基(图片来自作者)

  In the image above, I have two bases. The base (e1, e2), and the base (b1,b2). In addition, I have a vector r (in red). This vector has coordinates [3,4] when expressed in terms of (e1,e2) which is the base we’ve always used by default. But how do its coordinates become when expressed in terms of (b1,b2)?

  在上面的图片中,有两个基数。基数(e1,e2),和基数(b1,b2)。此外,还有一个矢量r(红色)。当用(e1,e2)表示时,这个向量的坐标为[3,4],这是我们一直默认使用的基数。但如果用(b1,b2)来表示,它的坐标会变成什么样子?

  To find these coordinates we need to go by steps. First, we need to find the projections of the vector r onto the vectors of the new base (b1,b2).

  为了找到这些坐标,我们需要分步骤进行。首先,我们需要找到矢量r对新基点(b1,b2)的矢量的投影。

Changing Basis (Image By Author)

变基(图片来自作者)

  It’s easy to see that the sum of these projections we created is just r.

  显而易见我们创建的这些投影的总和就是 r。

  r = p1 + p2.

  Furthermore, in order to change the basis, I have to check that the new basis is also orthogonal, meaning that the vectors are at 90 degrees to each other, so they can define the whole space.

  此外,为了改变基础,我必须检查新基也是正交的,也就是说,向量之间是90度的,所以它们可以定义整个空间。

  To check this just see if the cosine of the angle is 0 which means an angle of 90 degrees.

  要检查这一点,只需查看角度的余弦是否为 0,这意味着角度为 90 度。

Check orthonormal basis (Image by Author)

检查正交基(图片来自作者)

  Now we go on to calculate the vector projections of r on the vectors (b1,b2), with the formula we saw in the previous chapter.

  现在我们继续计算 r 在矢量 (b1,b2) 上的矢量投影,使用我们在前一章提到的公式。

Vector Projection (Image By Author)

矢量投影(图片来自作者)

  The value circled in red in the vector projection will give us the coordinate of the new vector r expressed in base b : (b1,b2) instead of e : (e1,e2).

  矢量投影中用红色圈出的值将为我们提供以基数b表示的新矢量r的坐标:(b1,b2) 而非e:(e1,e2)。

Vector r in new basis b (Image by Author)

新基b中的矢量r(图片来自作者)

  To check that the calculations are right we need to check that the sum of the projections is just r in base e:(e1,e2).

  为了检查计算是否正确,我们需要检查投影的总和是否只是以e为基数的r:(e1,e2)。

  [4,2] + [-1,2] = [3,4]

  Basis, Vector Space and Linear Indipendence

  基、矢量空间和线性无关

  We have already seen and talked about basis. But let’s define more precisely what a vector basis is in a vector space.

  前面章节中,我们已经看到并谈到了基,接下来让我们更加准确地定义什么是矢量空间中的矢量基。

  A basis is a set of n vectors that:

  基是一组 n 个矢量,它们:

  are not linear combinations of each other (linearly independent)

  不是彼此的线性组合(线性独立)

  span the space: the space is n-dimensional

  跨越空间:空间是n维的

  The first point means that if, for example, I have 3 vectors a,b,c forming a basis, that means there is no way to add these vectors together and multiply them by scalars and get zero!

  第一点是说,例如我有 3 个矢量a、b、c构成一个基,这意味着没有办法将这些矢量加在一起并乘以标量并得到零!

  If I denote by x y and z any three scalars (two numbers), it means that :

  如果我用x ,y和z表示任意三个标量(两个数字),则意味着:

  xa + yb +zc != 0

  (obviously excluding the trivial case where x = y = z = 0). In this case, we will say that the vectors are linearly independent.

  (显然不包括 x = y = z = 0 的简单情况)。在这种情况下,我们会说矢量是线性无关的。

  This means, for example, that there is no way to multiply by scalars and add a and b together to get c. It means that if a and b lie in space in two dimensions c lies in a third dimension instead.

  例如,无法乘以标量并将a和b加在一起得到c。这意味着如果a和b位于二维空间中,则 c 位于第三维空间中。

  While the second point means that I can multiply these vectors by scalars and sum them together to get any possible vectors in a 3-dimensional space. So these 3 basis vectors are enough for me to define the whole space of dimension n=3.

  而第二点意味着我可以将这些矢量乘以标量并将它们加在一起以获得 3 维空间中的任何可能矢量。所以这 3 个基矢量足以让我定义维度 n=3 的整个空间。

  Matrices and solving simultaneous equations

  矩阵和求解联立方程

  By now you should be pretty good at handling vectors and doing operations with them. But what are they used for in real life? We saw in the beginning that one of our goals was to solve multiple equations together simultaneously, for example, to figure out the prices of vegetables at the supermarket.

  到目前为止,您应该非常擅长处理矢量并对它们进行操作。但是它们在现实生活中有什么用呢?我们一开始就知道我们的目标之一是同时求解多个方程,例如,计算超市蔬菜的价格。

Simultaneous Equations (Image By Author)

联立方程组(图片来自作者)

  But now that we know the vectors we can rewrite these equations in a simpler way. We put the vectors of coefficients [2,10] and [3,1] next to each other in forming a matrix (set of vectors). Then we will have the vector of unknowns [a,b] and finally the result [8,3].

  但现在我们知道了矢量,我们可以用更简单的方式重写这些方程。我们把系数向量[2,10]和[3,1]挨在一起,形成一个矩阵(矢量集)。然后我们会有未知数的矢量[a,b],最后是结果[8,3]。

Vectorized Form (Image By Author)

矢量化形式(图片来自作者)

  Now you may ask whether this new form of writing the problem is really better or not. How do you do multiplication between a matrix and a vector? It is very simple. Just multiply each row of the matrix by the vector. In case we had a multiplication between two matrices we would have to multiply each row of the first matrix by each column of the second matrix.

  现在你可能会问,这种写问题的新形式是否真的更好。如何在矩阵和矢量之间做乘法?这很简单。只需将矩阵的每一行与矢量相乘。如果要在两个矩阵之间做乘法,我们就必须用第一个矩阵的每一行乘以第二个矩阵的每一列。

  So by applying this rule rows by columns we should regain the original shape.

  因此,通过逐列应用这一规则,我们应该重新获得原始形状。

Matrix Multiplication (Image By Author)

矩阵乘法(图片来自作者)

  This form, however, has other advantages as well. It gives us a geometric interpretation of what is happening. Every matrix defines a transformation in space. So if I have a point in a space and I apply a matrix, my point will move in some way.

  然而,这种形式也有其他优点。它为我们提供了对正在发生的事情的几何解释。每个矩阵都定义了空间的转换。因此,如果在一个空间里有一个点,应用一个矩阵,这个点将以某种方式移动。

Matrix Transformation (Image By Author)

矩阵变换(图片来自作者)

  But then we can also say that a matrix is nothing more than a function that takes a point as input and generates a new one as output.

  但是,我们也可以说,矩阵只不过是一个将一个点作为输入并生成一个新点作为输出的函数。

  So our initial problem can be interpreted as follows, “What is the original vector [a,b] on which the transformation results in [8,3]?”

  所以我们最初的问题可以解释如下:“什么是原始向量 [a,b],其变换结果为 [8,3]?”

  In this way, you can think about solving simultaneous equations as transformations over vectors in a vector space. Plus operations with matrices have the following properties that can be very useful.

  这样一来,就可以把求解联立方程看成是对矢量空间中矢量的变换。矩阵加运算具有以下非常有用的属性。

  Given A(r) = r2 where A is a matrix and r, r2 are both scalar:

  给定 A(r) = r2,其中 A 是矩阵,r、r2 都是标量:

  A(nr) = ns where n is a scalar

  A(nr) = ns其中n是标量

  A(r+s) = A(r) + A(s) where s is a vector

  A(r+s) = A(r) + A(s)其中s是矢量

  Matrices and space transformations

  矩阵和空间变换

  To understand the effects of a matrix then we can see how they transform the vectors to which they are applied. In particular, we might see what is the impact of a matrix when applied on the eigenbasis.

  要了解矩阵的影响,可以了解它们如何转换应用它们的矢量。特别是,我们可以看看矩阵在应用于特征基数时有什么影响。

  If we have a 2x2 matrix and we are in a space in two dimensions, the first column of the matrix will tell us what the effect will be on the vector e1 = [1,0] and the second column instead will tell us what the effect will be on the vector e1 = [0,2].

  如果我们有一个2x2的矩阵,并且我们在一个二维空间中,矩阵的第一列会告诉我们对向量e1=[1,0]的影响,而第二列则会告诉我们对向量e1=[0,2]的影响。

  We then see the effect of some known matrices. These transformations are often useful in Machine Learning for data augmentation on images, you can stretch or shrink those images for example.

  然后我们看到一些已知矩阵的效果。这些变换在机器学习中对图像的数据增强经常是有用的,例如可以拉伸或缩小这些图像。

Matrix transformations (Image By Author)

矩阵变换(图片来自作者)

  We can also apply multiple consecutive transformations to a vector. So if we have two transformations represented by the matrices A1 and A2 we can apply them consecutively A2(A1(vector)).

  我们也可以对一个矢量连续进行多次变换。因此,如果我们有两个由矩阵A1和A2表示的变换,我们可以连续应用它们A2(A1(vector))。

  But this is different from applying them inversely i.e. A1(A2(vector)). That is why the product between matrices does not enjoy the commutative property.

  但这与反过来应用它们是不同的,即A1(A2(vector))。这就是为什么矩阵之间的乘积不享有换元特性的原因。

责任编辑:张薇

分享: