Optimization Theory (DS2) Lecture #4
The Heart of the Simplex Algorithm
Abstract
Today we are going to look at the simplex algorithm, as developed by George Dantzig (1947). We will cover Secs. 2.4 and 2.5 of the textbook. In my opinion, this is the hardest week of the semester, stick with me here!
Text and examples adapted from A Gentle Introduction to Optimization.
1 Review
There were a lot of blank looks last time as we discussed the geometric nature of the constraints and the objective function, and we didn’t even get to discuss standard equality form (SEF) and the core idea of the simplex operation. So let’s begin here with some geometry, then go back and pick up what we were supposed to cover in Sec. 4 and 5 of last time, before we start into the grungy parts of simplex.
1.1 The Geometric Nature of Constraints and Objective Functions
Or, A Simple(!) Example of Prepping a Linear Program on a 2-D Simplex (i.e., a Triangle)
By now, you’ve heard me say that a simplex is an -dimensional triangle, but that the simplex algorithm applies to polytopes (-dimensional polygons).
Let’s look at a problem formulation for a 2-D simplex, that is, a triangle. Examples in this section are in Octave, rather than in R.
Consider the following LP, not yet in SEF (defined in the next section):
(1) |
subject to
(2) | |||
(3) |
It should be obvious that the first constraint is , and the second constraint is . The third constraint, , is a diagonal line. These constraints, one at a time, then the whole set, are illustrated in Fig. 5. Our objective function is shown in Fig. 6, with the code for all of these figures in the next subsection.
Our first important task is to put the problem into standard equality form (SEF). Since our constraints are inequalities, we have to add slack variables, , and . (Slack variables and free variables are covered more rigorously in Sec. 1.3.)
(4) |
subject to
(5) | ||||
(6) |
Now our program is in SEF, nice and pretty and waiting to be solved!
1.2 Interlude in Octave
We can draw each of those constraints using the following Octave code:
>> c1 = [0 0; 0 10; 2 10; 2 0]; >> c1v = [0 0; 0 10; 2 10; 2 0]; >> c2v = [0 0; 0 4; 10 4; 10 0]; >> fs = [ 1 2 3 4]; >> c3v = [0 28/3; 0 10; 10 10; 10 8/3]; >> fs3 = [1 2 3 4]; >> figure >> xlim([0 10]) >> ylim([0 10]) >> axis('equal') >> patch("Faces", fs, "Vertices", c1v, 'FaceColor', 'blue'); >> figure >> xlim([0 10]) >> ylim([0 10]) >> axis('equal') >> patch("Faces", fs, "Vertices", c2v, 'FaceColor', 'yellow'); >> figure >> xlim([0 10]) >> ylim([0 10]) >> axis('equal') >> patch("Faces", fs3, "Vertices", c3v, 'FaceColor', 'red'); >> figure >> xlim([0 10]); ylim([0 10]); axis('equal'); >> patch("Faces", fs, "Vertices", c1v, 'FaceColor', 'blue'); >> patch("Faces", fs, "Vertices", c2v, 'FaceColor', 'yellow'); >> patch("Faces", fs3, "Vertices", c3v, 'FaceColor', 'red');
And plot the constrained objective function using:
>> v = [2 4; 2 8; 8 4] v = 2 4 2 8 8 4 >> f = [1 2 3] f = 1 2 3 >> col = [2.57; 1; 5] col = 2.5700 1.0000 5.0000 >> figure >> xlim([0 10]) >> ylim([0 10]) >> axis('square') >> axis('equal') >> patch('Faces',f,'Vertices',v,'FaceVertexCData',col,'FaceColor','interp'); >> colorbar
1.3 Standard Equality Form (SEF)
Adapted from Sec. 4 of last time’s text.
An LP is in standard equality form if:
-
1.
it is a maximization problem;
-
2.
other than the non-negativity constraints, all constraints are equalities; and
-
3.
every variable has a non-negativity constraint.
That is, it can be written in the form
(7) |
In this formulation, is the vector of coefficients of each of our variables in the objective function. It will have two entries for each free variable we adapted and will have zeroes for each slack variable we added. and have one line for each constraint in our problem.
Consider the LP
(8) |
subject to
(9) | |||
(10) |
is called a free variable since it doesn’t have a non-negativity constraint. But a lot of our techniques and especially proofs depend on that. So, we introduce two new variables, and , set , and add constraints . After some algebra, you get
(11) |
subject to
(12) | |||
(13) |
Getting closer, but we’re still not quite there; we don’t have equalities everywhere yet. So we introduce two slack variables and . Now we get
1.4 A Simplex Iteration
Adapted from Sec. 5 of last time’s text.
The basic idea is to increase one of our variables until it hits a “stop” against one of the constraints. In the main body of the simplex algorithm, this will involve running along an edge of the polytope (see below) until we reach a vertex.
Consider the following LP in SEF:
(17) |
subject to
(18) | |||
(19) |
Given the solution (easy to see that’s a solution – look at the right side of the array), .
Now try increasing , choosing . A little algebra gives
(20) |
from which we get
(21) |
which gives us the limiting inequality . Setting , and .
Unfortunately, we can’t yet apply the same trick to , so that’s the topic for this week.
2 Bases
(Adapted from Sec. 2.4.1 in the textbook.)
Before we get to the algorithm itself, we need to discuss the idea of a basis for a set of variables (dimensions), and canonical form.
Recall that the inverse of a square matrix is the matrix such that , where is the identity matrix. A singular matrix has no inverse, or equivalently, its determinant is 0. A nonsingular matrix has an inverse and a non-zero determinant. (Question: is the identity matrix singular or nonsingular?)
is the matrix with linearly independent rows that comprise our constraints. We need to partition our columns into two sets, one that comprises a basis, which we will call , and the rest, which we will call .
Consider
(22) |
The set of columns is a basis, which we can see easily by taking
(23) |
which, if you think about it as a three-dimensional space, is the vector composed of the , and unit vectors.
Question: For the simplex algorithm, does our basis set have to be orthonormal?
The set of columns is also a basis:
(24) |
You can see pretty easily how to turn that into the identity matrix by substracting times the first row from the second, and times the first row from the third, after which our basis is orthonormal.
We will need a basic solution for our problem . is a basic solution iff:
-
1.
, and
-
2.
.
where is the subset of the vector of the columns not in the basis .
Every basis has a unique basic solution, which we are going to need. For , the basic solution is .
3 Canonical Forms
(Adapted from Sec. 2.4.2 in the textbook.)
Consider an LP (P) in SEF:
(25) |
where is a constant. Let be a basis of . (P) is in canonical form for if:
-
1.
is an identity matrix, and
-
2.
.
(26) |
subject to
(27) | |||
(28) |
Note that in this case, .
Obviously, is a basis, but that’s boring, it’s already in canonical form. is also a basis, so let’s rewrite our program in canonical form for this basis:
(29) |
subject to
(30) | |||
(31) |
(Notice that now columns 1, 2, 4 form an identity matrix.) So how did we get here? First, we are going to need the inverse of a matrix or two, so let’s take a quick break and see how to calculate them numerically using Octave or R, then we’ll come back to the problem.
3.1 Interlude: calculating the inverse of a matrix using Octave
The following ad hoc Octave code will create a matrix for , take its inverse, and left multiply it with the full constraint matrix .
octave:1> AB = [ 1, 1, 0; 2, 1, 1; -1, 1, 0] AB = 1 1 0 2 1 1 -1 1 0 octave:2> A = [1, 1, 1, 0, 0; 2, 1, 0, 1, 0; -1, 1, 0, 0, 1] A = 1 1 1 0 0 2 1 0 1 0 -1 1 0 0 1 octave:3> ABinv = inv(AB) ABinv = 0.50000 0.00000 -0.50000 0.50000 0.00000 0.50000 -1.50000 1.00000 0.50000 octave:4> ABinv*A ans = 1.00000 0.00000 0.50000 0.00000 -0.50000 0.00000 1.00000 0.50000 0.00000 0.50000 0.00000 0.00000 -1.50000 1.00000 0.50000
3.2 Interlude: calculating the inverse of a matrix using R
The following ad hoc R code will create a matrix for , take its
inverse, and left multiply it with the full constraint matrix .
Note the use of %*%
to do the matrix multiplication; simply
using *
does element-by-element multiplication. The function
solve()
, for our purposes here, inverts the matrix.
> AB = t(matrix( + c(1, 1, 0, 2, 1, 1, -1, 1, 0), + nrow = 3, + ncol = 3)) > AB [,1] [,2] [,3] [1,] 1 1 0 [2,] 2 1 1 [3,] -1 1 0 > A = t(matrix( + c(1, 1, 1, 0, 0, 2, 1, 0, 1, 0, -1, 1, 0, 0, 1), + nrow=5, ncol=3)) > A [,1] [,2] [,3] [,4] [,5] [1,] 1 1 1 0 0 [2,] 2 1 0 1 0 [3,] -1 1 0 0 1 > ABinv <- solve(AB) > ABinv [,1] [,2] [,3] [1,] 0.5 0 -0.5 [2,] 0.5 0 0.5 [3,] -1.5 1 0.5 > ABinv %*% A [,1] [,2] [,3] [,4] [,5] [1,] 1 0 0.5 0 -0.5 [2,] 0 1 0.5 0 0.5 [3,] 0 0 -1.5 1 0.5
3.3 Now Back to Our Regularly Scheduled Program
So how did we get that canonical form in Eqs. 29 to 31? There’s about 3.5 pages of linear algebra in the textbook…
Fundamentally, take the program (P):
(32) | ||||
subject to | ||||
(33) | ||||
(34) |
find your next preferred basis , and our new, equivalent program is
(35) | ||||
subject to | ||||
(36) | ||||
(37) | ||||
where | ||||
(38) |
(Note that is the transpose of the inverse of the matrix , and the inverse operation and transpose operation commute, so .)
Or, I prefer to write it
(39) | ||||
subject to | ||||
(40) | ||||
(41) |
where
(42) | ||||
(43) | ||||
(44) | ||||
(45) | ||||
(46) |
This, then, is the basic process of canonization (has nothing to do with the Catholic saints):
-
1.
rewrite the conditions using the procedure above, to satisfy the first condition;
-
2.
rewrite the objective function using from Eq. 42.
3.4 Additional Example
Due to popular request, in class we went through a second example of creating the canonical form for a different basis. Still the same problem,
(47) |
subject to
(48) | |||
(49) |
Note that in this case, .
As written, this is already in canonical form for . The requested basis was , which gives
(50) |
and
(51) |
We are going to need both and , so let’s calculate them:
octave:1> A = [1, 1, 1, 0, 0; 2, 1, 0, 1, 0; -1, 1, 0, 0, 1] A = 1 1 1 0 0 2 1 0 1 0 -1 1 0 0 1 octave:2> AB = [1, 0, 0; 1, 1, 0; 1, 0, 1] AB = 1 0 0 1 1 0 1 0 1 octave:3> ABinv = inv(AB) ABinv = 1 0 0 -1 1 0 -1 -0 1 octave:4> ABinvT = ABinv' ABinvT = 1 -1 -1 0 1 -0 0 0 1
(In Octave, the single quote mark, or apostrophe, gives you the transpose of a matrix or vector. This is not to be confused with the “prime” mark that I put on variables as I update them to new values.)
In Eq. 45, we saw that the rewritten constraint equation uses and , so let’s calculate those:
octave:5> Aprime = ABinv*A ans = 1 1 1 0 0 1 0 -1 1 0 -2 0 -1 0 1 octave:6> b = [6;10;4] b = 6 10 4 octave:7> bprime = ABinv*b ans = 6 4 -2
That gives us our constraints. Now, to rewrite the objective function is a little more complicated. Recall is the subset of basis elements from the objective function coefficients corresponding to our desired basis , so in this case, elements 2, 4 and 5 of , then we calculate ,
octave:8> CB = [3; 0; 0] CB = 3 0 0 octave:9> y = ABinvT*CB y = 3 0 0
Now we can calculate the new ,
octave:12> c = [2; 3; 0; 0; 0] c = 2 3 0 0 0 octave:23> cprime = c' - y'*A cprime = -1 0 -3 0 0
Finally, calculate ,
octave:26> y'*b ans = 18
giving us (hopefully) the revised linear program
(52) | ||||
subject to | ||||
(53) | ||||
(54) |
where
(55) | ||||
(56) | ||||
(57) | ||||
(58) |
If all has gone well, that has changed our problem statement from one that was canonical in the 3, 4, 5 basis to one that is canonical in the basis 2, 4, 5. Confirming our conditions: columns 2, 4, and 5 form an identity matrix () (check!); the corresponding entries in our objective function are zero () (check!).
Whew! It worked.
3.5 The Basic Solution
(This is material not covered in class.)
Every basis is associated with a unique basic solution. The elements of the basic solution related to the basis , which we will call the vector , come from the equation
(59) |
The elements not related to are part of the partition , of course, and all of those are 0, . Put and back together in the proper order to get .
4 The Simplex Algorithm
4.1 The Convex Polytope
A polytope is an -dimensional polygon (2-D) or polyhedron (3-D). It is convex if a line drawn between any two points on the surface of the polytope passes only through its interior.
A polytope can be created by cutting the space with a set of planes. In a linear program, each constraint defines a plane, and the total set defines the polytope. For our purposes here, the polytope doesn’t have to be bounded in all dimensions and directions. Before any constraints, we start with the unbounded polytope that is the positive quadrant of our space. If all of the constraints are necessary, adding the constraints one by one will chop off some of the space and create a new face for the polytope, keeping it convex and adding some number of vertices and edges in the process.
Ultimately, the possible solution space of a linear program with a feasible solution is a polytope. One (or more) of the vertices of the polytope will represent the optimal solution. Thus, we can find that vertex by starting at any vertex and sliding along the edges until we find the optimal one. This sliding is our basic simplex operation. (Mathematicians will tell you it’s not a precise name, that “simplex” really means the -dimensional triangle, but it’s the common name, so we’ll go with it.)
Question: Does the basic feasible solution we begin with have to be a vertex? Why won’t any point in the interior of the polytope do? In three dimensions, running in any direction will first encounter a face, then a second run along the face will find an edge, and the third run along the edge will find a vertex. Will these three steps suffice in 3 dimensions? Perhaps the basis rotation in the core algorithm isn’t good enough to make the algorithm work right, unless we first find a vector normal to the face we encounter on the first run? What about dimensions?
4.2 Anzen Dai-ichi
The idea of the simplex algorithm is pretty simple: repeat a simplex operation until no more simplex operations improve the result, and you’re done. But there are a couple of things we need to worry about:
-
1.
The second simplex operation might partly undo the work of the first!
-
2.
At each vertex, we have several possible choices for which way to go next. If we choose poorly:
-
(a)
we might be inefficient in finishing; or
-
(b)
worse, we might wind up looping forever!
-
(a)
-
3.
We need to be careful to recognize when we’re on an edge that extends forever (that is, the solution is unbounded).
-
4.
Finally, of course, we need some way to recognize when we are done, and have found the (or an) optimal solution.
4.3 The Basic Algorithm
See the separate file uploaded to SFS for pseudocode for the basic algorithm, slightly reformatted from p. 70 of textbook.
5 Homework
See the separate file uploaded to SFS.
6 References for this Lecture
-
1.
https://www.gnu.org/software/octave/doc/interpreter/Three_002dDimensional-Plots.html
has examples of the Octave functionpatch
. -
2.
https://www.gnu.org/software/octave/doc/interpreter/Graphics-Object-Properties.html
-
3.
https://jp.mathworks.com/help/matlab/ref/patch.html
is help in Japanese on using thepatch
function in Octave. -
4.
I recommend that you avoid learning more about finding the basic solution for right now; it’s really messy. But if you want to, I found the video
https://www.youtube.com/watch?v=eYa5aJZhfpg
helpful.