Complexity: More Complex Than You Think

This is really long, so it’s below the fold.

The problems I use aren’t for the most part mathematical brain teasers, because the real world problems I’m trying to teach them to solve aren’t for the most part brain teasers. However, when you have helped students work through problems for years, and actually listened to them, you discover that you have a simplistic concept of complexity, or that complexity is more complex than you think it is.

Complexity, or if you prefer, difficulty, exists on several different levels, and can be due to a variety of different variables. In other words, a mathematically simple problem can be highly complex. This is why I have a basic rule when introducing students to problems: Work from the familiar to the unfamiliar.

This is a rule, by the way, that some colleagues have scoffed at. Why not throw things at them like churn rates or NPV? They’re in the business school, after all. Well, the reason is this: If the object is to teach them how to run a t-test or a simulation model, it’s counter-productive to add things they don’t understand to the problem. Once they get the basic idea, we can move to less familiar contexts and incorporate less familiar variables.

If I’m teaching statistics, I present the material in terms of the familiar — grades, for example, because what are students more familiar with than grades — and then work toward the real world problems, which to students, are unfamiliar. When I’m teaching decision sciences, I start with familiar contexts, such as buying a car or selling football team T-shirts, using familiar variables, such as cost, revenue, and gross profit margins. I add unfamiliar variables, such as NPV, after students have a basic grasp of how to solve the problem, and work toward those unfamiliar real world problem contexts.

This building block approach to problem solving is unfashionable, but it works.

I also never miss a chance to make a point, or teach students a valuable lesson. Faculty parking stickers cost $300 a year. I (and everyone I know) tend to get riled when I go to campus early in the morning, only to find all of the spaces taken up, many by student vehicles with no stickers. So when we’re learning how to construct and solve simulations, I start with this problem.

The university strictly enforces parking policies on campus. The first parking violation costs $40. The second costs $60, and all successive violations cost $75. Each hour a vehicle is parked on campus, there is a 17% chance of its being ticketed. A bus pass costs $53.47. Create a simulation that models the costs incurred over a semester in parking violations, and run 1000 iterations of the simulation. Assume 30 hours of (illegal) parking per week (15 hours of classes, and an additional 15 hours for other reasons). There are 16 weeks in the semester. Is it cheaper to park illegally, or buy a buss pass?

I make as many connections to previous material as I can. It reinforces what they learned, and it makes the connections (and justifications) obvious to the students. So we revisit the problem later in the semester, when students are more skilled.

The university strictly enforces parking policies on campus. A first parking violation costs $40. A second costs $60, and all successive violations cost $75. A bus pass costs $53.47 per semester. The probability of being ticketed increases 20% over the base probability for every additional hour a vehicle is parked in the same lot. The base probability varies according to the season, as described in the table below:

Month
Weeks
Probability
AUG
1
21%
SEP
4
21%
OCT
4
21%
NOV
3
19%
DEC
3
17%
JAN
3
16%
FEB
4
16%
MAR
3
18%
APR
4
20%
MAY
2
21%

Create a simulation that models the costs incurred over a full school year in parking violations, and run 1000 iterations of the simulation. Use your class schedule in the model, using the data in the table above. Is it cheaper to park illegally, or buy a buss pass?

The first of the parking ticket simulations we do in class, as a class. I walk them through it. The second problem students work on individually, while I run around helping and answering questions. Run. Often literally. I’ve sprained an ankle several times teaching. (There is another “life lesson” problem listed below: The CCAmerica problem.)

Back to complexity. One thing I have noticed with, say, MBA students new to teaching is that they have a simplistic idea of complexity. One of the problems is that they are familiar with the problems and how to solve them. The other problem is that they see complexity solely in terms of mathematics.

Problem complexity can be textual, that is, a relatively simple problem can be made highly complex just by the way it is worded. Consider the following:

You have gotten a job in State College, Pennsylvania, the home of Penn State. Like most small college towns, property values in State College are high, but property values in the communities surrounding State College are notably cheaper. You have looked at two houses that you really like, one in State College, and the other thirty miles away, and you want to calculate an amortization table so you can compare the total costs of both houses. To calculate commuting costs, assume that you will work 48 weeks in the year, 5 days a week. Assume a 5% per year increase in gas per gallon per month. Note that you will not owe property taxes the first year—but you will every year after the first (property tax rates are included in the Excel file, as are mortgage and interest data, your downpayment, and the market prices of the two houses).

Open which_house.xls and use the information first to calculate the missing information for each of the two houses (each house is on its own worksheet; the first worksheet has all the information on it that applies to both). Which house would over twenty years be cheaper?

Wordy? Yes. But consider the first version that was submitted:

Compare the total costs over time of buying two houses, assuming a 48-week work year and a 5-day work week, and a 5% increase in gasoline prices per month. Property taxes are due from the second year. Answer the questions on the Excel worksheet.

The initial version is too terse. It gives the student minimal information (the missing crucial data is in the Excel worksheet, but the problem doesn’t tell the students that). It is worded so tersely that students aren’t sure what they’re supposed to do with it: “Compare the total costs” all by itself doesn’t mean much. “Due from the second year” is vaguely worded. So even though it may be short, it introduces additional complexity into an otherwise mathematically simple problem. That’s why the initially submitted problem was reworded. Of course, you could object to the conversational tone of the revised problem, but since no student has ever complained about informal wording, I don’t consider it a problem.

“Mathematically complex” itself can mean several different things. You can add mathematical complexity by introducing more variables, for example. Contrast the two problems below.

Leary Chemical manufactures three chemicals: A, B, and C. These chemicals are produced via two production processes: 1 and 2. Running process 1 for an hour costs $4 and yields 3 units of A, 1 unit of B, and 1 unit of C. Running process 2 for an hour costs $1 and yields 1 unit of A and 1 unit of B. To meet customer demands, at least 10 units of A, 5 units of B, and 3 units of C must be produced daily. Determine the daily production that minimizes Leary Chemical’s production costs.

The Monet Company produces four types of picture frames, which we label 1, 2, 3, and 4. The four types of frames differ with respect to size, shape, and materials used. Each type requires a certain amount of skilled labor, metal, and glass, as shown in Table A below. This table also lists the unit selling price Monet charges for each type of frame. During the coming week, Monet can purchase up to 4000 hours of skilled labor, 6000 ounces of metal, and 10,000 ounces of glass. The unit costs are $8.00 per labor hour, $0.50 per ounce of metal, and $0.75 per ounce of glass. Also, market constraints are such that it is impossible to sell more than 1000 type 1 frames, 2000 type 2 frames, 500 type 3 frames, and 1000 type 4 frames, and Monet does not want to keep any frames in inventory at the end of the week. What should the company do to maximize its profit for this week?

The two are very similar problems. The Monet problem, however, contains more variables (costs of different materials), and is therefore more mathematically complex. But mathematical complexity also arises in rather unlikely places. Compare either of the above two problems with the one below:

A customer requires during the next 4 months, respectively, 50, 65, 100, and 70 units of a commodity, and no backlogging is allowed (that is, the customer’s requirements must be met on time). Production costs are $5, $8, $4, and $7 per unit during these months. The storage cost from one month to the next is $2 per unit (assessed on ending inventory). It is estimated that each unit on hand at the end of month 4 can be sold for $6. Determine how to minimize the net cost incurred in meeting the demands for the next 4 months.

This problem seems on the surface to be of more or less the same mathematical complexity as the two preceding problems, but students find this one more difficult. This mystified me for a while, until after I had talked to quite a few students about why they found it so complex. Note this passage in the problem:

The storage cost from one month to the next is $2 per unit (assessed on ending inventory).

This seems to be merely one more cost variable. It turns out, however, that students find repeated calculations of the same type, such as we see in the either of the preceding problems (total material costs, etc.) significantly simpler than one, non-repeated calculation, such as the storage cost variable above. Students seem to interpret inventory as a time-related variable rather than a cost-related variable. As a result, they miss the fact that they have to set up an inventory table for each month and calculate the costs at the end of each month.

Adding more variables adds more calculations. The more variables and calculations, the more mathematically complex the problem is. Sometimes, I will make a mathematically complex problem a bit easier for students to digest (the academese for this is “reducing cognitive load”) by introducing familiarity wherever possible, such as the Pigskin problem:

The Pigskin Company produces footballs. Pigskin must decide how many footballs to produce each month. The company has decided to use a 6-month planning horizon. The forecasted demands for the next 6 months are 10,000, 15,000, 30,000, 35,000, 25,000, and 10,000. Pigskin must meet these demands on time, knowing that it currently has 5000 footballs in inventory and that it can use a given month’s production to help meet the demand for that month. (For simplicity, we assume that production occurs during the month, and demand is met at the end of the month.) During each month there is enough production capacity to produce up to 30,000 footballs, and there is enough storage capacity to store up to 10,000 footballs at the end of the month, after demand has been met. The forecasted production costs per football for the next 6 months are $12.50, $12.55, $12.70, $12.80, $12.85, and $12.95, respectively. The holding cost per football held in inventory at the end of any month is figured at 5% of the production cost for that month. (This cost includes the cost of storage and also the cost of money tied up in inventory.) The selling price for footballs is not considered relevant to the production decision because Pigskin will satisfy all customer demand exactly when it occurs—at whatever the selling price is. Therefore, Pigskin wants to determine the production schedule that minimizes the total production and holding costs. Determine this production schedule.

Mathematical complexity also arises from the interpretation of the results. In statistics, for example, students usually pick descriptive statistics up quickly. When you move from descriptive statistics to inferential statistics, however, you introduce a great deal of complexity. For whatever reason, students have a great deal of trouble wrapping their brains around uncertainty.

Consider the parking ticket simulation (I’ll repeat it below so you don’t have to scroll back up).

The university strictly enforces parking policies on campus. The first parking violation costs $40. The second costs $60, and all successive violations cost $75. Each hour a vehicle is parked on campus, there is a 17% chance of its being ticketed. A bus pass costs $53.47. Create a simulation that models the costs incurred over a semester in parking violations, and run 1000 iterations of the simulation. Assume 30 hours of (illegal) parking per week (15 hours of classes, and an additional 15 hours for other reasons). There are 16 weeks in the semester. Is it cheaper to park illegally, or buy a buss pass?

Students don’t have much trouble understanding the variables, setting up the problem, or “solving” it. But this is a simulation. It rests on uncertainty, or probability. You can’t set it up, run it, and get a black and white solution. You have to run multiple iterations (or repetitions) of the simulation, and because you get different results for every iteration, you have to do a statistical analysis of the results and interpret the statistics. This is a great big cognitive roadblock for students. And even when you think they’ve got it, even after they’ve been doing simulations in class for two weeks or more, a student will invariably raise his hand in class and ask, “Why are my results different from hers?”

The only thing to do is repeat that we’re dealing with probability — uncertainty — and although the specific results will differ from student to student and iteration to iteration, the statistics of those results (the means, standard deviations, confidence intervals, and so forth) should not significantly differ — and then show them. It takes time, but it will eventually sink in.

Eventually, you can work students up to doing comparatively complex simulations like this:

CCAmerica is a credit card company that does its best to gain customers and keep their business in a highly competitive industry. The first year a customer signs up for service typically results in a loss to the company because of various administrative expenses. However, after the first year, the profit from a customer is typically positive, and this profit tends to increase through the years. The company has estimated the mean profit from a typical customer to be as shown in column B.

For example, the company expects to lose $40 in the customer’s first year but to gain $87 in the fifth year— provided that the customer stays loyal that long.

For modeling purposes, we will assume that the actual profit from a customer in the customer’s nth year of service is normally distributed with mean shown in Column B and standard deviation equal to 10% of the mean.

At the end of each year, the customer leaves the company, never to return, with probability 0.15, the churn rate. Alternatively, the customer stays with probability 0.85, the retention rate.

The company wants to estimate the NPV of the net profit from any such customer who has just signed up for service at the beginning of year 1, at a discount rate of 15%, assuming that the cash flow occurs in the middle of the year.

The company wants to see how sensitive this NPV is to the retention rate. Do this by showing various retention rates: .75, .80, .85, .90, .95.

Or even much more complex problems which I won’t list here, because they take an average of 5-6 pages in a Word document to list all the variables, and so forth.

Interestingly, complexity pops up in some extremely unlikely places. Consider this problem:

Republic Airlines will launch service in two years, but first, they have to figure out their hub system. Each hub is used to connect flights between cities within 1000 miles of one another. Republic will fly to Atlanta, Boston, Chicago, Denver, Houston, Los Angeles, New Orleans, New York, Pittsburgh, Salt Lake City, San Francisco, Seattle, and Portland. Republic Airlines must know the minimum number of hubs it will need to cover all these cities (each city must be within 1000 miles of at least one hub). Below are listed the cities, and which other cities are within 1000 miles.

  Cities within 1000 miles
Atlanta (AT)
AT CH HO NO NY PI
Boston (BO)
BO NY PI
Chicago (CH)
AT CH NY NO PI
Denver (DE)
DE SL
Houston (HO)
AT HO NO
Los Angeles (LA)
LA SL SF
New Orleans (NO)
AT CH HO NO
New York (NY)
AT BO CH NY PI
Pittsburgh (PI)
AT BO CH NY PI
Salt Lake City (SL)
DE LA SL SF SE
San Francisco (SF)
LA SL SF SE
Seattle (SE)
SL SF SE

Bonus: What will be the minimum number of hubs if the mileage is 750? 1500?

This is an extremely simple problem, except that students really shriek when you give it to them. The first question is usually, “Is everything we need to know here?” or sometimes, “You forgot part of the problem, didn’t you?” When I say, “No, it’s all there,” I always get, “Where’s the data? How can we solve this without numbers?”

It’s the very simplicity of the problem that students find complex. There is only one variable here: Is the city within 1000 miles of another city or not? All students have to do is use a binary variable. One variable. Two or three calculations. That’s it. (The answer, by the way, is three hubs.) It doesn’t even really make any difference what values they use for that binary variable as long as they’re consistent. They could use 1 and 0, or 10 and 5, or whatever numerical values they like and they’ll get the same answer.

The moral of this story is that years of teaching has taught me that complexity is far more complex than I ever realized. I still run into things that students find complex but I do not. When students have trouble with the work you give them, sure, a lot of the time it’s going to be that they don’t have the basic skills they need, or they haven’t learned what they should have last week in class, or they didn’t do the reading, or they haven’t been coming to class, but don’t always assume that’s the problem. Always ask students why they find the work difficult, because it may be something that has never occurred to you. And listen closely, since students often have trouble telling you exactly what the problem is.

Consider this relatively simple statistics problem.

Lessen Waist, Inc. produces low-fat cereals, which they sell in 12-ounce (weight) boxes. Because of settling and production scheduling, Lessen Waist cannot weigh every box of cereal, and 0.35 ounces (weight) is considered to be an acceptable variance from the advertized weight. Lessen Waist weighs a subset of boxes because the filling machines must be adjusted periodically. Use the sample weights below and the appropriate statistical tests to determine if the boxes of cereal are within the acceptable weight. If they are not, use the appropriate statistical tests to determine how much the filling machines need to be adjusted. Report all relevant statistics.

The first problem students have — because a problem is more complex than most realize — is parsing the text of the problem. Far too many students experience some kind of frustration just reading the problem, and find it even more frustrating to try to get past the first reading (sorry to be cliché, but if I had a dollar for every time a student has come to office hours and expressed exasperation at being required to figure out how to figure out the “story problem,” I’d have my own island in the Caribbean). And this problem is getting worse, despite the fact that the new-new-math-free-math emphasizes story problems over equations.

This is probably the simplest problem I’ve listed so far. There is really no set up that needs to be done. Students open an Excel file, put in the acceptable variance weight in labeled cell, decide which test to use, run it, and paste the relevant statistics in the labeled cells. There are no calculations to perform, not even simple sums. Yet there seems to be a cognitive block in merely going from reading the problem to doing it.

I think that like the hub problem, it’s precisely the simplicity of this problem that creates the complexity. Give students a problem with lots of calculations and labeled cells in which to do them, and while some may do the wrong calculations, they will start working on it. They see a cell labeled “NPV,” know they’re supposed to do a calculation there, and try. But with this simple statistics problem, where they open the file and see no label other than “Acceptable Weight Variance,” and a comment box, they don’t understand where they’re supposed to do the calculations, and the howling begins as soon as you give it to them, before they’ve even touched the keyboard.

This one, for example, will cause far less yowling, even though it’s a great deal more complex.

General Ford (GF) Auto Corporation is developing a new model of compact car. This car is assumed to generate sales for the next 5 years. GF has gathered information about the following quantities through focus groups with the marketing and engineering departments.

  • Fixed cost of developing a car: This cost is assumed to $1.4 billion ($1,400,000,000). The fixed cost is incurred at the beginning of the year, before any sales are recorded.
  • Unit Gross Profit: GF assumes that in year 1, the gross profit will be $5000 per car. Every other year, GF assumes the unit gross profit will decrease by 4%.
  • Sales: The demand for the car is the uncertain quantity. In its first year, GF assumes sales – number of cars sold – will be triangularly distributed with parameters 100,000, 150,000, and 170,000. Every year after that, the company assumes that sales will decrease by some percentage, where this percentage is triangularly distributed with parameters 5%, 8%, and 10%. GF also assumes that the percentage decreases in successive years are independent of one another.
  • Depreciation: The company will depreciate its development cost on a straight-line basis over the lifetime of the car.
  • Taxes: The corporate tax is 40%.
  • Discount rate: GF figures its cost of capital at 15%

The first problem has only two variables, the weights and the acceptable weight variance. This problem has quite a few more than merely two variables, not to mention almost as many calculations. Students will, in fact, complain a lot less about this one than they will the statistics problem, or the airline hub problem listed above.

So going from text to calculation isn’t the only thing going on here. Many students get the car problem wrong, but they perceive it as more simple than the statistics or hub problem — even though it is, mathematically, at least, far more complex.

Part of the reason (I don’t know what all of it is) is, I think, that students are suspicious, and see a short, straightforward text problem as a paucity of information. That is, students are always insisting that they need more information to solve a problem, even when they have all the information they need. Students are also suspicious of a problem that seems simple, even when it is. Let’s take the statistics problem. The purpose of the problem is not to stump the students. The purpose is to determine whether students can discriminate among statistical tests and choose the correct one, and whether they can perform the test. That’s it. So yes, it’s simple, but it’s a valid assessment tool.

If you don’t have a lot of teaching experience, then you most likely have a simplistic concept of complexity. But there is far more to it than just the math, and you need to understand that if you produce your own materials.

One Comment

  1. PeggyU:

    Work from the familiar to the unfamiliar.

    This is a rule, by the way, that some colleagues have scoffed at.

    Why? Why would you scoff at that? Gotta start with what you know when you approach what you don’t know. It’s only logical.

Leave a comment