I just finished competing in my first Kaggle competition (which was a lot of fun!), and since I did decently, I thought I’d share my strategy here. Note that I am not describing the algorithm in detail; I am instead focusing on the optimization strategy.
Problem Description
The problem was about scheduling jobs for Santa’s elves. Supposedly the competition was sponsored by an analytics company to try to publicize their software, but I suspect they were a front company for Mr. Claus himself…
Orders for toys come in from Jan 1, 2014 to sometime in December of 2014 (for some reason, in the competition data, they ended around Dec 11). Each toy has an associated duration, ostensibly the number of minutes in which an elf can make the toy (but it gets more complicated), and cannot be started before the order arrives at the North Pole. Toys must be completed all at once (i.e. in consecutive minutes).
Approved working hours for elves are from 9 AM to 7 PM, seven days a week. Each elf has a productivity rating (which starts at 1.0) indicating how quickly they make toys. For each hour an elf works during approved working hours, the elf’s productivity goes up by 2%. For each hour an elf works outside of approved working hours, the elf’s productivity goes down by 10%. Also, if an elf takes, for example, an extra hour beyond 7PM (and therefore outside of approved working hours) to finish a job, the elf must take off the next approved working hour for which this particular elf would otherwise be available.
The actual length of time a job of duration d takes an elf of productivity p is . Elf productivity is capped above at 4.0 and below at 0.25. The goal is to schedule 10,000,000 jobs (done by at most 900 elves) in as little time as possible. If W is the size of the workforce (number of elves who do at least one job) and M is the elapsed time (in minutes) between January 1, 2014 at midnight and when the last job is finished, then the value we must minimize is .
Exploratory Data Analysis
If that seems like a lot of jobs for not many elves, it’s because it is: the winning solution finished around the year 2368 (my best solution finished in 2375). Below is a histogram showing the distribution of job lengths, as well as a bar chart which calculates the total job minutes in jobs of various lengths. Note that there are a handful of jobs with durations longer than shown on the histogram, but not enough to show up on the histogram anyway.
Any job longer than 2,400 minutes cannot be done in a single day, even by an elf with productivity 4. Any job longer for which less than 85% of the work is done during approved hours will lower productivity, so any job longer than about 2,850 minutes is guaranteed to lower productivity. Furthermore, any elf of productivity 4.0 that works 3,700 minutes on a job will end the job with productivity 0.25. This means that any job of duration longer than about 15,000 minutes will take any elf that does that job down to minimum productivity. Meanwhile, it takes about 8,400 minutes () of work during approved hours for an elf to raise productivity from 0.25 to 4.
One finds that the distribution of job lengths therefore means that the elves will be doing most of their work (not necessarily most of the jobs, but most of the job-minutes) at or near minimum productivity, a fact that will inform our optimization strategies.
Optimization Strategies
Short Jobs
We can think of short jobs which might raise productivity as a sort of currency, and ask how that currency should be spent (i.e., to what level do we raise productivity to do various productivity-reducing jobs). Before we ask this, however, we must recognize that whether a job raises or lowers an elf’s productivity depends on the elf in question and the time at which the job is started. Therefore one goal must be to maximize the number of jobs which raise productivity. Our strategy for this is that, if we are currently choosing a job for an elf to start at the beginning of approved hours (9 AM), we first try to pick any job that takes almost exactly 600 minutes, given the elf’s current productivity. Our later strategies will involve doing many very long jobs at relatively low productivity, but starting with this ensures that we will do most jobs of length up to 2400 minutes when those jobs will raise the elf’s productivity.
Long Jobs
Next, we consider how much to raise productivity to do each very long job. We think now of short jobs as currency, and ask how much we want to pay (in short jobs to raise productivity) to do each very long job. If we consider two very long jobs, we assume that we spend a fixed number of short job-minutes raising productivity for the two jobs, and ask how to allot those minutes to minimize the time it takes to do the two very long jobs. At the moment, we assume that “very long job” means that the elf will be at productivity .25 after completing the job.
The length of time it takes an elf of productivity p to do a job of length j is . Thus if we are “spending” c hours of productivity-raising jobs on our long job of length j, the time it takes is , assuming the elf started at productivity .25 (this is a fair assumption, but unnecessary, as we will see). Suppose we have an hour of productivity-raising job-minutes to allot to two long jobs of durations and . Then the length of time both jobs take is
where c is the percentage of the hour of productivity-raising job-minutes allotted to the first long job. We wish to choose c so as to minimize f(c). This is a simple calculus problem.
We find that is minimized when , in other words, when the hour of productivity-raising job-minutes are allotted such that the two long jobs are done in the same length of time. This determines our strategy for doing very long jobs.
Medium Jobs
Thus far our strategy is to first make sure we do as many jobs as possible at times when they will raise productivity, and then to try to choose productivity levels for longer jobs at which they will all take about the same amount of time. However, this must be adjusted to reflect the fact that jobs of medium length do not all decrease the productivity of the working elf to 0.25. We adjust simply by checking, when ramping up productivity to do a very long job, if it might be better to do a job of medium length.
To do this, we seek to maximize the benefit from doing a job of medium length before the long job, as compared with doing the long job first. That is, we seek to maximize
where is the current productivity of the elf in question and
is the productivity after doing the medium job first.
Note that we are making several assumptions. First, we assume that we are starting either job at the beginning of the day. Second, we assume the medium length job cannot be done in a single day (otherwise, we should consider it a productivity-raising job). Third, we assume that the medium length job is not long enough to take until the approved hours of the day after it is begun. This assumption was justified experimentally.
By finding the which maximizes (and satisfies our assumptions under which this analysis makes sense), we find the optimal job to do instead of the job of length . Of course, we must still check whether it is in fact better to do the job of length rather than the job of length .
Results
Below is a plot which graphs realtime taken to do jobs against the job duration. The plot actually only graphs every 500th job, for computational reasons.
In the bottom left we see the short and medium length jobs, short jobs being done at a range of productivity levels and medium jobs typically being done at high productivity levels. As we move left along the x-axis, we next see a line with slope 4 representing many long jobs which were done at productivity 0.25, because even at productivity 0.25 they could be done in a shorter time than the length of time we aimed to do all long jobs in. Next we see the horizontal line at about 44,900 minutes of realtime, the time in which we attempted to do long jobs. (The line of slope 4 does not exactly meet the horizontal line because the precise algorithm used did long jobs from longest to shortest, and it ran out of productivity-increasing jobs before getting to jobs of duration less than .)
Finally we see that a number of very long jobs were done in noticeably less than 44,900 minutes (a parameter chosen by trial and error). This was due to extra productivity ramping done to ensure that all jobs which could be used to ramp productivity were used in that way. These very long jobs done at higher productivity form a fairly straight line due to the long job/medium job calculation described in the previous section.
Final Thoughts
Minor adjustments in my strategy did not seem to be getting me closer to the winning solution, so I’d be very curious to see what they did. I suspect that the underlying strategy may have been very similar, but that once a solution was obtained it was improved with evolutionary methods. Such methods would be difficult to apply to this type of problem since changing an elf’s job assignment changes the elf’s productivity level for all future assignments. But such an optimization method might be possible.
In addition, I completely skipped the priority queue and knapsack problem aspects of this problem. I skipped the priority queue aspect by not assigning any jobs until all orders come in. This cannot possibly be delaying the final job finishing in my solution by more than a year, and an examination of when orders for jobs of various lengths come in (the distribution is not uniform) shows that taking this feature into account would likely only improve my solution by a few months. But this is an improvement that could be made. As for the knapsack aspect, I essentially solve it greedily, a strategy I use based on the fact that there are many jobs at each duration, so for most of the assignment period the algorithm can usually find a job for precisely the length it wants. But perhaps this can be done in a better way.