SIMC Planner

Here I tell an abridged story spanning three and a half years; and embedded within it are some interesting things I feel I should share, and a bit of algorithm analysis.

First, Second, and Third Editions

Three years ago I built the first version of a program named SIMC Planner, as an assignment in my high-school computer studies course.

By requirement, the graphical user interface was implemented entirely with Java Swing, without the aid of drag-and-drop interface builders. That meant we had to programmatically set the position of every interface element; to somehow visualize the interface in our head and make progress through crude trial-and-error. The tediousness of it all limited how far I could go, and the finished product looked something like this:

It was named SIMC Planner because it was intended to aid in the planning of a biannual competitive mathematics event hosted by my school, named the Singapore International Mathematics Challenge, or SIMC in short.

A short personal note on the SIMC. It is my belief that the SIMC is remarkably unique, and in some ways more engaging than plain Olympiads, for it involves more than just pen and paper. Computational tools are not only allowed but are arguably necessary, which opens the door to possibilities formerly impossible in static examinations; and second, the SIMC involves academic reports and research presentations, which certainly introduces dynamism, creativity, and tests a broader skill-set. And lastly the SIMC is a team game, as opposed to the typical individual effort of Olympiads.

It's somewhat analogous to comparing the International Young Physicists' Tournament to traditional Olympiads, in which case I would state that both are enriching in very different ways.

But I digress. Near the end of the competition, the SIMC calls for a series of presentations by participating teams to a panel of judges. There are some guidelines on how this is to be planned, an overview of which I present in point-form below:

There is a list of participating teams, of judges, and of available venues.
Each team must present a specified number of times, to a different judge each time. Equivalently, no judge should judge the same participant more than once.
Each team can present at most once during a presentation session, since a team cannot be present in two locations simultaneously.
Presentation sessions start after a specified start time, and last a specified duration.
A number of break times are to be arranged at specified timings with specified durations.
Each judge is to be given a venue to stay in, during the entirety of the presentations.
Each team is to be given a venue to stay in, in order to prepare for the competition.

The aim is to to come up with an optimal schedule, that is, one that satisfies all these guidelines with the least number of sessions. This is a simplistic example of the scheduling problem. It is tedious to come up with such a schedule manually, so I tasked myself to build a software solution.

The algorithm I implemented was a very simplistic randomized brute-force approach, in which the computer would attempt to fill the schedule with random valid judge-participant pairs. If the finished schedule stretches beyond the theoretical minimum, then the computer retries. In essence we cross our fingers and hope the sheer entropy of all the randomization would land us an optimal schedule quickly. There'll be more elaboration on the algorithms later.

In my fourth year I built the second version of this program, this time using JavaFX. The emphasis of the coursework had shifted to inheritance and abstraction—things like interfaces, abstract classes, generic types, parallelism—and so I modified my project to match.

There were fade-in and fade-out transitions between screens; writing and reading files were done in parallel through Java's concurrency packages; there were some minor optimizations of the brute-force algorithm originally implemented in the previous version. Sure, it did look a lot better, but all these aesthetics fragmented the code-base quite a bit, and I really couldn't bother to clean it all up.

Some time in my sixth year I got called up by one of my teachers. The program got stuck trying to generate a schedule, and for a while we tinkered and tried to convince the program to work. I eventually got it to work on my computer with minor manual intervention, but it took more than 6 minutes to accommodate 63 participants and 47 judges.

I had known that problem for a long time. The brute-force algorithm has ridiculous time complexity. It was by sheer luck that it cooperated during the project demonstrations, and during the previous SIMC. Some combinations of parameters are more conducive for a quick solve than others, apparently.

Some time later she asked if I was interested in improving the program; but I was busy then, and so I declined. A few months in compulsory military service doing menial labour, and I figured I'll do something useful with my life. I came up with a third version of the SIMC Planner.

One can easily see that this program was designed to be bare-bones right from the start, with all traces of unnecessary aesthetics removed. The usual three-button main screen had been entirely replaced by one that combines the schedule configuration panel with a right-side table display. By placing the most-used elements on the main screen, user effort is reduced.

When the program first launches, it attempts to load the input files from last-saved settings; if this is successful, then the program displays the imported data on the right-hand tab pane. If it isn't then it falls back to defaults, and waits for the user to tweak the import files and schedule parameters manually.

Configuration of break times can be done by clicking the "Configure Breaks" button, at which point a pop-up panel would be launched. Users can choose to declare the breaks in two modes—either by specifying timing, or by specifying which presentation session the break is to be placed after. When all parameters have been set, the program would enable the "Generate Schedules" button, which would allow the user to initiate the generation algorithm. Upon completion, the schedules would be displayed on the tab pane, and the written Excel files would be automatically opened.

That's not all. I'd included two other algorithms for the user to choose from, in addition to the brute-force one. One is based on graph theory, and the other, a rolling allocation approach; the latter promises speed-ups of orders of magnitude. This should completely solve all of those your-program's-a-snail problems encountered in earlier versions. Again, more elaboration on these algorithms later.

I ought to talk a little about about the graph-based approach, because I was quite excited when I first thought of it. And then afterwards I realized that it was a fairly standard method of solving scheduling problems, and worst still, that its performance didn't even come close to what I had expected, and so I ended up quite disappointed. It anchors upon the observation that the participants and judges can be represented as the two partite sets of a bipartite graph. The edges between the partite sets then represents valid possible pairings, for a presentation session. This way the allocation problem becomes equivalent to finding a maximum matching in the graph, for which efficient algorithms exist.

Since then I had always wondered why I had failed to recognize this as a viable solution earlier, and the conclusion was that despite completing introductory graph theory in my sixth year—a subject incredibly applicable to all sorts of different fields—I hadn't really taught myself to see problems from various perspectives all that much. I would face this same problem when attempting to analyse the time complexity of the algorithms, but that is for another post.

Bits and Bytes

I'd learnt quite a bit over the development of these three versions of roughly the same software, and I felt it might be good to share it. It might help those who are similarly starting out on Java.

The first is on the styling of JavaFX's TableView. Those who had meddled with it will know that JavaFX offers no easy way—easy as in calling pre-existent methods—of adjusting fonts displayed in the table cells. One has the option of overriding the table cell factory, which does offer great control in styling, but is otherwise troublesome. An alternative is to create a CSS stylesheet, and then reference that stylesheet in Java. A very simplistic CSS stylesheet is shown below, as an example.

And the code for referencing this in Java goes something like this, where tablestyle.css is the stylesheet we have created.

But if your CSS styling is not too complicated, you can actually utilize CSS styling in Java directly, without needing to create a separate stylesheet. This keeps things rather clean.

Above breakSN is a TableColumn that we wish to style. This method, unfortunately, is not able to style the headings of a TableView; it is able to style only the cell contents. To style the headings, one would have to use a separate CSS stylesheet.

Next, on the inclusion of text fields and combo-boxes within table cells. The breaks configuration poo-up panel in my third version used text fields wrapped in table cells to accept user input for start and end times for break timings, and combo-boxes wrapped in table cells to accept input for breaks sequencing. If this was to be done in older versions of JavaFX, one would have to override a considerable amount of content in order to get these to work; one may even choose to rewrite a custom TableCell altogether. In essence, it's a lot of work.

But in JavaFX 2.2, there's there's built-in support for placing text fields and combo-boxes in table cells. One can use TextFieldTableCell and ComboBoxTableCell to achieve this, and it's remarkably a single line of code for each. Below we show the initialization for the start time and end time columns, as an example.

Lastly, on Java 8's lambda expressions. Lambda expressions is a newly introduced feature that simplifies and shortens a lot of code, especially when it traditionally involves overriding methods or creating anonymous classes. Continuing our example on TextFieldTableCell and ComboBoxTableCell, one would have to specify what the text field or combo-box does when a user has inputted fresh data; this is usually done through the setOnEditCommit method. A traditional way of doing this would read something like this.

But using lambda expressions, we can shorten the code into this:

We avoid creating creating an EventHandler; and we also spare ourselves the trouble of finding out which method to override. There's a lot of other use cases in which lambda expressions would aid greatly. If you use NetBeans there's an easy way to learn the ways of utilizing lambda expressions—just write everything as you see fit, and then if a yellow notification shows up you can right-click and let NetBeans do the conversion for you.

Algorithms

I mentioned earlier that there would be an elaboration on the various scheduling algorithms implemented in the three versions of the software; here it is. Here we have a close look at the algorithms, and discuss a little on the performance and problems of each. There will be a follow-up post on analyzing the time complexity of these algorithms.

Before diving into the algorithms, it might help to discuss some preliminaries.

Preliminaries

First we establish some notations. We denote the set of participants P, the set of judges J, and the set of venues V. For all purposes these can be considered ordered sets.

Now, on how data is stored. We employ an object-oriented approach. Each judge object stores a list of participants, each entry corresponding to the paired participant for that session. We denote the participants list of some judge A as A.P, and we denote the n-th element of that participants list A.P[n]. Similarly each participant B stores a list of judges, denoted B.J, with each entry corresponding to the paired judge for that session. If the session is a break, or if there is no allocated pairing for that session, the entry is filled with a N/A tag.

As with the usual mathematics convention, |A.P| denotes the size of set A.P, equivalent to the number of assigned sessions that judge A has. We let |A.P|* indicate the number of assigned sessions of judge A that are not N/A slots—that is, the number of assigned presentations A has. Similarly, |B.J| is the number of assigned sessions of participant B, and |B.J|* is the number of assigned presentations of B. We can see that |A.P|* and |B.P|* must always be equal or less than |A.P| and |B.J| respectively.

Next on some theoretical limits. These limits would carry greater importance when we analyze the time complexity of the algorithms, but they're also involved in the execution of the algorithms. The theoretical minimum number of sessions can be written

And, as a direct result of the pigeonhole principle, the theoretical maximum number of presentations for a participant can be written

Simple Brute-Force Algorithm

The first version uses a very simplistic randomized brute-force algorithm. It is randomized, because the allocated participant-judge pairs are chosen on a random basis to ensure fairness; and it is of a brute-force approach, for it grinds through possible pairings until a satisfactory schedule is discovered.

For each presentation session, it iterates through the list of judges, and attempts to assign a participant to the considered judge. If the considered judge-participant pair is found to be invalid—because the participant has presented to the judge before, or because the participant has already been allocated a presentation for the current session, or because the participant is done with the required number of presentations—the algorithm scraps the pairing and finds a new one on a random basis. If a judge has no eligible participants for pairing, he is assigned an #N/A placeholder, which is another way of saying that he has a break for that slot.

The algorithm continues grinding through presentation sessions until the number of sessions exceeds the theoretical minimum. At this point the algorithm scraps the entire schedule and tries again from scratch. It makes sense to do it this way, because a schedule would only exceed the theoretical minimum if some sub-optimal pairing arrangements early in the schedule had caused some judges to have no eligible participants for pairing later on, consequently pushing the schedule back. An easy way of solving this is to start anew. Ideally, the entropy of all the randomization involved would land us an optimal schedule at some point.

The pseudo-code for this idea can be written as such.

Notice that selection of participants for pairing (lines 15 to 22) is done on the entire participant pool, without any filtering. One may think of an improvement—to pre-filter the pool before selection, so that there will be zero chance of selecting an invalid participant. The revised pseudo-code with this modification is shown below, with changes highlighted in yellow.

It is a great idea in theory, for it gets rid of an entire layer of iteration, which due to its random nature may or may not terminate quickly; but in practice the computational cost of pre-filtering is too great to make this worthwhile. In particular line 14 of the pseudo-code would take at least O(|P|) time, or perhaps O(log |P|) with more advanced data structures to keep track of valid pairings; and this is not accounting for the time to check the j ∉ p.J condition. This being the innermost nested loop, an addition of even O(log |P|) is pretty significant.

Benchmarking indeed indicates that the algorithm is slower with this modification. It does seem that the sheer speed of random selection on a static list cannot be paralleled by optimizations of this nature, unless one digs much, much deeper.

Perhaps, then, we might remove participants from the pool if we find them to be invalid, thus making sure that we will not select the same participant again by chance. We argue that this might not add a significant time complexity to the innermost loop, for we might hit a valid participant by chance quickly. This modification would go something like this.

But again we find that the modified algorithm performs worse. We can rationalize this by arguing that the chance of repeatedly selecting the same invalid pairing is only high near the very end of a schedule, where the proportion of invalid pairings is high; everywhere else this is a negligible issue. In other words, this optimization affects too small a portion of the schedule favourably, to break-even with its computational costs. And its costs is not small, either—removal of an element from P may take up to O(|P|) time in worst-case, if P is stored as an array or array list; storing P as a linked list would bring this down to constant time, but at the expense of everything else that needs random access.

These findings restricts us to a very simplistic Naive brute-force approach, whose form this algorithm had ultimately taken. Its Naive appearance hides the surprisingly fact that it outperforms numerous optimization attempts in practice.

Randomized Bipartite Maximum Matching Algorithm

We can take a look at this task from another perspective—from the perspective of graph theory.

Suppose we represent participants and judges as vertices of some graph. We let edges indicate the possible valid pairings between judges and participants. Pairings must consist of one judge and one participant, and therefore there will be no edges connecting participant to participant, or judge to judge. Our graph is a bipartite graph.

The graph shown above is a complete K(|J|, |P|) bipartite graph; but in general, because not all pairings between participants and judges would be valid, due to previous allocations and so forth, the graph would be not complete. In fact, because some participants might have already finished their required number of presentations towards the later portions of the schedule, not all participants would be included in the graph.

Allocating participant-judge pairings would then become equivalent to selecting edges in the graph—in other words, finding a matching. To have an optimal schedule we would like each session to have the maximum possible number of pairings; this is equivalent to finding a maximum matching in our graph.

It may help to clarify the difference between maximal matchings and maximum matchings. A matching is maximal, if upon the addition of any other edge, it ceases to become a valid matching—some vertex somewhere is connected to more than one other vertex. A maximum matching is a matching that contains the largest possible number of edges. And so a maximum matching will always be a maximal matching; but a maximal matching may not be a maximum one. One may understand that a maximum matching has stricter conditions than a maximal one. We reiterate that we seek a maximum matching for our purposes.

The good thing about all these is that there exists efficient algorithms to find maximum matchings in a bipartite graph. In particular, the fastest algorithm known thus far is the Hopcroft-Karp algorithm. In fact there also exist an efficient algorithm to find all maximum matchings in a bipartite graph; but this is not necessary for our purposes.

Utilizing the Hopcroft-Karl algorithm, we can write the pseudo-code for schedule generation as follows.

All seems well. We have a systematic, efficient way of assigning pairings for each presentation session. The problem remains, however, that some pairings earlier in the schedule may render some participants invalid for later pairings, and this causes the schedule to stretch beyond the theoretical minimum. And so we still have to start over from scratch if it is found that the schedule is not optimal.

At this point I cannot think of a way of formulating a graph-based solution without this iterative trial-and-error. If you have any ideas, please let me know. I'd be really excited to learn of a way out.

And here comes the million-dollar question—is this maximum matching algorithm faster than the brute-force one? Not so. Again the sheer speed of random selection trumps everything else for typical use cases; but time complexity analysis reveals that this algorithm may have an upper hand if we increase the problem size by an order of magnitude or two. The same flaw plagues both of these algorithms—the need to iterate again and again to discover an optimal schedule.

Rolling Allocation with Random Swaps

This is one of those moments when you'd realized there's a simple, almost laughably trivial solution to the problem you'd cracked your head over, and you feel like smashing your head into a wall, or perhaps hiding in a hole and never coming out.

Again this involves a shift in perspective. Instead of attempting to generate an optimal randomized schedule all in one go, we start from an optimal non-random one and randomize it. The idea is that perhaps there's some way to generate an optimal non-random schedule, much like the way templates are used, and this would make things a whole lot easier.

And indeed, a rolling allocation scheme can be used to generate that starting point that we need. It is rolling because it literally iterates through the participant pool and assign each one to judges, preserving order by default. Out-of-order allocation is performed only when circumstances force us to, and even so we follow a very rigid and simplistic way of shuffling allocation order.

A simple example can be realized with 3 judges and 6 participants. We let each participant present twice. A simplistic in-order rolling allocation would produce the following schedule:

From the third session onwards the same participants get assigned to the same judges, and the allocation pattern repeats way before the theoretical maximum number of presentations per participant, |J| as determined in preliminaries, is reached. This is evidently a problem when the number of judges |J| divides the number of participants |P|. In general, whenever the Diophantine equation a|J| = b|P| can be solved with integral a and b, this repetition problem will occur.

Further, whenever |P| < |J|, repetition will always occur.

One way to deal with this is to cyclically shift the list of participants forwards by one, whenever a repeat is detected. That is, the participant list {A, B, C, D, E} would become {B, C, D, E, A} when shifted, which would become {C, D, E, A, B} when shifted again, and so on. This scheme solves the above-mentioned problems.

Application of this scheme would give us a complete schedule that is guaranteed to be optimal. At this point, randomizing the schedule is a trivial task—it simply involves swapping pairs of participants randomly, so long as they are valid in each other's positions. The pseudo-code for all these can be written as follows.

Buffer Slots

One of the requirements of the software is that the user has to be able to specify the number of buffer slots in the schedule—that is, the number of judges to be assigned N/A in each presentation session. This is to facilitate the re-arrangement of presentations, in the case that some judges cannot be present.

It is simple to implement such a feature on the algorithms we had devised. Adding buffer slots is analogous to removing judges from the judges pool, so that these judges are not allocated any participants. We can then allocate N/A to these buffer pool independently. For the sake of fairness, these buffer slots must be allocated to random judges; we therefore pick our buffer pool on a random basis.

For the brute-force and maximum matching algorithms, this must be done during the allocation of each session, so that the buffer slots appear random across sessions. For our rolling allocation algorithm, the same buffer pool can be used for all sessions during the generation of the non-random base schedule; the random swapping later on would randomize this sufficiently.

Concluding Remarks

I'd mentioned that all three algorithms are implemented in the third version for users to choose from; and I'd also spent quite some effort in writing up this post with all three algorithms in it. It is also quite clear that the rolling allocation algorithm outperforms the first two by orders of magnitude.

And so one may ask what's the damned point, in even remembering that the first two algorithms exist.

It's for fun and laughter and for passing time in military service, my friends.

#Schedule #Programming #Java #Algorithms #GraphTheory #SIMCPlanner

Blog

SIMC Planner