Dynamic Programming Series #6: Maximum Profit in Job Scheduling

Today's problem is definitely challenging, although it is quite a classic problem that you might have seen in your algorithms courses. For those of you who have seen this before, this will serve as a great comprehensive review, and for those who haven't seen this before, take your time to digest everything you'll learn here.

First, let's take a look at the problem.

We have n jobs, where every job is scheduled to be done from startTime[i] to endTime[i], obtaining a profit of profit[i].

You're given the startTime , endTime and profit arrays, you need to output the maximum profit you can take such that there are no 2 jobs in the subset with overlapping time range.

If you choose a job that ends at time x you will be able to start another job that starts at time x.

Example 1:

Input: startTime = [1,2,3,3], endTime = [3,4,5,6], profit = [50,10,40,70]

Output: 120

Explanation: The subset chosen is the first and fourth job.

Time range [1-3]+[3-6] , we get profit of 120 = 50 + 70.

Example 2:

Input: startTime = [1,2,3,4,6], endTime = [3,5,10,6,9], profit = [20,20,100,70,60]

Output: 150

Explanation: The subset chosen is the first, fourth and fifth job.

Profit obtained 150 = 20 + 70 + 60.

Example 3:

Input: startTime = [1,1,1], endTime = [2,3,4], profit = [5,6,4]

Output: 6

Before we get into the solving this problem, let's consider a simpler version of this problem. Let's suppose that the profit for all jobs are equal. Essentially, what this means is that completing as much jobs as possible will be our optimal solution (it really doesn't matter what job you take as long as it is a job). Doesn't this sound familiar? Bingo! This sounds like a greedy algorithm type of problem.

Our greedy algorithm template:

Consider jobs is some order
Take each job if it doesn't conflict with the ones already taken

The key here is to determine the order we want to consider the jobs. Some orders we can consider...

Shortest Interval: ascending order of (finish - start) times
Fewest Conflicts: ascending order of Ci, where Ci is the number of remaining jobs that conflict with job i
Earliest Start Time (EST): ascending order of start times
Earliest Finish Time (EFT): ascending order of finish times

Let's consider each ordering and try to think of some counterexamples for them (i.e., why the order doesn't work).

Shortest Interval

shortest interval.PNG

Fewest Conflicts

fewest conflict.PNG

Earliest Start Time

The blue bar indicates the job that the greedy algorithm will choose and for all the three cases, the solution is not optimal.

Finalizing our greedy algorithm with EFT

Sort jobs in ascending order of finishing times
When deciding whether job i should be included, check whether it conflicts with all previously added jobs by checking if start time of job i >= finish time of job i - 1
If it doesn't conflict, then we can add it

Now, let's try to apply this greedy algorithm to our problem of maximizing profits in job scheduling when profits for each job are not the same. You'll realize that it will fail miserably. This is because now, not only is the number of jobs important, but also how much profit you make from the job. Which makes this a weighted interval scheduling problem, where each job is weighed differently such that the job with a higher profit is weighed higher than the job with a lower profit.

So what did we take away from the simple unweighted interval scheduling problem? Some nice insights but most importantly, the ordering that matters to us.

When jobs are unweighted, we can simply sort jobs in ascending order of finishing times and add the job if it doesn't conflict with previously added jobs. But in the weighted version, we should add if it doesn't conflict with previously added jobs AND increases our current profit. In other words, we want to take the max of (current profit + job i's profit, current profit). Using this idea, let's try to implement this in Python.

First, let's create a job class such that each instance of a job has a start time, an end time and a profit.

class Job:
    def __init__(self, start, finish, profit):
        self.start = start
        self.finish = finish
        self.profit = profit

Next, from our input arrays, let's create job instances and sort them based on their ending times like so:

def solution(startTime, endTime, profit) -> int:
    jobs = []
    for i in range(len(profit)):
        jobs.append(Job(startTime[i], endTime[i], profit[i]))

    # sort jobs in increasing order of their finish times
    jobs.sort(key=lambda x: x.finish)

After that, doing our usual bottom-up approach:

    # initialize max profit table
    maxProfit = [0 for i in range(len(jobs))]

    maxProfit[0] = jobs[0].profit

    for i in range(1, len(jobs)):
        # find the index of last non-conflicting job with current job
        # j = -1 if no index could be found
        j = get_last_non_conflicting_job(jobs, i)

        # include the current job with its non-conflicting job
        profit = jobs[i].profit
        if j != -1:
            profit += maxProfit[j]

        # store the maximum profit by including or excluding the current job
        maxProfit[i] = max(profit, maxProfit[i-1])
    return maxProfit[len(jobs) - 1]

To get our last non-conflicting job, we can do a simple linear search.

def get_last_non_conflicting_job(jobs, n):
    for i in reversed(range(n)):
        if jobs[i].finish <= jobs[n].start:
            return i
    return -1  # if no non-conflicting job is found

Although, our linear search can be optimized further using a binary search instead.

def get_last_non_conflicting_job(jobs, n):
    low = 0
    high = n
    while low <= high:
        mid = (low + high) // 2
        if jobs[mid].finish <= jobs[n].start:
            if jobs[mid + 1].finish <= jobs[n].start:
                low = mid + 1
            else:
                return mid
        else:
            high = mid - 1

    return -1  # if no non-conflicting job is found

And we're done! I think the hardest part of this problem was realizing that ordering matters and determining the order in which we should iterate the jobs through, and that's why I focused on that. Besides that, it really is just another dynamic programming problem.