In a first order scheme subcycling can make the code run faster. Let operator L be operator split into L=A+B. Then ABABAB... is equivalent to AAAAA..BBBBB.. to the first order, ie, the order of A and B doesnt matter.
If we have an operator B that gives smaller dt because of courant condition, we can subcycle operator B. Following is a rough algorithm.
1. calculate dtA, dtB according to courant condition. Let dtB 2. calculate Nsub=min[int(dtA/dtB),Nsubmax]. set dtA=Nsub*dtB.
3.
a. apply operator A using dtA.
b. apply operator B using dtB for Nsub times in a loop.
4. Repeat until time>=tlimit.
Be careful to update all quantities after each step for operator splitting to be consistent.

Scaling of speedup with Nsub

Let 'dtcond' and 'dtnocond' be the processing time for conduction step which is subcycled, and rest of the steps that are not subcycled. These times depend on the number of floating point calculations done in one cycle of both steps.
Define speedup by
speedup = (time taken without optimization)/(time take with optimization)
Let Nsub denote the ratio of courant time steps for conductive and nonconductive parts. Note that this is different from 'dtcond' and 'dtnocond'. Let conduction time step be more restrictive, i.e., Tcond >= Tnocond, Nsub >= 1.
Processing time taken without subcycling = Nsub(dtnocond + dtcond)
Processing time taken with subcycling = dtnocond + Nsub*dtcond
Speedup = Nsub(dtnocond+dtcond)/(dtnocond+Nsub*dtcond)
We are usually dealing with the case dtnocond >> dtcond. Shown below is speedup vs. Nsub (linearly depends on conductivity) for the case of dtnocond/dtcond=10.