Compared to parsing printers and drivers generating, the Overview is definitely more complex. For Printers, Drivers, and Options XMLs (the “primary” XMLs) only occasional processing was necessary. This leaves Overview to simply process the data into the overview structure. By lines of code about half of Overview is spent removing excess keys, a simple matter.
The complexity arises from the need to combine printers with drivers that support them. We get this compatibility information from two places, the printers which contain a list of drivers, and the drivers which contain a list of printers. Only one XML needs to claim compatibility to establish a relationship. Thus this relationship need not be mutual, a driver may claim to support a printer yet the printer could claim to be a brick. As well if a driver references a printer for which we have no XML we create a new in memory entry for this otherwise unknown printer.
The simplest approach would be to compute the relationships as a discrete step. This would leave you with an algo similar to this:
Build hash of Printers Build hash of Drivers Compute relationships Compile based on relationships
Your Compute relationships algo could be quite naive, with hashes we get unique entries quite easily. You might end up with a structure like this:
'printer' => {'driver'=>'', 'driver' =>'', 'driver'=>''};
Having Computer Relationships be a descrete step makes conceptualization easier but nothing says it has to be desecrate. Once we know of a relationship we can act on it. This does complicate the algo as we now need to avoid mutual relationships from causing a driver getting added twice. In my implementation we do this by removing the reference to a printer in the driver if it exists.
Upto this morning my implementation had a logic defect in that it assumed a reference to a driver in the printer was a mutual relationship. Thus once it was done processing the relationships it computed from printers it assumed that any references in drivers were ONLY to printers without XMLs. Essentially I had not considered that a driver may reference a printer while the same printer may not reference the driver. With hind-sight the deficiency is obvious, unfortunately the downside of not treating Relationship Computation as a discrete step is that your relationship algo gets spread over more code, and thus harder to conceptualize.
If I were to re-implement Overview I would have Compute Relationships as a discrete step.This would have saved me about two days of debugging with only slightly increased complexity in the Combine step. In the end I am only out the debugging time and I do have a computationally more effective function and have learned to keep steps discrete in the future, a fair trade.
 
						