Category Archives: GSoC

Processing Constraints, a solution

Yesterday I ran into a conundrum processing the constraints of options. I had a solution but I wasn’t happy with the cleanness of it. I slept on it and this morning I’ve got something I’m much happier with.

Instead of summering the constraints in a structure like this:

driver => option =>  printer => [sense, default]

I used a structure like:

printer=> driver => option => [sense, default]

If a constraint does not specify a printer then it goes under the ‘*’ printer hash, like wise a driver can also be ‘*’.

This also allows me to simplify retrieving the constraints by merging it with the operation that determines which options to inspect for a particular driver printer pair. This happens in the function getOptions($relations, $printerName, $driverName), which now only merges the lists of options from the three possibilities (driver, printer, and driver-printer) overwriting keys with those of more specific possibilities.

Now I can move onto equivalency testing of parseCombo and the C implementation.

Processing Constraints, a conundrum

Edit: The solution I ended up going with.

Constraints are a element in the Option XMLs and determine what options match a driver / printer combo. They like this:

'constraints' => [
			  {
				'printer' => 'printer/Lexmark-Z12',
				'sense' => 0
			  },
			  {
				'printer' => 'printer/Lexmark-Z31',
				'sense' => 0
			  },
			  {
				'printer' => 'printer/Lexmark-Z31',
				'sense' => 1
				'driver' = > 'lxm3200-tweeked'
			  }
			],

While they look simple I’ve run into a problem, how best to fit this into a data structure for easy lookup.

The printer and driver XMLs contain no references to options. Instead an option contains these constraints which specify which drivers and printer pairs apply to the option. Unfortunately generating a combo is driven based on the driver and printer data. Because of this we have to compute an in memory cache of these constraints in order to match a combo to its options (or reprocess the entire set of option xmls for each combo (which would be absurd (rather like these nested parenthesise( ( ))). The data structure of this cache is that has me troubled. I have a working structure, but I’m sure that if I was a better computer scientist I would have known something better than this:

'hpdj' => {
	'7' => {
	   'HP-DeskJet_540' => [
						   1,
						   'ev/137'
						 ],
	   '*' => [
				1,
				'ev/140'
			  ],

		 },
	'94' => {
		'*' => [
				 1,
				 '60'
			   ]
	  },
	},
'deskjet' => {
   '6' => {
		  '*' => [
				   1,
				   'ev/130'
				 ]
		},
   'GS-HalftoningAlgorithm-Gray' => {
		'*' => [
				 1,
				 'ev/GS-HalftoningAlgorithm-Gray-Standard'
			   ]
		}
	},
'printers' => {
	'HP-DeskJet_500' => {
			   '6' => {
				  '*' => [
						   1,
						   'ev/130'
						 ]
				},
		},
	},

So let it be said, even in a project as code monkey as processing XML computer science finds a way to sneek in.

Overview generation nearing completion, more complex than expected

Compared to parsing printers and drivers generating, the Overview is definitely more complex. For Printers, Drivers, and Options XMLs (the “primary” XMLs) only occasional processing was necessary. This leaves Overview to simply process the data into the overview structure. By lines of code about half of Overview is spent removing excess keys, a simple matter.

The complexity arises from the need to combine printers with drivers that support them. We get this compatibility information from two places, the printers which contain a list of drivers, and the drivers which contain a list of printers. Only one XML needs to claim compatibility to establish a relationship. Thus this relationship need not be mutual, a driver may claim to support a printer yet the printer could claim to be a brick. As well if a driver references a printer for which we have no XML we create a new in memory entry for this otherwise unknown printer.

The simplest approach would be to compute the relationships as a discrete step. This would leave you with an algo similar to this:

Build hash of Printers
Build hash of Drivers
Compute relationships
Compile based on relationships

Your Compute relationships algo could be quite naive, with hashes we get unique entries quite easily. You might end up with a structure like this:

'printer' => {'driver'=>'', 'driver' =>'', 'driver'=>''};

Having Computer Relationships be a descrete step makes conceptualization easier but nothing says it has to be desecrate. Once we know of a relationship we can act on it. This does complicate the algo as we now need to avoid mutual relationships from causing a driver getting added twice. In my implementation we do this by removing the reference to a printer in the driver if it exists.

Upto this morning my implementation had a logic defect in that it assumed a reference to a driver in the printer was a mutual relationship. Thus once it was done processing the relationships it computed from printers it assumed that any references in drivers were ONLY to printers without XMLs. Essentially I had not considered that a driver may reference a printer while the same printer may not reference the driver. With hind-sight the deficiency is obvious, unfortunately the downside of not treating Relationship Computation as a discrete step is that your relationship algo gets spread over more code, and thus harder to conceptualize.

If I were to re-implement Overview I would have Compute Relationships as a discrete step.This would have saved me about two days of debugging with only slightly increased complexity in the Combine step. In the end I am only out the debugging time and I do have a computationally more effective function and have learned to keep steps discrete in the future, a fair trade.

Progress report and a happy surprise

Since the last blog post most of my work has been less milestone oriented.

  • Full multilingual support for human readable nodes, nodes which show up in UIs
  • Preliminary support for Option XMLs
  • Two fixes for the XML schema to bring it upto date
  • Creation of a script wrapping XMLint and foomatics XML schema XSDes
  • Clean up of the OpenPrinting front page

I felt the the XML schema was important because my Perl module, the one replacing the C programs, is very forgiving. Robustness is of course a desirable trait in production software but Foomatic still needs something to enforce consistency. I knew we had XML XSDes but we were missing a simply script to run them. Once I had my new script in place two bugs in the printer schema came up. After I fixed that a rerun uncovered four XMLs that needed fixing.

When I started writing support for overviews (a massive data structure containing every printer and its drivers) I uncovered a bug, see if you can spot it:

# This variable holds the current time #
my $now = time;
my $Parser = xmlParse->new('en','0');
my @allPrinters
foreach my $xml (<../printer/*.xml>) {
   my$perlData = $Parser->parsePrinter($xml);
   print Dumper($perlData);
}

# Calculate runtime
$runtime = time - $now;

I hope you didn’t look too hard, the bug is way too easy to be obvious. This is from scaffolding.pl, a script I use to run xmlParse.pl through its paces. Its a throw away script that exists only for testing. Part of this testing is keeping an eye on runtime, and this is where the bug was.

Notice that our runtime calculation includes Dumper(A helper function that outputs Perl data structures in human readable form), when I wrote the script my assumption was that dumping the data structure was trivial. I was very wrong about this. We can read and parse an XML in one third the time it takes Dumper to print it. Every single printer XML can be read and parsed in under four seconds. Parsing every single Driver we get a runtime of 0:00, the one second resolution of the crude runtime calculator is insufficient! Fixing this bug also reduces the runtime of the C programs, but because the data structures are the same the cost of dumping is constant between the two parsing implementations. Thus the Perl implementation’s relative runtime advantage is even greater than I originally thought.

Definitely good news to report for the next OpenPrinting teleconn.