Tag Archives: GSoC2011

Consolidating the foomatic-db option’s human readable namespace, for multi-lingual profit

Perhaps a bit verbose but the foomatic-db option’s human readable namespace refers to all the elements in our options that are meant for the end user. The foomatic-db data sets allow for multiple languages to be present in human readable elements. The issue is that only one xml in our repository uses this feature, the main proprietary Epson driver xml with its english and japanese comments.

Thus for all practical purposes foomatic-db is a mono lingual data set. This leaves downstream responsible for any translation.

Downstream may have an army of translators but they are going to have a hard time if the data set is inconsistent, ambiguous, and verbose. This is thus what I spent my remaining week of GSoC working on.

To do this I used Google Refine and a set of throw away scripts which created a csv with the xml filepath the human readable string.

With a global view of the namespace it became clear that it wasn’t as bad as I thought it would be. All Resolution options were consistently named, as were Page Size options, and others. Some things were simply misspelled or similar’arly spelled. Though in many cases I needed to do some digging to group things. In total I brought the total number of unique option choices from ~830 to ~740 with other uncounted improvements to readability. A brief and incomplete overview of the consolidation:

  • Consolidated on Color spelling of Color
  • Acronyms were expanded in some cases
  • Standardized on ‘Economy Mode’
  • Standardized on ‘Print Quality’
  • Expanded ‘600 DPI’ and like to ‘600×600 DPI’
  • Standardized on ‘Color Mode’
  • If I noticed them I would remove redundant terms like ‘setting’*

It wasn’t a major overhaul but hopefully this will result in more complete and helpful translations for end users.

*The user is already being shown a setting dialog. Visually the presence of a toggle or checkbox communicates that something is a setting. Appending ‘setting’ to a setting is thus redundant.

Final thoughts on GSoC 2011

Overall I am very happy with how my project progressed. The addition of xmlParse has reduced foomatic-db-engine by over 5500 lines of code, a 85% reduction. SqlLayer allows for near pyppd levels of performance for non-cups users. Perhaps most importantly I’ve grown as a programmer and am more familiar with our linux printing architecture.

A design decision I made early that I am especially proud of is phonebook.pm. With it I was able to write xmlParse and sqlLayer much more abstractly than their C and php counterparts, which meant a substantial reduced codebase . My only regret was not making phonebook even more general, it would have been a design challenge but I think it is possible. This might just be the second-system effect talking.

While GSoC 2011 may be over I do intend to participate throughout the school year. I have already assigned myself two feature requests[1][2] and I still have option name consistency to work on prior to the new semester. This summer’s work will ship in Foomatic 5.

sqlLayer, pushing foomatic data into the database

The second portion of my project is to write a perl lib to push foomatic’s data into a relational database. This would allow the use of SQLite instead of the xml database for foomatic-db-engine. This isn’t going to affect CUPS users (the vast majority of people) since last year’s project (vitor’s) created pyppd which can side-step foomatic-db-engine entierly for end users. What it does though is provide considerable performance increases for users of legacy spoolers.

Like with xmlParse I am not treading new ground, openprinting already has a script to import the data set. This script was written as part of another GSoC project two years ago as part of the openprinting website re-design. A few months before this year’s GSoC Till gave me a copy of the script along with a database dump. I was able to convert this database dump into a sqlite database. With those I’ve been able to make considerable progress. Currently I’m adding support for about one table a day.

Thinking about the project as a whole I am rather proud of the phonebooks, by extending them to document the database schema I’ve been able to operate at a fairly high abstraction level. Whereas the C programs and the PHP import script had 100s of lines of simple ‘if def assign’ the phonebooks let a single* loop handle all the simple renaming and processing for xmlParse. For complex types the raw data is handed to special case code to process.

Sample special case code for option’s complex data:
#The specific groups
} elsif ($group == 11) { #constraints
	setConstraint($node, \$perlData{$destinationKey});

} elsif ($group == 12) { #enum_values
	foreach my $subnode ($node->findnodes("./enum_val")) {
		my %enumValue;

		foreach my $subsubnode ($subnode->findnodes('./@id[1]')) {
			$enumValue{"idx"} = $subsubnode->to_literal;
		}

		foreach my $longnames ($subnode->findnodes('./ev_longname')) {
			$this->setHumanReadableText(\%enumValue,\"comment", $longnames);
		}

If I were to redo my work I would make the phonebooks document the structures of the complex data. This would allow an even further generalisation and do away with the special case code for the complex types.

That isn’t going to happen though, the current code has been tested and is working. And while the special cases could have been done better I do realise that a more general approach would have had a much harder time conforming to the behaviour of the C programs.

 

*Not necessarily a single instance of the loop. I’m a bit ashamed to admit but there are actually three copies of the same loop, one for each xml type. It is this way because when I created the phonebooks I made groups above 10 be namespace specific groups. Thus group 11 for an option xml is different from printer’s group 11. In xmlParse this is implemented by the fact that the option loop is separate from the printer’s loop. The groups that all loops share are in a separate function, so really only the loop structure is copy pasted. In sqlLayer I’ve kept the loop singular and simply added support for namespaces, support which will be made cleaner if I can think of a way.

Mid way point reached, xmlParse is ready for production

Around the 20th of June I told Till I wanted to finish up combo generation by the first of July and take twenty days to integrate xmlParse(my perl library) into DB.pm. I missed by first deadline and didn’t get to start integration until the ninth. Oddly enough here it is the twenty first and the lost known bug in xmlParse has been squashed, I ended up meeting my deadline anyway.

This is after testing the results of the foomatic 4 stable branch’s foomatic-compiledb script against the trunk’s xmlParse based one. Over 6.6k flat file ppds exactly the same, and a few different ones due to a change in how the maximum resolution of a pair is determined if a printer claims a lower resolution than the driver’s default.

This isn’t success for my GSoC project yet, but it is the biggest milestone. Over 7k lines of C (not even C++) replaced with 1k lines of Perl and 400 lines of data (xmlParse is enterprise-y in that the xmls are mapped to the internal perl data structure using data in the phonebook.pm lib).

Till has been a great mentor and really pushed me to get the job done right.

My first experince with profiling

During integration Till noticed that generating overviews was, slow. This was the slowest operation and the only one where the new Perl library fairs worse than the C so I am not surprised that it would come under inspection. Till tasked me with improving the performance. I was doubtful that improvements were possible, I was wrong. Once I profiled using NYTProf a clear bottleneck appeared, over half the run time was being spent cloning data structures. Many of these clones were unnecessary but resulted from me having removed the conditional cloning(I thought it made the code cleaner).

So that was my first experience with profiling. All together it went perfectly, even if only to uncover a bug of my own making.