Last year for some reason that I can’t recall I only managed to take 1 day of hackweek, back then I started oletool.py. oletool.py is intended to be like a cheap & nasty (zip/unzip)-like command-line tool for OLE compound documents. I started this for 2 reasons
1) I often play with Libreoffice filters, many times I want to either extract some stream or other from a document to examine it or maybe quickly modify a stream re-inject it into the document to test some code or theory
2) Libreoffice has support for Python, I know nothing about Python and I wanted to learn a new scripting language ( I know I could have used libgsf for this tool - maybe something for some spare cycles )
So, has to be said I wasn’t starting completely from scratch, Kohei already created mso-dumper for dumping the content of Excel documents in python. Last year I was happy with my one day’s work, I managed using Kohei’s mso-dumper code to quickly cobble together a tool that allowed the contents of the document to be displayed ( ala gsf list type format ) and also extract one or more streams
Have to admit between the last Hackweek and this one I had completely forgotten whatever Python I had had learned. But, this time I did get around to reading the available documentation about the OLE compound document format and started trying to add support to be able to write OLE documents from oletool.py. I was aiming to be able to be able add support deletion ( stream or storage ), add ( stream or storage ) and update ( existing stream ) Things went a little ( ermmm much ) slower than I would have liked.
I took the approach of trying to surgically write various pieces of the already read document and new pieces to the output document after I discovered that the byte array used in mso-dumper was immutable ( the array of bytes is really a string ), attempts to convert this to use the mutable bytearray type started giving many many errors in the exiting code and I admit I got scared off burning time trying to convert that unfamiliar stuff. This worked well enough especially for delete actions and modifying the header and Directory entries but I ran into a brick wall with more complicated scenarios of adding/modifying streams especially when various internal tables needed to be expanded etc. So eventually I ended up converting Ole.py to use bytearray ( this actually turned out not so hard at all ), also had to rewrite my code to use the existing data model and try and be able to write the document top to bottom from the model alone ( yeah I know that makes more sense but I tried to cheat to get stuff working faster which turned to be completely counter productive ) Another problem was plagued with strange errors that looked like corruption, my progress was really stifled by that. I really missed not having access to a debugger, scripting languages are great but you can easily get lost without some nice debugging aids besides the old reliable ‘print’. In the end and after nearly 2 days of debugging I found a strange quirk with bytearray, something like
myoutputByteArray[ 4 : 24 ] = srcStringOrArray[ 0 : 20 ]
can give quite unexpected results if the srcStringOrArray’s size is unexpectedly short, in these cases it seems a slice corresponding to the missing bytes in the source array is removed. The position of where the slices is removed from depends somehow on the start/end positions so it isn’t just that some bytes are removed from the end. This simple ( and still to me strange ) behaviour caused complete havoc and even appeared to screw up some things in memory. I suppose my lack of Python knowledge is showing here and to those in the know this is all obvious.
After getting various bits working I managed in my final commits to break things ( now mostly fixed ) So, currently things are nearly working in terms of the most basic functionality, however the code is ugly in the extreme ( generally I like to get bits of functionality working before massaging it into some sort of shape ) there is plenty of cut’n'paste, little object-orientation and lots of debug print(s), additionally no testing or consideration of windows file paths etc. In essence still alot to do ( roll on the next hackweek )
the fruits of my labour(currently failure) can be see one this branch


The screenshot shows a MSO userform with some multiple levels of nested container controls, then the same useform imported in ‘vanilla’ openoffice and finally how it looks now in the cws. You should notice some other nice improvements that also are included in this cws e.g. the ’spinbutton’ is now imported ( unfortunately this cws does not make use of the spinbutton generally available for normal Openoffice.org Dialogs ) However this cws does enable controls in Openoffice Dialogs to now access embedded images ( note the image control from the MSO Userforms has an associated image, the filter has been modified to create embedded images on import and the Dialog controls now can handle embedded images ) Also some good news regarding the toolbar enhancements mentioned in the last
you’ll notice that of course we don’t actually import a menu on the toolbar, this is a pity and exposes a gap in Openoffice.org functionality. The best I can do at the moment is import the menu ( which in reality is just another toolbar ) as a separate toolbar. The basic toolbar, commands and associated images import well, builtin commands are not yet supported, unfortunately it would take quite some time to generate the appropriate translation between the Excel command-ids and the corrosponding ‘uno.xxxx’ commands. While I didn’t have time to do that I did spend some time debuging through the relevant code so I think if I can get some time I can close some of the obvious functionality gaps e.g.





