Rewrite ODS support based on loxun XMLWriter module#244
Rewrite ODS support based on loxun XMLWriter module#244bdauvergne wants to merge 2 commits intojazzband:masterfrom
Conversation
It uses constant memory and is a lot faster than odf and odf3 packages.
|
@bdauvergne , just out of curiosity, could I find the ods writer lib(Copyright (C) 2005-2016 Entr'ouvert) on pypi or github? |
|
This code is new, I produced it on my employer (Entr'ouvert) time, it's freely inspired by this package (http://git.entrouvert.org/wcs.git/tree/wcs/qommon/ods.py) also from Entr'ouvert which use ElementTree and so do not have bounded memory consumption for this you need a streaming XmlWriter like API. |
|
Thanks for your reply. I planned to copy your code to produce a specialised ods writer for pyexcel, as pyexcel-odsw. As you mentioned in this PR, odfpy and ezodf does not use constant memory in writing an ods. I hope you will be OK with my copying. For your information, messy-tables had a better performing ods reader and it inspired pyexcel-odsr. So your code is the missing puzzle to complete ods story: performant writer + performant reader. |
|
No problem, just keep the copyright. |
| self.status = self.INSHEET | ||
| self.xmlwriter.endTag() | ||
|
|
||
| def add_cell(self, content, hint=None): |
There was a problem hiding this comment.
Just an observation here. It is not a bug or anything.
add_cell does not support other cell data types, such as: int, float but unicode string.
There was a problem hiding this comment.
Yep, small amerliorations are still possible, I would do it if i had information from the maintainer that a possible integration is possible soon.
…n from https://github.com/kennethreitz/tablib/pull/244 and adapted for pyexcel
|
@bdauvergne Why not add loxun in the |
|
Just thought it was the tablib way, it contains (contained?) so much external dependencies, I did not know they were all not packaged on pypi. |
It uses constant memory and is a lot faster than odf and odf3 packages as the document is not built in memory prior to serialization. OpenDocument is a simple format that should not need many thousand lines of code and gigabytes of memory to export a simple table of tens of thousand of lines.
A temporary file is needed as zipfile does not support streaming directly into it, if it's a problem I can do it in memory with BytesIO augmenting a little bit the memory consumption.
With the current implementation it's nearly impossible to export a 100 000 lines table to ODS in a constrained memory environment (VM with 1 Gb of memory).