Wednesday, September 6. 2006Splitting a large XML file on Linux 2.4Trackbacks
Trackback specific URI for this entry
No Trackbacks
Comments
Display comments as
(Linear | Threaded)
Hi Mike,
I found this blog entry by googling a little bit. I have the same problem as written here. A 200 megabyte XML-File must be converted and there are always outofmemory-errors. Do you have any tipps and tricks how i could split the big file in a few smaller files and then converting them? Greetings
Out of memory errors are different from "file too large" errors. File too large errors indicate that PHP/Perl/etc. can't even deal with the file. The out of memory errors suggest that the XML parser needs more memory to parse the file than the operating system has available.
Depending on your XML format, splitting the file might help. My file consisted of multiple records stored in one large file. I split that file into smaller files, each containing 1000 records. It would be fairly easy to write a PHP/Perl script to copy a group of records into a smaller XML file. The unix command csplit might work for you too, but I think you have to have it create a file for each record; there's no way to group them. Viel Glück!
A friend of mine works on XML awk. You might find it useful:
http://xgawk.radlinux.org/ http://home.vrweb.de/%7Ejuergen.kahrs/gawk/XML/
A quick note, there is a *nix utility called split
e.g. split --bytes=2147483645 --verbose YOURFILE
You're right, but as I noted in my post, split does not take into account what it's splitting...it will split a file right in the middle of an XML record. So it didn't work for my use case.
|
CategoriesQuicksearchArchives |