Auto-fix invalid XML
Monday, June 2nd, 2008 - 5:10pm
I have some "XML" that was exported from an application except there are many unquoted attributes and unclosed nodes. Does anyone know of a utility that can fix these? I don't need a validator that can point out the problems - I already know where the 6000+ problems are. I need something like Tidy that can fix the problems. (Before you suggest the -xml mode of Tidy, this file makes Tidy cry - and blow up.)
Thanks.
Xfire|XBL
gr.umpic.us|grumpicus.com
Commissioner, GWJFFL|GWJFFL2



Further research reveals that it's actually an SGML file so if anyone knows of an easy-to-use SGML-to-XML converter, that would be great.
Xfire|XBL
gr.umpic.us|grumpicus.com
Commissioner, GWJFFL|GWJFFL2
Eclipse's Europa Java, Java EE, and RCP downloads include an XML editor, the Source menu for which has a Cleanup function that's pretty much designed for this. You'll need a 1.5 Java Runtime Environment handy to run it, though. I'm curious whether it will survive this file of yours.
PSN: Kurrelgyre | Raptr | Spore | Steam | Xbox Live
I had a similar problem recently, and unfortunately I just had to brute-force script it to fill in the blanks. I'd be curious to hear if the Eclipse tool solved the problem.
Xbox Live: hubbinsd
With some Perl-fu, there's some modules on CPAN that can handle Epic Fail from XML and create something workable.
Mystic Violet wrote:
It was brute-force but with Regular Expression search and Replacement Arguments support, Search and Replace for Windows managed to let me do what I needed.
I couldn't even get Europa to open on my Win64 machine. It'd throw a JRE error every time. It didn't even get a chance at the file.
Xfire|XBL
gr.umpic.us|grumpicus.com
Commissioner, GWJFFL|GWJFFL2
Pity. The 32/64-bit stuff usually means that the wrong JRE is being used in relation to the native bits in the Eclipse install. The Europa packages on Microsoft Windows are all for 32-bit. Linux users ran into it with the prior year's collected release, Callisto, where only 32-bit packages were offered on Linux.
Generally that means starting with a Win64 download of the Eclipse SDK, which is at least in Release Candidate stage at this point. From there it should be possible to install release candidates of everything else that will be in Ganymede. We're fewer than 3 weeks shy of the annual release.
PSN: Kurrelgyre | Raptr | Spore | Steam | Xbox Live