Auto-fix invalid XML

Maximus Nofunicus
Donator V5.0
Grumpicus's picture
Location: Piedra Redonda, Tejas

I have some "XML" that was exported from an application except there are many unquoted attributes and unclosed nodes. Does anyone know of a utility that can fix these? I don't need a validator that can point out the problems - I already know where the 6000+ problems are. I need something like Tidy that can fix the problems. (Before you suggest the -xml mode of Tidy, this file makes Tidy cry - and blow up.)

Thanks.

Maximus Nofunicus
Donator V5.0
Grumpicus's picture
Location: Piedra Redonda, Tejas

Further research reveals that it's actually an SGML file so if anyone knows of an easy-to-use SGML-to-XML converter, that would be great.

Junior Executive
Donator V2.0
Kurrelgyre's picture
Location: The disputed territories of Cary, NC

Eclipse's Europa Java, Java EE, and RCP downloads include an XML editor, the Source menu for which has a Cleanup function that's pretty much designed for this. You'll need a 1.5 Java Runtime Environment handy to run it, though. I'm curious whether it will survive this file of yours.

PSN: Kurrelgyre | Raptr | Spore | Steam | Xbox Live

Goes to 11
Donator V5.0
hubbinsd's picture
Location: The Circus of Values

I had a similar problem recently, and unfortunately I just had to brute-force script it to fill in the blanks. I'd be curious to hear if the Eclipse tool solved the problem.

Xbox Live: hubbinsd

Gamer Chick
Donator V2.0
Azure Chicken's picture

With some Perl-fu, there's some modules on CPAN that can handle Epic Fail from XML and create something workable.

Mystic Violet wrote:

I think we all need to stop avoiding the real question here:

WWMCD?

Maximus Nofunicus
Donator V5.0
Grumpicus's picture
Location: Piedra Redonda, Tejas

It was brute-force but with Regular Expression search and Replacement Arguments support, Search and Replace for Windows managed to let me do what I needed.

Kurrelgyre wrote:
Eclipse's Europa Java, Java EE, and RCP downloads include an XML editor, the Source menu for which has a Cleanup function that's pretty much designed for this. You'll need a 1.5 Java Runtime Environment handy to run it, though. I'm curious whether it will survive this file of yours.

I couldn't even get Europa to open on my Win64 machine. It'd throw a JRE error every time. It didn't even get a chance at the file.

Junior Executive
Donator V2.0
Kurrelgyre's picture
Location: The disputed territories of Cary, NC

Grumpicus wrote:
I couldn't even get Europa to open on my Win64 machine. It'd throw a JRE error every time. It didn't even get a chance at the file.

Pity. The 32/64-bit stuff usually means that the wrong JRE is being used in relation to the native bits in the Eclipse install. The Europa packages on Microsoft Windows are all for 32-bit. Linux users ran into it with the prior year's collected release, Callisto, where only 32-bit packages were offered on Linux.

Generally that means starting with a Win64 download of the Eclipse SDK, which is at least in Release Candidate stage at this point. From there it should be possible to install release candidates of everything else that will be in Ganymede. We're fewer than 3 weeks shy of the annual release.

PSN: Kurrelgyre | Raptr | Spore | Steam | Xbox Live