I decided to run a real test with a delimited text driver, sample SAP iDoc XML of around 7000 rows and 80 users. I tried removing duplicate nodes using IDM scripting language, XSLT Muenchian method and simple XSLT solution and the differences are pretty huge. Almost made me laugh when got the results.

Unfortunately I could succeed with the Muenchian method with XSLT keys. It seems that recycling the key indexes does not work.

Full document: https://doc.pegasi.fi/wiki/doku.php?id=xml_cleaning.

The results:

IDM scripting language method

Roughly 21 minutes
Very simple to do
Easily debuggable with IDM tracing

Muenchian method

Using unique keys / person, template matching each person
Roughly 6-7 minutes
Did not produce correct results due to key index recycling
Unusable

Muenchian method with globally indexed keys

Using unique keys globally, template matcing each person
Did not complete in 1 hour, did not wait longer
Unusable

XSLT without keys, with sorting

Template matching each person
Comparing the results to the remaining nodes
56 seconds

XSLT without keys, with two axis sorting

Template matching each person
Comparing the results to the next node
3.8 seconds