Print This Post

GML Complexity Re-visited

I have discussed the issue of GML complexity a number of times in this blog. Mostly we have looked at things like the number of tags, use of XML Schema, subject complexity and so forth. Most of it was pretty qualitative. We had no real measures of the complexity, nor comparisons to other established XML grammars to see how GML stacked up. Well, now some folks over at Microsoft, led by Stan Kitsis have set about to create a number of XML Schema metrics and applied these to a large number of schemas, GML among them. Their work used GML v3.1 which is close enough to the current release (GML v3.1.1 and the pending GML v3.2) to mean their results are completely refelective of the GML we are all working with or planning to. The paper is entitled "Analysis of XML Schema Usage" and begins by developing a variety of metrics for XML Schema size and complexity and utiization of particular XML Schema features (e.g. Model-group operators, Simple type features, Occurence features, subtyping and friends, mixed content, wild cards, identity constraints and modularization).

They then provide statistics on the application of these metrics to a set of 63 schema projects from different IT Sectors. Some were internal to Microsoft and some wee external including of course GML. The schemas included some 6000 individual schema files, with roughly 82,000 global element names.

So how did GML stack up? There is not space to go over all of the findings and I will leave that to Stan and the Microsoft folks. However just a few items will give you the general idea.

Schema Size based on Lines of Code (LOC)

The range of schemas is shown in the table below with GML.

 

LOC-based category

Definition

Schema count

Mini

0 – 100

0

Small

100 – 1,000

12

Medium

1,000 – 10,000

24

Large

GML

10,000 – 100,000

10,291 lines

23

Huge

100,000 – …

4

It is clear from this measure that GML is at the bottom end of the large schemas.

Schema Size – Based on size in kilobytes.

The schemas in the study ranged from a 6 Kbytes to 18 Mbytes. Most of these schemas (26 of the 63) are in the range of 100 KB to 1MB and this is indeed where we find GML at 532 Mbytes. There were NOT many small schemas (only 6 less than 10Kbytes), and as one might expect not many really large schemas (only 11 in this range).

Number of Complex Type Definition:

Some people think GML is complex because it declares so many complex types – well does it?

According to the Microsoft study this metric ranged over the following:

 

#CT-based category

Definition

Schema count

Mini

0 – 32

13

Small

32 – 100

12

Medium

100 – 256

14

Large

256 – 1,000

12

Huge

1,000 – …

12

and GML – well 287 – so again at the bottom end of the large schemas.

Leave a Reply

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>