Initial code for version-aware schema validation

When I wrote my initial post on my new approach to XSD versioning, I promised that I'd post code. Here's the first cut:

class Program
{
  static void OriginalMain(string[] args)
  {
    FileStream xml = new FileStream(args[0], FileMode.Open);
    FileStream xsd = new FileStream(args[1], FileMode.Open);

    XmlReader reader = null;
    XmlReaderSettings settings = new XmlReaderSettings();
    settings.IgnoreWhitespace = true;
    settings.ValidationFlags = XmlSchemaValidationFlags.None; //ReportValidationWarnings;
    settings.ValidationType = ValidationType.Schema;
    settings.Schemas.Add(XmlSchema.Read(xsd, null));
    settings.Schemas.Compile();

    // wire-up anonymous callback delegate
   
int badDepth = -1;
    bool nextNodeInvalid = false;
    settings.ValidationEventHandler += delegate(object sender, ValidationEventArgs ea)
    {
      Console.WriteLine("Event -- {0}: {1}", ea.Severity, ea.Message);
      Console.WriteLine("Event -- {0}\t{1}\t{2}\t{3}\t{4}", reader.NodeType, reader.Name, reader.Value, reader.SchemaInfo == null ? "<none>" : reader.SchemaInfo.Validity.ToString(), reader.Depth);

      if (reader.NodeType == XmlNodeType.Element &&
          reader.SchemaInfo.Validity == XmlSchemaValidity.NotKnown &&
          nextNodeInvalid == false)
     {
        Console.WriteLine("Filtering out unexpected stuff now...");
        nextNodeInvalid = true;
        badDepth = reader.Depth;
      }
      else if (reader.NodeType == XmlNodeType.EndElement &&
                  reader.SchemaInfo.Validity == XmlSchemaValidity.Valid)
      {
          Console.WriteLine("Other stuff expected, ignoring...");
      }
  };

  reader = XmlReader.Create(xml, settings);
  while (reader.Read())
  {
     if (nextNodeInvalid)
     {
       int targetDepth = badDepth - 1;
       while (reader.Depth > targetDepth)
       {
         reader.Read();
         Console.WriteLine("Filtering -- {0}\t{1}\t{2}\t{3}\t{4}", reader.NodeType, reader.Name, reader.Value, reader.SchemaInfo == null ? "<none>" : reader.SchemaInfo.Validity.ToString(), reader.Depth);
       }
       nextNodeInvalid = false;
       badDepth = -1;
     }
     Console.WriteLine("Main -- {0}\t{1}\t{2}\t{3}\t{4}", reader.NodeType, reader.Name, reader.Value, reader.SchemaInfo == null ? "<none>" : reader.SchemaInfo.Validity.ToString(), reader.Depth);
  }
 }
}

What this code does is pretty simple. It catches XSD errors and if they occur at the beginning of an element, which indicates unexpected stuff is present, they ignore that error and any others that occur for that depth in the document. This version also happens to catch XSD errors if they occur on the close tag of an element, which indicates that expected stuff is absent. With optioanl extenstions, this is never a problem, but it was interesting to make it work with extensions that aren't marked minOccurs=”0”. Anyway, when it detects extra unexpected stuff, it indicates that that content is “filtered” - meaning that it could be removed from the XML stream and the resulting document could be assumed to be valid as per your current schema (you'd really want to do another validation pass to be absolutely sure, but I wouldn't bother in general). My next step is to wrap this in an XmlReader implementation so that it can be piped into a serializer more simply. It needs its own reader because it relies on control of the Read loop to filter stuff out. Anyway, I'll post that when it's working correctly. BTW, one thing to note about this code is that it only works with the sequence compositor. With all compositors, you can't assume that an unknown element is the beginning of content from a later version. I don't have any problem with this limitation, but it has to be mentioned.


Posted Apr 25 2006, 02:22 PM by tim-ewald

Comments

doc wrote re: Initial code for version-aware schema validation
on 09-25-2006 11:14 PM
*Nudge* Just wondering whether you ever wrapped this up into an XmlReader implementation. I have been playing around with this but I am not sure how to plug it all together given that in .net 2.0 you are meant to create reader instances using the XmlReader.Create static function.
doc wrote re: Initial code for version-aware schema validation
on 09-26-2006 10:45 PM
I figured it out using this post:

http://www.tkachenko.com/blog/archives/000585.html

Note to others, there a couple of bugs I have found in the above:

1. Doesn't handling ignoring new elements that are leaf nodes properly, e.g. <myNewNode/>

2. Will incorrectly ignore known elements if there is a new element above it at the same level, e.g.

<root>
<newElement>
...
</newElement>
<existingElement>
...
</existingElement>
</root>

The element "existingElement" gets ignored as well.

To fix the logic you need to look for a changing element name when you are at the "bad depth".
Ken Brubaker wrote Tim Ewald's solution for XML Schema versioning
on 11-28-2006 7:55 AM
Tim Ewald addresses the XML Schema versioning issue head on.

Add a Comment

(required)  
(optional)
(required)  
Remember Me?