About UsCommunityTrainingContent DevelopmentContact

Blogs
Pluralsight
Course Schedule
Scott Allen
Craig Andera
Mark Baciak
Don Box
Keith Brown
John CJ
Tim Ewald
Jon Fancey
Jon Flanders
Vijay Gajjala
Kirill Gavrylyuk
Ian Griffiths
Martin Gudgin
Jim Johnson
John Justice
Mike Henderson
Joe Hummel
Matt Milner
Ted Neward
Fritz Onion
Brian Randell
Jeffrey Schlimmer
Aaron Skonnard
Dan Sullivan
Herb Sutter
Doug Walter
Jim Wilson
Mike Woodring

My Links
Home
Contact
Login

Blog Stats
Posts - 19
Stories - 0
Comments - 52
Trackbacks - 23

Archives
Mar, 2007 (1)
Feb, 2007 (1)
Nov, 2006 (4)
Oct, 2006 (3)
Sep, 2006 (1)
Aug, 2006 (2)
Jul, 2006 (2)
Apr, 2006 (1)
Jan, 2006 (1)
Dec, 2005 (1)
Sep, 2005 (2)

Post Categories
PowerShell(rss)
SQL Server(rss)
XML(rss)


.NET, XML, SQL and Doing Things as Time Allows

Tuesday, November 28, 2006

In our previous blog article Processing XML with PowerShell we looked at using XPath expressions to do calculations that used XML as input. One of the things that this article pointed out was that you often can do an entire calculation within an XPath expression. Sometimes, however, you want to read the XML file, pull parts out of it and process them outside of the XML file. One of the ways of doing this is to use the dotted syntax and the Item ParameterizedProperty features of PowerShell so that you can treat XML as an object.

We are going to start off by looking at using the PowerShell object model for XML to pull apart an XML file so that we can process it, then we are going to look at an extension function aliased as xitems to do the same thing, but typically with less effort and more capabilities. Lastly we will look at how the xitems function is implemented. You can download the script for making this function and the sample files at http://www.pluralsight.com/dan/samples/ProcessingXMLPowershell-2.zip. Note that this will include the XSLT.ps1 script, updated for the xitems function, that was include in the ProcessingXMLPowerShell.zip file.

We will start by looking at a file named Stock.xml. The Stock.xml file looks like:

<GroceryList
ID = "A-24"
>
<Stock>
    <Dept>
    <Area>Round</Area>
    <Name>Produce</Name>
    </Dept>
    <Name>Orange</Name>
<Price>3.41</Price>
</Stock>
<Stock>
    <Dept>
    <Area>Beef</Area>
    <Name>Meat</Name>
    </Dept>
    <Name>Steak</Name>
    <Price >13.20</Price>
</Stock>
<Stock>
    <Dept>
    <Area>Leaf</Area>
    <Name>Produce</Name>
    </Dept>
    <Name>Lettuce</Name>
    <Price>1.36</Price>
</Stock>
</GroceryList>

Note that each Dept element includes both an Area and Name element. We want to find out the names of the departments in the stock.xml file. We can do this by piping all of the department names into a select-object -unique cmdlet. Here is a script that does just that.

PS C:\Demos> [xml]$s = get-content C:\Demos\Stock.xml
PS C:\Demos> $s.GroceryList.Stock | %{$_.Dept.Name} | select-object -unique
Produce
Meat
PS C:\Demos>

Here we are using the dotted syntax to extract each Stock element from the stock.xml file and pipe it into a following pipeline segment. In that second pipeline segment we use the dotted syntax again to extract the Name from the Dept element of current pipeline object; The current pipeline object in this case is the Stock element. Then we pipe all the names we have found into the select-object cmdlet that makes a unique list of those names.

It’s easy to come up with a rather wordy description of what this script is doing; It is saying something like “Give me all of the elements named GroceryList at the root of the $s XML document; then for each one of these get me all of its children whose name is Stock; then for each one of these get me all of its children whose name is Dept; then for each on of these get me all of its children whose name is Name.” This sort of description could be applied to almost anything that manages hierarchical data including XPath, which we will be looking at later when we examine the xitems function.

Thinking of this hierarchical description makes it seem that the script below would be alternate, more simple, way to find the names of the departments.

PS C:\Demos> [xml]$s = get-content C:\Demos\Stock.xml
PS C:\Demos> $s.GroceryList.Stock.Dept.Name | select-object -unique
PS C:\Demos>

This, seemingly obvious, way of getting the children of the children, etc. did not produce any results. The dotted syntax is somewhat limited because of the way PowerShell models XML. If we take closer look at what is actually being returned for the GroceryList and Stock elements we will see why.

PS C:\Demos> $s.GroceryList.GetType()
IsPublic IsSerial Name       BaseType
-------- -------- ----      --------
True     False    XmlElement System.Xml.XmlLinkedNode
PS C:\Demos> $s.GroceryList.Stock.GetType()
IsPublic IsSerial Name      BaseType
-------- -------- ----      --------
True     True     Object[]  System.Array
PS C:\Demos>

The type of the GroceryList is an XmlElement as you might suspect, after all we are working with XML. However the type of Stock is Array, not an XmlElement or XmlElement[], and that is why it lacks a Dept property. Now let’s look at what happens when instead of trying to access the Dept element from Stock we pipe the array into another pipeline segment and see what the type is.

PS C:\Demos> $s3.GroceryList.Stock | %{$_.GetType()}
IsPublic IsSerial Name        BaseType
-------- -------- ----        --------
True     False    XmlElement  System.Xml.XmlLinkedNode
True     False    XmlElement  System.Xml.XmlLinkedNode
True     False    XmlElement  System.Xml.XmlLinkedNode
PS C:\Demos>

It turns out that the Stock array is an array of XmlElements and piping it into a pipeline segment enumerates that array. The result is that inside the pipeline segment $_ is an XmlElement that has a Dept child element.

So to drill into an XML hierarchy using the PowerShell object model you must break up what seems like a natural dotted syntax into pipeline segments, one pipeline segment for every two levels of depth you want to go into the XML hierarchy. The word description earlier of what is happening here, however, is in effect the definition of an XPath construct called a LocationPath.

We used XPath expressions in Processing XML with PowerShell to do processing of an XML file. XPath is really a simple language, an expression can only produce one of four datatypes; A number, a string, a boolean, or a node set. We were using those first three scalar types to do calculations with the xeval function.

A node set is what the name seems to imply, it is a set of XML nodes. It might be a set of XML elements or attributes or a mixture of these an other kinds of XML nodes. The data model in the XPath Recommendation defines seven kinds of nodes that might be found in an XML documents. A LocationPath is an XPath expression that produces a node set instead of a scalar value. It is given a special name because it is such a common idiom to use XPath to produce a node set. Here is a LocationPath that will produce a nodeset that consists of all of the Name elements from the stock.xml file:

GroceryList/Stock/Dept/Name

We can use this LocationPath with the xitems function to find the names of the departments in the stock.xml file, as we did at the beginning of this article.

PS C:\Demos> xitems C:\Demos\Stock.xml "GroceryList/Stock/Dept/Name" |
 select-object -property value -unique
Value
-----
Produce
Meat
PS C:\Demos>

The xitems function requires at least two parameters. The first is the path to the XML file we want to process, or as we will later see an XPathNavigator. The second argument is an XPath LocationPath. You can see we are able to use a more simple model to specify the parts of the XML file we wish to process.

In the Processing XML with PowerShell blog article we looked at processing XML files that contained namespaces and some of the issues you can run into when using the PowerShell object model of XML. The xitems function uses the same technique as the xeval did, you pass a dictionary that contains the mapping of prefixes and namespaces to it. Here is another file that uses namespaces, stockNS.xml.

<GroceryList xmlns='urn:prices'
xmlns:loc='urn:location'
xmlns:ident='urn:identity'
ID = "A-24"
>
<Stock xmlns="urn:inventory">
    <loc:Dept>3rd floor</loc:Dept>
    <ident:Dept>
    <Area>Round</Area>
    <Name>Produce</Name>
    </ident:Dept>
    <Name>Orange</Name>
<Price>3.41</Price>
</Stock>
<Stock xmlns="urn:inventory">
    <loc:Dept>2nd floor</loc:Dept>
    <ident:Dept>
    <Area>Beef</Area>
    <Name>Meat</Name>
    </ident:Dept>
    <Name>Steak</Name>
    <Price >13.20</Price>
</Stock>
<Stock xmlns="urn:inventory">
    <loc:Dept>3rd floor</loc:Dept>
    <ident:Dept>
    <Area>Leaf</Area>
    <Name>Produce</Name>
    </ident:Dept>
    <Name>Lettuce</Name>
    <Price>1.36</Price>
</Stock>
</GroceryList>

This file uses quite a few namespaces, just to make things interesting and because typically when namespaces are used they are often used a lot. Lets do our department names calculation again, using the PowerShell xml object model.

PS C:\Demos> [xml]$s = get-content C:\Demos\StockNS.xml
PS C:\Demos> $s.grocerylist.stock 
  | %{$_.Item("Dept", "urn:identity").Name} | select-object -unique
Produce
Meat
PS C:\Demos>

Here we have used the Item property that PowerShell adds to an XML element to make it possible to access elements that are distinguished by their namespace. Here is a script that uses xitems to get the same results:

PS C:\Demos> xitems C:\Demos\StockNS.xml `
  "p:GroceryList/i:Stock/id:Dept/i:Name" `
  @{p="urn:prices";i="urn:inventory";id="urn:identity"} |
  select-object -property value -unique
Value
-----
Produce
Meat

PS C:\Demos>

You might look at this and say that the PowerShell object model for XML is in at least one respect much easier to use than one based on XPath that xitems uses because in the it does not require the specification of the namespaces for the GroceryList, Stock and Name elements. However namespaces are part of an XML file for a reason, to make sure that names in common usage can easily be distinguished. Look at this alternate version of the StockNS.xml file:

<GroceryList xmlns='urn:prices'
xmlns:loc='urn:location'
xmlns:ident='urn:identity'
ID = "A-24"
>
<Stock xmlns="urn:inventory">
    <loc:Dept>3rd floor</loc:Dept>
    <ident:Dept>
    <Area>Round</Area>
    <Name>Produce</Name>
    </ident:Dept>
    <Name>Orange</Name>
<Price>3.41</Price>
</Stock>
<Stock xmlns="urn:ignore">
    <loc:Dept>2nd floor</loc:Dept>
    <ident:Dept>
    <Area>Beef</Area>
    <Name>Meat</Name>
    </ident:Dept>
    <Name>Steak</Name>
    <Price >13.20</Price>
</Stock>
<Stock xmlns="urn:inventory">
    <loc:Dept>3rd floor</loc:Dept>
    <ident:Dept>
    <Area>Leaf</Area>
    <Name>Produce</Name>
    </ident:Dept>
    <Name>Lettuce</Name>
    <Price>1.36</Price>
</Stock>
</GroceryList>

Notice that the second Stock element is in the “urn:ignore” namespace, not the “urn:inventory. The PowerShell dotted syntax would have included this element in its selection of department names, which may not be what was really intended. In order to be sure what you are processing XML in the namespace you intend with the PowerShell XML object model you really have to use the Item method at every level of the XML hierarchy. Your mileage may vary, but using XPath to select items from an XML file will in general be easier than using the PowerShell object model of XML and more capable.

An XPath LocationPath can be thought of as a filter; You use it to filter out the parts of the document you are not interested in. This filter can be about as fine-grained as you would like. For example what if we wanted the department names on the 3rd floor?

PS C:\Demos> xitems C:\Demos\StockNS.xml `
   "p:GroceryList/i:Stock[loc:Dept='3rd floor']/id:Dept/i:Name" `
   @{p="urn:prices";i="urn:inventory";id="urn:identity";loc="urn:location"} |
   select-object -property value -unique
Value
-----
Produce
PS C:\Demos>

The LocationPath used by this script has a predicate in it, that only selects Stock elements that have a loc:Dept child element whose value is “3rd floor”.

The xitems function is produces an XPath navigator, and it can also take a XPathNavigator as input. This means you can use the results of one xitems function as input to another. Here is an example using stock.xml, the file without the namespaces to reduce the clutter:

PS C:\Demos> xitems C:\Demos\Stock.xml "GroceryList/Stock" |
  %{xitems $_ "Dept/Name"} | Select-Object -property value -unique
Value
-----
Produce
Meat

PS C:\Demos>

This particular script is analogous to the first script that we presented in this article. We’ve broken the selection into two pipeline segments just to show that the second segment could use the output of the first as input. The first pipeline segment pipes an XPathNavigator into the second pipeline segment which uses that XPathNavigator as input to another xitems function.

From the Processing XML with PowerShell we know that xeval can also process an XPathNavigator, so the output of the xitems function can also be passed into the xeval function. Let’s try that:

PS C:\Demos> xitems C:\Demos\Stock.xml "GroceryList/Stock" |
  %{xeval $_ "string(Dept/Name)"} | Select-Object  -unique
Produce
Meat
PS C:\Demos>

In this example the second pipeline segment evaluates the result of the first pipeline segment. Note that in the third segment the Select-Object cmdlet is not using the -property value option. That is because the second segment is producing a string and a string does not have a value property.

Lastly xitems is similar to xeval in that you can pass it an array of LocationPaths and it will apply all of them to an XML file.

PS C:\Demos> xitems C:\Demos\Stock.xml "GroceryList/Stock/Dept/Area",
  "GroceryList/Stock/Dept/Name" | Group-object -property value
Count Name                      Group
----- ----                      -----
    1 Round                     {Area}
    2 Produce                   {Name, Name}
    1 Beef                      {Area}
    1 Meat                      {Name}
    1 Leaf                      {Area}
PS C:\Demos>

This may seem a strange query, but it does show us, for example, that there are two Stock elements that have Name children whose value is “Produce”.

So far we have seen the basics of using the xitems function and that it shares much in common with xeval. Let’s now take a look at the implementation of xitems.

First of all xitems is an alias for get-XSLT_XPathSelection. As the names implies this function is making an XPath selection.

filter get-XSLT_XPathSelection
{
param($nav, [array]$expressions, [hashtable]$namespaces)
if($nav -is [string])
{
$nav = get-XSLT_XPathNavigator $nav
}
if($nav -isnot [System.Xml.XPath.XPathNavigator]) 
{ throw "String path or XPathNavigator required"}
$nm = get-XSLT_NamespaceManager $nav.NameTable $namespaces
$xpathExpression = "";
foreach($exp in $expressions)
{
if($xpathExpression -ne "")
{
$xpathExpression += " | ";
}
$xpathExpression += $exp
}
$nodes = $nav.Clone().Select($xpathExpression, $nm);
$nodes;
}

The xitems function starts off the same as the xeval function that we looked at in the Processing XML with PowerShell article, it has a parameters an untyped $nav, an array and a hashtable. Just as xeval does, xitems converts a string to an XPathNavigator. It then builds a namespace manager and iterates through the expressions that were passed in, just like xeval does. In fact the only real differences from xeval is that xitems concatenates the selection expressions using the XPath alternate operator, “|” and uses Select instead of Evaluate on the XPathNavigator that was passed in.

So in conclusion we can see that using xitems is really easier and more consistent than using the dotted syntax and Item method that the PowerShell XML object model uses, and is a lot more capable. Of course you will have to learn about XPath to fully exploit those capabilities, but it will be well worth you effort doing so.

posted @ 1:12 PM | Feedback (3)


 
   
 
© 2004 Pluralsight.
Visual Design by Studio Creativa
Privacy Policy