About UsCommunityTrainingContent DevelopmentContact

Blogs
Pluralsight
Course Schedule
Scott Allen
Craig Andera
Mark Baciak
Don Box
Keith Brown
John CJ
Tim Ewald
Jon Fancey
Jon Flanders
Vijay Gajjala
Kirill Gavrylyuk
Ian Griffiths
Martin Gudgin
Jim Johnson
John Justice
Mike Henderson
Joe Hummel
Matt Milner
Ted Neward
Fritz Onion
Brian Randell
Jeffrey Schlimmer
Aaron Skonnard
Dan Sullivan
Herb Sutter
Doug Walter
Jim Wilson
Mike Woodring

My Links
Home
Contact
Login

Blog Stats
Posts - 19
Stories - 0
Comments - 52
Trackbacks - 23

Archives
Mar, 2007 (1)
Feb, 2007 (1)
Nov, 2006 (4)
Oct, 2006 (3)
Sep, 2006 (1)
Aug, 2006 (2)
Jul, 2006 (2)
Apr, 2006 (1)
Jan, 2006 (1)
Dec, 2005 (1)
Sep, 2005 (2)

Post Categories
PowerShell(rss)
SQL Server(rss)
XML(rss)


.NET, XML, SQL and Doing Things as Time Allows

Monday, March 12, 2007

The next article in my series on PowerShell and SMO, PowerSMO At Work Part II is up on Simple-Talk.com now.

Dan

 

 

 

posted @ 10:36 AM | Feedback (0)

Monday, February 19, 2007

I'm working on a series of articles on PowerSMO, my combination of PowerShell and SMO, for http://www.simple-talk.com. The first few are on the site now.

Some of the topics in these articles are covered in the Applied SQL Server 2005 course.

Dan

 

posted @ 6:22 AM | Feedback (1)

Tuesday, November 28, 2006

In our previous blog article Processing XML with PowerShell we looked at using XPath expressions to do calculations that used XML as input. One of the things that this article pointed out was that you often can do an entire calculation within an XPath expression. Sometimes, however, you want to read the XML file, pull parts out of it and process them outside of the XML file. One of the ways of doing this is to use the dotted syntax and the Item ParameterizedProperty features of PowerShell so that you can treat XML as an object.

We are going to start off by looking at using the PowerShell object model for XML to pull apart an XML file so that we can process it, then we are going to look at an extension function aliased as xitems to do the same thing, but typically with less effort and more capabilities. Lastly we will look at how the xitems function is implemented. You can download the script for making this function and the sample files at http://www.pluralsight.com/dan/samples/ProcessingXMLPowershell-2.zip. Note that this will include the XSLT.ps1 script, updated for the xitems function, that was include in the ProcessingXMLPowerShell.zip file.

We will start by looking at a file named Stock.xml. The Stock.xml file looks like:

<GroceryList
ID = "A-24"
>
<Stock>
    <Dept>
    <Area>Round</Area>
    <Name>Produce</Name>
    </Dept>
    <Name>Orange</Name>
<Price>3.41</Price>
</Stock>
<Stock>
    <Dept>
    <Area>Beef</Area>
    <Name>Meat</Name>
    </Dept>
    <Name>Steak</Name>
    <Price >13.20</Price>
</Stock>
<Stock>
    <Dept>
    <Area>Leaf</Area>
    <Name>Produce</Name>
    </Dept>
    <Name>Lettuce</Name>
    <Price>1.36</Price>
</Stock>
</GroceryList>

Note that each Dept element includes both an Area and Name element. We want to find out the names of the departments in the stock.xml file. We can do this by piping all of the department names into a select-object -unique cmdlet. Here is a script that does just that.

PS C:\Demos> [xml]$s = get-content C:\Demos\Stock.xml
PS C:\Demos> $s.GroceryList.Stock | %{$_.Dept.Name} | select-object -unique
Produce
Meat
PS C:\Demos>

Here we are using the dotted syntax to extract each Stock element from the stock.xml file and pipe it into a following pipeline segment. In that second pipeline segment we use the dotted syntax again to extract the Name from the Dept element of current pipeline object; The current pipeline object in this case is the Stock element. Then we pipe all the names we have found into the select-object cmdlet that makes a unique list of those names.

It’s easy to come up with a rather wordy description of what this script is doing; It is saying something like “Give me all of the elements named GroceryList at the root of the $s XML document; then for each one of these get me all of its children whose name is Stock; then for each one of these get me all of its children whose name is Dept; then for each on of these get me all of its children whose name is Name.” This sort of description could be applied to almost anything that manages hierarchical data including XPath, which we will be looking at later when we examine the xitems function.

Thinking of this hierarchical description makes it seem that the script below would be alternate, more simple, way to find the names of the departments.

PS C:\Demos> [xml]$s = get-content C:\Demos\Stock.xml
PS C:\Demos> $s.GroceryList.Stock.Dept.Name | select-object -unique
PS C:\Demos>

This, seemingly obvious, way of getting the children of the children, etc. did not produce any results. The dotted syntax is somewhat limited because of the way PowerShell models XML. If we take closer look at what is actually being returned for the GroceryList and Stock elements we will see why.

PS C:\Demos> $s.GroceryList.GetType()
IsPublic IsSerial Name       BaseType
-------- -------- ----      --------
True     False    XmlElement System.Xml.XmlLinkedNode
PS C:\Demos> $s.GroceryList.Stock.GetType()
IsPublic IsSerial Name      BaseType
-------- -------- ----      --------
True     True     Object[]  System.Array
PS C:\Demos>

The type of the GroceryList is an XmlElement as you might suspect, after all we are working with XML. However the type of Stock is Array, not an XmlElement or XmlElement[], and that is why it lacks a Dept property. Now let’s look at what happens when instead of trying to access the Dept element from Stock we pipe the array into another pipeline segment and see what the type is.

PS C:\Demos> $s3.GroceryList.Stock | %{$_.GetType()}
IsPublic IsSerial Name        BaseType
-------- -------- ----        --------
True     False    XmlElement  System.Xml.XmlLinkedNode
True     False    XmlElement  System.Xml.XmlLinkedNode
True     False    XmlElement  System.Xml.XmlLinkedNode
PS C:\Demos>

It turns out that the Stock array is an array of XmlElements and piping it into a pipeline segment enumerates that array. The result is that inside the pipeline segment $_ is an XmlElement that has a Dept child element.

So to drill into an XML hierarchy using the PowerShell object model you must break up what seems like a natural dotted syntax into pipeline segments, one pipeline segment for every two levels of depth you want to go into the XML hierarchy. The word description earlier of what is happening here, however, is in effect the definition of an XPath construct called a LocationPath.

We used XPath expressions in Processing XML with PowerShell to do processing of an XML file. XPath is really a simple language, an expression can only produce one of four datatypes; A number, a string, a boolean, or a node set. We were using those first three scalar types to do calculations with the xeval function.

A node set is what the name seems to imply, it is a set of XML nodes. It might be a set of XML elements or attributes or a mixture of these an other kinds of XML nodes. The data model in the XPath Recommendation defines seven kinds of nodes that might be found in an XML documents. A LocationPath is an XPath expression that produces a node set instead of a scalar value. It is given a special name because it is such a common idiom to use XPath to produce a node set. Here is a LocationPath that will produce a nodeset that consists of all of the Name elements from the stock.xml file:

GroceryList/Stock/Dept/Name

We can use this LocationPath with the xitems function to find the names of the departments in the stock.xml file, as we did at the beginning of this article.

PS C:\Demos> xitems C:\Demos\Stock.xml "GroceryList/Stock/Dept/Name" |
 select-object -property value -unique
Value
-----
Produce
Meat
PS C:\Demos>

The xitems function requires at least two parameters. The first is the path to the XML file we want to process, or as we will later see an XPathNavigator. The second argument is an XPath LocationPath. You can see we are able to use a more simple model to specify the parts of the XML file we wish to process.

In the Processing XML with PowerShell blog article we looked at processing XML files that contained namespaces and some of the issues you can run into when using the PowerShell object model of XML. The xitems function uses the same technique as the xeval did, you pass a dictionary that contains the mapping of prefixes and namespaces to it. Here is another file that uses namespaces, stockNS.xml.

<GroceryList xmlns='urn:prices'
xmlns:loc='urn:location'
xmlns:ident='urn:identity'
ID = "A-24"
>
<Stock xmlns="urn:inventory">
    <loc:Dept>3rd floor</loc:Dept>
    <ident:Dept>
    <Area>Round</Area>
    <Name>Produce</Name>
    </ident:Dept>
    <Name>Orange</Name>
<Price>3.41</Price>
</Stock>
<Stock xmlns="urn:inventory">
    <loc:Dept>2nd floor</loc:Dept>
    <ident:Dept>
    <Area>Beef</Area>
    <Name>Meat</Name>
    </ident:Dept>
    <Name>Steak</Name>
    <Price >13.20</Price>
</Stock>
<Stock xmlns="urn:inventory">
    <loc:Dept>3rd floor</loc:Dept>
    <ident:Dept>
    <Area>Leaf</Area>
    <Name>Produce</Name>
    </ident:Dept>
    <Name>Lettuce</Name>
    <Price>1.36</Price>
</Stock>
</GroceryList>

This file uses quite a few namespaces, just to make things interesting and because typically when namespaces are used they are often used a lot. Lets do our department names calculation again, using the PowerShell xml object model.

PS C:\Demos> [xml]$s = get-content C:\Demos\StockNS.xml
PS C:\Demos> $s.grocerylist.stock 
  | %{$_.Item("Dept", "urn:identity").Name} | select-object -unique
Produce
Meat
PS C:\Demos>

Here we have used the Item property that PowerShell adds to an XML element to make it possible to access elements that are distinguished by their namespace. Here is a script that uses xitems to get the same results:

PS C:\Demos> xitems C:\Demos\StockNS.xml `
  "p:GroceryList/i:Stock/id:Dept/i:Name" `
  @{p="urn:prices";i="urn:inventory";id="urn:identity"} |
  select-object -property value -unique
Value
-----
Produce
Meat

PS C:\Demos>

You might look at this and say that the PowerShell object model for XML is in at least one respect much easier to use than one based on XPath that xitems uses because in the it does not require the specification of the namespaces for the GroceryList, Stock and Name elements. However namespaces are part of an XML file for a reason, to make sure that names in common usage can easily be distinguished. Look at this alternate version of the StockNS.xml file:

<GroceryList xmlns='urn:prices'
xmlns:loc='urn:location'
xmlns:ident='urn:identity'
ID = "A-24"
>
<Stock xmlns="urn:inventory">
    <loc:Dept>3rd floor</loc:Dept>
    <ident:Dept>
    <Area>Round</Area>
    <Name>Produce</Name>
    </ident:Dept>
    <Name>Orange</Name>
<Price>3.41</Price>
</Stock>
<Stock xmlns="urn:ignore">
    <loc:Dept>2nd floor</loc:Dept>
    <ident:Dept>
    <Area>Beef</Area>
    <Name>Meat</Name>
    </ident:Dept>
    <Name>Steak</Name>
    <Price >13.20</Price>
</Stock>
<Stock xmlns="urn:inventory">
    <loc:Dept>3rd floor</loc:Dept>
    <ident:Dept>
    <Area>Leaf</Area>
    <Name>Produce</Name>
    </ident:Dept>
    <Name>Lettuce</Name>
    <Price>1.36</Price>
</Stock>
</GroceryList>

Notice that the second Stock element is in the “urn:ignore” namespace, not the “urn:inventory. The PowerShell dotted syntax would have included this element in its selection of department names, which may not be what was really intended. In order to be sure what you are processing XML in the namespace you intend with the PowerShell XML object model you really have to use the Item method at every level of the XML hierarchy. Your mileage may vary, but using XPath to select items from an XML file will in general be easier than using the PowerShell object model of XML and more capable.

An XPath LocationPath can be thought of as a filter; You use it to filter out the parts of the document you are not interested in. This filter can be about as fine-grained as you would like. For example what if we wanted the department names on the 3rd floor?

PS C:\Demos> xitems C:\Demos\StockNS.xml `
   "p:GroceryList/i:Stock[loc:Dept='3rd floor']/id:Dept/i:Name" `
   @{p="urn:prices";i="urn:inventory";id="urn:identity";loc="urn:location"} |
   select-object -property value -unique
Value
-----
Produce
PS C:\Demos>

The LocationPath used by this script has a predicate in it, that only selects Stock elements that have a loc:Dept child element whose value is “3rd floor”.

The xitems function is produces an XPath navigator, and it can also take a XPathNavigator as input. This means you can use the results of one xitems function as input to another. Here is an example using stock.xml, the file without the namespaces to reduce the clutter:

PS C:\Demos> xitems C:\Demos\Stock.xml "GroceryList/Stock" |
  %{xitems $_ "Dept/Name"} | Select-Object -property value -unique
Value
-----
Produce
Meat

PS C:\Demos>

This particular script is analogous to the first script that we presented in this article. We’ve broken the selection into two pipeline segments just to show that the second segment could use the output of the first as input. The first pipeline segment pipes an XPathNavigator into the second pipeline segment which uses that XPathNavigator as input to another xitems function.

From the Processing XML with PowerShell we know that xeval can also process an XPathNavigator, so the output of the xitems function can also be passed into the xeval function. Let’s try that:

PS C:\Demos> xitems C:\Demos\Stock.xml "GroceryList/Stock" |
  %{xeval $_ "string(Dept/Name)"} | Select-Object  -unique
Produce
Meat
PS C:\Demos>

In this example the second pipeline segment evaluates the result of the first pipeline segment. Note that in the third segment the Select-Object cmdlet is not using the -property value option. That is because the second segment is producing a string and a string does not have a value property.

Lastly xitems is similar to xeval in that you can pass it an array of LocationPaths and it will apply all of them to an XML file.

PS C:\Demos> xitems C:\Demos\Stock.xml "GroceryList/Stock/Dept/Area",
  "GroceryList/Stock/Dept/Name" | Group-object -property value
Count Name                      Group
----- ----                      -----
    1 Round                     {Area}
    2 Produce                   {Name, Name}
    1 Beef                      {Area}
    1 Meat                      {Name}
    1 Leaf                      {Area}
PS C:\Demos>

This may seem a strange query, but it does show us, for example, that there are two Stock elements that have Name children whose value is “Produce”.

So far we have seen the basics of using the xitems function and that it shares much in common with xeval. Let’s now take a look at the implementation of xitems.

First of all xitems is an alias for get-XSLT_XPathSelection. As the names implies this function is making an XPath selection.

filter get-XSLT_XPathSelection
{
param($nav, [array]$expressions, [hashtable]$namespaces)
if($nav -is [string])
{
$nav = get-XSLT_XPathNavigator $nav
}
if($nav -isnot [System.Xml.XPath.XPathNavigator]) 
{ throw "String path or XPathNavigator required"}
$nm = get-XSLT_NamespaceManager $nav.NameTable $namespaces
$xpathExpression = "";
foreach($exp in $expressions)
{
if($xpathExpression -ne "")
{
$xpathExpression += " | ";
}
$xpathExpression += $exp
}
$nodes = $nav.Clone().Select($xpathExpression, $nm);
$nodes;
}

The xitems function starts off the same as the xeval function that we looked at in the Processing XML with PowerShell article, it has a parameters an untyped $nav, an array and a hashtable. Just as xeval does, xitems converts a string to an XPathNavigator. It then builds a namespace manager and iterates through the expressions that were passed in, just like xeval does. In fact the only real differences from xeval is that xitems concatenates the selection expressions using the XPath alternate operator, “|” and uses Select instead of Evaluate on the XPathNavigator that was passed in.

So in conclusion we can see that using xitems is really easier and more consistent than using the dotted syntax and Item method that the PowerShell XML object model uses, and is a lot more capable. Of course you will have to learn about XPath to fully exploit those capabilities, but it will be well worth you effort doing so.

posted @ 1:12 PM | Feedback (3)

Saturday, November 25, 2006

There are a couple of powerful technologies for processing native XML, XPath and XSLT. People often avoid processing native XML but instead convert the XML to an object model in a language they are used to and do “conventional” programming on that model. Even PowerShell itself does this with its fairly straightforward dotted syntax for accessing parts of an XML document and of course .NET, web service technologies, and SQL Server have their own ways to morph XML into a familiar object model.

There are probably a number reasons for this not the least of which is syntactic comfort… with some practice you actually can drive nails with a screwdriver and then you only need to learn how to use one tool to build a house. XSLT itself is often criticized as being too verbose but that is not really the case. And lastly the programming models for XPath and XSLT are different than that used in languages like C# or VB.NET; They are much more like SQL in that you don’t actually write a program but instead write a set of rules and throw data at them.

However if you are going to bump into XML in your travels, and you can be pretty sure that you will, it is really worth your while to become comfortable with at least the basics of XPath and XSLT because that knowledge will make a lot of programming jobs a lot easier. Let’s take a look at a simple example to see this. Here is a grocery list, XML-style, in the file groceries.xml

<GroceryList>
<Item>
<Dept>Produce</Dept><Name>Orange</Name><Price>3.20</Price>
</Item>
<Item>
<Dept>Meat</Dept><Name>Steak</Name><Price>13.20</Price>
</Item>
<Item>
<Dept>Produce</Dept><Name>Lettuce</Name><Price>1.34</Price>
</Item>
<Item>
<Dept>Meat</Dept><Name>Ham</Name><Price>11.41</Price>
</Item>
</GroceryList>

We can calculate the total of all of groceries using the PowerShell object model of XML with the following script;

PS C:\Demos> [xml]$list = get-content .\groceries.xml
PS C:\Demos> $list.GroceryList.Item | &{begin {$sum=0}
 process{$sum += $_.Price} end {$sum}}  
29.15
PS C:\Demos>

There are other ways to do this in PowerShell, but all involve iterating through the items to produce a sum. It turns out there is a simple XPath expression that calculates sum of the prices of the items in the list:

sum(GroceryList/Item/Price)

In fact it would be kind of nice if we had a way to “execute” and XPath expression easily in PowerShell. How about this?

PS C:\Demos> xeval groceries.xml "sum(GroceryList/Item/Price)"
29.15
PS C:\Demos>

This blog article is about processing XML using PowerShell and the typical sorts of things you run into when you do this. It uses some extension functions, xeval and xnav are their aliases, to do this processing. The xeval function is used to process an XML file using XPath expressions. The xnav function is used to turn literal XML into an XPathNavigator. Later blog articles will cover other ways to process XML using PowerShell.

A script to build these functions and their associated aliases is in a file named XSLT.ps1. This file and the examples in this blog article are available at http://www.pluralsight.com/dan/samples/ProcessingXMLPowershell.zip. These extension functions are not really any harder to use than the XML support built into PowerShell but are quite a bit more capable in what they can accomplish. After we look at using these extension functions we will look inside of XSLT.ps1 and see how it works.

The XSLT.ps1 file actually has some other extension functions that are not discussed in this blog article but will be in a future one.

In the first example of using xeval we just looked at, the first argument to the xeval function is the file path for the XML file you want to process. The second argument is the XPath expression you want evaluated. Of course to make good use of xeval you will have to be familiar with XPath. XPath is a W3 recommendation and is at http://www.w3.org/TR/xpath.

The XPath recommendation is certainly worth reading and contains many example of XPath expressions. Another good source to have at your side is “Essential XML Quick Reference” published by Addison-Wesley and written by Aaron Skonnard and Martin Gudgen.

Let’s start by looking at one of the issues you run into when working with XML. XML is often treated as though it were text and that is how PowerShell treats it. But XML is not plain ol’ text and the following examples will show that. We have another version of our xml grocery list in a file named GroceriesUC.xml. Let’s use our PowerShell script to process it.

PS C:\Demos> [xml]$list = get-content groceriesuc.xml
Cannot convert value "System.Object[]" to type 
"System.Xml.XmlDocument". 
Error: "Root element is missing."
At line:1 char:11
+ [xml]$list  <<<< = get-content GroceriesUC.xml
PS C:\Demos>

Hmmmm, that generated an error. What’s going on here?

We often think of files as containing text, that is the characters we see on the printed page. But files don’t contain text, the are just a sequence of bytes. When someone gives you a “text” file you must know how that text was encoded into a sequence of bytes in order to be able to read it. PowerShell gives us a little help here in that the get-content cmdlet lets you specify the encoding of the file if you know it, or “unknown” if you don’t. Well we don’t know the encoding of the file so let’s tell PowerShell that the encoding is unknown and see what happens.

PS C:\Demos> [xml]$list = get-content GroceriesUC.xml -encoding unknown
Cannot convert value "????????????????????????????????????????????
?????????????????????????????????????????????????????????????????????
 to type "System.Xml.XmlDocument". Error: 
 "Data at the root level is invalid. Line 1, position 1."
At line:1 char:11
+ [xml]$list  <<<< = get-content GroceriesUC.xml -encoding unknown
PS C:\Demos>

Looks like we are out of luck here too. It turns out the encoding of the file is UTF-16BE. That’s a standard encoding used for XML files that is a sequence of words with the high-order byte of the word coming first. You might see it in XML that is generated on non-Intel compatible processors. Now that we know that we know actual encoding we can pass the information onto PowerShell.

PS C:\Demos> [xml]$list = get-content GroceriesUC.xml 
    -encoding BigEndianUnicode
PS C:\Demos> $list.GroceryList.Item | 
    &{begin {$sum=0} process{$sum += $_.Price} end {$sum}}
29.15
PS C:\Demos>

Bottom line is when it comes to text unless you know the actual encoding you can’t depend on being able to read it. Earlier we said that XML wasn’t really text. To see what this means lets try that xeval function again on the GroceriesUC.xml file.

PS C:\Demos> xeval GroceriesUC.xml "sum(GroceryList/Item/Price)"
29.15
PS C:\Demos>

It works just fine and we don’t have to tell it what the encoding is. The reason for this is a requirement of every XML processor, i.e. a support library for XML such as the one in .NET that xeval uses, must be able to unambiguously figure out the encoding used in an XML file without any outside help. This is thought by many to be the key feature of XML and certainly is one of the reasons for its wide use today. It works so well that most people don’t even know it is a feature!

The built in processing in PowerShell using get-content makes a non-compliant XML processor. In some cases this isn’t that important but you should keep in mind that in the general case it is not useful for processing XML. If you want do know the details of how this “self-encoding” in XML works there is an explanation of it in Appendix F of the W3 Extensible Markup Language XML recommendation at http://www.w3.org/TR/xml/.

There is another issue that comes up when you deal with XML, namespaces. There are some who feel that namespaces are an unnecessary complication to XML, but they are important enough to have their own specification, Namespaces in XML which is at http://www.w3.org/TR/xml-names/. For those with an interest in such things the Extensible Markup Language XML is really just a grammar with a little over 80 productions with lots of comments in it, and Namespaces in XML just adds a few productions to that grammar. Regardless of how you feel about namespaces you will have to deal with them. Here is a different version of our grocery list. It is in the file named GroceriesNS.xml.

<GroceryList xmlns="urn:foo"
xmlns:r="urn:retail"
xmlns:w="urn:wholesale" >
<Stock>
<Dept>Produce</Dept><Name>Orange</Name>
<w:Price>3.20</w:Price><r:Price>4.20</w:Price>
</Stock>
<Stock>
<Dept>Meat</Dept><Name>Steak</Name>
<r:Price >14.20</r:Price><w:Price >13.20</w:Price>
</Stock>
<Stock>
<Dept>Produce</Dept><Name>Lettuce</Name>
<w:Price>1.34</w:Price><r:Price>2.34</r:Price>
</Stock>
<Stock>
<Dept>Meat</Dept><Name>Ham</Name>
<w:Price>11.41</w:Price><r:Price>14.41</r:Price>
</Stock>
</GroceryList>

This file is different from the Groceries.xml file in three ways. One is that it uses namespaces. Another is that it contains both a wholesale and a retail price for each item. It also uses Stock elements instead of Item elements; We will see why in a second. The Price elements are distinguished by their namespace, the ones prefixed with “r” are retail prices. The prices in the r:Price elements are the same as the corresponding Price elements in the Groceries.xml file. Let’s use PowerShell’s object model of XML to calculate the sum of the retail prices. PowerShell sees two Price elements under the Stock element, so it makes an array out of them. We will have pick which one to sum up.

PS C:\Demos> $list = get-content GroceriesNS.xml
PS C:\Demos> $list.GroceryList.Stock | 
    &{begin {$sum=0} process{$sum += $_.Price[1]} end {$sum}}
30.15
PS C:\Demos>

The retail prices in the GroceriesNS.xml file are the same as the ones in the unqualified prices in the Groceries.xml file so we should get the same answer as before, but we don’t. The problem we have run into is that Price elements are distinguished only by their namespace and not by there position in the file. Note that in the second Stock element in the file the wholesale price comes after the retail price. So we have to make sure that we pick the correct price element.

To distinguish a Price element we have to use a ParameterizedProperty named Item that PowerShell adds to an XML element. In many cases you will find it difficult to process XML using the PowerShell object model if the XML contains any Item elements because PowerShell uses this name for the ParameterizedProperty it adds to XML elements. This is why we changed the name of the Item element to Stock. If we had not made this change we would not have been able to process this XML file using the PowerShell object model of XML.

In any case the Item property allows us to specify both the name and the namespace of the element we want.

PS C:\Demos> $list = get-content GroceriesNS.xml
PS C:\Demos> $list.GroceryList.Stock | 
  %{$_.Item("Price", "urn:retail")} | 
  &{begin {$sum = 0} process {$sum += $_.get_InnerText()} end {$sum}}
29.15
PS C:\Demos>

Now we get the 29.15 just as we did when we processed the Groceries.xml file.

Now let’s do the same thing using xeval function.

PS C:\Demos> xeval GroceriesNS.xml 
"sum(a:GroceryList/a:Stock/r:Price)" @{r="urn:retail";a="urn:foo"}
29.15
PS C:\Demos>

In this example xeval function has a third argument that is a dictionary that maps prefixes to the namespaces they represent in the XPath expression. You can see the GroceryList and Stock end up in the “urn:foo” namespace because of the “a” prefix and likewise Price ends up in the “urn:retail” namespace. Note that the prefix used in the XPath expression is not necessarily the same as that in the source XML file. There is no requirement the prefix used in an XPath expression be the same as that in the source XML file being processed; The key thing is that is specifies the proper namespace. Note that in the GroceriesNS.xml file the GroceryList and Stock element had no prefix but that the default namespace for the file was “urn:foo”.

Let’s look at some more things we can do with xeval. The second parameter of xeval may be an array of XPath expressions. xeval will evaluate each of these expressions.

PS C:\Demos> xeval GroceriesNS.xml "sum(a:GroceryList/a:Stock/r:Price)",
    "count(a:GroceryList/a:Stock)" @{a="urn:foo";r="urn:retail"}
29.15
4
PS C:\Demos>

Here we calculated the sum of the retail prices and number of Stock items. Note that this example makes use of the fact that in PowerShell the “,” operator makes an array of the arguments it joins. Let’s carry this one step further.

PS C:\Demos> xeval GroceriesNS.xml "sum(a:GroceryList/a:Stock/r:Price)",
    "count(a:GroceryList/a:Stock)",
    "sum(a:GroceryList/a:Stock/r:Price)
    div count(a:GroceryList/a:Stock)" @{a="urn:foo";r="urn:retail"}
29.15
4
7.2875
PS C:\Demos>

Here, beside the sum of the prices and the number of stock items, we calculate the average price of the stock items. The important point of these last few examples is that it is very common to calculate some value based on the content of an XML file. These calculations can be embedded in an XPath expression and you never have to “read”, i.e. pull out and interpret parts of, the XML file to do this.

You might think that all repeated a:GroceryList etc. might be inefficient or at least is tedious. First of all it’s not really inefficient at all to calculate a path multiple times in an XPath expression because the XPath engine that is evaluating these expression caches paths and reuses them when they appear again. As far as the tedium of typing them multiple times you can leverage PowerShell itself to simplify that.

PS C:\Demos> $s = "a:GroceryList/a:Stock"
PS C:\Demos> $p = "$s/r:Price"
PS C:\Demos> xeval GroceriesNS.xml "sum($p)",
    "count($s)",
    "sum($p) div count($s)" @{a="urn:foo";r="urn:retail"}
29.15
4
7.2875
PS C:\Demos>

Here we have made use of the fact that PowerShell will build a string out of a combination of literal text and variables. If the format of the XML file is pretty regular you can make the XPath expression used for the evaluation even more simple.

PS C:\Demos> xeval Groceries.xml "sum(//r:Price)" @{r="urn:retail"}
29.15
PS C:\Demos>

Of course here again you need some knowledge of XPath to simplify things. The “//” part of the XPath expression in this case really means “Find all the r:Price elements in the file.”

There is a hidden value in using XPath expressions to do calculations on an XML file; That expression can be used by anyone using any technology that implements XML support to do the same calculation on that file. In other words the XPath expression is a platform independent way of specifying how a calculation is done, it is not limited to PowerShell.

Sometimes you will have a literal string for your xml instead of a file. You can’t pass this directly to the xeval function because it will interpret that string as a file path and attempt to load a file.

The implementation of xeval internally uses an XPathNavigator to process the XML that is passed to it. This blog article isn’t going discuss the details of how XPathNavigator works, but xnav is an alias for a function that converts literal XML into an XPathNavigator. If the first parameter passed into xeval is an XPathNavigator it will use that navigator instead of interpreting it as a file path.

Here is an example of processing literal XML.

PS C:\Demos> $nav = xnav "<Stock><sku>ee-44</sku></Stock>"
PS C:\Demos> xeval $nav "string(//sku)"
ee-44
PS C:\Demos>

This example begins by using the xnav function to make an XPathNavigator out of some literal XML. This XPathNavigator is passed into the eval function. The XPath expression passed to xeval pulls out the stockroom unit from the literal XML.

Using the pipeline in PowerShell is a great way process XML. There are a number of grocery files with names like GroceriesNS1.xml, GroceriesNS2.xml and so on that we would like to process. We would like to calculate the value of each these files. This is what the GroceriesNS1.xml file looks like.

<GroceryList xmlns="urn:foo"
xmlns:r="urn:retail"
xmlns:w="urn:wholesale" 
ID = "A-24"
>
<Stock>
<Dept>Produce</Dept><Name>Orange</Name>
<w:Price>114.20</w:Price><r:Price>3.41</r:Price>
</Stock>
<Stock>
<Dept>Meat</Dept><Name>Steak</Name>
<r:Price >13.20</r:Price><w:Price >14.20</w:Price>
</Stock>
<Stock>
<Dept>Produce</Dept><Name>Lettuce</Name>
<w:Price>21.34</w:Price><r:Price>1.36</r:Price>
</Stock>
</GroceryList>

First of all it’s straightforward to get the names of these files.

PS C:\Demos> get-childitem C:\Demos\* |
 ?{$_.Name -match "GroceriesNS\d+.xml"}
    Directory: Microsoft.PowerShell.Core\FileSystem::C:\Demos

Mode                LastWriteTime     Length Name
----                -------------     ------ ----
-a---        11/25/2006  10:15 AM        438 GroceriesNS1.xml
-a---        11/25/2006  10:14 AM        324 GroceriesNS2.xml
-a---        11/25/2006  10:16 AM        438 GroceriesNS3.xml

PS C:\Demos>

Note that the GroceryList element has an attribute name ID that identifies that list. We want to include that ID in our results.

PS C:\Demos> get-childitem C:\Demos\* |
 ?{$_.Name -match "GroceriesNS\d+.xml"} |
 %{xeval "$_" "string(f:GroceryList/@ID)", "sum(//r:Price)" `
 @{f="urn:foo";r="urn:retail"} }
A-24
17.97
31
54.4
109
57.97
PS C:\Demos>

In this example we pipe the file names into a script block that uses the xeval function. This uses XPath expressions to get both the ID of the GroceryList and the sum of its Price elements. Note the backtick and the end of the third line to insure the continuation of the command line.

The output we get is ID followed by sum. We might like something that produces a single line per GroceryList. We could pipe these results into another script block that aggregated these results by the pair… or we could use XPath to do the same thing.

PS C:\Demos> get-childitem C:\BlogArts\ProcessingXMLPowerShell\* |
 ?{$_.Name -match "GroceriesNS\d+.xml"} |
 %{xeval "$_" "concat(string(f:GroceryList/@ID), ' : ', sum(//r:Price))" `
 @{f="urn:foo";r="urn:retail"} }
A-24 : 17.97
31 : 54.400000000000006
109 : 57.97

Here we use the XPath concat function to produce a line per GroceryList report of the sum of the prices of each grocery list. You can produce some pretty fancy reports using just XPath expressions, but if they are much more complicated than the one in this example you will find it somewhat tedious to code them up. For more complicated reports XSLT is really a better choice and we will be looking at that in a later blog article. In any case this example has defined a report in terms of an XPath expression which anyone on any platform that implements XML can produce the same report. This example didn’t “code up” a report it made a rule that defined how the report was to be produced.

Now let’s look at the implementation. We will start with the eval function.

filter get-XSLT_XPathEvaluate
{
param($nav, [array]$computations, [hashtable]$namespaces)
if($nav -is [string])
{
$nav = get-XSLT_XPathNavigator $nav
}
if($nav -isnot [System.Xml.XPath.XPathNavigator]) 
  { throw "String file path or XPathNavigator required"}
$nm = get-XSLT_NamespaceManager $nav.NameTable $namespaces
foreach($n in $nav)
{
foreach($compute in $computations)
{
$n.Clone().Evaluate($compute, $nm)
}
}
}
set-alias xeval get-XSLT_XPathEvaluate

The xeval function uses three parameters. The first is a string or an XPathNavigator, the second is an array of XPath expressions, and the last is a dictionary of namespace mappings. It tests the first parameter to see if it is a string. If it is it uses the get-XSLTXPathNavigator function to make an XPathNavigator from the file path. We will look at the get-XSLTXPathNavigator function shortly.

Next it checks to make sure that the $nav variable is in fact an XPathNavigator and throws an error if it isn’t.

In order to use namespace with an XPathNavigator you need a construct called an XmlNamespaceManager. This construct holds the mappings of prefixes to namespaces. Both XPathNavigators and XmlDocuments store their associated XML in a non-textual, binary form for efficiency. Internally another construct, a NameTable, maintains a mapping between the names of elements and attributes, and their internal representation. The XmlNamespaceManager uses this NameTable in its constructor so that it can have the same mapping of names to internal representation that the XPathNavigator does.

Once the XmlNamespaceManager is constructed it is filled by get-XSLT_NamespaceManager function that we will look at shortly.

To do the computations the xeval function iterates through the array of XPath expression that are passed in. It uses a clone of the XPathNavigator to execute the expression. The reason it uses a clone of the XPathNavigator is the XPathNavigator is really a cursor on the XML file and we want to leave that XPathNavigator in its original state for each execution of XPath expressions being processed.

filter get-XSLT_XPathNavigator
{
param ($xml)
if($xml -is [string])
{
$xml = get-XSLT_XMLReader $xml;
$xml = get-XSLT_XPathDocument $xml
}
$nav = $xml.CreateNavigator();
$nav
}

The get-XSLT_XPathNavigator uses the string passed into it as a file path. It starts by converting the file path into an XmlReader, then uses that XmlReader to make an XPathDocument, which in turn is used to make an XPathNavigator.

filter get-XSLT_XMLReader
{
param ([string]$xmlFile)
[System.IO.FileStream]$fileStream = new-object System.IO.FileStream $xmlFile, 
    ([System.IO.FileMode]::Open),
([System.IO.FileAccess]::Read)
[System.Xml.XmlTextReader]$rdr = new-object System.Xml.XmlTextReader $fileStream
$rdr
}

The get-XSLT_XMLReader function opens a FileStream using the string passed in as the path to the file. Note that it is not using a StreamReader which would convert the file to text, it is instead reading the raw bytes in the file. The FileStream is used to make an XmlTextReader. Again, dispite its name, an XmlTextReader does not read text, it reads bytes from the FileStream and because it is a complient XML processor it is completely capable of determining the encoding of the XML that is in the byte stream.

filter get-XSLT_XPathDocument
{
param ([System.Xml.XmlReader]$xml)
$doc = new-object System.Xml.XPath.XPathDocument $xml;
$doc
}

The get-XSLT_XPathDocument function uses an XmlReader to make an XPathDocument. An XPathDocument is, in effect, a readonly XmlDocument except that the only thing you can do with it is make an XPathNavigator out of it. If all you are going to do is read the content of an XML file and not modify it, and XPathDocument may be a better choice because it may be more efficient at processing XPath than the XmlDocument.Select method is.

function get-XSLT_NamespaceManager 
([System.Xml.NameTable] $nameTable, [hashtable] $namespaces)
{
$nm = new-object System.Xml.XmlNamespaceManager $NameTable
foreach($key in $namespaces.keys)
{
$nm.AddNamespace($key, $namespaces.$key);
}
,$nm
}

The get-XSLT_NamespaceManager has two inputs, a NameTable and a dictionary of namespace mapping. It starts by making an XmlNamespaceManager. It then iterates through the keys in the dictionary and uses the key and it associated value to add namespace mappings to the XmlNamespaceManager. Note that it uses the “,” operator when it returns the XmlNamespaceManager. The XmlNamespaceManager implements IEnumerable and returning it inside of an array prevents the XmlNamespace itself from being enumerated by PowerShell when it is returned, which is what we want.

Lastly the get-XSLT_LiteralXPathNavigator function is used to make an XPathNavigator out of literal XML.

filter get-XSLT_LiteralXPathNavigator
{
param ([string]$literalXml)
[xml]$xml = $literalXml;
$xml.CreateNavigator();
}

This is a case where assuming the XML is in fact text is ok, becuase it is text and we use the PowerShell implementation of XML to get a navigator from the string.

You probably have noticed the all of these functions have an “XSLT_” internal prefix in them but this blog article doesn’t show anything about the use of XSLT. There is more comming in blog articles that follow this one…

So where are we at? The xeval function can handle a lot of the kinds of processing that is typically done with XML and has none of the limitations that the PowerShell implemenation of XML does. You do have to learn a bit about XPath, the references that were sited earlier are a good place to start and there are XPath tutorials all over the web. YMMV, but typically the best way to process XML is to process XML rather than turn it into an object model. It will be worth you effort at learning XPath… after all it is easier to learn XPath than to learn Perl:-).

posted @ 11:25 AM | Feedback (6)

Wednesday, November 08, 2006

Now that we have PowerSMO! we can start making use of it. The first example will be building a test database. Whenever I work on an new database application or write labs for a course that involves databases I need to make test database with some data in them. T-SQL is just fine for defining tables and such, but as soon as I want to fill those tables with some sample data it gets a bit tedious… I just want to directly inject some C# into my T-SQL to manipulate strings, generate random data and so on. Now with PowerSMO! I can, in effect, do just that.

This blog article assumes you have some familiarity with PowerSMO!, SQL Server, and PowerShell. The purpose of this blog article is to show how to make use of PowerSMO! to do some typical database operations.

We start by making a database.

PS C:\demos> $server = SMO_Server
PS C:\demos> $testdb = SMO_Database $server "TestDB_1"
PS C:\demos> $testdb.DatabaseOptions.RecoveryModel="Simple"
PS C:\demos> $testdb.Create()
PS C:\demos>

To start with we need a reference to an instance of SQL Server. We are using the get-SMO_Server function from PowerSMO! to get this reference. Note that short form of the function is used, dropping the “get-” prefix. This is a handy feature of PowerShell.

$testdb is the test database we are making. SMO_Database requires a reference to a server and a name. I usually have a common prefix for test database names, we will how that is useful shortly. Also I set the RecoveryMode to “Simple” so I don’t end up with a big log on database that is really a throwaway anyhow. Lastly the $testdb doesn’t really exist until Create() is called on it, that’s what makes SMO issue the appropriate T-SQL commands to the server to create the database.

Since I, and probably you too, have a standard way to create a test database we should we should capture our ad hoc script in a function so we can reuse it. Actually we are going to make two functions here, one to create a test database name, and another to make an actual test database. We will see shortly that this will make it a lot easier to maintain things over time.

function global:get-TestDatabaseName
([string]$suffix)
{
"TestDB_{0}" -f $suffix;
}
function global:new-TestDatabase
(
[Microsoft.SqlServer.Management.Smo.Server]$server,
[string]$name_suffix)
{
$name = get-TestDatabaseName $name_suffix;
$testdb = SMO_Database $server $name;
$testdb.DatabaseOptions.RecoveryModel="Simple";
$testdb.Create();
}

get-TestDatabaseName is used to create a name for a test database. All test databases have the prefix “TestDB_”. The -f operator is the PowerShell formatting operator. It replaces {0} with the first parameter that follows it and {1} with the second and so on.

The new-TestDatabase requires a Server and a string as input. The body of the function duplicates our ad hoc script but uses the get-TestDatabaseName to generate the name of the database we want to add. Let’s try it out…

PS C:\demos> new-TestDatabase $server 3
PS C:\demos> new-TestDatabase $server 4
PS C:\demos> new-TestDatabase $server 5

It’s pretty easy to add a lot of databases. In fact you will probably find that often you end up with a lot of trash databases you want to get rid of and sometimes you just want to clean out your test databases and start over. Because we have a standard prefix for out test database that fairly easy to do. To make things even easier let’s add a function that finds all of our test databases.

function global:get-TestDatabases
([Microsoft.SqlServer.Management.Smo.Server]$server)
{
$pat = get-TestDatabaseName "*";
$server.Databases | ?{$_.Name -like $pat}
}

The get-TestDatabases function makes a pattern for test database names using the get-TestDatabaseName function. It then passes the databases it finds in $server through a -like filter that uses this pattern to eliminate the database that are not test databases.

PS C:\demos> get-TestDataBases $server | %{$_.name}
TestDB_1
TestDB_2
TestDB_3
TestDB_4
TestDB_5
PS C:\demos>

Now we can see we have made a good sized population of test databases. However it is pretty easy to get rid of all of them.

PS C:\demos> TestDatabases $server | %{$_.Drop()}
PS C:\demos> TestDatabases $server | %{$_.name}
PS C:\demos>

We use the get-TestDatabases function to pipe each of the databases into a script that calls the Drop() method on each one. A quick check shows we were successful. Ok, let’s put our test database back into the server for the rest of the things we want to do.

PS C:\demos> new-TestDatabase $server 1
PS C:\demos> $testdb = $server.Databases[(TestDatabaseName 1)]
PS C:\demos> $testdb.Name
TestDB_1
PS C:\demos>

My test systems have a few Windows users, Dawn, Don, and SqlAdmin, that I use for testing. They are all ordinary Windows users with no special priveleges and the all have logins on the Sql Server instance I’m going to do testing with. We can check to see if they are there easily.

PS C:\demos> $server.logins |  %{$_.name}
Ambler
AuditLogin
BUILTIN\Administrators
Frank
Joe
MyAsmLogin
NT AUTHORITY\SYSTEM
PARSEC5\Administrator
PARSEC5\Dawn
PARSEC5\Don
PARSEC5\SqlAdmin
PARSEC5\SQLServer2005MSFTEUser$PARSEC5$MSSQLSERVER
PARSEC5\SQLServer2005MSSQLUser$PARSEC5$MSSQLSERVER
PARSEC5\SQLServer2005SQLAgentUser$PARSEC5$MSSQLSERVER
sa
PS C:\demos>

If you look about half-way down the list you will see Dawn, Don and SqlAdmin. Of course we can refine this a bit more to list only the logins we are interested in.

PS C:\demos> $server.logins | 
    ?{$_.name -like "*\Dawn" -or $_.name -like "*\Don"
    -or $_.name -like "*\SqlAdmin"} | %{$_.name}
PARSEC5\Dawn
PARSEC5\Don
PARSEC5\SqlAdmin
PS C:\demos>

Here we use the logical -or operator and the -like pattern matching operator to filter out the logins to just the standard ones we use. In fact we should make add this to a library of functions we use when we build test databases.

function global:Get-StandardTestLogins
([Microsoft.SqlServer.Management.Smo.Server]$server)
{
$server.logins | 
    ?{$_.name -like "*\Dawn" -or 
    $_.name -like "*\Don" -or
    $_.name -like "*\SqlAdmin"}
}

The Get-StandardTestLogin function is a bit different from the ad hoc script we put together; It outputs a login object instead of just a name. Note that it uses a typed parameter for input, it requries a Server as input parameter. We can still use it to get the list of names though, even though it outputs login objects.

PS C:\demos> StandardTestLogins $server | %{$_.name}
PARSEC5\Dawn
PARSEC5\Don
PARSEC5\SqlAdmin
PS C:\demos>

If it turns out our test logins are not on the system we could use PowerShell to add them. PowerShell integrates support for WMI, Window Management Instrumentation, and is a conventional way to script new users into a Windows system or enterprise. We are not going to cover those features in this blog article though.

There is one last thing we have to check for out test logins. As the name implies SqlAdmin is supposed to be in the sysadmin role for SqlServer. That’s straight forward to check.

PS C:\demos> $server.logins["PARSEC5\SqlAdmin"].IsMember("sysadmin")
True
PS C:\demos>

Looks like we are good to go for our test logins.

Now that we know that our standard test logins are there, lets add the corresponding users to our test database.

PS C:\demos> foreach ($login in StandardTestLogins $server)
>> {
>> $user = SMO_User $testdb $login.name
>> $user.login = $login.name
>> $user.Create()
>> }
>>
PS C:\demos> $testdb.users | %{$_.name}
dbo
guest
INFORMATION_SCHEMA
PARSEC5\Dawn
PARSEC5\Don
PARSEC5\SqlAdmin
sys
PS C:\demos>

We use the PowerShell foreach command to iterate through each of our test logins. We then make a new user, $user, in the $testdb with a name the same as the login name, which is a pretty typical way to add users to a database. Then we fill out the login property of $user with the corresponding login name. Last we call the Create() method on $user. A SMO user object, like just about all new objects in SMO, do not exist in the database until after the Create() method has been called on them.

We confirm that our test users were added by listing the names of the users in $testdb.

One of the nice things about having the Get-StandardTestLogins is that we can use it to create the users for our test databases. This allows us to to keep track of out standard test logins in one place, we don’t need to constantly copy the list of them everywhere we need them.

We should turn this foreach loop into a function so we can re-use for future test databases.

function global:new-StandardTestUsers
([Microsoft.SqlServer.Management.Smo.Database]$database)
{
foreach ($login in StandardTestLogins $database.Parent)
{
$user = SMO_User $database $login.name
$user.login = $login.name
$user.Create()
}
}

Here the new-StandardTestUsers functions requires that a database be passed into it. The body of the function is the same as the ad hoc script we wrote except that the reference to the server is gotten from the $database itself. Now, as you can see below, we just pass in a reference to our test database to the new-StandardTestUsers function to add all of our test users.

PS C:\demos> new-StandardTestUsers $testdb
PS C:\demos>

Now to finish out our test database we will add a table using SMO objects, then populate the table with some random data. The table will have an order number, a customer name and value column. The order number column will be the PRIMARY KEY.

First of all we need to make a table. In case you don’t remember to parameters to construct a table the get-SMO_ctors function will remind you.

PS C:\demos> SMO_ctors (SMOT_Table)
Table()
Table(Database database, String name)
Table(Database database, String name, String schema)
PS C:\demos>

We are not going to work with database schemas in this blog article, I’ll save that for a later one. We’ll use the second constructor.

PS C:\demos> $orders = SMO_Table $testdb "Orders"
PS C:\demos>

Now that we have a table we need to create the columns for it. Again get-SMOctors can be used to find out what parameters we need to pass to the get-SMOColumn function.

PS C:\demos> SMO_ctors (SMOT_column)
Column()
Column(SqlSmoObject parent, String name)
Column(SqlSmoObject parent, String name, DataType dataType)
PS C:\demos>

We can create a column, specify it name and type in one operation. We will need to make a DataType, so let’s check what the constructor options are for it.

PS C:\demos> smo_ctors (SMOT_DataType)
DataType()
DataType(SqlDataType sqlDataType)
DataType(SqlDataType sqlDataType, Int32 precisionOrMaxLength)
DataType(SqlDataType sqlDataType, Int32 precision, Int32 scale)
DataType(SqlDataType sqlDataType, String type)
DataType(SqlDataType sqlDataType, String type, String schema)
DataType(XmlSchemaCollection xmlSchemaCollection)
DataType(UserDefinedDataType userDefinedDataType)
DataType(UserDefinedType userDefinedType)

This, in turn requires us to make a SqlDataType, which is an enum. We can look to see what the possible enumerated values are for this too.

PS C:\demos> SMO_enum (SMOT_SqlDataType)
value__
None
BigInt
Binary
Bit
Char
DateTime
Decimal
Float
Image
Int
Money
NChar
NText
NVarChar
NVarCharMax
Real
SmallDateTime
SmallInt
SmallMoney
Text
Timestamp
TinyInt
UniqueIdentifier
UserDefinedDataType
UserDefinedType
VarBinary
VarBinaryMax
VarChar
VarCharMax
Variant
Xml
SysName
Numeric
PS C:\demos>

No big suprise here, it has all the SQL datatypes were are use to using in SQL Server. Now we can make some columns and add them to our $orders table.

PS C:\demos> $orders = SMO_Table $testdb "Orders"
PS C:\demos> $order_number = SMO_Column $orders "Order Number"      
    (SMO_DataType "Int")
PS C:\demos> $order_number = SMO_Column $orders "Order Number" 
    (SMO_DataType "Int")
PS C:\demos> $orders.Columns.Add($order_number)
PS C:\demos> $orders.Columns.Add($customer_name)
PS C:\demos> $value = SMO_Column $orders "Value"
    (SMO_DataType "Money")
PS C:\demos> $orders.Columns.Add($value)

First we create a table named “Orders”. A reference to the table is in variable $orders. Each column is made using the SMOColumn function. Note that the datatype for the column is defined using the SMODataType function.

Now that we have our table we need to make the $order_number column a primary key.

PS C:\demos> $pk = SMO_Index $orders "Orders__PK"
PS C:\demos> smo_enum (SMOT_IndexKeyType)
value__
None
DriPrimaryKey
DriUniqueKey
PS C:\demos> $pk.IndexKeyType=(SMOT_IndexKeyType)::DriPrimaryKey
PS C:\demos> $inx_col = SMO_IndexedColumn $pk "Order Number"
PS C:\demos> $pk.IndexedColumns.Add($inx_col)
PS C:\demos> $orders.Indexes.Add($pk)
PS C:\demos> $orders.Create()
PS C:\demos>

The index takes a little more work. First we make an SMOIndex.and name it “OrdersPK”. An index might include more than one column and those columns are specified in SMOIndexedColumn objects. We need to make an SMOIndexed column for each column in the index. This index is only a single column, however, so we only need to create a single SMOIndexedColumn. The SMOIndexedColumn is added to the index, $pk'. Once we have added the SMO_IndexedColumn to the SMO_Index we can add the index to the$orders` table.

Lastly we call the Create() method on $orders to make the table on the server.

Now we can add some data to the table. We can make use of the System.Random class from the .NET framework to generate some of the data. There aren’t any “row” objects for tables in SMO. We just use standard T-SQL to do inserts to add data to a table.

To fill out our table we need to produce some triples that consist of an order number, customer name, and a value. We have 30 customers with 1000 orders whose value ranges from 1 to 1000.00. With PowerShell we can use a pipeline to generate triples. Here is a simple example that generates 10 random triples.

PS C:\demos> (1..10) | %{"{0} : {1} : {2}" -f $_, 
    $rand.next(1,10), $rand.next()}
1 : 5 : 2119806400
2 : 6 : 1944455664
3 : 7 : 323953648
4 : 9 : 1680451911
5 : 4 : 820794382
6 : 4 : 1793724046
7 : 8 : 1206229307
8 : 5 : 1466323995
9 : 6 : 1123175179
10 : 1 : 1559943559
PS C:\demos>

The first part of the pipeline is an array, initialized with values from 1 to 10. The second part of the pipeline formats the numbers from the array along with some random numbers. Now we will use this technique generate the T-SQL insert statments. To get started let’s make a helper function that will create the insert statement from three input parameters.

PS C:\demos> function get-insert
>> ($number, $customer, $value)
>> {
>> "INSERT INTO Orders Values ({0}, '{1}', {2})" -f
>> $number, $customer, $value
>> }
>>
PS C:\demos> insert 1 "joe" 102.32
INSERT INTO Orders Values (1, 'joe', 102.32)
PS C:\demos>

The get-insert function uses the -f operator to build an insert statement from the three parameters passed into it.

PS C:\demos> (1..10) |
    %{insert $_ ("Name_" + $rand.next(1,30)) 
    ($rand.next(1,1000000)/100.1)}
INSERT INTO Orders Values (1, 'Name_15', 1127.23276723277)
INSERT INTO Orders Values (2, 'Name_12', 9753.04695304695)
INSERT INTO Orders Values (3, 'Name_23', 9560.27972027972)
INSERT INTO Orders Values (4, 'Name_15', 2843.98601398601)
INSERT INTO Orders Values (5, 'Name_13', 2798.22177822178)
INSERT INTO Orders Values (6, 'Name_13', 4531.40859140859)
INSERT INTO Orders Values (7, 'Name_11', 4825.18481518482)
INSERT INTO Orders Values (8, 'Name_26', 8539.9000999001)
INSERT INTO Orders Values (9, 'Name_22', 9863.88611388611)
INSERT INTO Orders Values (10, 'Name_10', 4556.79320679321)
PS C:\demos>

Here we have used the get-insert function generate ten insert’s. Next we need a way to execute those insert statements. A SMO database object has a number methods that can execute a SQL statement, much like a SqlCommand does in ADO.NET. In this case we will use the ExecuteNonQuery method of $database to insert 1000 orders.

PS C:\demos> (1..1000) | 
    %{insert $_ ("Name_" + $rand.next(1,30))
    ($rand.next(1,1000000)/100.1)} | 
    %{$testdb.ExecuteNonQuery($_)}
PS C:\demos>

We put an array with a thousand numbers into a pipeline, each number is used as an order number. The customer name is just the string “name_” with a suffix that is a random number from 1 to 30. And then for the value we generate a random number. These constructed values are passed into the get-insert function to make an insert statement. The insert statement is then used a the parameter to the ExectureNonQuery method on $testdb.

We can see the row count of the table is now 1000.

PS C:\demos> $orders.refresh()
PS C:\demos> $orders.rowcount
1000
PS C:\demos>

Note that before we check the rowcount we refresh the $orders table.

To finish up we will make use of the fact that can also execute a select statement against the Orders database.

PS C:\demos> $dataset = $testdb.ExecuteWithResults(
    "SELECT * FROM Orders")
PS C:\demos> $table = $dataset.tables[0]
PS C:\demos> $table.Rows | format-table -autosize
Order Number Customer Name     Value
------------ -------------     -----
           1 Name_29       1837.6124
           2 Name_27       3723.3666
           3 Name_1        3221.3886
...

The ExecuteWithResults method of $database returns an ADO Dataset. We can extract the rows from the first table in the Dataset and pipe them into a PowerShell format-table command to see what they contain.

We have seen that we can use PowerSMO! to create a test database, add tables to it and then fill these tables with some random data. Along the way we have seen how to use PowerSMO! to do a number of typical database operation like checking out what logins exist or adding users to database.

Well that’s it for this article, there will be more later.

Dan dan@pluralsight.com

posted @ 8:33 PM | Feedback (2)

Tuesday, November 07, 2006

Last year I wrote a blog article about using what was then called MSH with SQL Server Management Objects http://pluralsight.com/blogs/dan/archive/2005/12/29/17703.aspx. MSH is now called PowerShell and mixing some SMO with it makes PowerSMO!

SMO is a set of object models for SQL Server. With PowerSMO! you can manipulate those object models from the command line or with a script. The two object models probably of most interest to SQL Server developers and DBA’s are the Server and ManagedComputer object models and that’s what this article is going to use them to show how to use PowerSMO!.

This article assumes you are reasonably familiar with SQL Server and have some familiarity with PowerShell. You can look at some of the previous articles I have blogged about PowerShell to get familiar with PowerShell.

There are a number of reasons for using PowerSMO! You can script the management, testing, status, etc. of database objects. PowerSMO! also has pedagogical value too. I’ve been using PowerShell + SMO in the classes I teach because it provides such a quick way too cobble up ad hoc examples. It’s useful in development for the same reason, you can quickly try something out before you commit the time to edit/compile/test a C# program.

To use PowerSMO! you need to start up PowerShell then run the InitPowerSMO.ps1 script. This script is available at http://www.pluralsight.com/dan/samples/InitPowerSMO.zip.

Windows PowerShell
Copyright (C) 2006 Microsoft Corporation. All rights reserved.
PS C:\demos> . "C:\demos\InitPowerSMO.ps1"
PS C:\demos>

This script loads the SMO assemblies into PowerShell and a number of other things that we will look at later.

Let’s start with some basics. The new-object cmd-let is used to make an instance of a .NET class. SMO has a class named ManagedComputer. An instance of ManagedComputer represents the root of one of the object models in SMO, the one that is used to manage the services provided by SQL Server, such as the database engine or full-text search. BTW, watch out for the line-wraps in the examples that follow.

PS C:\demos> $mc = new-object "Microsoft.SqlServer.Management.Smo.Wmi.ManagedComputer"
PS C:\demos> $mc.services | %{$_.name}
MSFTESQL
MSSQLSERVER
SQLBrowser
SQLSERVERAGENT
PS C:\demos>

Here we used an instance of the ManagedComputer class find the SQL Server services on the local computer. We piped these services to a script that prints out the name of the service. Looks like a pretty typical SQL Server installation.

This is handy but, as you can see, it is a bit tedious dealing with the full name of the class that new-object requires. PowerShell deals with this issue in a number of places by using what it calls a “type accelerator” or “type alias”.

For example [wmi] is a type accelerator for System.Management.ManagementObject and you can use it instead of the fully spelled out class name when you need to specify that type. Unfortunately there is no provision in PowerShell for creating your own type aliases. PowerShell supports user defined aliases for cmdlets, functions, etc. but these are not useful as type accelerators.

To save on typing PowerSMO! creates helper functions that can be used to create instances of the classes used by SMO. One of the reasons you see a delay when you run the initPowerSMO.ps1 script is that it has to create all these functions. Below is a script that uses one of these functions and duplicates the operation of the previous example.

PS C:\demos> $mc = Get-SMO_ManagedComputer
PS C:\demos> $mc.Services | %{$_.name}
MSFTESQL
MSSQLSERVER
SQLBrowser
SQLSERVERAGENT
PS C:\demos>

All of these helper functions have a preface of “Get-SMO_” followed by the name of the class that they create.

All that the Get-SMO_ helper functions and new-object are doing is running the constructor for the class requested. In the previous two samples we used the parameter-less constructor to make a ManagedComputer object. When you do this the managed computer is the one PowerShell is running on.

One of the nice things about PowerShell is that you can always ask what kinds of things an object can do by piping it to the get-member cmdlet. For example:

PS C:\demos> $mc | get-member

   TypeName: Microsoft.SqlServer.Management.Smo.Wmi.ManagedComputer
Name                   MemberType Definition
----                   ---------- ----------
Equals                 Method     System.Boolean Equals(Object obj)
GetHashCode            Method     System.Int32 GetHashCode()
GetSmoObject           Method     Microsoft.SqlServer.Management.Smo.Wmi.Wmi...
...

PowerShell does not have the same level of builtin “What can you do for me?” capability for the constructors of .NET classes. .NET classes often have a number of constructors to choose from and it is often hard to remember them all. PowerSMO! has some helper functions to make it easier to see what constructors there are for a class.

PS C:\demos> Get-SMO_ctors (Get-SMOT_ManagedComputer)
ManagedComputer()
ManagedComputer(String machineName)
ManagedComputer(String machineName, String userName, String password)
ManagedComputer(String machineName, String userName, String password, 
ProviderArchitecture providerArchitecture)
PS C:\demos>

The Get-SMO_ctors function lists the constructors for the type passed, in parentheses to it. For every Get-SMO_ function there is a Get-SMOT_ that returns its type definition. The Get-SMO_ctors function requires a type definition, not the function that makes an instance of the type. Actually the Get-SMO_ctors function will work for any .NET type definition.

PS C:\demos> Get-SMO_ctors ([System.String])
String(Char* value)
String(Char* value, Int32 startIndex, Int32 length)
String(SByte* value)
String(SByte* value, Int32 startIndex, Int32 length)
String(SByte* value, Int32 startIndex, Int32 length, Encoding enc)
String(Char[] value, Int32 startIndex, Int32 length)
String(Char[] value)
String(Char c, Int32 count)
PS C:\demos>

The example above shows using the Get-SMO_ctors function to find the various constructors for System.String class from .NET.

All of the Get-SMO_ function will accept parameters for a constructor. You pass the parameters on the command line as you would for any other PowerShell function.

PS C:\demos> $mc = Get-SMO_ManagedComputer "PARSEC5"
PS C:\demos>

In the example above $mc is attached to a computer name “PARSEC5”.

Some constructors you will want to use require multiple parameters. The ServerConnection class serves a purpose similar to that for a SqlConnection in ADO.NET

PS C:\demos> Get-SMO_ctors (Get-SMOT_ServerConnection)
ServerConnection(SqlConnectionInfo sci)
ServerConnection(SqlConnection sqlConnection)
ServerConnection()
ServerConnection(String serverInstance)
ServerConnection(String serverInstance, String userName, String password)
ServerConnection(String serverInstance, String userName, SecureString password)
PS C:\demos>

Here you can see that there is a constructor that lets you can create a ServerConnection using a SQL login. It needs the name and password for that login.

PS C:\demos> $sc = Get-SMO_ServerConnection "PARSEC5" "ambler" "Ambler"
PS C:\demos> $sc.ConnectionString
server='PARSEC5';uid='ambler';password='Ambler';
multipleactiveresultset =false
PS C:\demos>

As you can see the ServerConnection just uses the user name and password to build the connection string it will need to connect to SQL Server.

A note about the Get-SMO_ function that is a bit different than using the new-object cmdlet. The constructor parameters for the Get-SMO_ functions are separated by spaces, unlike passing constructor parameters to the new-object cmdlet which requires them to be separted by commas.

Ok, let’s use PowerSMO! to create a database. To do this we will need to make use of two classes of SMO object, a Server and a Database. We will start by looking at the constructors for each.

PS C:\demos> Get-SMO_ctors (Get-SMOT_Server)
Server(ServerConnection serverConnection)
Server(String name)
Server()
PS C:\demos> Get-SMO_ctors(Get-SMOT_Database)
Database()
Database(Server server, String name)
PS C:\demos>

We see that to construct a Server can pass in the name of the instance of SQL Server we want to use, and for the Database we will have to pass in a reference to the Server and a the name we want for the new database.

PS C:\demos> $s = Get-SMO_Server "PARSEC5"
PS C:\demos> $db = Get-SMO_Database $s "DBDemoSample"
PS C:\demos> $db.create()
PS C:\demos> $s.Databases["DBDemoSample"].Name
DBDemoSample
PS C:\demos>

Here we have made the Server object for the default SQL Server instance on machine “PARSEC5”. Next we make the Database object by passing the variable that holds the Server object into the database constructor. Just to confirm it actually worked we look at the databases collection in the server and see if we can find a database named “DBDemoSample” and then print out its name. If there was no database named “DBDemoSample” this line would not have printed out anything.

Now let’s do a little database management. Typically when a database is created its recovery model is “Full” This is going to be a development database so we want to change its recovery mode to “Simple”. We will start by confirming our suspicions about its recovery mode.

PS C:\demos> $db = $s.DataBases["DBDemoSample"]
PS C:\demos> $db.DatabaseOptions.RecoveryModel
Full
PS C:\demos>

We use $s to get a reference to the “DBDemoSample” database. From $db we can look at the recovery model in database options and see that, in fact, it is “Full”.

The value of RecoverModel is one of the enumerated values from the Microsoft.SqlServer.Management.Smo.RecoveryModel enum. You can always find the possible values of an enum by making use of .NET reflection in PowerShell.

PS C:\demos> [Microsoft.SqlServer.Management.Smo.RecoveryModel].GetFields() | %{
$_.Name}
value__
Simple
BulkLogged
Full
PS C:\demos>

Again, similar to using new-object, you must type out the entire class name to find the enumerated values of an enum. PowerSMO! provides another helper function for finding the enumerated values.

PS C:\demos> Get-SMO_enum (Get-SMOT_RecoveryModel)
value__
Simple
BulkLogged
Full

The Get-SMO_emum function finds the enumerated values of an enum. It is similar to the Get-SMO_ctors function in that it requires a class definition, not the function that makes an instance. In this example we use the Get-SMOT_RecoveryModel to get the class definition.

Using an enumerated value once you know what it is, is fairly easy. You can just set it as though is were a string value and PowerShell will convert it to the appropriate enumeration.

PS C:\demos> $db.DatabaseOptions.RecoveryModel="Simple"
PS C:\demos> $db.Alter()
PS C:\demos>

Setting a value in a field of a SMO object does not affect the database behind it until you call the Alter() function. After the Alter() function completes the database has the new recovery mode.

Note you need not worry about setting an improper enumeration value, PowerShell will check that for you, as you can see below.

PS C:\demos> $db.DatabaseOptions.RecoveryModel = "Fullx"
Exception setting "RecoveryModel": "Cannot convert value "Fullx" to type "Micro
soft.SqlServer.Management.Smo.RecoveryModel" due to invalid enumeration values.
 Specify one of the following enumeration values and try again. The possible en
umeration values are "Full, BulkLogged, Simple"."
At line:1 char:21
+ $db.DatabaseOptions.R <<<< ecoveryModel = "Fullx"
PS C:\demos>

Now you have seen PowerSMO!. It consists of functions, prefaced by Get-SMO_ that you can use to create instances of SMO objects. It also includes Get-SMOT_ functions that you can use to retrieve the type definition for a particular kind of SMO object. The Get-SMO_ctors function gives you a list of the constructors for a class and the Get-SMO_enum function give you a list of the values for an enum.

If you take a peek inside of the initPowerSMO.ps1 file, except for the Get-SMO_enum and Get-SMO_ctors functions, you won’t see any of the functions we have used in this article. That’s because the initPowerSMO.ps1 script generates these functions from the assemblies that makeup SMO. How it does this will be the topic of a later blog article.

Dan

posted @ 1:55 PM | Feedback (10)

Monday, October 30, 2006

PowerShell has builtin support for XML, but the System.Xml namespace offers many additional capabilites for processing XML. This article looks at using System.Xml in PowerShell. This article assumes you know some of the basics of PowerShell programming and are familiar with the System.Xml namespace in .NET.

First of all the [xml] data type variable in PowerShell is an instance of an XmlDocument. Typically an [xml] variable is used by assigning a variable to it. For example:

PS C:\demos> [xml]$xmldata = "<order><line price='100' qty='3'>hammer</line></order>"

We are going to look at a different way to load XML into an [xml] variable.

PS C:\demos> $xmldata = new-object "System.Xml.XmlDocument"
PS C:\demos> $xmldata.LoadXml("<order><line price='100' qty='3'>hammer</line></order>")
PS C:\demos>

This way of initializing a variable produces the same result as assigning a string to an [xml] variable, but does not create a new instance of an XmlDocument. We don’t want to create a new instance of an XmlDocument so we can leverage XPathNavigators and XPathExpressions but that will become evident later.

You can get back the XML in text form by using the get_InnerXml() method.

PS C:\demos> $xmldata.get_InnerXml()
<order><line price="100" qty="3">hammer</line></order>
PS C:\demos>

If you were programming in C# you would use the InnerXml property of the $xmldata variable to retrieve the InnerXml, but you must use the underlying get_InnerXml method in PowerShell to do the same thing. This will be the case for the other properties in XmlDocument too, the properties are not available by just using their name as the are in C#.

The root of an XML document is called the DocumentElement and for our document the name the root element is “order”. You can get a reference to it through the DocumentElement property and use its Name property to find its name.

PS C:\demos> $xmldata.get_DocumentElement().get_Name()
order
PS C:\demos>

Here we used the get_DocumentElement method to get the DocumentElement and the get_Name method to get its name.

You can modify an XmlDocument by adding or removing XML nodes. A node is part of XML, for example an element is a node as is an attribute. Let’s add another line to our order.

PS C:\demos> $line = $xmldata.CreateElement("line")
PS C:\demos> $line.SetAttribute("price", 23)
PS C:\demos> $line.SetAttribute("qty", 4)
PS C:\demos> $line.set_InnerText("nail")
PS C:\demos> $d =$xmldata.get_DocumentElement().AppendChild($line)
PS C:\demos> $xmldata.get_InnerXml()
<order><line price="100" qty="3">hammer</line><line price="23" qty="4">nail</line></order>
PS C:\demos>

Elements for an XmlDocument are not created using a constructor. Instead a technique called a factory method is used to create them. This is typical of almost all XML processors on any platform. An XmlDocument contains factory methods to create the various kinds of nodes you find in an XmlDocument. To create a new line element we use the XmlDocument.CreateElement factory method.

An XmlElement has a SetAttibute that is used to add attibutes to that element. We use the SetAttribute to add a “price” and “qty” attibute to the line element we created. SetAttribute is really a shortcut method. We could use CreateAttribute and SetAttributeNode instead, but SetAttribute is more straightforward.

The content of the new line element is set using the set_InnerText method. Again, if you were programming in C# you would assigning the InnerText property a value, but for an XmlDocument you must use the set_InnerText method. This will be true for the other assignable properties in XmlDocument too.

Creating a element using a factory method does not add that element to the document. We use the AppendChild method of the DocumentElement of the document to add the line. Append child always makes the added element the last child. PrependElement will also add an element but will make it the first child element. AppendChild always returns a reference to the element that was appended. To prevent that returned value from “leaking” out of the script we capture it in the dummy $d variable.

Last we use the get_InnerXml method of the XmlDocument see that we have in fact added a new line element to the document.

Now that we have a document let’s do some processing of it. Each line has a price and qty attribute and the product of these two attributes is called the extended price. The value of an order is the sum of all of the extended prices in it. So let’s calculate the value of an order. In this example we are going to use the native [xml] support built into PowerShell.

To start with let’s just calculate the extended prices.

PS C:\demos> $xmldata.order.line | %{$_.price * $_.qty}
100100100
23232323
PS C:\demos>

Somehow the results don’t really look correct. Data in an XML document my be untyped or strongly typed. Untyped doesn’t really mean the data doesn’t have a type, it just means that each piece of data is considered to be a string even if it looks like a number. Strongly typed XML is produced by validating an XML document against an XML Schema. In this case the types of the pieces of data are known because they are defined in the XML Schema. Sometimes a validated XML document is called the Post Schema Validation Instance or PSVI.

Our $xmldata XML document is untyped so the price and quantity are considered to be strings. When the ‘*’ operator is used with strings the string on its right is converted to an integer, or produces an error if it cannot be converted. The value of this integer is used to replicate and concatonate the string on the left of the operator. That is why we see 100 repeated three times, the value of qty for the first line is “3”.

We have to cast the price to a double to get what we want.

PS C:\demos> $xmldata.order.line | %{[double]$_.price * $_.qty}
300
92
PS C:\demos>

Now we can use the sum function example from my previous blog article, PowerShell and XML and SQL Server, to find the value of the order.

PS C:\demos> function sumOrder {
>> begin {$value = 0}
>> process { $value += [double]$_.price * $_.qty}
>> end {$value}
>> }
>>
PS C:\demos> $xmldata.order.line | sumOrder
392
PS C:\demos>

We can use the SelectNodes method of XmlDocument to get the same result. It makes use of an XPath expression which is a bit more flexible, though more complicated, than the dotted syntax that PowerShell provides for [xml] variables.

PS C:\demos> $xmldata.SelectNodes("//line") | sumOrder
392
PS C:\demos>

The SelectNodes method returns a set of XML nodes from a document that meet some criterion specified by the XPath expression. This particular XPath expression returns all the line elements in the document. When a set of XML nodes is put into a pipeline PowerShell passes each one of the nodes one at a time into the pipe.

One of the nice things about using XPath is that you can bury a lot of selection logic right into the XPath expression. What if we want to know the value of only the expensive items in our order? Our definition of expensive is when the price is more than 99.

PS C:\demos> $xmldata.SelectNodes("//line[@price>99]") | sumOrder
300
PS C:\demos>

Here our XPath expression has a predicate, “[@price>99]”, that filters out any lines whose value is 99 or less. The following is the equivalent using the XML capabilities built into PowerShell.

PS C:\demos> $xmldata.order.line | ?{[double]$_.price -gt 99} | sumOrder
300
PS C:\demos>

Notice that in this case it was important to cast the price as a [double] otherwise PowerShell would have taken the filter to check to see if the price lexically sorted after the string “99”.

An important thing about XPath is its universality. The XPath expression we have used in this example, technically called LocationPath, is a criterion for selection. Virtually every language and every platform supports XPath. You can pass the XPath expression we used in this example, “line[@price>99}”, to almost any other program and it will select the same lines for processing as we did in this example.

You have to be careful reading XML. For example below is an XML file that is encoded as big endian UTF-16. You can’t see the actual encoding on this page but you can download this test file from http://www.pluralsight.com/dan/samples/PSXml.zip if you want to try it out.

<?xml version="1.0" encoding="UTF-16BE"?>
<Test/>

The get-content command is a way to read the content of a file. For example you might try to read the sample file into a builtin [xml] datatype in PowerShell like this:

PS C:\demos> [xml]$x = get-content "c:\demos\testdocs\test.xml"
Cannot convert value "System.Object[]" to type "System.Xml.XmlDocument". Error:
 "Root element is missing."
At line:1 char:8
+ [xml]$x  <<<< = get-content "c:\demos\testdocs\test.xml"
PS C:\demos>

What PowerShell is doing here is reading in test.xml as string, then assigning that string to then $x variable. Unfortunately when it does this it has to make a guess about the encoding of the file because I didn’t tell it what it was and it guessed wrong. In fact if I just ask it to read the file and tell it I just don’t know the encoding it will generate a lot of unknown characters because of an incorrect guess about the encoding of the file.

PS C:\demos> get-content "c:\demos\testdocs\test.xml" -encoding Unknown
??????????????????????????????????????????????????????
PS C:\demos>

However I happen to know the encoding for the file, as I said it big endian UTF-16, so I can do this:

PS C:\demos> [xml]$x = get-content "c:\demos\testdocs\test.xml" -encoding BigEndianUnicode
PS C:\demos> $x.get_InnerXml()
<?xml version="1.0" encoding="UTF-16BE"?><Test />
PS C:\demos>

Now we are able to read our big endian UTF-16 file. But this defeats one of the most important features of XML; You can read an XML file without knowing the encoding.

Fortunately because PowerShell supports all of the .NET framework we can get around this problem and read any XML file that the underlying .NET Framework can handle without knowing its encoding.

PS C:\demos> $doc = new-object "System.Xml.XmlDocument"
PS C:\demos> $doc.Load($filePath)
PS C:\demos> $doc.get_InnerXml()
<?xml version="1.0" encoding="UTF-16BE"?><Test />
PS C:\demos>

Here we initialize $doc as an XmlDocument then use the Load function an XmlDocument to load in the file. The argument for Load is a string that can be either a file path or a URL. This is the recommended way to load an XML document into a variable because you shouldn’t depend on knowing what encoding of an XML document is.

Now that we can read XML lets process some XML files. Microsoft Word 2003 can be saved as XML. We have a few files that have been saved this way.

PS C:\demos> dir c:\demos\testdocs\*.xml
    Directory: Microsoft.PowerShell.Core\FileSystem::C:\demos\testdocs
Mode                LastWriteTime     Length Name
----                -------------     ------ ----
-a---        10/30/2006  10:29 AM        108 betest.xml
-a---        10/30/2006   9:29 AM      24982 Test Document 1.xml
-a---        10/30/2006   9:33 AM      29195 Test Document 2.xml
-a---        10/30/2006   9:30 AM      25105 Test Document 3.xml
-a---        10/30/2006  10:05 AM        108 test.xml
PS C:\demos>

Actually some of the files in this directory are not Office documents, so we need a way to distinguish them. All Office XML files have on thing in common, they start with something called a processing instruction that looks like:

<?mso-application progid="Word.Document"?>

Let’s build a filter that will skip over the files that are not Word documents.

PS C:\demos> dir c:\demos\testdocs\*.xml | ?{$x = new-object "System.Xml.XmlDocument";
>> $x.Load($_.FullName);
>> $x.SelectSingleNode("processing-instruction('mso-application')")}
>>
    Directory: Microsoft.PowerShell.Core\FileSystem::C:\demos\testdocs
Mode                LastWriteTime     Length Name
----                -------------     ------ ----
-a---        10/30/2006   4:29 PM      32448 Test Document 1.xml
-a---        10/30/2006   4:11 PM      29935 Test Document 2.xml
-a---        10/30/2006   9:30 AM      25105 Test Document 3.xml
PS C:\demos>

Here we use the full path name for each file to load an XmlDocument. Then we use the SelectSingleNode method of XmlDocument to see if we can find the processing instruction we are looking for. If the processing function isn’t found then the file name is not passed out of the pipe so it does not get listed.

The XPath expression we used was bit more complicated than the first one we tried. If you are interested in an interactive tool for working with XPath you can download Aaron Skonnard’s XPath expression builder from http://www.pluralsight.com/toolcontent/xpath-expression-builder-4.zip. Also these test documents can be found at http://www.pluralsight.com/dan/samples/PSXml.zip.

Lastly we would like to show that these Word documents have been processed by PowerShell. After you open a document in Word if you go to File->Properties->Custom you will see that you can add custom properties of you own design to a word document. We would like to add a PowerShell custom property that indicates when the document was processed by PowerShell. These properties are embeded into the XML for the Word document.

Another thing about Word documents that we haven’t you looked at is that they make heavy use of XML namespaces. So before we try anything with a complete Word document let’s look at simple document that has namespaces in it.

PS C:\demos> [xml]$x = '<w:wordDocument
>> xmlns:w="http://schemas.microsoft.com/office/word/2003/wordml"
>> xmlns:o="urn:schemas-microsoft-com:office:office">
>> <o:CustomDocumentProperties>
>> </o:CustomDocumentProperties>
>>  </w:wordDocument>'
>>
PS C:\demos> $x.get_InnerXml()
<w:wordDocument xmlns:w="http://schemas.microsoft.com/office/word/2003/wordml"
xmlns:o="urn:schemas-microsoft-com:office:office"><o:CustomDocumentProperties><
/o:CustomDocumentProperties></w:wordDocument>
PS C:\demos>

This document is a mini-Word document with all the things we don’t care about stripped out of it.

Custom properties for a Word document are contained in a CustomDocumentProperties element from the “urn:schemas-microsoft-com:office:office” namespace. If the Word document doesn’t have any custom properties it will not have this element. So we will need a way to check to see if that element in the document. Let’s test our mini-Word document to see verify we can find it.

PS C:\demos> $custDoc = "/*/*[local-name()='CustomDocumentProperties' and namespace-uri()='urn:schemas-microsoft-com:office:office']"
icrosoft-com:office:office']
PS C:\demos> $x | ?{$_.SelectSingleNode($custDoc)}
wordDocument
------------
wordDocument

Here we have made an XPath expression and saved it into a variable so we can easily reuse it. It looks for an element whose name is CustomDocumentProperties and is in the office namespace. We can use it in a simple filter test and see that our test document can get through the filter.

Next let’s look at adding the CustomDocumentProperites if it is not there. First of all we will need an element to add.

PS C:\demos> $props = $x.CreateElement("CustomDocumentProperties", $custDoc)
PS C:\demos> $props.get_OuterXml()
<CustomDocumentProperties xmlns="/*/*[local-name()='CustomDocumentProperties' and namespace-uri()='urn:schemas-microsoft-com:office:office']" />
PS C:\demos>

To create an element in a particular namespace we use the second parameter of the CreateElement method to specify the desired namespace. To check to see if we got what we wanted we use the OuterXml property… there is no InnerXml for an element with no content. Let’s make a test document without a CustomDocumentProperties and try adding this element.

PS C:\demos> $x2.get_InnerXml()
<w:wordDocument xmlns:w="http://schemas.microsoft.com/office/word/2003/wordml"/>
PS C:\demos> if ($x2.SelectSingleNode($custDoc)){}else
>> {
>> $props = $x2.CreateElement("CustomDocumentProperties", "urn:schemas-microsoft-com:office:office")
>> $x2.get_DocumentElement().AppendChild($props)
>> }
>>
PS C:\demos> $x2.get_InnerXml()
<w:wordDocument xmlns:w="http://schemas.microsoft.com/office/word/2003/wordml">
<CustomerDocumentProperties xmlns="urn:schemas-microsoft-com:office:office" /><
/w:wordDocument>
PS C:\demos>

Here we start off with a new XML document, $x2 that contains just a wordDocument. We use an if construct to test the $x2 to see if it contains CustomDocumentProperties element. If if it does not we create on and add it. Then we check to make sure the element was added.

This will be a useful for what we do next so let’s save it as a function.

function addCustomProps
{
$cust = $_.SelectSingleNode("/*/*[local-name()='CustomDocumentProperties' and namespace-uri()='urn:schemas-microsoft-com:office:office']")
if($cust){$cust}else{
$props = $_.CreateElement("CustomDocumentProperties", "urn:schemas-microsoft-com:office:office")
$_.get_DocumentElement().AppendChild($props)
}
}

Note that the addCustomProps function always returns a CustomDocumentProperties.

Now we have everthing we need to modify a Word document by adding a custom property to it.

PS C:\demos> $filePath = "C:\demos\testdocs\test document 1.xml"
PS C:\demos> $doc = new-object "System.Xml.XmlDocument"
PS C:\demos> $doc.Load($filePath)
PS C:\demos> $prop = $doc.CreateElement("PowerShell", "urn:schemas-microsoft-com
:office:office")
PS C:\demos> $prop.SetAttribute("dt", "uuid:C2F41010-65B3-11d1-A29F-00AA00C14882
", "string")
string
PS C:\demos> $prop.set_InnerText([System.DateTime]::Now)
PS C:\demos> $doc | %{addCustomProps} | %{$_.AppendChild($prop)}
dt                                      #text
--                                      -----
string                                  10/30/2006 16:07:57

PS C:\demos>
PS C:\demos> $doc.Save($filePath)

We start off by setting the $filePath variable to the path of a Word document. Next we load that Word document into the $doc variable.

We use the $doc variable to create a PowerShell element, and fill it out with the current time. We also add a dt attribute to specify that this is a string property and put the PowerShell element in the “urn:schemas-microsoft.com:office:office” namespace. Both of these are required for a custom property added to a Word document.

Finally we pass the $doc property through a pipeline our addCustomProps function. This function always returns the CustomDocumentProperty element so we can use the next segment of the pipeline to append our PowerShell property to it.

If you now open the “test document 1.xml” file in Word and navigate to its custom properties you will see that is now has a PowerShell property.

So we can use the full set of features available from the System.Xml namespace in .NET. The key to really making use of the is to become familiar with XPath. We really have just scratched the surface of its capabilities.

Dan

dan@pluralsight.com

posted @ 3:09 PM | Feedback (3)

Sunday, October 29, 2006

Lots of times when you are working with a SQL Server all you want to do is to poke at the data. You aren’t doing any real heavy duty transactional processing, you just want to get a look at what is going on. Of course you can fire up SSMS (SQL Server Management Studio) or SqlCmd and issue some ad hoc queries. Or you can copy all the tables you are interested in to a local instance of SQL Server on your system and do the poking there. Doing the latter is often times a lot more convienient but you need a copy of SQL Server locally and then you have to copy all the data, yadda, yadda, yadda.

Actually it’s pretty common to take the “work on the data locally” approach in C# applications. These applications use ADO.NET to build a DataSet in the application, then data bind, or add a little procedural programming or otherwise poke at it. But writing, compiling and running an ad hoc application each time you want to make a little change isn’t really practical. What does make this technique practical though is PowerShell.

In this article we’re going to look at the basics of using ADO.NET inside of PowerShell. This article assumes you know the basics of using ADO.NET in C#.

We will start with the DataSet. A DataSet is just a fancy container, just a bit more fancy than a hash table. It has tables and views and relations and queries sort of like a database does. You can use a DataSet without a database, but if you are going to use it with a database then the SqlDataAdapter is a handy way to make a snapshot of some data and save it into a dataset. Let’s start by using PowerShell to get a snapshot of the data in the Customers table of the Northwind database.

First of all we will need a query that returns all the data from the Customers table.

$query = "SELECT * FROM [Customers]"

We will also need a connection string. This one assumes a local SQL Server.

$connString = "server=.;integrated security;database=northwind"

Now we will need a DataSet to copy the customers data into. PowerShell does not initialize a variable just because it is declared so we will have to make an instance of the DataSet class.

$dataset = new-object "System.Data.DataSet" "MyDataSet"
PS C:\demos> $dataset
RemotingFormat          : Xml
SchemaSerializationMode : IncludeSchema
CaseSensitive           : False
DefaultViewManager      : {System.Data.DataViewManagerListItemTypeDescriptor}
EnforceConstraints      : True
DataSetName             : MyDataSet
Namespace               :
Prefix                  :
ExtendedProperties      : {}
HasErrors               : False
IsInitialized           : True
Locale                  : en-US
Site                    :
Relations               : {}
Tables                  : {}
Container               :
DesignMode              : False

We use the new-object command to make an instance of a DataSet. The second string passed to new-object is the name we want for the DataSet. Typing the name of a variable on the command line in PowerShell returns the properties of that object, in most cases. Here we see that the dataset has a DataSetName of “MyDataSet” as we wanted and that the Tables collection is empty.

Next we need a SqlDataAdapter so we can fill our dataset.

PS C:\demos> $da = new-object "System.Data.SqlClient.SqlDataAdapter" ($query, $connString)

Here we have used the new-object command again and passed into the SqlDataAdapter constructor both the query and the connection string. Note that when want to pass more than one parameter to a constructor for a .NET class in PowerShell, you must pass all the parameters in a single array of objects. That is why $query and $connString have been passed separated by commas and enclosed in parentheses. A comma separted list enclosed in parentheses in PowerShell creates an array of objects.

We can check to see that our DataAdapter was properly constructed by looking at the text for the SelectCommand and the ConnectionString for the connection the SelectCommand uses.

PS C:\demos> $da.SelectCommand.CommandText 
SELECT * FROM [Customers]
PS C:\demos> $da.SelectCommand.Connection.ConnectionString
server=.;integrated security=true;database=northwind

Now we have everything we need to get our snapshot.

PS C:\demos> $da.Fill($dataset)
92

All we did was to call the Fill method on $da and pass in our $dataset variable. Once we have done this we can take a look at what is changed about our dataset.

PS C:\demos> $dataset

RemotingFormat          : Xml
SchemaSerializationMode : IncludeSchema
CaseSensitive           : False
DefaultViewManager      : {System.Data.DataViewManagerListItemTypeDescriptor}
EnforceConstraints      : True
DataSetName             : MyDataSet
Namespace               :
Prefix                  :
ExtendedProperties      : {}
HasErrors               : False
IsInitialized           : True
Locale                  : en-US
Site                    :
Relations               : {}
Tables                  : {}
Container               :
DesignMode              : False

Nothing has really changed. Notice that the Table container is still empty. Don’t worry about this, is just an artifact of how PowerShell treats DataTables. If we ask how many DataTables are in $dataset we will see it contains one, as we would expect.

PS C:\demos> $dataset.Tables.Count
1
PS C:\demos>

It would be nice of the Tables container listed the names of the tables in the DataSet, but it doesn’t so we will have to write a little script to see what is there.

PS C:\demos> $dataset.Tables | %{$_.TableName}
Table

The Tables property of the $dataset is a container, so if we pass it into a pipe it sends each of its tables down the pipe one at a time. The script inside of %{} is executed once for each item that passes through the pipe and $_ is the item being passed. $_.TableName gets the name of each table that passes through the pipe. Of course there is only one in this case.

As expected, the first table that a DataAdapter loads into a DataSet is named “Table”. The Tables collection acts like an array so we can access the individual tables in the Tables collection and set their names to whatever we want.

PS C:\demos> $dataset.Tables[0].TableName = "Customers"
PS C:\demos> $dataset.Tables | %{$_.TableName}
Customers

If we want to make it a bit easier to work with the Customers table we can pull out a reference to if from the DataSet.

PS C:\demos> $customers = $dataset.Tables["Cu