About UsCommunityTrainingContent DevelopmentContact

Blogs
Pluralsight
Course Schedule
Scott Allen
Craig Andera
Mark Baciak
Don Box
Keith Brown
John CJ
Tim Ewald
Jon Fancey
Jon Flanders
Vijay Gajjala
Kirill Gavrylyuk
Ian Griffiths
Martin Gudgin
Jim Johnson
John Justice
Mike Henderson
Joe Hummel
Matt Milner
Ted Neward
Fritz Onion
Brian Randell
Jeffrey Schlimmer
Aaron Skonnard
Dan Sullivan
Herb Sutter
Doug Walter
Jim Wilson
Mike Woodring

My Links
Home
Contact
Login

Blog Stats
Posts - 170
Stories - 0
Comments - 1082
Trackbacks - 219

Inter alia
My book
My wiki
Speaking
Webcasts

Archives
May, 2008 (1)
Apr, 2008 (3)
Mar, 2008 (6)
Feb, 2008 (4)
Jan, 2008 (2)
Dec, 2007 (1)
Nov, 2007 (3)
Oct, 2007 (1)
Sep, 2007 (4)
Aug, 2007 (1)
Jul, 2007 (3)
Jun, 2007 (3)
May, 2007 (2)
Apr, 2007 (3)
Mar, 2007 (4)
Feb, 2007 (2)
Jan, 2007 (3)
Dec, 2006 (1)
Nov, 2006 (3)
Oct, 2006 (6)
Sep, 2006 (2)
Aug, 2006 (7)
Jul, 2006 (1)
Jun, 2006 (5)
May, 2006 (2)
Apr, 2006 (2)
Mar, 2006 (5)
Feb, 2006 (3)
Jan, 2006 (4)
Dec, 2005 (6)
Nov, 2005 (3)
Oct, 2005 (6)
Aug, 2005 (6)
Jul, 2005 (4)
Jun, 2005 (4)
May, 2005 (2)
Apr, 2005 (8)
Mar, 2005 (4)
Feb, 2005 (5)
Jan, 2005 (2)
Dec, 2004 (2)
Nov, 2004 (6)
Oct, 2004 (7)
Sep, 2004 (6)
Aug, 2004 (3)
Jul, 2004 (5)
Jun, 2004 (3)
May, 2004 (1)

Post Categories
Personal(rss)
Technical(rss)



I've been working on a project recently where I had the need to randomly shuffle all of the rows in a DataTable. I wanted to do it with the DataTable itself instead of in the act of populating the DataTable for a couple of reasons: 1) I wanted to keep the DataTable in memory and shuffle it in place multiple times without going back to the source, and 2) I had multiple sources where data was coming from (SQL and XML) so I preferred to keep the randomization logic in one place. I also didn't want to copy all of the data (even though it was not a large amount) each time I shuffled, so I decided to use a DataView to display the data shuffled each time I needed it.

Here's the utility function I came up with - each time you call RandomizeDataTable it will return a newly shuffled DataView of all the data passed in through the DataTable. Note that because I reuse the added column "rndSortId" each time, any DataViews retrieved from previous calls to the method will have the new shuffle order. You could change this behavior by adding a new column each time with its own unique sort sequence.

As always, comments/improvements welcome – enjoy!

public static class DataSetUtilities

{

static Random _rand = new Random();

 

public static DataView RandomizeDataTable(DataTable dt)

{

// Create array of indices and populate with ordinal values

int[] indices = new int[dt.Rows.Count];

for (int i = 0; i < indices.Length; i++)

indices[i] = i;

 

// Knuth-Fisher-Yates shuffle indices randomly

for (int i = indices.Length - 1; i > 0; i--)

{

int n = _rand.Next(i + 1);

int tmp = indices[i];

indices[i] = indices[n];

indices[n] = tmp;

}

 

// Add new column to data table (if it's not there already)

// to store shuffle index

if (dt.Columns["rndSortId"] == null)

dt.Columns.Add(new DataColumn("rndSortId", typeof(int)));

int rndSortColIdx = dt.Columns["rndSortId"].Ordinal;

for (int i = 0; i < dt.Rows.Count; i++)

dt.Rows[i][rndSortColIdx] = indices[i];

 

DataView dv = new DataView(dt);

dv.Sort = "rndSortId";

return dv;

}

}

posted on Wednesday, April 16, 2008 7:38 AM

  • # re: Randomizing rows in a DataTable
    Mike
    Posted @ 4/16/2008 8:49 AM
    Did you consider just assing a GUID to the rndSortId column and Sorting on that column?

    Because GUIDs are random you should get a fast, simple, and random sort.
  • # re: Randomizing rows in a DataTable
    Fritz Onion
    Posted @ 4/16/2008 8:57 AM
    Ha - great point Mike - I don't actually need to shuffle at all, I just need a decent distribution of random numbers in the shuffle column. That's why I post these things :)
    Here's what I'll do instead I think:
    dt.Rows[i][rndSortIdx] = _rand.Next(int.MaxValue);

    No need to waste the space of a full GUID really.
  • # re: Randomizing rows in a DataTable
    Jason Coyne
    Posted @ 4/16/2008 3:37 PM
    Coding Horror made an excellent post on this deceptively simple problem. Depending on why you want numbers to be random, the formula used to generate the random numbers is very important.

    http://codinghorror.com/blog/archives/001015.html

    The GUID might be better than just the rand.Next, especially since the GUIDs will never have the same number show up twice, but your solution I bet will end up with at least one duplicate number in a run of any given size, and then the sorting would default to some other order, that is probably deterministic. If the randomness is something like a gambling site, where having anything deterministic could be a security flaw, that would be a big issue.
  • # re: Randomizing rows in a DataTable
    Fritz Onion
    Posted @ 4/16/2008 6:28 PM
    Jason - that's a good point, and if you're using this technique for something where true randomness really matters, be advised. However, for my purposes the simplest solution suffices (I'm just re-ordering UI elements for presentation).
Title  
Name  
Url
Comments   
Please enter the code you see below. what's this?
This CAPTCHA image helps deter automated scripts that submit comment spam. In essence, it helps us determine that you are indeed a human instead of script.

 
   
 
© 2004 Pluralsight.
Visual Design by Studio Creativa
Privacy Policy