
I've been working on a project recently where I had the need to randomly shuffle all of the rows in a DataTable. I wanted to do it with the DataTable itself instead of in the act of populating the DataTable for a couple of reasons: 1) I wanted to keep the DataTable in memory and shuffle it in place multiple times without going back to the source, and 2) I had multiple sources where data was coming from (SQL and XML) so I preferred to keep the randomization logic in one place. I also didn't want to copy all of the data (even though it was not a large amount) each time I shuffled, so I decided to use a DataView to display the data shuffled each time I needed it.
Here's the utility function I came up with - each time you call RandomizeDataTable it will return a newly shuffled DataView of all the data passed in through the DataTable. Note that because I reuse the added column "rndSortId" each time, any DataViews retrieved from previous calls to the method will have the new shuffle order. You could change this behavior by adding a new column each time with its own unique sort sequence.
As always, comments/improvements welcome – enjoy!
public static
class
DataSetUtilities
{
static
Random _rand = new
Random();
public
static
DataView RandomizeDataTable(DataTable dt)
{
// Create array of indices and populate with ordinal values
int[] indices = new
int[dt.Rows.Count];
for (int i = 0; i < indices.Length; i++)
indices[i] = i;
// Knuth-Fisher-Yates shuffle indices randomly
for (int i = indices.Length - 1; i > 0; i--)
{
int n = _rand.Next(i + 1);
int tmp = indices[i];
indices[i] = indices[n];
indices[n] = tmp;
}
// Add new column to data table (if it's not there already)
// to store shuffle index
if (dt.Columns["rndSortId"] == null)
dt.Columns.Add(new
DataColumn("rndSortId", typeof(int)));
int rndSortColIdx = dt.Columns["rndSortId"].Ordinal;
for (int i = 0; i < dt.Rows.Count; i++)
dt.Rows[i][rndSortColIdx] = indices[i];
DataView dv = new
DataView(dt);
dv.Sort = "rndSortId";
return dv;
}
}
posted on Wednesday, April 16, 2008 7:38 AM
-
# re: Randomizing rows in a DataTable
Posted @ 4/16/2008 8:49 AM
Did you consider just assing a GUID to the rndSortId column and Sorting on that column?
Because GUIDs are random you should get a fast, simple, and random sort.
-
# re: Randomizing rows in a DataTable
Posted @ 4/16/2008 8:57 AM
Ha - great point Mike - I don't actually need to shuffle at all, I just need a decent distribution of random numbers in the shuffle column. That's why I post these things :)
Here's what I'll do instead I think:
dt.Rows[i][rndSortIdx] = _rand.Next(int.MaxValue);
No need to waste the space of a full GUID really.
-
# re: Randomizing rows in a DataTable
Posted @ 4/16/2008 3:37 PM
Coding Horror made an excellent post on this deceptively simple problem. Depending on why you want numbers to be random, the formula used to generate the random numbers is very important.
http://codinghorror.com/blog/archives/001015.html
The GUID might be better than just the rand.Next, especially since the GUIDs will never have the same number show up twice, but your solution I bet will end up with at least one duplicate number in a run of any given size, and then the sorting would default to some other order, that is probably deterministic. If the randomness is something like a gambling site, where having anything deterministic could be a security flaw, that would be a big issue.
-
# re: Randomizing rows in a DataTable
Posted @ 4/16/2008 6:28 PM
Jason - that's a good point, and if you're using this technique for something where true randomness really matters, be advised. However, for my purposes the simplest solution suffices (I'm just re-ordering UI elements for presentation).
|