Selecting and Transforming Data with LINQ
Nov 14, 2018 • 6 Minute Read
The Problem This Solves
Before LINQ came out with C# 3.0, we wrote a LOT of code like this:
public MyEmployeeList Method()
{
MyEmployeeList aaEmployees = new MyEmployeeList();
foreach(var record in records)
{
if (record.Fields["GroupType"].Substring(0, 2) == "AA")
{
var newEmployee = new Employee();
newEmployee.Id = int.Parse(record.Fields["employee_id"].ToString());
newEmployee.FirstName = record.Fields["first_name"];
newEmployee.LastName = record.Fields["last_name"];
aaEmployees.AddEmployee(newEmployee);
}
}
return aaEmployees;
}
Here's what's happening here:
- We're creating an empty list of employees named aaEmployees.
- We're looping through a collection of records, that may be returned from a database or another data source.
- We're determining whether the record's group type field begins with "AA" – otherwise, we want to skip the record.
- If it did match "AA", we create a new employee record and set its properties to match the values in the record.
- Then we add the new employee object to our Employee list.
- Once we're looping, we return the collection of employee objects, filtered to only have employees with a GroupType that starts with "AA".
Compare that code to this:
return from record in records
where record.Fields["GroupType"].Substring(0, 2) == "AA"
select new Employee
{
Id = int.Parse(record.Fields["employee_id"].ToString()),
FirstName = record.Fields["first_name"],
LastName = record.Fields["last_name"]
};
A picky reader will observe that the return type is different between the two - the first returns a MyEmployeeList and the second an IEnumerable - but it's clear that life is better working with the second, more fluent syntax. I have a grammar checker which is smart enough to tell code from English - except with LINQ, where it attempts to fix my grammar, mistaking it for plain English. All of our code should be so comprehensible.
LINQ is a set manipulation query language - it automates much of the grunt work and code that is common to software by making set operations generic.
Our Scenario
We have been tasked with writing a business layer that will filter and organize a set of results returned from a query. The results have certain elements that need to be transformed, filtered, ordered, and grouped. It's important to understand that with LINQ, the results may have come from a database, they may have come from the file system, or by listening on a network socket, or they may be hard-coded. How the results got to us is largely immaterial - LINQ is designed to be utterly generic in how it performs the kind of set operations we're going to work with.
To begin with, we have a set of Employee objects, like the results of the query we looked at in the first section, but not filtered for having a group type beginning with AA. So our Employee object looks like this:
public class Employee
{
public string FirstName { get; set; }
public string LastName { get; set; }
public int Id { get; set; }
public string GroupCode { get; set; }
public List<string> Addresses { get; set; }
}
Very simple. For the purposes of this guide, we're creating a set of three Employees with this code:
List<Employee> employees = new List<Employee>
{
new Employee { Id = 1, FirstName = "Chris", LastName = "Behrens", GroupCode = "AA123", Addresses = new List<string>{"7814 Vandalia Avenue Apt 3b Los Angeles CA 91610"} },
new Employee { Id = 2, FirstName = "Sam", LastName = "Smith", GroupCode = "AA128", Addresses = new List<string>{ "7814 Vandalia Avenue Apt 3b Los Angeles CA 91610" } },
new Employee { Id = 3, FirstName = "Gill", LastName = "Bates", GroupCode = "ZZ867", Addresses = new List<string>{"2810 Pisces Ct Albuequerque, NM 87144"} }
};
And this set is contained in a List variable called Employees. We need an array of the employee IDs - we have a set of employee IDs that we want to filter from the results. This statement returns the entire set as an array:
var employeeArray = (from e in employees
select e).ToArray();
We always begin our LINQ statements with from - this defines the set we're working with and the variable we'll use for our set operations. Here, we're selecting the entire employee set, defining the set variable as , and selecting it, that is, returning the value for the query. Finally, we wrap the entire query and transform it into an array with ToArray().
This is great, but now we have an array of Employee objects - we want a simpler array of ints containing only the ID. To narrow the scope of what is being selected, we select the ID property on our set variable :
var employeeIdArray = (from e in employees
select e.Id).ToArray();
As simple as that.
LINQ Syntax - Fluent vs. Query
For the purposes of this guide, we're going to focus on the _fluent _syntax - the syntax which is close to T-SQL and English:
var employeeIdArray = (from e in employees
select e.Id).ToArray();
In addition to the fluent syntax, you can also invoke LINQ as function calls, which are made available to set types as extension methods:
var employeeIdArray = employees.Select(e => e.Id).ToArray();
In this case, we're expressing our set variable for our Select call as a lambda expression:
.Select(e => e.Id)
Just like with the fluent syntax you get Intellisense for the members of the set variable as you type. You can mix these two syntaxes as well:
var employeesAddresses = from e in employees select e.Addresses.First();
The .First() method does just what you think it does – it gets the first element in the list. This would return an IEnumerable of type string – the type of the addresses, and it would contain the first address listed for every employee in the employee set.