Tuesday, March 5, 2013
0 comments

Choosing a Data Access Implementation::Dapper

4:29 PM
Before we dive in and start coding serious consideration needs to be taken regarding any decisions about data access implementation. Data access is the most important, most complicated and most difficult aspect to change of any software system. Once our application is live and begins to grow any future changes we wish to make can become increasingly difficult.
The data access layer for our SMS quiz should adhere to the following requirements:
  • Must be performant: Our application will be subject to surges, so we need our database read/writes to be as performant as possible.
  • No use of code generation/designers or configuration files: I've had bad experiences with config/designers when these files are committed to source control and many developers are making changes to these files.
  • Convention over configuration and automation of common actions: A LOT of changes will be made to these classes/methods by potentially many developers so we want to promote convention.
  • To limit abstractions and have control over the domain specific language (DSL): It's always preferable to code in a specific DSL such a SQL especially when writing complex queries.
  • Use a relational database: our first implementation needs to run on SQL Server.
  • Allow a possible move to a cloud computing database provider in the future: To scale the database it may need to be moved to a cloud computing provider such as Azure and use a NoSql approach.
Now that we've established some ground rules, let's jump in a look at some code. We have the following method in a class called CompetitionService:
?
1
2
3
4
5
6
7
8
9
public GetCompetitionsResponse GetCompetitions(GetCompetitionsRequest request)
{
    GetCompetitionsResponse response = new GetCompetitionsResponse();
    IEnumerable<Competition> competitions = _repository.Find(GetQuery(request));
    response.Competitions = _mapper.Map<IEnumerable<Competition>,
                                        IEnumerable<CompetitionItem>>(competitions);
 
    return response;
}
This class serves as the main entry point to send requests and receive responses between layers of the application. In particular it hides away the complexities of our domain model and data access layer from the UI layers.
The next step is to realise our repository calls such as _repository.GetCompetitions() and have them read/write to a data store.
Let's begin with an integration test that will give us an idea of how we can call our repository methods. In this example we have a "get by id", "get list of items" and "insert item" method for illustration purposes, we would of course need, update and delete methods but these will be omitted for brevity:
?
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
[TestMethod]
private void GetCompetitionsReturnsRecords()
{
    // Arrange
    ICompetitionRepository repository = new CompetitionRepository();
 
    // Act
    IEnumerable<Competition> items = repository.GetCompetitions();
 
    // Assert
    Assert.IsTrue(items.Count() > 0);
}
 
[TestMethod]
private void GetCompetitionByIDReturnsRecord()
{
    // Arrange
    ICompetitionRepository repository = new CompetitionRepository();
 
    // Act
    Competition item = repository.GetCompetitionByID(COMPETITION_ID);
 
    // Assert
    Assert.IsNotNull(item);
}
 
[TestMethod]
private void CreateCompetitionReturnsNewID()
{
    // Arrange
    Competition competition = new Competition();
    competition.ClosingDate = CLOSING_DATE;
    competition.CompetitionKey = COMPETITION_KEY;
 
    /* more properties removed for brevity */
 
    // Act
    _repository.CreateCompetition(competition);
 
    // Assert
    Assert.AreNotEqual(Guid.Empty, competition.ID);
}

Manual Implementation

This ICompetitionRepository implementation uses the System.Data.SqlClient library. This is data access code in its lowest form when using the ADO.Net library:
?
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
public sealed class CompetitionRepository : ICompetitionRepository
{
    public IEnumerable<Competition> GetCompetitions()
    {
        List<Competition> competitions = new List<Competition>();
 
        using (SqlConnection cn = new SqlConnection(ConnectionString))
        {
            SqlCommand cmd = new SqlCommand("SELECT * FROM Competitions", cn);
            cn.Open();
            using (SqlDataReader reader = cmd.ExecuteReader())
            {
                while (reader.Read())
                {
                    var competition = new Competition();
                    competition.Question = reader["Question"].ToString();
                    competition.ClosingDate = (DateTime)reader["ClosingDate"];
                    competition.CompetitionKey = reader["CompetitionKey"].ToString();
 
                    /* more properties removed for brevity */
 
                    competitions.Add(competition);
                }
            }
        }
 
        return competitions;
    }
 
    public Competition GetCompetitionByID(Guid id)
    {
        Competition competition = null;
 
        using (SqlConnection cn = new SqlConnection(ConnectionString))
        {
            SqlCommand cmd = new SqlCommand("SELECT * FROM Competitions WHERE ID=@ID", cn);
            cmd.Parameters.Add("@ID", System.Data.SqlDbType.UniqueIdentifier).Value = id;
            cn.Open();
            using (SqlDataReader reader = cmd.ExecuteReader())
            {
                while (reader.Read())
                {
                    competition = new Competition();
                    competition.Question = reader["Question"].ToString();
                    competition.ClosingDate = (DateTime)reader["ClosingDate"];
                    competition.CompetitionKey = reader["CompetitionKey"].ToString();
 
                    /* more properties removed for brevity */
                }
            }
        }
 
        return competition;
    }
 
    public void CreateCompetition(Competition item)
    {
        using (SqlConnection cn = new SqlConnection(ConnectionString))
        {
            SqlCommand cmd = new SqlCommand("sps_Competitions_Insert", cn);
            cn.Open();
 
            cmd.Parameters.Add(new SqlParameter("@Question", item.Question));
            cmd.Parameters.Add(new SqlParameter("@ClosingDate", item.ClosingDate));
            cmd.Parameters.Add(new SqlParameter("@CompetitionKey", item.CompetitionKey));
 
            /* more properties removed for brevity */
 
            item.ID = (Guid)cmd.ExecuteScalar();
        }
    }
}

Pros

  • Complete control over code. We have 100% control over every line of code.
  • Explicit use of DSL i.e. SQL queries have to be manually constructed.
  • Allows easy switching between in-line SQL or stored procedures.
  • No code generators or configuration files are used.

Cons

  • Complete control over code. The fact that we have full control means we need to write every line of code.
  • Lots of repetitive code. Almost all objects that need to be saved to the data store will have similar looking CRUD operations which we will need to write.
  • We need to write a separate repository methods for all queries.
  • We need to manually code all parameter and property mapping for our SQL queries.
This is pure, good old-fashioned coding. But it can be repetitive and will require granular changes for any future developments.

ORM Implementation

Now we look at an implementation of ICompetitionRepository that uses Entity Framework Code First which is similar in principle to other ORMs such as LinqToSql and NHibernate:
?
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
public sealed class CompetitionRepository : ICompetitionRepository
{
    private Context _context = new Context();
 
    public IEnumerable<Competition> GetCompetitions()
    {
        return _context.Competitions.ToList();
    }
 
    public Competition GetCompetitionByID(Guid id)
    {
        return _context.Competitions.SingleOrDefault(c => c.ID == id);
    }
 
    public void CreateCompetition(Competition item)
    {
        _context.Competitions.Add(item);
        _context.SaveChanges();
    }
}
The reduction and code in comparison with the manual implementation we looked at is dramatic. However, before we get too carried away we must remember that there is a lot of implementation detailed tucked away in a magic box somewhere. To make this work we also have to create the following Context class:
?
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
public class Context : DbContext
{
    public DbSet<Competition> Competitions { get; set; }
 
    public Context()
        : base("SmsQuizConnection")
    {
        Database.SetInitializer<Context>(null);
    }
 
    protected override void OnModelCreating(DbModelBuilder modelBuilder)
    {
        modelBuilder.Conventions.Remove<OneToManyCascadeDeleteConvention>();
 
        /* custom mappers */
        modelBuilder.Configurations.Add(new Mapping.CompetionMapper());
    }
}
Plus we also have classes for custom mappers:
?
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
public class CompetionMapper : EntityTypeConfiguration<Competition>
{
    public CompetionMapper()
    {
        ToTable("Competitions");
        HasKey(t => t.ID);
        this.Property(t => t.ID)
            .HasDatabaseGeneratedOption(DatabaseGeneratedOption.Identity)
            .HasColumnName("ID");
 
        this.Property(t => t.ClosingDate)
            .HasColumnName("ClosingDate");
 
        this.Property(t => t.CompetitionKey)
            .HasColumnName("CompetitionKey");
 
        /* more properties removed for brevity */
    }
}
Note: If the classes are straightforward POCOs these types of frameworks can automate the mapping, however in this example the domain entities do necessarily match the table structures in the database which forces us to write custom mapping code.
The non Code First version of Entity Framework can have its mapping and context file created from the database tables using a Drag and Drop designer. However this goes against our requirements since we do not want to use such designers as our domain model does not exactly match our database structure. Also our database structure is uncompleted at present.

Pros

  • Automates repetitive tasks such as parameter and property mapping.
  • The LINQ query to SQL syntax enables quick creation of query variants.
  • Abstracts away specific SQL Queries potentially making future database moves easier since we're not tied to any specific DSL.
  • Could be insanely productive for plain POCO classes.

Cons

  • Abstracts away specific SQL Queries. This can lead to some serious performance issues if we allow an abstraction to craft our SQL queries.
  • Has designer, configuration, or code generation files.
  • Is a black box. Do we really want the most critical part of our application to be heavily reliant on a black box we do not own?

The Biggest Trade Off

In my experience with using an ORM it really becomes a joy to use when you have lots of small queries with minor differences to create. For example, lets consider the following SQL queries:
?
1
2
3
4
5
-- #1
SELECT * FROM Competitions WHERE ClosingDate < @ClosingDate
 
-- #2
SELECT * FROM Competitions WHERE ClosingDate > @ClosingDate AND CompetitionKey = @CompetitionKey
With our manual implementation we will have three separate methods in our repository, for example:
?
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
public IEnumerable<Competition> GetClosedCompetitions()
{
    var param = new SqlParameterCollection();
    param.Add("@ClosingDate", System.Data.SqlDbType.SmallDateTime).Value = DateTime.Now;
    return GetCompetitionsByQuery("SELECT * FROM Competitions WHERE ClosingDate < @ClosingDate", param);
}
 
public IEnumerable<Competition> GetOpenCompetitionsWithKey(string competitionKey)
{
    var param = new SqlParameterCollection();
    param.Add("@ClosingDate", System.Data.SqlDbType.SmallDateTime).Value = DateTime.Now;
    param.Add("@ClosingDate", System.Data.SqlDbType.SmallDateTime).Value = competitionKey
    return GetCompetitionsByQuery("SELECT * FROM Competitions WHERE ClosingDate > @ClosingDate AND CompetitionKey = @CompetitionKey ", param);
}
 
// Generic method to do the heavy query lifting
private IEnumerable<Competition> GetCompetitionsByQuery(string sql, SqlParameterCollection param)
{
    List<Competition> competitions = null;
 
    using (SqlConnection cn = new SqlConnection(ConnectionString))
    {
        SqlCommand cmd = new SqlCommand(sql, cn);
 
        foreach (var p in param)
        {
            cmd.Parameters.Add(p);
        }
 
        cn.Open();
        using (SqlDataReader reader = cmd.ExecuteReader())
        {
            while (reader.Read())
            {
                Competition competition = new Competition();
                competition.Question = reader["Question"].ToString();
                competition.ClosingDate = (DateTime)reader["ClosingDate"];
                competition.CompetitionKey = reader["CompetitionKey"].ToString();
 
                /* fields removed for brevity */
 
                competitions.Add(competition);
            }
        }
    }
 
    return competitions;
}
Now let us contrast this example with the an implementation using Entity Framework:
?
1
2
3
4
public IEnumerable<Competition> GetCompetitions(Expression<Func<T, bool>> predicate)
{
    return _context.Competitions.Where(predicate);
}
That's it! To get a better idea of how the GetCompetitions method works let's consider how we would consume if from the Service Layer methods:
?
1
2
3
4
5
// Get closed competitions
var competitions = _repository.GetCompetitions(c => c.ClosingDate < DateTime.Now);
 
// Get open competitions
var competitions = _repository.GetCompetitions(c => c.ClosingDate > DateTime.Now && c.CompetitionKey = "ABC");
This is very flexible and productive, and for me is the most beneficial feature of using ORMs. But word of caution, imagine the same scenario with complicated SQL queries with table joins, subqueries, order by columns etc. Would we be able to construct such a query using a lambda expression in this example code? And what the would resulting SQL look like?

The Best of Both Worlds?

The comparisons of these two implementations suggests that we have some kind of trade off to make. The manual approach meets some of our requirements as does the ORM approach.
In a perfect world, wouldn't it be great to automate the mundane and nontrivial stuff yet have full control over the more complicated aspects of implementation? That's exactly what we are going to try and implement.
ORMs are feature packed products that try to be all things to all men. This can lead to leaky abstractions and performance issues. A common criticism of ORMs is that the native SQL queries they generate for more complex scenarios can be inefficient. But as we've seen, the productivity boost for the nontrivial aspects is something that just cannot be ignored. What about Micro-ORMs?
A micro ORM seeks to do one thing very well. What you lose in features you gain in control and performance.
We are going to implement our repository layer using Dapper, an open source Micro-ORM that was written for and is used in production at StackOverflow.com. It is designed with performance as a feature which is what we need.

0 comments:

 
Toggle Footer