Quantcast
Channel: Fairway Technologies
Viewing all articles
Browse latest Browse all 53

Writing A Custom LINQ Provider With Re-linq

$
0
0

Overview

When LINQ was first released in 2007, I remember thinking that I wouldn’t use it and I was very happy using SQL for querying the database.  Well, once I started using it I really liked the ability to query lists, XML and SQL Server or other databases all using the same query syntax (for the most part at least).  When starting the generic repository idea that became SharpRepository, it became a logical choice to use.  If a backend had a LINQ provider then it became pretty simple to add it to our list of supported systems:  Entity Framework, InMemory, Cache, RavenDB, MongoDB, and db4o all had LINQ providers that we could find, whether they were developed by the same group or a third party provider.

At some point I started thinking, “how can we integrate with a technology that doesn’t have a LINQ provider?”  And the answer that came to me was, “I’ll just build one, it can’t be that hard right.”  Well, turns out, it’s not the easiest thing to do.  After doing some research, I came across this great article by Ayende Rahien about creating the LINQ provider for RavenDB and, as you can tell by the title, the difficulties which he faced: The Pain of Implementing LINQ Providers.  In it, he refers to the re-linq framework which takes some of the pain out of the process.

What is re-linq?

Instead of dealing with the IQueryable expression tree (which is a really big pain), re-linq gives you an abstract syntax tree that is similar to the original LINQ query.  In implementing a LINQ provider, you will be coding against their QueryModel object, which is easier to consume than the LINQ Expression, and re-linq will evaluate any expressions within the query when it can.  re-linq implements the Visitor Pattern and provides a set of visitors for you to implement for your transformation.

One of the downsides of re-linq is that there isn’t much documentation available.  The most helpful information I found was a Codeproject article, which goes through the process and provides sample code, for creating a NHibernate LINQ provider, which translated LINE into Hibernate Query Language (HQL) that is titled re-linq|ishing the Pain: Using re-linq to Implement a Powerful LINQ Provider on the Example of NHibernate.

My attempt at writing a LINQ Provider

One day I got the strange idea to create an ODataRepository for the SharpRepository project.  Why?  Partly for fun, partly to see if I could, and partly because it could actually come in handy with WebAPI allowing you to easily provide for OData querying.  (It also didn’t hurt that the OData URI Coneventions are pretty simple and can only support a limited subset of LINQ syntax making it easier to implement.)  So while I could create an HttpClient object, query “http://somewebsite.net/Products?$filter=Category eq ‘Electronics’”, get the response and use JSON.NET to translate it to my object models, wouldn’t it be cool to be able to do this instead:

var repos = new ODataRepository<Product>("http://somewebsite.net/");
var products = repos.FindAll(x => x.Category == "Electronics");

My first attempt took me down the path of writing my own visitor class that accepts an Expression.  I got a very simple POC working but quickly realized that expanding it to handle more advanced situations was going to be very tough.  And that is when I ran across re-linq and decided to scrap my initial design and give re-linq a try.

While I am not done with creating this LINQ to OData provider yet, I have plugged away at it off and on for a little while and have made some good progress.  But mostly I have enjoyed the challenge of learning something new and giving it a shot. You can see the full code on the odata branch of SharpRepository on GitHub.  All of the LINQ related code is located here.

Some code

I won’t try to go through all the code in here but will point out a few things that will hopefully give a little overview of what it is like to write a LINQ provider using re-linq.  Please keep in mind that this code is in development and not a final product (in other words, be nice!).  I’ve based the basic structure of the code on the Codeproject article mentioned above and tweaked it for my needs, which really helped getting me going.

When parsing the syntax tree provided by re-linq, you will want to build up an object yourself that better represents the parts of your final query language.  After the tree parsing and visiting is all done, you will use this class to actually build the final query, in my case the OData querystring.  Here is the class I’m using for this purpose:

    public class QueryPartsAggregator
    {
        public QueryPartsAggregator()
        {
            FromParts = new List<string>();
            WhereParts = new List<string>();
        }

        public string SelectPart { get; set; }
        public List<string> FromParts { get; set; }
        public List<string> WhereParts { get; set; }
        public string OrderBy { get; set; }
        public int? Take { get; set; }
        public int? Skip { get; set; }
        public bool OrderByIsDescending { get; set; }
        public bool ReturnCount = false;

        public void AddFromPart(IQuerySource querySource)
        {
            FromParts.Add(querySource.ItemName);
        }

        public void AddWherePart(string formatString, params object[] args)
        {
            WhereParts.Add(string.Format(formatString, args));
        }

        public void AddOrderByPart(string orderBy, bool isDescending)
        {
            OrderBy = orderBy;
            OrderByIsDescending = isDescending;
        }
    }

When I need to build the actual OData querystring to perform the query this object maps nicely as you can see in the code below that handles this part:

            if (_queryParts.ReturnCount)
            {
                querystring += "/$count";
            }

            querystring += "?$format=json&";

            if (_queryParts.Take.HasValue)
                querystring += "$top=" + _queryParts.Take.Value + "&";

            if (_queryParts.Skip.HasValue)
                querystring += "$skip=" + _queryParts.Skip.Value + "&";

            if (!String.IsNullOrEmpty(_queryParts.OrderBy))
                querystring += "$orderby=" + _queryParts.OrderBy + "&";

            var filter = SeparatedStringBuilder.Build(" and ", _queryParts.WhereParts);
            if (!String.IsNullOrEmpty(filter))
                querystring += "$filter=" + filter + "&";

            if (!String.IsNullOrEmpty(_queryParts.SelectPart))
                querystring += "$select=" + _queryParts.SelectPart + "&";

In re-linq, you define an IQueryExecutor that is in charge of running the query based on the QueryModel that re-linq gives it.  Basically this class takes the QueryModel and kicks off the Visitor that goes through it all and builds the QueryPartsAggregator.  For the OData implementation we need to give it the base URL and the name of the collection we are querying.

    // Called by re-linq when a query is to be executed.
    public class ODataQueryExecutor : IQueryExecutor
    {
        private readonly string _url;
        private readonly string _databaseName;

        public ODataQueryExecutor(string url, string databaseName)
        {
            _url = url;
            _databaseName = databaseName;
        }

        // Executes a query with a scalar result, i.e. a query that ends with a result operator such as Count, Sum, or Average.
        public T ExecuteScalar<T> (QueryModel queryModel)
        {
          return ExecuteCollection<T> (queryModel).Single();
        }

        // Executes a query with a single result object, i.e. a query that ends with a result operator such as First, Last, Single, Min, or Max.
        public T ExecuteSingle<T> (QueryModel queryModel, bool returnDefaultWhenEmpty)
        {
          return returnDefaultWhenEmpty ? ExecuteCollection<T> (queryModel).SingleOrDefault () : ExecuteCollection<T> (queryModel).Single ();
        }

        // Executes a query with a collection result.
        public IEnumerable<T> ExecuteCollection<T> (QueryModel queryModel)
        {
          var commandData = ODataApiGeneratorQueryModelVisitor.GenerateODataApiQuery(queryModel);
          var query = commandData.CreateQuery(_url, _databaseName);
          return query.Enumerable<T> ();
        }
    }

At this point we have defined a class structure to represent our OData query, written the logic to take that structure and translate it into the OData querystring to use when making the HTTP request, and we’ve wired up what re-linq needs to kick off the process.  So the only thing left is to write the actual Visitor that parses the QueryModel.  As you can see from the code above, I’ve called this ODataApiGeneratorQueryModelVisitor (I like to keep my class names really short as you can see).  This class inherits from re-linq’s QueryModelVisitorBase and we override the methods as needed.  I won’t show the whole class but this is where we begin to populate the QueryPartsAggregator.

For example, there is a VisitResultOperator method which gets passed a ResultsOperatorBase class.  There are implementations of this base class for when the First() method is called, FirstResultOperator, or when Take() is called, TakeResultOperator, etc.  So when we override this method, for my OData needs it looks like this:

        public override void VisitResultOperator(ResultOperatorBase resultOperator, QueryModel queryModel, int index)
        {
            if (resultOperator is FirstResultOperator)
            {
                _queryParts.Take = 1;
                return;
            }

            if (resultOperator is CountResultOperator || resultOperator is LongCountResultOperator)
            {
                _queryParts.ReturnCount = true;
                return;
            }

            if (resultOperator is TakeResultOperator)
            {
                var exp = ((TakeResultOperator)resultOperator).Count;

                if (exp.NodeType == ExpressionType.Constant)
                {
                    _queryParts.Take = (int)((ConstantExpression)exp).Value;
                }
                else
                {
                    throw new NotSupportedException("Currently not supporting methods or variables in the Skip or Take clause.");
                }

                return;
            }

            if (resultOperator is SkipResultOperator)
            {
                var exp = ((SkipResultOperator) resultOperator).Count;

                if (exp.NodeType == ExpressionType.Constant)
                {
                    _queryParts.Skip = (int)((ConstantExpression)exp).Value;
                }
                else
                {
                    throw new NotSupportedException("Currently not supporting methods or variables in the Skip or Take clause.");
                }

                return;
            }

            base.VisitResultOperator(resultOperator, queryModel, index);
        }

The more complex parts are when Expressions need to be parsed, like in the case of the WhereClause, SelectClause or OrderByClause.  Re-linq provides a base visitor class for handling the expression parsing called ThrowingExpressionTreeVisitor.  As it’s name suggests, it throws an exception if some part of the LINQ syntax is not implemented in your visitor.  That way if you don’t implement the Contains() method, it will notify the user that it’s not implemented instead of just ignoring it and moving along as if it was being used.

The most straight-forward example of this is in parsing a BinaryExpression.  A BinaryExpression is made up of a Left and Right side and a NodeType.  This could be something like (x.Category == “Electronics”) where the Left side is the property x.Category, the Right side is the constant “Electronics” and the NodeType is Equal  It could also be something more complex like (x.Category == “Electronics” && x.Status == true) where the Left side is the BinaryExpression (x.Category == “Electronics”), the Right side is the BinaryExpression (x.Status == true), and the NodeType is AndAlso.

Here is the override of the VisitBinaryExpression method that I’ve made for my OData implemenation:

        protected override Expression VisitBinaryExpression (BinaryExpression expression)
        {
            _expression.Append("(");

            VisitExpression (expression.Left);

            // In production code, handle this via lookup tables.
            switch (expression.NodeType)
            {
                case ExpressionType.Equal:
                    _expression.Append (" eq ");
                    break;

                case ExpressionType.NotEqual:
                    _expression.Append (" ne ");
                    break;

                case ExpressionType.GreaterThan:
                    _expression.Append (" gt ");
                    break;

                case ExpressionType.GreaterThanOrEqual:
                    _expression.Append (" ge ");
                    break;

                case ExpressionType.LessThan:
                    _expression.Append (" lt ");
                    break;

                case ExpressionType.LessThanOrEqual:
                    _expression.Append (" le ");
                    break;

                case ExpressionType.AndAlso:
                case ExpressionType.And:
                    _expression.Append (" and ");
                    break;

                case ExpressionType.OrElse:
                case ExpressionType.Or:
                    _expression.Append (" or ");
                    break;

                case ExpressionType.Not:
                    _expression.Append(" not ");
                    break;

                case ExpressionType.Add:
                    _expression.Append (" add ");
                    break;

                case ExpressionType.Subtract:
                    _expression.Append (" sub ");
                    break;

                case ExpressionType.Multiply:
                    _expression.Append (" mul ");
                    break;

                case ExpressionType.Divide:
                    _expression.Append (" div ");
                    break;

                case ExpressionType.Modulo:
                    _expression.Append(" mod ");
                    break;

                default:
                    base.VisitBinaryExpression (expression);
                    break;
            }

            VisitExpression (expression.Right);

            _expression.Append(")");

            return expression;
        }

As you can see, I am translating the ExpressionType into the proper OData syntax, == becomes “eq”, >= becomes “ge”, etc.  The call to VisitExpression(expression.Left) and VisitExpression(expression.Right) are both very important.  They allow you to further parse the left side of the expression if it is a complex expression itself.  So in my example above where the Left side is (x.Category == “Electronics”), this call to VisitExpression then visits that left side and again has VisitBinaryExpression called allowing that original complex expression to be turned into the following OData query: “((Category eq ‘Electronics’) and (Status eq true))”.

Another example is being able to handle method calls like Contains(“text”),  StartsWith(“text”) or ToLower() and ToUpper().  To handle these we need to override the cleverly named VisitMethodCallExpression().  Here is the code that I wrote to handle those calls:

        protected override Expression VisitMethodCallExpression (MethodCallExpression expression)
        {
            // In production code, handle this via method lookup tables.
            if (expression.Method.Name == "Contains")
            {
                _expression.Append("substringof(");
                VisitExpression(expression.Arguments[0]);
                _expression.Append(",");
                VisitExpression(expression.Object);
                _expression.Append(") eq true");
                return expression;
            }

            if (expression.Method.Name == "StartsWith")
            {
                _expression.Append("startswith(");
                VisitExpression(expression.Object);
                _expression.Append(",");
                VisitExpression(expression.Arguments[0]);
                _expression.Append(") eq true");
                return expression;
            }

            if (expression.Method.Name == "EndsWith")
            {
                _expression.Append("endswith(");
                VisitExpression(expression.Object);
                _expression.Append(",");
                VisitExpression(expression.Arguments[0]);
                _expression.Append(") eq true");
                return expression;
            }

            if (expression.Method.Name == "ToLower")
            {
                _expression.Append("tolower(");
                VisitExpression(expression.Object);
                _expression.Append(")");
                return expression;
            }
            if (expression.Method.Name == "ToUpper")
            {
                _expression.Append("toupper(");
                VisitExpression(expression.Object);
                _expression.Append(")");
                return expression;
            }

            return base.VisitMethodCallExpression (expression); // throws
        }

Conclusion

Hopefully this has given you a glimpse into the crazy world of writing a LINQ provider.  As Ayende Rahien writes, “As a consumer of LINQ, I absolutely adore it, despite some of the issues just discussed. But consuming LINQ is the part that is all sunshine and roses. The task of implementing a LINQ provider is one of Herculean proportion and involves much use of fertilizer, sweat, and hard work.”  I think that does a great job of summing it up actually.


Viewing all articles
Browse latest Browse all 53

Trending Articles