VWAP Performance: Was it Good or Bad?

Updated: Feb 11, 2022

Sept 9, 2020 Chris Sparrow, Co-founder, Director of Research and Data Science

In transaction cost analysis (TCA), it is common to measure the cost of implementing an order by comparing the average price of the order with a benchmark such as arrival price, or the volume weighted average price (VWAP) computed over the life of the order.

The VWAP cost that is reported is an ‘absolute’ measurement in that it reports the difference between the order price and the benchmark without reference to what else was going on as the order was being worked in the market. A key question that arises is whether the cost represents good or bad implementation performance.

What we really want is a way to answer the question: what was our relative performance? To measure relative performance, we need to consider more than just the benchmark and the average price, we need to include the activity of other participants in the market – we need to include context. How can we do this?

Figure 1 / Results for 4 different stocks. We see a large variation in the typical range of outcomes with ENB having a very tight range and SCL a very wide range. A 10 bps cost for an order in ENB is much worse than a 10 bps cost for an order in SCL.

One way to approach this problem is to estimate the range of outcomes that could have been expected given the liquidity characteristics and the activities of other market participants while our order was being implemented.

We can get a sense of the range of outcomes as follows: first, determine the start and end time of our order. Next, determine the percentage of the overall market volume our order represented, often called the participation rate.

The next step is to randomly choose trades from the full set of market trades across all participants during the time of our order. These will be different executions (and therefore have different average prices) – but will be randomly chosen from the market trades in a way that matches the overall participation rate of our order. We then compare the performance of the set of randomized orders versus our benchmark(s).

The procedure will result in a set of orders that experienced the same liquidity conditions as our order since each of the random orders had the same start and finish times as our order. Due to its random nature, the set will include orders that performed well and orders that performed poorly, giving us a good basis for comparison.

This method is an example of a Monte Carlo approach. The benefit of using a Monte Carlo approach is that we can directly sample the distribution driving the process. In this case, the underlying distribution is the price-volume distribution of stock trades that occurred while we were working our order. Simpler approaches to evaluating relative performance include measuring the percentage of volume that traded at better (or worse) prices than our average price. However, these methods miss the important detail of how much better (or worse) those other prices were and what the likelihood of those outcomes were.

We want to understand the range of costs versus benchmark(s) that our set of random orders produced. Once we have this, we can compare our performance versus the range obtained from the randomized orders. We now have a method of measuring relative performance that considers how other participants were trading during the same period as our order and can directly address the question: how did my incurred costs compare with this set of possible outcomes? Or put another way, was my performance good or bad?

This approach can be refined by imposing constraints on the randomized orders, such as requiring that the participation rate must be below a specified threshold, reflecting the original order’s instructions. In fact, we could use this approach to determine how much any constraints we imposed on our order impacted our performance.

We can also use a similar approach to get an idea of how liquidity analytics like spread, depth, volatility and volume can impact the expected range of cost outcomes. We use the procedure as part of a pre-trade analysis by estimating the expected range of outcomes under similar liquidity conditions to the ones we expect to execute our order in.

Consider an example:

Start with an order.
Get the times of the start and end of the order and compute its benchmark costs.
Construct a set of orders from the trades that occurred in the market during the period our order was being worked and require this set of randomly chosen trades to sum to the same volume as we executed on our order.
Compare the average prices with our benchmark of interest, such as the interval VWAP and record the cost.
Use the resulting distribution of costs to determine the range of potential cost outcomes of random orders (or constrained semi-random orders) exposed to the same liquidity conditions as our order.
Compare the cost of our order with the distribution of cost outcomes from the random orders.

The range of outcomes from the random orders provides us with the context we need to determine whether our performance was good or bad. If our order is far from the middle of the range obtained from the random orders, we can conclude we had particularly good or bad performance (depending which side of the distribution we are on and whether we were a buyer or seller).

We applied this process to a group of stocks to illustrate the method. For each stock, several thousand orders were generated by randomly choosing trades that were executed while our order was being implemented. For each of these randomly constructed orders, the cost versus the interval volume weighted average price (VWAP) benchmark was computed and recorded. The standard deviation of the set of costs was then computed to quantify the range of results that can be used to put our order’s performance into context and let us answer the question of whether our performance was good or bad.

Some sample results are shown in Figure 1, where we show results for four stocks. We can see that ENB has a very tight distribution, while SCL has a very wide distribution. Missing the VWAP by 10 bps on ENB would be bad, while missing the VWAP by 10 bps on SCL would be quite reasonable. This is because for the ENB case, almost none of the random orders had costs of more than 10 bps, while on SCL a large percentage of orders had costs greater than 10 bps.

Figure 2 / Comparison of the range of cost outcomes using the Monte Carlo methodology with the intra-order volatility computed using the price-volume distribution. We observe a direct relationship between the two suggesting that the intra-order volatility is a good proxy for the Monte Carlo method.

An objection to this method may be that we need to generate the random orders resulting in computational overhead. The Monte Carlo method is a so-called ‘non-parametric’ approach since it makes no assumptions about the underlying distribution. To see how these results may compare to a ‘parametric’ approach, we also compute the intra-order volatility by measuring the standard deviation of the price-volume distribution. A scatter plot of the range determined by the Monte Carlo approach versus the standard deviation of the price-volume distribution shows that there is good agreement between the two approaches. The good news is that it is computationally efficient to use the parametric approach and the chart in Figure 2 shows us that we get similar results from the two approaches.

Figure 3 shows a performance gauge that incorporates the Monte Carlo analysis described above. The red range on the gauge represents the set of possible prices an order of a given size could have achieved given the trades that occurred in the market. The green range shows the region where most of the Monte Carlo results were obtained. We can then compare our result by looking at where the red needle points. The shorter blue needle shows the VWAP benchmark price. The gauge allows us to put the performance, typically represented as the signed difference between the needles, into context. Comparing where the red needle points relative to the green range allows us to see our relative performance.

Figure 3 / This gauge allows us to visualize performance. The red range indicates the lowest and highest average price our order could have achieved, while the green range represents region from the Monte Carlo analysis that would be considered adequate performance.

This analysis is not a replacement for traditional post trade TCA, however it does give additional insight into how our order performed within the context of the market, allowing us to directly answer the question of whether our performance was good or bad.

It also allows for an exploration of the effects order constraints and execution instructions have on our range of potential outcomes and can be extended further to perform sensitivity analysis for our execution strategy to various liquidity and market conditions.

This is an example of how Spacetime applies our philosophy of Better Than Best Practice to Best Execution.

Chris Sparrow

Director of Research and Data Science

chris.sparrow@spacetime.io

solutions@spacetime.io | spacetime.io | @spacetime.io

VWAP Performance: Was it Good or Bad?

Recent Posts

Comments