Archive

Archive for the ‘Cluster’ Category

Fast Threshold Clustering Algorithm (FTCA) test

Today I want to share the test and implementation for the Fast Threshold Clustering Algorithm (FTCA) created by David Varadi. This implementation was developed and contributed by Pierre Chretien, I only made minor updates.

Let’s first replicate the results from the Fast Threshold Clustering Algorithm (FTCA) post:

###############################################################################
# Load Systematic Investor Toolbox (SIT)
# https://systematicinvestor.wordpress.com/systematic-investor-toolbox/
###############################################################################
setInternet2(TRUE)
con = gzcon(url('http://www.systematicportfolio.com/sit.gz', 'rb'))
    source(con)
close(con)

	#*****************************************************************
	# Load historical data for ETFs
	#****************************************************************** 
	load.packages('quantmod')

	tickers = spl('XLY,XLP,XLE,XLF,XLV,XLI,XLB,XLK,XLU')
	
	data <- new.env()
	getSymbols(tickers, src = 'yahoo', from = '1900-01-01', env = data, auto.assign = T)
		for(i in ls(data)) data[[i]] = adjustOHLC(data[[i]], use.Adjusted=T)		
	bt.prep(data, align='keep.all')

	#*****************************************************************
	# Helper function to compute portfolio allocation additional stats
	#****************************************************************** 
	portfolio.allocation.custom.stats.clusters <- function(x,ia) {
		return(list(
			clusters.FTCA = cluster.group.FTCA(0.5)(ia)			
		))
	}
	
	#*****************************************************************
	# Find clusters
	#****************************************************************** 		
	periodicity = 'months'
	lookback.len = 252
		
	obj = portfolio.allocation.helper(data$prices, 
		periodicity = periodicity, lookback.len = lookback.len,
		min.risk.fns = list(EW=equal.weight.portfolio),
		custom.stats.fn = portfolio.allocation.custom.stats.clusters
	) 			
	
	clusters = obj$clusters.FTCA$EW	
	clusters['2012:05::']

The clusters are stable and match David’s results

           XLB XLE XLF XLI XLK XLP XLU XLV XLY
2012-05-31   1   1   1   1   1   1   1   1   1
2012-06-29   1   1   1   1   1   1   1   1   1
2012-07-31   1   1   1   1   1   1   1   1   1
2012-08-31   1   1   1   1   1   1   1   1   1
2012-09-28   1   1   1   1   1   1   1   1   1
2012-10-31   1   1   1   1   1   1   1   1   1
2012-11-30   2   2   2   2   2   2   1   2   2
2012-12-31   2   2   2   2   2   2   1   2   2
2013-01-31   2   2   2   2   2   2   1   2   2
2013-02-28   1   1   1   1   1   1   1   1   1
2013-03-28   1   1   1   1   1   1   1   1   1
2013-04-30   1   1   1   1   1   1   1   1   1
2013-05-31   1   1   1   1   1   1   1   1   1
2013-06-28   1   1   1   1   1   1   1   1   1
2013-07-31   1   1   1   1   1   1   1   1   1
2013-08-30   1   1   1   1   1   1   1   1   1
2013-09-30   1   1   1   1   1   1   1   1   1
2013-10-31   1   1   1   1   1   1   1   1   1
2013-11-26   1   1   1   1   1   1   1   1   1

Next let’s compare the Cluster Portfolio Allocation Algorithm using K-means and FTCA:

	#*****************************************************************
	# Code Strategies
	#****************************************************************** 					
	obj = portfolio.allocation.helper(data$prices, 
		periodicity = periodicity, lookback.len = lookback.len, 
		min.risk.fns = list(
			C.EW.kmeans = distribute.weights(equal.weight.portfolio, cluster.group.kmeans.90),
			C.EW.FTCA = distribute.weights(equal.weight.portfolio, cluster.group.FTCA(0.5))			
		)
	)
	
	models = create.strategies(obj, data)$models
						
	#*****************************************************************
	# Create Report
	#******************************************************************    
	barplot.with.labels(sapply(models, compute.turnover, data), 'Average Annual Portfolio Turnover')

Both clustering algorithms produced very similar results. One noticeable difference is turnover. Since the Fast Threshold Clustering Algorithm (FTCA) produced more stable groups, it had smaller turnover.

plot1

The full source code and example for the cluster.group.FTCA() function is available in strategy.r at github.

Advertisement
Categories: Asset Allocation, Cluster, R

Cluster Risk Parity back-test

In the Cluster Portfolio Allocation post, I have outlined the 3 steps to construct Cluster Risk Parity portfolio. At each rebalancing period:

  • Create Clusters
  • Allocate funds within each Cluster using Risk Parity
  • Allocate funds across all Clusters using Risk Parity

I created a helper function distribute.weights() function in strategy.r at github to automate these steps. It has 2 parameters:

  • Function to allocate funds. For example, risk.parity.portfolio, will use use risk parity to allocate funds both within and across clusters.
  • Function to create clusters. For example, cluster.group.kmeans.90, will create clusters using k-means algorithm

Here is the example how to put it all together. Let’s first load historical prices for the 10 major asset classes:

###############################################################################
# Load Systematic Investor Toolbox (SIT)
# https://systematicinvestor.wordpress.com/systematic-investor-toolbox/
###############################################################################
setInternet2(TRUE)
con = gzcon(url('http://www.systematicportfolio.com/sit.gz', 'rb'))
    source(con)
close(con)

	#*****************************************************************
	# Load historical data for ETFs
	#****************************************************************** 
	load.packages('quantmod')

	tickers = spl('GLD,UUP,SPY,QQQ,IWM,EEM,EFA,IYR,USO,TLT')

	data <- new.env()
	getSymbols(tickers, src = 'yahoo', from = '1900-01-01', env = data, auto.assign = T)
		for(i in ls(data)) data[[i]] = adjustOHLC(data[[i]], use.Adjusted=T)
		
	bt.prep(data, align='remove.na')

Next, let’s run the 2 versions of Cluster Portfolio Allocation using Equal Weight and Risk Parity algorithms to allocate funds:

	#*****************************************************************
	# Code Strategies
	#****************************************************************** 	
	periodicity = 'months'
	lookback.len = 250
	cluster.group = cluster.group.kmeans.90
	
	obj = portfolio.allocation.helper(data$prices, 
		periodicity = periodicity, lookback.len = lookback.len,
		min.risk.fns = list(
				EW=equal.weight.portfolio,
				RP=risk.parity.portfolio,
						
				C.EW = distribute.weights(equal.weight.portfolio, cluster.group),
				C.RP=distribute.weights(risk.parity.portfolio, cluster.group)
			)
	) 		
	
	models = create.strategies(obj, data)$models

Finally, let’s examine the results:

	#*****************************************************************
	# Create Report
	#****************************************************************** 	
	strategy.performance.snapshoot(models, T)

plot1.png.small

The Cluster Portfolio Allocation produce portfolios with better risk-adjusted returns and smaller drawdowns.

To view the complete source code for this example, please have a look at the bt.cluster.portfolio.allocation.test1() function in bt.test.r at github.

Categories: Backtesting, Cluster, R

Cluster Portfolio Allocation

February 12, 2013 5 comments

Today, I want to continue with clustering theme and show how the portfolio weights are determined in the Cluster Portfolio Allocation method. One example of the Cluster Portfolio Allocation method is Cluster Risk Parity (Varadi, Kapler, 2012).

The Cluster Portfolio Allocation method has 3 steps:

  • Create Clusters
  • Allocate funds within each Cluster
  • Allocate funds across all Clusters

I will illustrate below all 3 steps using “Equal Weight” and “Risk Parity” portfolio allocation methiods. Let’s start by loading historical prices for the 10 major asset classes.

###############################################################################
# Load Systematic Investor Toolbox (SIT)
# https://systematicinvestor.wordpress.com/systematic-investor-toolbox/
###############################################################################
setInternet2(TRUE)
con = gzcon(url('http://www.systematicportfolio.com/sit.gz', 'rb'))
    source(con)
close(con)
	#*****************************************************************
	# Load historical data for ETFs
	#****************************************************************** 
	load.packages('quantmod')

	tickers = spl('GLD,UUP,SPY,QQQ,IWM,EEM,EFA,IYR,USO,TLT')

	data <- new.env()
	getSymbols(tickers, src = 'yahoo', from = '1900-01-01', env = data, auto.assign = T)
		for(i in ls(data)) data[[i]] = adjustOHLC(data[[i]], use.Adjusted=T)
		
	bt.prep(data, align='remove.na')

	#*****************************************************************
	# Setup
	#****************************************************************** 
	# compute returns
	ret = data$prices / mlag(data$prices) - 1

	# setup period
	dates = '2012::2012'
	ret = ret[dates]

Next, let’s compute “Plain” portfolio allocation (i.e. no Clustering)

	fn.name = 'equal.weight.portfolio'				
	fn = match.fun(fn.name)

	# create input assumptions
	ia = create.historical.ia(ret, 252) 
	
	# compute allocation without cluster, for comparison
	weight = fn(ia)

Next, let’s create clusters and compute portfolio allocation within each Cluster

	# create clusters
	group = cluster.group.kmeans.90(ia)
	ngroups = max(group)

	weight0 = rep(NA, ia$n)
			
	# store returns for each cluster
	hist.g = NA * ia$hist.returns[,1:ngroups]
			
	# compute weights within each group	
	for(g in 1:ngroups) {
		if( sum(group == g) == 1 ) {
			weight0[group == g] = 1
			hist.g[,g] = ia$hist.returns[, group == g, drop=F]
		} else {
			# create input assumptions for the assets in this cluster
			ia.temp = create.historical.ia(ia$hist.returns[, group == g, drop=F], 252) 

			# compute allocation within cluster
			w0 = fn(ia.temp)
			
			# set appropriate weights
			weight0[group == g] = w0
			
			# compute historical returns for this cluster
			hist.g[,g] = ia.temp$hist.returns %*% w0
		}
	}

Next, let’s compute portfolio allocation across all Clusters and compute final portfolio weights

	# create GROUP input assumptions
	ia.g = create.historical.ia(hist.g, 252) 
			
	# compute allocation across clusters
	group.weights = fn(ia.g)
				
	# mutliply out group.weights by within group weights
	for(g in 1:ngroups)
		weight0[group == g] = weight0[group == g] * group.weights[g]

Finally, let’s create reports and compare portfolio allocations

	#*****************************************************************
	# Create Report
	#****************************************************************** 			
	load.packages('RColorBrewer')
	col = colorRampPalette(brewer.pal(9,'Set1'))(ia$n)

	layout(matrix(1:2,nr=2,nc=1))
	par(mar = c(0,0,2,0))
	index = order(group)

	pie(weight[index], labels = paste(colnames(ret), round(100*weight,1),'%')[index], col=col, main=fn.name)

	pie(weight0[index], labels = paste(colnames(ret), round(100*weight0,1),'%')[index], col=col, main=paste('Cluster',fn.name))	

equal.weight.portfolio.plot.png.small

The difference is most striking in the “Equal Weight” portfolio allocation method. The Cluster version allocates 25% to each cluster first, and then allocates equally within each cluster. The Plain version allocates equally among all assets. The “Risk Parity” version below works in similar way, but instead of having equal weights, the focus is on the equal risk allocations. I.e. UUP gets a much bigger allocation because it is far less risky than any other asset.

risk.parity.portfolio.plot.png.small

Next week, I will show how to back-test Cluster Portfolio Allocation methods.

To view the complete source code for this example, please have a look at the bt.cluster.portfolio.allocation.test() function in bt.test.r at github.

Categories: Cluster, R

Tracking Number of Historical Clusters in DOW 30 and S&P 500

February 5, 2013 1 comment

In the Tracking Number of Historical Clusters post, I looked at how 3 different methods were able to identify clusters across the 10 major asset universe. Today, I want to share the impact of clustering on the larger universe. Below I examined the historical time series of number of clusters in the DOW 30 and S&P 500 indices.

I went back to the 1970 for the companies in DOW 30 index.

plot1_dow.png.small

plot2_dow.png.small

plot3_dow.png.small

I went back to the 1994 for the companies in S&P 500 index.

plot1_sp500.png.small

plot2_sp500.png.small

plot3_sp500.png.small

Takeaways: The markets are changing, and correspondingly the diversification (i.e. number of clusters) goes thought cycles as can be seen in the charts. The results will vary across different methods and must be validated by the user. For example, some readers will consider an average of 10 clusters for S&P 500 as too small, while others might think that 10 clusters as sufficient.

Categories: Cluster, R

Tracking Number of Historical Clusters

January 27, 2013 2 comments

In the prior post, Optimal number of clusters, we looked at methods of selecting number of clusters. Today, I want to continue with clustering theme and show historical Number of Clusters time series using these methods.

In particular, I will look at the following methods of selecting optimal number of clusters:

  • Minimum number of clusters that explain at least 90% of variance
  • Elbow method
  • Hierarchical clustering tree cut at 1/3 height

Let’s first load historical prices for the 10 major asset classes

###############################################################################
# Load Systematic Investor Toolbox (SIT)
# https://systematicinvestor.wordpress.com/systematic-investor-toolbox/
###############################################################################
setInternet2(TRUE)
con = gzcon(url('http://www.systematicportfolio.com/sit.gz', 'rb'))
    source(con)
close(con)

	#*****************************************************************
	# Load historical data for ETFs
	#****************************************************************** 
	load.packages('quantmod')

	tickers = spl('GLD,UUP,SPY,QQQ,IWM,EEM,EFA,IYR,USO,TLT')

	data <- new.env()
	getSymbols(tickers, src = 'yahoo', from = '1900-01-01', env = data, auto.assign = T)
		for(i in ls(data)) data[[i]] = adjustOHLC(data[[i]], use.Adjusted=T)
		
	bt.prep(data, align='remove.na')

Next, I created 3 helper functions to automate cluster selection. In particular, I used following methods of selecting optimal number of clusters:

  • Minimum number of clusters that explain at least 90% of variance – cluster.group.kmeans.90 function
  • Elbow method – cluster.group.kmeans.elbow function
  • Hierarchical clustering tree cut at 1/3 height – cluster.group.hclust function

To view the complete source code for these functions please have a look at the startegy.r at github.

Let’s use these functions on our data set every week with 250 days look-back to compute correlations.

	#*****************************************************************
	# Use following 3 methods to determine number of clusters
	# * Minimum number of clusters that explain at least 90% of variance
	#   cluster.group.kmeans.90
	# * Elbow method
	#   cluster.group.kmeans.elbow
	# * Hierarchical clustering tree cut at 1/3 height
	#   cluster.group.hclust
	#****************************************************************** 

	# helper function to compute portfolio allocation additional stats
	portfolio.allocation.custom.stats.clusters <- function(x,ia) {
		return(list(
			ncluster.90 = max(cluster.group.kmeans.90(ia)),
			ncluster.elbow = max(cluster.group.kmeans.elbow(ia)),
			ncluster.hclust = max(cluster.group.hclust(ia))
		))
	}
	
	#*****************************************************************
	# Compute # Clusters
	#****************************************************************** 		
	periodicity = 'weeks'
	lookback.len = 250
		
	obj = portfolio.allocation.helper(data$prices, 
		periodicity = periodicity, lookback.len = lookback.len,
		min.risk.fns = list(EW=equal.weight.portfolio),
		custom.stats.fn = portfolio.allocation.custom.stats.clusters
	) 

Finally, the historical number of cluster time series plots for each method:

	#*****************************************************************
	# Create Reports
	#****************************************************************** 		
	temp = list(ncluster.90 = 'Kmeans 90% variance',
		 ncluster.elbow = 'Kmeans Elbow',
		 ncluster.hclust = 'Hierarchical clustering at 1/3 height')	
	
	for(i in 1:len(temp)) {
		hist.cluster = obj[[ names(temp)[i] ]]
		title = temp[[ i ]]
	
		plota(hist.cluster, type='l', col='gray', main=title)
			plota.lines(SMA(hist.cluster,10), type='l', col='red',lwd=5)
		plota.legend('Number of Clusters,10 period moving average', 'gray,red', x = 'bottomleft')			
	}

plot1.png.small

plot2.png.small

plot3.png.small

All methods selected clusters a little bit differently, as expected. The “Minimum number of clusters that explain at least 90% of variance” method seems to produce the most stable results. I would suggest looking at the larger universe (for example DOW30) and longer period of time (for example 1995-present) to evaluate these methods.

Takeaways: As I mentioned in the Optimal number of clusters post, there are many different methods to create clusters, and I have barely scratched the surface. There is also another dimension that I have not explored yet, the distance matrix. Currently, I’m using a correlation matrix as a distance measure to create clusters. I was pointed out by Matt Considine that there is an R interface to the Maximal Information-based Nonparametric Exploration (MINE) metric that can be used as a better measure of correlation.

To view the complete source code for this example, please have a look at the bt.cluster.optimal.number.historical.test() function in bt.test.r at github.

Categories: Cluster, R