### Archive

Archive for the ‘R’ Category

## Cluster Portfolio Allocation

Today, I want to continue with clustering theme and show how the portfolio weights are determined in the Cluster Portfolio Allocation method. One example of the Cluster Portfolio Allocation method is Cluster Risk Parity (Varadi, Kapler, 2012).

The Cluster Portfolio Allocation method has 3 steps:

• Create Clusters
• Allocate funds within each Cluster
• Allocate funds across all Clusters

I will illustrate below all 3 steps using “Equal Weight” and “Risk Parity” portfolio allocation methiods. Let’s start by loading historical prices for the 10 major asset classes.

```###############################################################################
# Load Systematic Investor Toolbox (SIT)
# https://systematicinvestor.wordpress.com/systematic-investor-toolbox/
###############################################################################
setInternet2(TRUE)
con = gzcon(url('http://www.systematicportfolio.com/sit.gz', 'rb'))
source(con)
close(con)
#*****************************************************************
# Load historical data for ETFs
#******************************************************************

tickers = spl('GLD,UUP,SPY,QQQ,IWM,EEM,EFA,IYR,USO,TLT')

data <- new.env()
getSymbols(tickers, src = 'yahoo', from = '1900-01-01', env = data, auto.assign = T)

bt.prep(data, align='remove.na')

#*****************************************************************
# Setup
#******************************************************************
# compute returns
ret = data\$prices / mlag(data\$prices) - 1

# setup period
dates = '2012::2012'
ret = ret[dates]
```

Next, let’s compute “Plain” portfolio allocation (i.e. no Clustering)

```	fn.name = 'equal.weight.portfolio'
fn = match.fun(fn.name)

# create input assumptions
ia = create.historical.ia(ret, 252)

# compute allocation without cluster, for comparison
weight = fn(ia)
```

Next, let’s create clusters and compute portfolio allocation within each Cluster

```	# create clusters
group = cluster.group.kmeans.90(ia)
ngroups = max(group)

weight0 = rep(NA, ia\$n)

# store returns for each cluster
hist.g = NA * ia\$hist.returns[,1:ngroups]

# compute weights within each group
for(g in 1:ngroups) {
if( sum(group == g) == 1 ) {
weight0[group == g] = 1
hist.g[,g] = ia\$hist.returns[, group == g, drop=F]
} else {
# create input assumptions for the assets in this cluster
ia.temp = create.historical.ia(ia\$hist.returns[, group == g, drop=F], 252)

# compute allocation within cluster
w0 = fn(ia.temp)

# set appropriate weights
weight0[group == g] = w0

# compute historical returns for this cluster
hist.g[,g] = ia.temp\$hist.returns %*% w0
}
}
```

Next, let’s compute portfolio allocation across all Clusters and compute final portfolio weights

```	# create GROUP input assumptions
ia.g = create.historical.ia(hist.g, 252)

# compute allocation across clusters
group.weights = fn(ia.g)

# mutliply out group.weights by within group weights
for(g in 1:ngroups)
weight0[group == g] = weight0[group == g] * group.weights[g]
```

Finally, let’s create reports and compare portfolio allocations

```	#*****************************************************************
# Create Report
#******************************************************************
col = colorRampPalette(brewer.pal(9,'Set1'))(ia\$n)

layout(matrix(1:2,nr=2,nc=1))
par(mar = c(0,0,2,0))
index = order(group)

pie(weight[index], labels = paste(colnames(ret), round(100*weight,1),'%')[index], col=col, main=fn.name)

pie(weight0[index], labels = paste(colnames(ret), round(100*weight0,1),'%')[index], col=col, main=paste('Cluster',fn.name))
```

The difference is most striking in the “Equal Weight” portfolio allocation method. The Cluster version allocates 25% to each cluster first, and then allocates equally within each cluster. The Plain version allocates equally among all assets. The “Risk Parity” version below works in similar way, but instead of having equal weights, the focus is on the equal risk allocations. I.e. UUP gets a much bigger allocation because it is far less risky than any other asset.

Next week, I will show how to back-test Cluster Portfolio Allocation methods.

To view the complete source code for this example, please have a look at the bt.cluster.portfolio.allocation.test() function in bt.test.r at github.

Categories: Cluster, R

## Tracking Number of Historical Clusters in DOW 30 and S&P 500

February 5, 2013 1 comment

In the Tracking Number of Historical Clusters post, I looked at how 3 different methods were able to identify clusters across the 10 major asset universe. Today, I want to share the impact of clustering on the larger universe. Below I examined the historical time series of number of clusters in the DOW 30 and S&P 500 indices.

I went back to the 1970 for the companies in DOW 30 index.

I went back to the 1994 for the companies in S&P 500 index.

Takeaways: The markets are changing, and correspondingly the diversification (i.e. number of clusters) goes thought cycles as can be seen in the charts. The results will vary across different methods and must be validated by the user. For example, some readers will consider an average of 10 clusters for S&P 500 as too small, while others might think that 10 clusters as sufficient.

Categories: Cluster, R

## An Example of Seasonality Analysis

Today, I want to demonstrate how easy it is to create a seasonality analysis study and produce a sample summary report. As an example study, I will use S&P Annual Performance After a Big January post by Avondale Asset Management.

The first step is to load historical prices and find Big Januaries.

```###############################################################################
# Load Systematic Investor Toolbox (SIT)
# https://systematicinvestor.wordpress.com/systematic-investor-toolbox/
###############################################################################
setInternet2(TRUE)
con = gzcon(url('http://www.systematicportfolio.com/sit.gz', 'rb'))
source(con)
close(con)

#*****************************************************************
#******************************************************************

price = getSymbols('^GSPC', src = 'yahoo', from = '1900-01-01', auto.assign = F)

# convert to monthly
price = Cl(to.monthly(price, indexAt='endof'))
ret = price / mlag(price) - 1

#*****************************************************************
# Find Januaries with return > 4%
#******************************************************************
index =  which( date.month(index(ret)) == 1 & ret > 4/100 )

# create summary table with return in January and return for the whole year
temp = c(coredata(ret),rep(0,12))
out = cbind(ret[index], sapply(index, function(i) prod(1 + temp[i:(i+11)])-1))
colnames(out) = spl('January,Year')
```

All the hard work is done now, let’s create a chart and table to summarize the S&P Annual Performance After a Big January numbers.

```	#*****************************************************************
# Create Plot
#******************************************************************
pos = barplot(100*out, border=NA, beside=T, axisnames = F, axes = FALSE,
col=col, main='Annual Return When S&P500 Rises More than 4% in January')
axis(1, at = colMeans(pos), labels = date.year(index(out)), las=2)
axis(2, las=1)
grid(NA, NULL)
abline(h= 100*mean(out\$Year), col='red', lwd=2)
plota.legend(spl('January,Annual,Average'),  c(col,'red'))

# plot table
plot.table(round(100*as.matrix(out),1))
```

That is it, we are done.

Takeaways: It is very easy to create a seasonality analysis study. Next you might want to schedule to run the study script at specific times through out the year and send you a remainder email in case the study conditions are met.

To view the complete source code for this example, please have a look at the bt.seasonality.january.test() function in bt.test.r at github.

Categories: R

## Tracking Number of Historical Clusters

In the prior post, Optimal number of clusters, we looked at methods of selecting number of clusters. Today, I want to continue with clustering theme and show historical Number of Clusters time series using these methods.

In particular, I will look at the following methods of selecting optimal number of clusters:

• Minimum number of clusters that explain at least 90% of variance
• Elbow method
• Hierarchical clustering tree cut at 1/3 height

Let’s first load historical prices for the 10 major asset classes

```###############################################################################
# Load Systematic Investor Toolbox (SIT)
# https://systematicinvestor.wordpress.com/systematic-investor-toolbox/
###############################################################################
setInternet2(TRUE)
con = gzcon(url('http://www.systematicportfolio.com/sit.gz', 'rb'))
source(con)
close(con)

#*****************************************************************
# Load historical data for ETFs
#******************************************************************

tickers = spl('GLD,UUP,SPY,QQQ,IWM,EEM,EFA,IYR,USO,TLT')

data <- new.env()
getSymbols(tickers, src = 'yahoo', from = '1900-01-01', env = data, auto.assign = T)

bt.prep(data, align='remove.na')
```

Next, I created 3 helper functions to automate cluster selection. In particular, I used following methods of selecting optimal number of clusters:

• Minimum number of clusters that explain at least 90% of variance – cluster.group.kmeans.90 function
• Elbow method – cluster.group.kmeans.elbow function
• Hierarchical clustering tree cut at 1/3 height – cluster.group.hclust function

To view the complete source code for these functions please have a look at the startegy.r at github.

Let’s use these functions on our data set every week with 250 days look-back to compute correlations.

```	#*****************************************************************
# Use following 3 methods to determine number of clusters
# * Minimum number of clusters that explain at least 90% of variance
#   cluster.group.kmeans.90
# * Elbow method
#   cluster.group.kmeans.elbow
# * Hierarchical clustering tree cut at 1/3 height
#   cluster.group.hclust
#******************************************************************

# helper function to compute portfolio allocation additional stats
portfolio.allocation.custom.stats.clusters <- function(x,ia) {
return(list(
ncluster.90 = max(cluster.group.kmeans.90(ia)),
ncluster.elbow = max(cluster.group.kmeans.elbow(ia)),
ncluster.hclust = max(cluster.group.hclust(ia))
))
}

#*****************************************************************
# Compute # Clusters
#******************************************************************
periodicity = 'weeks'
lookback.len = 250

obj = portfolio.allocation.helper(data\$prices,
periodicity = periodicity, lookback.len = lookback.len,
min.risk.fns = list(EW=equal.weight.portfolio),
custom.stats.fn = portfolio.allocation.custom.stats.clusters
)
```

Finally, the historical number of cluster time series plots for each method:

```	#*****************************************************************
# Create Reports
#******************************************************************
temp = list(ncluster.90 = 'Kmeans 90% variance',
ncluster.elbow = 'Kmeans Elbow',
ncluster.hclust = 'Hierarchical clustering at 1/3 height')

for(i in 1:len(temp)) {
hist.cluster = obj[[ names(temp)[i] ]]
title = temp[[ i ]]

plota(hist.cluster, type='l', col='gray', main=title)
plota.lines(SMA(hist.cluster,10), type='l', col='red',lwd=5)
plota.legend('Number of Clusters,10 period moving average', 'gray,red', x = 'bottomleft')
}
```

All methods selected clusters a little bit differently, as expected. The “Minimum number of clusters that explain at least 90% of variance” method seems to produce the most stable results. I would suggest looking at the larger universe (for example DOW30) and longer period of time (for example 1995-present) to evaluate these methods.

Takeaways: As I mentioned in the Optimal number of clusters post, there are many different methods to create clusters, and I have barely scratched the surface. There is also another dimension that I have not explored yet, the distance matrix. Currently, I’m using a correlation matrix as a distance measure to create clusters. I was pointed out by Matt Considine that there is an R interface to the Maximal Information-based Nonparametric Exploration (MINE) metric that can be used as a better measure of correlation.

To view the complete source code for this example, please have a look at the bt.cluster.optimal.number.historical.test() function in bt.test.r at github.

Categories: Cluster, R

## Weekend Reading – S&P 500 Visual History

Michael Johnston at the ETF Database shared a very interesting post with me over the holidays. The S&P 500 Visual History – is an interactive post that shows the top 10 components in the S&P 500 each year, going back to 1980.

On a different note, Judson Bishop contributed a plota.recession() function to add recession bars to the existing plot. The Recession dates are from National Bureau of Economic Research. Following is a simple example of plota.recession() function.

```###############################################################################
# Load Systematic Investor Toolbox (SIT)
# https://systematicinvestor.wordpress.com/systematic-investor-toolbox/
###############################################################################
setInternet2(TRUE)
con = gzcon(url('http://www.systematicportfolio.com/sit.gz', 'rb'))
source(con)
close(con)

#*****************************************************************
# Load historical data for ETFs
#******************************************************************