Capturing Intraday data

I want to follow up the Intraday data post with an example of how you can capture Intraday data without too much effort by recording 1 minute snapshots of the market.

I will take market snapshots from Yahoo Finance using following function that downloads delayed market quotes with date and time stamps:

###############################################################################
# getSymbols interface to Yahoo today's delayed qoutes
# based on getQuote.yahoo from quantmod package
###############################################################################            
getQuote.yahoo.today <- function(Symbols) {
    require('data.table')
    what = yahooQF(names = spl('Symbol,Last Trade Time,Last Trade Date,Open,Days High,Days Low,Last Trade (Price Only),Volume'))
    names = spl('Symbol,Time,Date,Open,High,Low,Close,Volume')
    
    all.symbols = lapply(seq(1, len(Symbols), 100), function(x) na.omit(Symbols[x:(x + 99)]))
    out = c()
    
    for(i in 1:len(all.symbols)) {
        # download
        url = paste('http://download.finance.yahoo.com/d/quotes.csv?s=',
            join( trim(all.symbols[[i]]), ','),
            '&f=', what[[1]], sep = '')
        
        txt = join(readLines(url),'\n') 
        data = fread(paste0(txt,'\n'), stringsAsFactors=F, sep=',')
            setnames(data,names)
            setkey(data,'Symbol')      	
      	out = rbind(out, data)
    }
    out
} 

Next we can run the getQuote.yahoo.today function from 9:30 to 16:00 every minute and record market snap shoots. Please note that you will have to make some judgement calls in terms of how you want to deal with highs and lows.

Symbols = spl('IBM,AAPL')

prev = c()
while(T) {
    out = getQuote.yahoo.today(Symbols)
	
    if (is.null(prev)) 
        for(i in 1:nrow(out)) {
	    cat(names(out), '\n', sep=',', file=paste0(out$Symbol[i],'.csv'), append=F)
	    cat(unlist(out[i]), '\n', sep=',', file=paste0(out$Symbol[i],'.csv'), append=T)					
	}
    else
        for(i in 1:nrow(out)) {
	    s0 = prev[Symbol==out$Symbol[i]]
	    s1 = out[i]
	    s1$Volume = s1$Volume - s0$Volume
	    s1$Open = s0$Close
	    s1$High = iif(s1$High > s0$High, s1$High, max(s1$Close, s1$Open))
	    s1$Low  = iif(s1$Low  < s0$Low , s1$Low , min(s1$Close, s1$Open))
	    cat(unlist(s1), '\n', sep=',', file=paste0(out$Symbol[i],'.csv'), append=T)					
        }

    # copy
    prev = out
		
    # sleep 1 minute   
    Sys.sleep(60)	
} 

For example I was able to saved following quotes for AAPL:

Symbol   Time      Date    Open    High      Low   Close  Volume
  AAPL 2:57pm 3/10/2014 528.360 533.330 528.3391 531.340 5048146
  AAPL 2:58pm 3/10/2014 531.340 531.570 531.3400 531.570    7650
  AAPL 2:59pm 3/10/2014 531.570 531.570 531.5170 531.517    2223
  AAPL 3:00pm 3/10/2014 531.517 531.517 531.4500 531.450    5283
  AAPL 3:01pm 3/10/2014 531.450 531.450 531.2900 531.290    4413
  AAPL 3:02pm 3/10/2014 531.290 531.490 531.2900 531.490    2440

Unfortunately, there is no way to go back in history, unless you buy historical intraday data. But if you want to start recording market moves yourself, following code should get you started.

Categories: R

Intraday data

In the Intraday Backtest post I showed an example of loading and working with Forex Intraday data from the FXHISTORICALDATA.COM. Recently, I came across another interesting source of Intraday data at the Bonnot Gang site. Please note that you will have to register to get access to the Intraday data; the registration is free.

Today, I want examine quality of the Intraday data from the Bonnot Gang and show how it can be integrated into Backtest using the Systematic Investor Toolbox. For the example below, please first download and save 1 minute Intraday historical data for SPX and GLD. Next let’s load and plot time series for SPX.

###############################################################################
# Load Systematic Investor Toolbox (SIT)
# http://systematicinvestor.wordpress.com/systematic-investor-toolbox/
###############################################################################
con = gzcon(url('http://www.systematicportfolio.com/sit.gz', 'rb'))
    source(con)
close(con)

	#*****************************************************************
	# Load historical data
	#****************************************************************** 
	load.packages('quantmod')	

	# data from http://thebonnotgang.com/tbg/historical-data/
	spath = 'c:/Desktop/'
	# http://stackoverflow.com/questions/14440661/dec-argument-in-data-tablefread
		Sys.localeconv()["decimal_point"]
		Sys.setlocale("LC_NUMERIC", "French_France.1252")
	
	data <- new.env()
	data$SPY = read.xts(paste0(spath,'SPY_1m.csv'), 
		sep = ';', date.column = 3, format='%Y-%m-%d %H:%M:%S', index.class = c("POSIXlt", "POSIXt"))

	data$GLD = read.xts(paste0(spath,'GLD_1m.csv'), 
		sep = ';', date.column = 3, format='%Y-%m-%d %H:%M:%S', index.class = c("POSIXlt", "POSIXt"))

	#*****************************************************************
	# Create plot for Nov 1, 2012 and 2013
	#****************************************************************** 
	layout(c(1,1,2))		
	plota(data$SPY['2012:11:01'], type='candle', main='SPY on Nov 1st, 2012', plotX = F)
	plota(plota.scale.volume(data$SPY['2012:11:01']), type = 'volume')	

	layout(c(1,1,2))		
	plota(data$SPY['2013:11:01'], type='candle', main='SPY on Nov 1st, 2013', plotX = F)
	plota(plota.scale.volume(data$SPY['2013:11:01']), type = 'volume')	

plot1

plot2

It jumps right away that the data provider had changed the time scale, in 2012 data was recorded from 9:30 to 16:00 and in 2013 data was recorded from 13:30 to 20:00.

Next, let’s check if there are any big gaps in the series Intraday.

	#*****************************************************************
	# Data check for Gaps in the series Intraday
	#****************************************************************** 
	i = 'GLD'
	dates = index(data[[i]])
	factor = format(dates, '%Y%m%d')
	gap = tapply(dates, factor, function(x) max(diff(x)))
	
	gap[names(gap[gap > 4*60])]
	data[[i]]['2013:02:19']

	i = 'SPY'
	dates = index(data[[i]])
	factor = format(dates, '%Y%m%d')
	gap = tapply(dates, factor, function(x) max(diff(x)))
	
	gap[names(gap[gap > 4*60])]
	data[[i]]['2013:02:19']	

Please see below the dates for GLD with gaps over 4 minutes

20120801   12
20121226   22
20130219   48
20130417    6
20130531    6
20130705    8
20131105    4
20131112    4
20140124   14
20140210   22
20140303    6

A detailed look at the Feb 19th, 2013 shows a 48 minute gap between 14:54 and 15:42

> data[[i]]['2013:02:19 14:50::2013:02:19 15:45']
                        open    high      low   close volume
2013-02-19 14:50:54 155.3110 155.315 155.3001 155.315   8900
2013-02-19 14:51:56 155.3100 155.310 155.3100 155.310 119900
2013-02-19 14:52:52 155.3100 155.330 155.3000 155.305 354600
2013-02-19 14:53:55 155.2990 155.300 155.2800 155.280      0
2013-02-19 14:54:54 155.2900 155.290 155.2659 155.279  10500
2013-02-19 15:42:57 155.3400 155.360 155.3400 155.350 587900
2013-02-19 15:43:57 155.3501 155.355 155.3300 155.332   8300
2013-02-19 15:44:59 155.3395 155.340 155.3200 155.340  10700
2013-02-19 15:45:55 155.3300 155.340 155.3300 155.340   5100

So there is definitely something going on with data acquisition at that time.

Next, let’s compare Intrada data with daily data:

	#*****************************************************************
	# Data check : compare with daily
	#****************************************************************** 
	data.daily <- new.env()
		quantmod::getSymbols(spl('SPY,GLD'), src = 'yahoo', from = '1970-01-01', env = data.daily, auto.assign = T)   
     
	layout(1)		
	plota(data$GLD, type='l', col='blue', main='GLD')
		plota.lines(data.daily$GLD, type='l', col='red')
	plota.legend('Intraday,Daily', 'blue,red')	
	

	plota(data$SPY, type='l', col='blue', main='SPY')
		plota.lines(data.daily$SPY, type='l', col='red')
	plota.legend('Intraday,Daily', 'blue,red')		

plot3

plot4

The Intraday data matches Daily data very well.

Please note that the raw Intraday data comes with seconds time stamp, for back-testing purposes we will also want to round date time to the nearest minute, so that we can merge the Intraday data series without introducing multiple entries for the same minute. For example:

	#*****************************************************************
	# Round to the next minute
	#****************************************************************** 
	GLD.sample = data$GLD['2012:07:10::2012:07:10 09:35']
	SPY.sample= data$SPY['2012:07:10::2012:07:10 09:35']
	
	merge( Cl(GLD.sample), Cl(SPY.sample) )
	
	# round to the next minute
	index(GLD.sample) = as.POSIXct(format(index(GLD.sample) + 60, '%Y-%m-%d %H:%M'), format = '%Y-%m-%d %H:%M')
	index(SPY.sample) = as.POSIXct(format(index(SPY.sample) + 60, '%Y-%m-%d %H:%M'), format = '%Y-%m-%d %H:%M')
	
	merge( Cl(GLD.sample), Cl(SPY.sample) )
> merge( Cl(GLD.sample), Cl(SPY.sample) )
                       close close.1
2012-07-10 09:30:59 155.0900 136.030
2012-07-10 09:31:59 155.1200 136.139
2012-07-10 09:32:58 155.1100      NA
2012-07-10 09:32:59       NA 136.180
2012-07-10 09:33:56 155.1400      NA
2012-07-10 09:33:59       NA 136.100
2012-07-10 09:34:59 155.0999 136.110
2012-07-10 09:35:59 155.0200 136.180

> merge( Cl(GLD.sample), Cl(SPY.sample) )
                       close close.1
2012-07-10 09:31:00 155.0900 136.030
2012-07-10 09:32:00 155.1200 136.139
2012-07-10 09:33:00 155.1100 136.180
2012-07-10 09:34:00 155.1400 136.100
2012-07-10 09:35:00 155.0999 136.110
2012-07-10 09:36:00 155.0200 136.180

I got an impression that these Intraday data is not really authentic, but was collected by running Intraday snap shoots of the quotes and later on processed to create one minute bars. But I might be wrong.

Next, let’s clean the Intraday data, by removing any day with time gaps over 4 minutes and let’s round all times to the nearest minute:

	#*****************************************************************
	# Clean data
	#****************************************************************** 
	# remove dates with gaps over 4 min
	for(i in ls(data)) {
		dates = index(data[[i]])
		factor = format(dates, '%Y%m%d')
		gap = tapply(dates, factor, function(x) max(diff(x)))
		data[[i]] = data[[i]][ is.na(match(factor, names(gap[gap > 4*60]))) ]
	}		
	
	common = unique(format(index(data[[ls(data)[1]]]), '%Y%m%d'))
	for(i in ls(data)) {
		dates = index(data[[i]])
		factor = format(dates, '%Y%m%d')	
		common = intersect(common, unique(factor))
	}
	
	# remove days that are not present in both time series
	for(i in ls(data)) {
		dates = index(data[[i]])
		factor = format(dates, '%Y%m%d')
		data[[i]] = data[[i]][!is.na(match(factor, common)),]
	}
		
	#*****************************************************************
	# Round to the next minute
	#****************************************************************** 
	for(i in ls(data))
		index(data[[i]]) = as.POSIXct(format(index(data[[i]]) + 60, '%Y-%m-%d %H:%M'), tz = Sys.getenv('TZ'), format = '%Y-%m-%d %H:%M')

Once Intraday data is ready, we can test a simple equal weight strategy:

	#*****************************************************************
	# Load historical data
	#****************************************************************** 
	bt.prep(data, align='keep.all', fill.gaps = T)

	prices = data$prices   
	dates = data$dates
		nperiods = nrow(prices)
	
	models = list()

	#*****************************************************************
	# Benchmarks
	#****************************************************************** 							
	data$weight[] = NA
		data$weight$SPY = 1
	models$SPY = bt.run.share(data, clean.signal=F)

	data$weight[] = NA
		data$weight$GLD = 1
	models$GLD = bt.run.share(data, clean.signal=F)
	
	data$weight[] = NA
		data$weight$SPY = 0.5
		data$weight$GLD = 0.5
	models$EW = bt.run.share(data, clean.signal=F)

	
    #*****************************************************************
    # Create Report
    #******************************************************************    
    strategy.performance.snapshoot(models, T)	

plot5

In this post, I tried to outline the basic steps you need to take if you are planning to work with a new data source. Next, I plan to follow with more examples of testing Intraday strategies.

To view the complete source code for this example, please have a look at the bt.intraday.thebonnotgang.test() function in bt.test.r at github.

Categories: Backtesting, R

Probabilistic Momentum

February 17, 2014 13 comments

David Varadi has recently discussed an interesting strategy in the
Are Simple Momentum Strategies Too Dumb? Introducing Probabilistic Momentum post. David also provided the Probabilistic Momentum Spreadsheet if you are interested in doing computations in Excel. Today I want to show how you can test such strategy using the Systematic Investor Toolbox:

###############################################################################
# Load Systematic Investor Toolbox (SIT)
# http://systematicinvestor.wordpress.com/systematic-investor-toolbox/
###############################################################################
setInternet2(TRUE)
con = gzcon(url('http://www.systematicportfolio.com/sit.gz', 'rb'))
    source(con)
close(con)
	#*****************************************************************
	# Load historical data
	#****************************************************************** 
	load.packages('quantmod')
		
	tickers = spl('SPY,TLT')
		
	data <- new.env()
	getSymbols(tickers, src = 'yahoo', from = '1980-01-01', env = data, auto.assign = T)
		for(i in ls(data)) data[[i]] = adjustOHLC(data[[i]], use.Adjusted=T)
	bt.prep(data, align='remove.na', dates='2005::')
 
	
	#*****************************************************************
	# Setup
	#****************************************************************** 
	lookback.len = 60
	
	prices = data$prices
	
	models = list()
	
	#*****************************************************************
	# Simple Momentum
	#****************************************************************** 
	momentum = prices / mlag(prices, lookback.len)
	data$weight[] = NA
		data$weight$SPY[] = momentum$SPY > momentum$TLT
		data$weight$TLT[] = momentum$SPY <= momentum$TLT
	models$Simple  = bt.run.share(data, clean.signal=T) 	

The Simple Momentum strategy invests into SPY if SPY’s momentum if greater than TLT’s momentum, and invests into TLT otherwise.

	#*****************************************************************
	# Probabilistic Momentum
	#****************************************************************** 
	confidence.level = 60/100
	ret = prices / mlag(prices) - 1 

	ir = sqrt(lookback.len) * runMean(ret$SPY - ret$TLT, lookback.len) / runSD(ret$SPY - ret$TLT, lookback.len)
	momentum.p = pt(ir, lookback.len - 1)
		
	data$weight[] = NA
		data$weight$SPY[] = iif(cross.up(momentum.p, confidence.level), 1, iif(cross.dn(momentum.p, (1 - confidence.level)), 0,NA))
		data$weight$TLT[] = iif(cross.dn(momentum.p, (1 - confidence.level)), 1, iif(cross.up(momentum.p, confidence.level), 0,NA))
	models$Probabilistic  = bt.run.share(data, clean.signal=T) 	

The Probabilistic Momentum strategy is using Probabilistic Momentum measure and Confidence Level to decide on allocation. Strategy invests into SPY if SPY vs TLT Probabilistic Momentum is above Confidence Level and invests into TLT is SPY vs TLT Probabilistic Momentum is below 1 – Confidence Level.

To make Strategy a bit more attractive, I added a version that can leverage SPY allocation by 50%

	#*****************************************************************
	# Probabilistic Momentum + SPY Leverage 
	#****************************************************************** 
	data$weight[] = NA
		data$weight$SPY[] = iif(cross.up(momentum.p, confidence.level), 1, iif(cross.up(momentum.p, (1 - confidence.level)), 0,NA))
		data$weight$TLT[] = iif(cross.dn(momentum.p, (1 - confidence.level)), 1, iif(cross.up(momentum.p, confidence.level), 0,NA))
	models$Probabilistic.Leverage = bt.run.share(data, clean.signal=T) 	

	#*****************************************************************
	# Create Report
	#******************************************************************    
	strategy.performance.snapshoot(models, T)

plot1

The back-test results look very similar to the ones reported in the Are Simple Momentum Strategies Too Dumb? Introducing Probabilistic Momentum post.

However, I was not able to exactly reproduce the transition plots. Looks like my interpretation is producing more whipsaw when desired.

	#*****************************************************************
	# Visualize Signal
	#******************************************************************        
	cols = spl('steelblue1,steelblue')
	prices = scale.one(data$prices)

	layout(1:3)
	
	plota(prices$SPY, type='l', ylim=range(prices), plotX=F, col=cols[1], lwd=2)
	plota.lines(prices$TLT, type='l', plotX=F, col=cols[2], lwd=2)
		plota.legend('SPY,TLT',cols,as.list(prices))

	highlight = models$Probabilistic$weight$SPY > 0
		plota.control$col.x.highlight = iif(highlight, cols[1], cols[2])
	plota(models$Probabilistic$equity, type='l', plotX=F, x.highlight = highlight | T)
		plota.legend('Probabilistic,SPY,TLT',c('black',cols))
				
	highlight = models$Simple$weight$SPY > 0
		plota.control$col.x.highlight = iif(highlight, cols[1], cols[2])
	plota(models$Simple$equity, type='l', plotX=T, x.highlight = highlight | T)
		plota.legend('Simple,SPY,TLT',c('black',cols))	

plot2

David thank you very much for sharing your great ideas. I would encourage readers to play with this strategy and report back.

To view the complete source code for this example, please have a look at the bt.probabilistic.momentum.test() function in bt.test.r at github.

Weekend Reading: F-Squared

December 7, 2013 7 comments

Mebane Faber posted another interesting blog post: Building a Simple Sector Rotation on Momentum and Trend that caught my interest. Today I want to show how you can test such strategy using the Systematic Investor Toolbox:

###############################################################################
# Load Systematic Investor Toolbox (SIT)
# http://systematicinvestor.wordpress.com/systematic-investor-toolbox/
###############################################################################
setInternet2(TRUE)
con = gzcon(url('http://www.systematicportfolio.com/sit.gz', 'rb'))
    source(con)
close(con)

	#*****************************************************************
	# Load historical data
	#******************************************************************    
	load.packages('quantmod')		
	
	data = new.env()
	# load historical market returns
	temp = get.fama.french.data('F-F_Research_Data_Factors', periodicity = '',download = T, clean = T)
		ret = cbind(temp[[1]]$Mkt.RF + temp[[1]]$RF, temp[[1]]$RF)
		price = bt.apply.matrix(ret / 100, function(x) cumprod(1 + x))
	data$SPY = make.stock.xts( price$Mkt.RF )
	data$SHY = make.stock.xts( price$RF )
	
	# load historical sector returns
	temp = get.fama.french.data('10_Industry_Portfolios', periodicity = '',download = T, clean = T)		
		ret = temp[[1]]
		price = bt.apply.matrix(ret[,1:9] / 100, function(x) cumprod(1 + x))
	for(n in names(price)) data[[n]] = make.stock.xts( price[,n] )
	
	# align dates
	data$symbolnames = c(names(price), 'SHY', 'SPY')
	bt.prep(data, align='remove.na', dates='2000::')

	# back-test dates
	bt.dates = '2001:04::'

	#*****************************************************************
	# Setup
	#****************************************************************** 	
	prices = data$prices  
	n = ncol(data$prices)
		
	models = list()
	
	#*****************************************************************
	# Benchmark Strategies
	#****************************************************************** 			
	data$weight[] = NA
		data$weight$SPY[1] = 1
	models$SPY = bt.run.share(data, clean.signal=F, dates=bt.dates)
			
	weight = prices
		weight$SPY = NA
		weight$SHY = NA
	
	data$weight[] = NA
		data$weight[] = ntop(weight[], n)
	models$EW = bt.run.share(data, clean.signal=F, dates=bt.dates)
	
	#*****************************************************************
	# Code Strategies
	# http://www.mebanefaber.com/2013/12/04/square-root-of-f-squared/
	#****************************************************************** 			
	sma = bt.apply.matrix(prices, SMA, 10)
	
	# create position score
	position.score = sma
	position.score[ prices < sma ] = NA
		position.score$SHY = NA	
		position.score$SPY = NA	
	
	# equal weight allocation
	weight = ntop(position.score[], n)	
	
	# number of invested funds
	n.selected = rowSums(weight != 0)
	
	# cash logic
	weight$SHY[n.selected == 0,] = 1
	
	weight[n.selected == 1,] = 0.25 * weight[n.selected == 1,]
	weight$SHY[n.selected == 1,] = 0.75
	
	weight[n.selected == 2,] = 0.5 * weight[n.selected == 2,]
	weight$SHY[n.selected == 2,] = 0.5
	
	weight[n.selected == 3,] = 0.75 * weight[n.selected == 3,]
	weight$SHY[n.selected == 3,] = 0.25
	
	# cbind(round(100*weight,0), n.selected)	
	
	data$weight[] = NA
		data$weight[] = weight
	models$strategy1 = bt.run.share(data, clean.signal=F, dates=bt.dates)	
	
	#*****************************************************************
	# Create Report
	#******************************************************************       	
	strategy.performance.snapshoot(models, one.page = T)

plot1

Mebane thank you very much for sharing your great ideas. I would encourage readers to play with this strategy and report back.

Please note that I back-tested the strategy using the monthly observations. The strategy’s draw-down is around 17% using monthly data. If we switch to the daily data, the strategy’s draw-down goes to around 22%. There was one really bad month in 2002.

To view the complete source code for this example, please have a look at the bt.mebanefaber.f.squared.test() function in bt.test.r at github.

Averaged Input Assumptions and Momentum

December 5, 2013 5 comments

Today I want to share another interesting idea contributed by Pierre Chretien. Pierre suggested using Averaged Input Assumptions and Momentum to create reasonably quiet strategy. The averaging techniques are used to avoid over-fitting any particular frequency.

To create Averaged Input Assumptions we combine returns over different look-back periods, giving more weight to the recent returns, to form overall Input Assumptions.

create.ia.averaged <- function(lookbacks, n.lag) {
	lookbacks = lookbacks
	n.lag = n.lag

	function(hist.returns, index=1:ncol(hist.returns), hist.all)
	{	
		nperiods = nrow(hist.returns)
		
		temp = c()
		for (n.lookback in lookbacks) 
			temp = rbind(temp, hist.returns[(nperiods - n.lookback - n.lag + 1):(nperiods - n.lag), ])
		create.ia(temp, index, hist.all)
	}	
}

To create Averaged Momentum we take a look-back weighted avaerage of momentums computed over different look-back periods.

momentum.averaged <- function(prices, 
	lookbacks = c(20,60,120,250) ,	# length of momentum look back
	n.lag = 3
) {
	momentum = 0 * prices
	for (n.lookback in lookbacks) {
		part.mom = mlag(prices, n.lag) / mlag(prices, n.lookback + n.lag) - 1
		momentum = momentum + 252 / n.lookback * part.mom
	}
	momentum / len(lookbacks)
}

Next let’s compare using historical Input Assumptions vs Averaged Input Assumptions and Momentum vs Averaged Momentum. I will consider Absolute Momentum (not cross sectional), for more information about relative and absolute momentum, please see

###############################################################################
# Load Systematic Investor Toolbox (SIT)
# http://systematicinvestor.wordpress.com/systematic-investor-toolbox/
###############################################################################
setInternet2(TRUE)
con = gzcon(url('http://www.systematicportfolio.com/sit.gz', 'rb'))
    source(con)
close(con)
 
	#*****************************************************************
	# Load historical data
	#****************************************************************** 
	load.packages('quantmod')
		
	# 10 funds
	tickers = spl('Us.Eq = VTI + VTSMX,
	Eurpoe.Eq = IEV + FIEUX,
	Japan.Eq = EWJ + FJPNX,
	Emer.Eq = EEM + VEIEX,
	Re = RWX + VNQ + VGSIX,		
	Com = DBC + QRAAX,
	Gold = GLD + SCGDX,
	Long.Tr = TLT + VUSTX,
	Mid.Tr = IEF + VFITX,
	Short.Tr = SHY + VFISX') 
	
	start.date = 1998
	
	dates = paste(start.date,'::',sep='') 
	
	data <- new.env()
	getSymbols.extra(tickers, src = 'yahoo', from = '1980-01-01', env = data, set.symbolnames = T, auto.assign = T)
		for(i in data$symbolnames) data[[i]] = adjustOHLC(data[[i]], use.Adjusted=T)
	bt.prep(data, align='keep.all', dates=paste(start.date-2,':12::',sep=''), fill.gaps = T)

	#*****************************************************************
	# Setup
	#****************************************************************** 		
	prices = data$prices   
		n = ncol(prices)
		nperiods = nrow(prices)
		
	periodicity = 'months'
	period.ends = endpoints(prices, periodicity)
		period.ends = period.ends[period.ends > 0]
		
	max.product.exposure = 0.6	
	
	#*****************************************************************
	# Input Assumptions
	#****************************************************************** 	
	lookback.len = 40
	create.ia.fn = create.ia
	
	# input assumptions are averaged on 20, 40, 60 days using 1 day lag
	ia.array = c(20,40,60)
	avg.create.ia.fn = create.ia.averaged(ia.array, 1)

	#*****************************************************************
	# Momentum
	#****************************************************************** 	
	universe = prices > 0
	
	mom.lookback.len = 120	
	momentum = prices / mlag(prices, mom.lookback.len) - 1
	mom.universe = ifna(momentum > 0, F)
	
	# momentum is averaged on 20,60,120,250 days using 3 day lag
	mom.array = c(20,60,120,250)	
	avg.momentum = momentum.averaged(prices, mom.array, 3)
	avgmom.universe = ifna(avg.momentum > 0, F)

	#*****************************************************************
	# Algos
	#****************************************************************** 	
	min.risk.fns = list(
		EW = equal.weight.portfolio,
		MV = min.var.portfolio,
		MCE = min.corr.excel.portfolio,
				
		MV.RSO = rso.portfolio(min.var.portfolio, 3, 100, const.ub = max.product.exposure),
		MCE.RSO = rso.portfolio(min.corr.excel.portfolio, 3, 100, const.ub = max.product.exposure)
	)

	#*****************************************************************
	# Code Strategies
	#****************************************************************** 	
make.strategy.custom <- function(name, create.ia.fn, lookback.len, universe, env) {
	obj = portfolio.allocation.helper(data$prices, 
		periodicity = periodicity,
		universe = universe,
		lookback.len = lookback.len,
		create.ia.fn = create.ia.fn,
		const.ub = max.product.exposure,
		min.risk.fns = min.risk.fns,
		adjust2positive.definite = F
	)
	env[[name]] = create.strategies(obj, data, prefix=paste(name,'.',sep=''))$models
}


	models <- new.env()	
	make.strategy.custom('ia.none'        , create.ia.fn    , lookback.len, universe       , models)
	make.strategy.custom('ia.mom'         , create.ia.fn    , lookback.len, mom.universe   , models)
	make.strategy.custom('ia.avg_mom'     , create.ia.fn    , lookback.len, avgmom.universe, models)
	make.strategy.custom('avg_ia.none'    , avg.create.ia.fn, 252         , universe       , models)
	make.strategy.custom('avg_ia.mom'     , avg.create.ia.fn, 252         , mom.universe   , models)
	make.strategy.custom('avg_ia.avg_mom' , avg.create.ia.fn, 252         , avgmom.universe, models)
	
	#*****************************************************************
	# Create Report
	#*****************************************************************		
strategy.snapshot.custom <- function(models, n = 0, title = NULL) {
	if (n > 0)
		models = models[ as.vector(matrix(1:len(models),ncol=n, byrow=T)) ]	

	layout(1:3)	
	plotbt(models, plotX = T, log = 'y', LeftMargin = 3, main = title)
		mtext('Cumulative Performance', side = 2, line = 1)
	plotbt.strategy.sidebyside(models)
	barplot.with.labels(sapply(models, compute.turnover, data), 'Average Annual Portfolio Turnover', T)	
}

	# basic vs basic + momentum => momentum filter has better results
	models.final = c(models$ia.none, models$ia.mom)
	strategy.snapshot.custom(models.final, len(min.risk.fns), 'Momentum Filter')

	# basic vs basic + avg ia => averaged ia reduce turnover
	models.final = c(models$ia.none, models$avg_ia.none)
	strategy.snapshot.custom(models.final, len(min.risk.fns), 'Averaged Input Assumptions')

	# basic + momentum vs basic + avg.momentum => mixed results for averaged momentum
	models.final = c(models$ia.mom, models$ia.avg_mom)
	strategy.snapshot.custom(models.final, len(min.risk.fns), 'Averaged Momentum')

	# basic + momentum vs avg ia + avg.momentum
	models.final = c(models$ia.mom, models$avg_ia.avg_mom)
	strategy.snapshot.custom(models.final, len(min.risk.fns), 'Averaged vs Base')	

Above, I compared results for the following 4 cases:
1. Adding Momentum filter: all algos perfrom better
plot3

2. Input Assumptions vs Averaged Input Assumptions: returns are very similar, but using Averaged Input Assumptions helps reduce portfolio turnover.
plot2

3. Momentum vs Averaged Momentum: returns are very similar, but using Averaged Momentum increases portfolio turnover.
plot1

4. historical Input Assumptions + Momentum vs Averaged Input Assumptions + Averaged Momentum: results are mixed, no consistent advantage of using Averaged methods
plot4

Overall, the Averaged methods is a very interesting idea and I hope you will experiemtn with it and share your findings, like Pierre. Pierre, again thank you very much for sharing.

The full source code and example for the bt.averaged.test() function is available in bt.test.r at github.

Fast Threshold Clustering Algorithm (FTCA) test

Today I want to share the test and implementation for the Fast Threshold Clustering Algorithm (FTCA) created by David Varadi. This implementation was developed and contributed by Pierre Chretien, I only made minor updates.

Let’s first replicate the results from the Fast Threshold Clustering Algorithm (FTCA) post:

###############################################################################
# Load Systematic Investor Toolbox (SIT)
# http://systematicinvestor.wordpress.com/systematic-investor-toolbox/
###############################################################################
setInternet2(TRUE)
con = gzcon(url('http://www.systematicportfolio.com/sit.gz', 'rb'))
    source(con)
close(con)

	#*****************************************************************
	# Load historical data for ETFs
	#****************************************************************** 
	load.packages('quantmod')

	tickers = spl('XLY,XLP,XLE,XLF,XLV,XLI,XLB,XLK,XLU')
	
	data <- new.env()
	getSymbols(tickers, src = 'yahoo', from = '1900-01-01', env = data, auto.assign = T)
		for(i in ls(data)) data[[i]] = adjustOHLC(data[[i]], use.Adjusted=T)		
	bt.prep(data, align='keep.all')

	#*****************************************************************
	# Helper function to compute portfolio allocation additional stats
	#****************************************************************** 
	portfolio.allocation.custom.stats.clusters <- function(x,ia) {
		return(list(
			clusters.FTCA = cluster.group.FTCA(0.5)(ia)			
		))
	}
	
	#*****************************************************************
	# Find clusters
	#****************************************************************** 		
	periodicity = 'months'
	lookback.len = 252
		
	obj = portfolio.allocation.helper(data$prices, 
		periodicity = periodicity, lookback.len = lookback.len,
		min.risk.fns = list(EW=equal.weight.portfolio),
		custom.stats.fn = portfolio.allocation.custom.stats.clusters
	) 			
	
	clusters = obj$clusters.FTCA$EW	
	clusters['2012:05::']

The clusters are stable and match David’s results

           XLB XLE XLF XLI XLK XLP XLU XLV XLY
2012-05-31   1   1   1   1   1   1   1   1   1
2012-06-29   1   1   1   1   1   1   1   1   1
2012-07-31   1   1   1   1   1   1   1   1   1
2012-08-31   1   1   1   1   1   1   1   1   1
2012-09-28   1   1   1   1   1   1   1   1   1
2012-10-31   1   1   1   1   1   1   1   1   1
2012-11-30   2   2   2   2   2   2   1   2   2
2012-12-31   2   2   2   2   2   2   1   2   2
2013-01-31   2   2   2   2   2   2   1   2   2
2013-02-28   1   1   1   1   1   1   1   1   1
2013-03-28   1   1   1   1   1   1   1   1   1
2013-04-30   1   1   1   1   1   1   1   1   1
2013-05-31   1   1   1   1   1   1   1   1   1
2013-06-28   1   1   1   1   1   1   1   1   1
2013-07-31   1   1   1   1   1   1   1   1   1
2013-08-30   1   1   1   1   1   1   1   1   1
2013-09-30   1   1   1   1   1   1   1   1   1
2013-10-31   1   1   1   1   1   1   1   1   1
2013-11-26   1   1   1   1   1   1   1   1   1

Next let’s compare the Cluster Portfolio Allocation Algorithm using K-means and FTCA:

	#*****************************************************************
	# Code Strategies
	#****************************************************************** 					
	obj = portfolio.allocation.helper(data$prices, 
		periodicity = periodicity, lookback.len = lookback.len, 
		min.risk.fns = list(
			C.EW.kmeans = distribute.weights(equal.weight.portfolio, cluster.group.kmeans.90),
			C.EW.FTCA = distribute.weights(equal.weight.portfolio, cluster.group.FTCA(0.5))			
		)
	)
	
	models = create.strategies(obj, data)$models
						
	#*****************************************************************
	# Create Report
	#******************************************************************    
	barplot.with.labels(sapply(models, compute.turnover, data), 'Average Annual Portfolio Turnover')

Both clustering algorithms produced very similar results. One noticeable difference is turnover. Since the Fast Threshold Clustering Algorithm (FTCA) produced more stable groups, it had smaller turnover.

plot1

The full source code and example for the cluster.group.FTCA() function is available in strategy.r at github.

Categories: Asset Allocation, Cluster, R

getSymbols Extra

November 26, 2013 2 comments

The getSymbols function from the quantmod package is an easy and convenient way to bring historical stock prices into your R environment. You need to specify the list of tickers, the source of historical prices and dates. For example following commands will download historical stock prices from yahoo finance for ‘RWX’, ‘VNQ’, ‘VGSIX’ symbols:

	data <- new.env()
	getSymbols.extra(c('RWX','VNQ','VGSIX'), src = 'yahoo', from = '1980-01-01', env = data, auto.assign = T)

Now, the data environment contains the historical stock prices. For example, create plot for RWX using the code below:

	plot(data$RWX)

Sometimes, I find that getSymbols functionality can be extended. For example, it would be nice to rename the series. I.e. RWX is a real estate ETF, so we could ask getSymbols function to get RWX, but call output Real.Estate. Another useful feature would be ability to specify how to extend the data. I.e. RWX is only started trading in 2007, it would be convenient to extend RWX time series with VNQ and VGSIX.

I created the getSymbols.extra function to address these features. The getSymbols.extra function allows you to specify tickers in the following format:

  • RWX : the original functionality. i.e get historical stock prices for RWX and it can be accessed with data$RWX
  • Real.Estate = RWX : get historical stock prices for RWX, rename it to Real.Estate, and it can be accessed with data$Real.Estate
  • RWX + VNQ + VGSIX : get historical stock prices for RWX, VNQ, VGSIX and extend RWX with VNQ, next extend it with VGSIX, and it can be accessed with data$RWX
  • Real.Estate = RWX + VNQ + VGSIX : mix and match above functionality

Let’s look at the example:

###############################################################################
# Load Systematic Investor Toolbox (SIT)
# http://systematicinvestor.wordpress.com/systematic-investor-toolbox/
###############################################################################
setInternet2(TRUE)
con = gzcon(url('http://www.systematicportfolio.com/sit.gz', 'rb'))
    source(con)
close(con)

	tickers = spl('REIT=RWX, RWX+VNQ, REIT.LONG=RWX+VNQ+VGSIX')
	data <- new.env()
		getSymbols.extra(tickers, src = 'yahoo', from = '1980-01-01', env = data, auto.assign = T)
	bt.start.dates(data)
	
	data$symbolnames = spl('REIT.LONG,RWX,REIT')
		for(i in data$symbolnames) data[[i]] = adjustOHLC(data[[i]], use.Adjusted=T)
	bt.prep(data, align='keep.all') 	
   
	plota.matplot(data$prices)

plot1
The REIT.LONG is RWX extended using VNQ and next using VGSIX.

There is also a possibility to provide custom data into the getSymbols.extra function. I.e. the data which is not available through the getSymbols function.
For example, to extend GLD time series we might used historical Gold prices from Bundes Bank:

	# Use extrenal data
	raw.data <- new.env()
		raw.data$GOLD = bundes.bank.data.gold()
	
	tickers = spl('GLD, GLD.LONG=GLD+GOLD')
	data <- new.env()
		getSymbols.extra(tickers, src = 'yahoo', from = '1980-01-01', env = data, raw.data = raw.data, auto.assign = T)
	bt.start.dates(data)
	data$symbolnames = spl('GLD.LONG,GLD')
   		for(i in data$symbolnames) data[[i]] = adjustOHLC(data[[i]], use.Adjusted=T)
	bt.prep(data, align='keep.all') 	
   
	plota.matplot(data$prices)	

plot2

The main goal of the getSymbols.extra function is to simplify your data acquisition/preparation step. Please let me know if you run into any problems or have suggestions.

To view the complete source code for this example, please have a look at the getSymbols.extra.test() function in utils.r at github.

Categories: R
Follow

Get every new post delivered to your Inbox.

Join 242 other followers