### Archive

Archive for October, 2011

To incorporate Math Formulas to my WordPress posts, I found following solution that works really well for me. I hope you will find it useful.

Let’s start from the beginning. I wanted to add Math Formulas to my WordPress posts and after searching WordPress support, I found LaTeX tag. It sounds simple: embed your equation’s LaTeX syntax like this $latex your-latex-code-here$.

The only problem is I don’t know LaTeX syntax. I looked around for a stand alone application that can render LaTeX with preview pane. LEd looked promising until I realized that I would need to install an additional LaTeX typesetting, like MiKTeX – a whooping 160 MB.

Instead I decided to look for online solution and CodeCogs LaTeX Equation Editor works perfectly. It has buttons at the top for different operators and formulas and more importantly it renders, very fast, the formula as you type it.

I also found following articles with LaTeX examples very useful:

Categories: Uncategorized

## Maximum Loss and Mean-Absolute Deviation risk measures

During construction of typical efficient frontier, risk is usually measured by the standard deviation of the portfolio’s return. Maximum Loss and Mean-Absolute Deviation are alternative measures of risk that I will use to construct efficient frontier. I will use methods presented in Comparative Analysis of Linear Portfolio Rebalancing Strategies: An Application to Hedge Funds by Krokhmal, P., S. Uryasev, and G. Zrazhevsky (2001) paper to construct optimal portfolios.

Let x.i, i= 1,…,n be weights of instruments in the portfolio. We suppose that j= 1,…,T scenarios of returns with equal probabilities are available. I will use historical assets returns as scenarios. Let us denote by r.ij the return of i-th asset in the scenario j. The portfolio’s Maximum Loss (page 34) can be written as

$\max_{1\leq j \leq T}\left \{ -\sum_{i=1}^{n}r_{ij}x_i \right \}$

It can be formulated as a linear programming problem

$\min_{}{}w\newline\newline-\sum_{i=1}^{n}r_{ij}x_j\leq w , j=1,...,T$

This linear programming problem can be easily implemented

min.maxloss.portfolio <- function
(
ia,		# input assumptions
constraints	# constraints
)
{
n = ia$n nt = nrow(ia$hist.returns)

# objective : maximum loss, w
f.obj = c( rep(0, n), 1)

#  - [ SUM <over i> r.ij * x.i ] < w, for each j = 1,...,T
a = rbind( matrix(0, n, nt), 1)
a[1 : n, ] = t(ia$hist.returns) constraints = add.constraints(a, rep(0, nt), '>=', constraints) # setup linear programming f.con = constraints$A
f.dir = c(rep('=', constraints$meq), rep('>=', len(constraints$b) - constraints$meq)) f.rhs = constraints$b

# find optimal solution
x = NA
sol = try(solve.LP.bounds('min', f.obj, t(f.con), f.dir, f.rhs,
lb = constraints$lb, ub = constraints$ub), TRUE)

if(!inherits(sol, 'try-error')) {
x = sol$solution[1:n] } return( x ) }  The portfolio’s Mean-Absolute Deviation (MAD) (page 33) can be written as $\frac{1}{T}\sum_{j=1}^{T}\left | \sum_{i=1}^{n}r_{ij}x_{i} - \frac{1}{T}\sum_{k=1}^{T}\sum_{i=1}^{n}r_{ik}x_{i} \right |$ It can be formulated as a linear programming problem $\min_{}{}\frac{1}{T}\sum_{j=1}^{T}(u_{j}^{+}+u_{j}^{-})\newline\newline \sum_{i=1}^{n}r_{ij}x_{i} - \frac{1}{T}\sum_{j=1}^{T}\sum_{i=1}^{n}r_{ij}x_{i}=u_{j}^{+}-u_{j}^{-}, j=1,...,T\newline\newline u_{j}^{+},u_{j}^{-} \geq 0, j=1,...,T$ This linear programming problem can be implemented min.mad.portfolio <- function ( ia, # input assumptions constraints # constraints ) { n = ia$n
nt = nrow(ia$hist.returns) # objective : Mean-Absolute Deviation (MAD) # 1/T * [ SUM (u+.j + u-.j) ] f.obj = c( rep(0, n), (1/nt) * rep(1, 2 * nt) ) # adjust constraints, add u+.j, u-.j constraints = add.variables(2 * nt, constraints, lb = 0) # [ SUM <over i> r.ij * x.i ] - 1/T * [ SUM <over j> [ SUM <over i> r.ij * x.i ] ] = u+.j - u-.j , for each j = 1,...,T a = rbind( matrix(0, n, nt), -diag(nt), diag(nt)) a[1 : n, ] = t(ia$hist.returns) - repmat(colMeans(ia$hist.returns), 1, nt) constraints = add.constraints(a, rep(0, nt), '=', constraints) # setup linear programming f.con = constraints$A
f.dir = c(rep('=', constraints$meq), rep('>=', len(constraints$b) - constraints$meq)) f.rhs = constraints$b

# find optimal solution
x = NA
sol = try(solve.LP.bounds('min', f.obj, t(f.con), f.dir, f.rhs,
lb = constraints$lb, ub = constraints$ub), TRUE)

if(!inherits(sol, 'try-error')) {
x = sol$solution[1:n] } return( x ) }  Let’s examine efficient frontiers computed under different risk measures using historical input assumptions presented in the Introduction to Asset Allocation post: ############################################################################### # Create Efficient Frontier ############################################################################### n = ia$n

# 0 <= x.i <= 0.8
constraints = new.constraints(n, lb = 0, ub = 0.8)

# SUM x.i = 1
constraints = add.constraints(rep(1, n), 1, type = '=', constraints)

# create efficient frontier(s)
ef.risk = portopt(ia, constraints, 50, 'Risk')
ef.maxloss = portopt(ia, constraints, 50, 'Max Loss', min.maxloss.portfolio)

# Plot multiple Efficient Frontiers
layout( matrix(1:4, nrow = 2) )
plot.ef(ia, list(ef.risk, ef.maxloss, ef.mad), portfolio.risk, F)
plot.ef(ia, list(ef.risk, ef.maxloss, ef.mad), portfolio.maxloss, F)

# Plot multiple Transition Maps
layout( matrix(1:4, nrow = 2) )
plot.transition.map(ef.risk)
plot.transition.map(ef.maxloss)


The Mean-Absolute Deviation and Standard Deviation risk measures are very similar by construction – they both measure average deviation, so it is not a surprise that their efficient frontiers and transition maps are close. On the other hand, the Maximum Loss measures the extreme deviation and has very different efficient frontier and transition map.

To view the complete source code for this example, please have a look at the aa.test() function in aa.test.r at github.

## Introduction to Asset Allocation

This is the first post in the series about Asset Allocation, Risk Measures, and Portfolio Construction. I will use simple and naive historical input assumptions for illustration purposes across all posts.

In these series I plan to discuss:

The plan for this post is an Introduction to Asset Allocation.  I will show how to create and visualize input assumptions, set constraints, and create Markowitz mean-variance efficient frontier.

# load Systematic Investor Toolbox
setInternet2(TRUE)
source(gzcon(url('https://github.com/systematicinvestor/SIT/raw/master/sit.gz', 'rb')))

# load historical prices from Yahoo Finance
symbols = spl('SPY,QQQ,EEM,IWM,EFA,TLT,IYR,GLD')
symbol.names = spl('S&P 500,Nasdaq 100,Emerging Markets,Russell 2000,EAFE,20 Year
Treasury,U.S. Real Estate,Gold')

getSymbols(symbols, from = '1980-01-01', auto.assign = TRUE)

# align dates for all symbols & convert to monthly
hist.prices = merge(SPY,QQQ,EEM,IWM,EFA,TLT,IYR,GLD)
month.ends = endpoints(hist.prices, 'months')
hist.prices = Cl(hist.prices)[month.ends, ]
colnames(hist.prices) = symbols

# remove any missing data
hist.prices = na.omit(hist.prices['1995::2010'])

# compute simple returns
hist.returns = na.omit( ROC(hist.prices, type = 'discrete') )


To create input assumptions, I will compute mean, standard deviation, and Pearson correlation using historical monthly returns:

# compute historical returns, risk, and correlation
ia = list()
ia$expected.return = apply(hist.returns, 2, mean, na.rm = T) ia$risk = apply(hist.returns, 2, sd, na.rm = T)
ia$correlation = cor(hist.returns, use = 'complete.obs', method = 'pearson') ia$symbols = symbols
ia$symbol.names = symbol.names ia$n = len(symbols)
ia$hist.returns = hist.returns # convert to annual, year = 12 months annual.factor = 12 ia$expected.return = annual.factor * ia$expected.return ia$risk = sqrt(annual.factor) * ia$risk # compute covariance matrix ia$risk = iif(ia$risk == 0, 0.000001, ia$risk)
ia$cov = ia$cor * (ia$risk %*% t(ia$risk))


Now its a good time to visualize input assumptions:

# visualize input assumptions
plot.ia(ia)

# display each asset in the Risk - Return plot
layout(1)
par(mar = c(4,4,2,1), cex = 0.8)
x = 100 * ia$risk y = 100 * ia$expected.return

plot(x, y, xlim = range(c(0, x)), ylim = range(c(0, y)),
xlab='Risk', ylab='Return', main='Risk vs Return', col='black')
grid();
text(x, y, symbols,	col = 'blue', adj = c(1,1), cex = 0.8)


There many problems with these input assumptions, to name a few:

• historical mean might not be a good proxy for expected returns
• weighted historical mean maybe a better choice because it puts more weight on the recent observations
• correlations are not stable
• volatility tends to cluster
• input assumptions for cash and bonds are better approximated by current yields and short-term variations

I will only use these simple and naive historical input assumptions for illustration purposes.

To create efficient frontier, I will consider portfolios with allocations to any asset class ranging between 0% and 80% and total portfolio weight equal to 100%. (I limited the maximum allocation to any asset class to 80%, just as an example)

# Create Efficient Frontier
n = ia$n # 0 <= x.i <= 0.8 constraints = new.constraints(n, lb = 0, ub = 0.8) # SUM x.i = 1 ( total portfolio weight = 100%) constraints = add.constraints(rep(1, n), 1, type = '=', constraints) # create efficient frontier consisting of 50 portfolios ef = portopt(ia, constraints, 50, 'Sample Efficient Frontier') # plot efficient frontier plot.ef(ia, list(ef))  The Transition Map displays portfolio weights as we move along the efficient frontier. I display portfolio risk along the X axis, and portfolio weights along the Y axis. The width of the slice represents the portfolio weight for the given risk level. For example, in the above Transition Map plot, the allocation to Gold (GLD – gray) was about 20% at the lower risk level and steadily grew to 80% at the higher risk level. Similarly, the allocation to Bonds (TLT – pink) was about 50% at the lower risk level and steadily shrank to 0% at the higher risk level. Finally I want to go over logic of “portopt” function that creates efficient frontier for us. The first step to create efficient frontier is to find the top,right (maximum return portfolio) and bottom,left (minimum risk portfolio). Next, I divide the return space between minimum risk portfolio and maximum return portfolio into nportfolios equally spaced points. For each point, I find minimum risk portfolio with additional constraint that portfolio return has to be equal target return for this point. The last step is to compute returns and risks for portfolio on efficient frontier. portopt <- function ( ia, # Input Assumptions constraints = NULL, # Constraints nportfolios = 50, # Number of portfolios name = 'Risk', # Name min.risk.fn = min.risk.portfolio # Risk Measure ) { # set up output out = list(weight = matrix(NA, nportfolios, ia$n))
colnames(out$weight) = ia$symbols

# find maximum return portfolio
out$weight[1, ] = max.return.portfolio(ia, constraints) # find minimum risk portfolio out$weight[nportfolios, ] = match.fun(min.risk.fn)(ia, constraints)

# find points on efficient frontier
out$return = portfolio.return(out$weight, ia)
target = seq(out$return[1], out$return[nportfolios], length.out = nportfolios)

constraints = add.constraints(ia$expected.return, target[1], type = '=', constraints) for(i in 2:(nportfolios - 1) ) { constraints$b[1] = target[i]
out$weight[i, ] = match.fun(min.risk.fn)(ia, constraints) } # compute risk / return out$return = portfolio.return(out$weight, ia) out$risk = portfolio.risk(out$weight, ia) out$name = name

return(out)
}


I will discuss a Maximum Loss risk measure and compare it to a traditional Risk, as measured by standard deviation, risk measure in the next post.

To view the complete source code for this example, please have a look at the aa.test() function in aa.test.r at github.

## Risk, Return and Analyst Ratings

Today I want to discuss a connection between Risk, Return and Analyst Ratings. Let’s start with defining our universe of stocks : 30 stocks from Dow Jones Industrial Average (^DJI) index. For each stock I will compute the number of Upgrades and Downgrades, Risk, and Return in 2010:2011. I will run a linear regression and compute correlation between the number of Upgrades and Downgrades and Risk and Return.

Let’s implement this plan using R and Systematic Investor Toolbox.

First, let’s load Systematic Investor Toolbox and quantmod package

# load Systematic Investor Toolbox
setInternet2(TRUE)
source(gzcon(url('https://github.com/systematicinvestor/SIT/raw/master/sit.gz', 'rb')))



I will get the list of stocks in Dow Jones Industrial Average (^DJI) index from Yahoo Finance:

# download Dow Jones Components
url = 'http://finance.yahoo.com/q/cp?s=^DJI+Components'

temp = extract.table.from.webpage(txt, 'Symbol', hasHeader = T)

# Symbols
Symbols = temp[, 'Symbol']


I will get the Upgrades & Downgrades History for each stock from Yahoo Finance:

# Matrix with number of Upgrades/Downgrades in 2010:2011
up.down.stats = matrix( NA, nrow = len(Symbols), ncol = 3 )
rownames(up.down.stats) = Symbols
colnames(up.down.stats) = spl('N,Return,Risk')

# Get Upgrade/Downgrade statistics and compute Risk and Return for each symbol
for( Symbol in Symbols ) {

url = paste('http://finance.yahoo.com/q/ud?s=', Symbol, sep = '')

temp = extract.table.from.webpage(txt, 'Research Firm', hasHeader = T)

event.year = format(as.Date(temp[, 'Date'], '%d-%b-%y'), '%Y')
up.down.stats[Symbol, 'N'] = sum(event.year == '2010' | event.year == '2011')

data = getSymbols(Symbol, from = '1980-01-01', auto.assign = FALSE)
returns = ROC(Cl(data['2010::2011']), type = 'discrete')
returns = na.omit(returns)

# compute basic measures of Return and Risk
up.down.stats[Symbol, 'Return'] = 252 * mean(returns)
up.down.stats[Symbol, 'Risk'] = sqrt(252) * sd(returns)
}


Let’s have a look at the data:

# sort up.down.stats by number of events
up.down.stats = up.down.stats[ order(up.down.stats[,'N']), , drop = FALSE]
up.down.stats[, spl('Return,Risk')] = round(100 * up.down.stats[, spl('Return,Risk')], 1)

# plot table
plot.table(up.down.stats)

par(mar = c(5,4,2,1), cex = 0.8)
barplot( up.down.stats[, 'N'],
xlab = 'Symbols', ylab = 'Number of Upgrades & Downgrades in 2010:2011',
names.arg = rownames(up.down.stats), las = 2 )


Let’s run a linear regression and compute correlation between the number of Upgrades and Downgrades and Risk and Return:

# run linear regression and compute correlation between number of events and Returns / Risk
for( measure in spl('Risk,Return') ) {
x = up.down.stats[, 'N']
y = up.down.stats[, measure]

# linear regression
fit = lm(y ~ x)
print(summary(fit))

par(mar = c(5,4,2,1))
plot(x, y, xlab = 'Number of of Upgrades & Downgrades', ylab = measure,
main = paste(plota.format(100 * cor(x,y), 0, '', '%') , 'correlation between', measure, 'and Number of Events'))
grid()
text(x, y, rownames(up.down.stats), col = 'blue', adj = c(1,1), cex = 0.8)
abline(coef = coef(fit), lty=2)

# compute ellipsoid at 50% confidence level
d = dataEllipse(x, y, levels = c(0.5), draw = FALSE)
lines(d, col='red', lty=3)
}


Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)  20.1766     2.3879   8.449 3.45e-09 ***
x             0.6589     0.3022   2.180   0.0378 *
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Multiple R-squared: 0.1451,     Adjusted R-squared: 0.1146
F-statistic: 4.753 on 1 and 28 DF,  p-value: 0.0378

There is a positive correlation between the number of Upgrades & Downgrades and Risk. The beta coefficient in linear regression is positive and significant at 5% confidence.

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)  14.5098     4.5533   3.187  0.00352 **
x            -1.6238     0.5763  -2.818  0.00877 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Multiple R-squared: 0.2209,     Adjusted R-squared: 0.1931
F-statistic:  7.94 on 1 and 28 DF,  p-value: 0.008769

There is a negative correlation between the number of Upgrades & Downgrades and Returns. The beta coefficient in linear regression is negative and significant at 1% confidence.

One could conclude from these observations that as the number of Upgrades & Downgrades increases the Risk goes up and Return goes down in 2010:2011 period. However, I see a few problems with this analysis:

• we examined all stocks in the same way; yet companies from different sectors might have naturally occurring different risk/return characteristics
• we treated all events in the same way; yet Upgrade/Downgrade/Initiated actions may have different influences on company’s stock price

Please tell me what else do you think is wrong with my analysis.

Categories: R

## Visualizing Tables with plot.table

October 7, 2011 1 comment

plot.table function in the Systematic Investor Toolbox is a flexible table drawing routine. plot.table has a simple interface and takes following parameters:

• plot.matrix – matrix with data you want to plot
• smain – text to draw in (top, left) cell; default value is blank string
• highlight – Either TRUE/FALSE to indicate if you want to color each cell based on its numeric value Or a matrix with colors for each cell
• colorbar – TRUE/FALSE flag to indicate if you want to draw colorbar

Here is a few examples how you can use plot.table function to create summary reports.

First, let’s load Systematic Investor Toolbox:

# load Systematic Investor Toolbox
setInternet2(TRUE)
source(gzcon(url('https://github.com/systematicinvestor/SIT/raw/master/sit.gz', 'rb')))


To create basic plot.table:

# define row and column titles
mrownames = spl('row one,row two,row 3')
mcolnames = spl('col 1,col 2,col 3,col 4')

# create temp matrix with data you want to plot
temp = matrix(NA, len(mrownames), len(mcolnames))
rownames(temp) = mrownames
colnames(temp) = mcolnames
temp[,] = matrix(1:12,3,4)

# plot temp, display current date in (top, left) cell
plot.table(temp, format(as.Date(Sys.time()), '%d %b %Y'))


To create plot.table with colorbar:

# generate 1,000 random numbers from Normal(0,1) distribution
data =  matrix(rnorm(1000), nc=10)
colnames(data) = paste('data', 1:10, sep='')

# compute Pearson correlation of data and format it nicely
temp = compute.cor(data, 'pearson')
temp[] = plota.format(100 * temp, 0, '', '%')

# plot temp with colorbar, display Correlation in (top, left) cell
plot.table(temp, smain='Correlation', highlight = TRUE, colorbar = TRUE)


Next, I want to show a more practical example of plot.table function. I want to create a report page that will display a chart of IBM for 2010:2011 and a table with Valuation Measures from Key Statistics Yahoo Finance webpage.

# Load quantmod package to download price history for IBM

Symbol = 'IBM'

data = getSymbols(Symbol, from = '1980-01-01', auto.assign = FALSE)

url = paste('http://finance.yahoo.com/q/ks?s=', Symbol, sep = '')

temp = extract.table.from.webpage(txt, 'Market Cap', hasHeader = F)

# prepare IBM data for 2010:2011 and compute 50 days moving average
y = data['2010::2011']
sma50 = SMA(Cl(y), 50)

# plote candles and volume and table
layout(c(1,1,2,3,3))

plota(y, type = 'candle', main = Symbol, plotX = F)
plota.lines(sma50, col='blue')
plota.legend(c(Symbol,'SMA 50'), 'green,blue', list(y,sma50))

y = plota.scale.volume(y)
plota(y, type = 'volume')

plot.table(temp)