Portfolio Optimization: Part I

Money is interesting. Money is weird. Money can change ones life. The word money alone triggers all different kind of feelings in us. The feelings can be completely different from person to person. One might associate it with struggle and uncertainty, others might see it as a status symbol of their lives.

At some point in life most people ought to think about their finance status and future. Being financially educated can became import to build wealth and to be financially secure in later stages of ones life. Some well-known public speakers are Toni Robbins or Dace Ramsey who teach the basics to the public about finances. Their goal is to make their advise available to many people possible and thus simplify complicated ideas an present them in a very intuitive way to a broad audience.

In this article, my goal is to convey the basics of portfolio optimization in a simple manner. I am not a financial advisor, not aiming to be and never will be. However, my growing interest in finance and my passion for Computer Science and optimization makes me writing some articles about it. The following article starts from considering just a single asset, two asserts and finally multiple assets in a portfolio.

The simplest case is to look at only one asset. For the purpose of illustration let us take a look at the Vanguard Total Stock Market (VTSMX) which is an Index Fund designed to provide investors with exposure to the entire U.S. equity market, including small-, mid-, and large-cap growth and value stocks. First, let us load the data using the yahoo Finance interface through Pandas each month for the last 15 years.

[42]:
import pandas_datareader.data as web
df = web.get_data_yahoo('VTSMX','05/01/2005', '05/01/2020',interval='m')

Whenever working with data I recommend plotting them to make sure they are reasonable and to avoid any surprises later when working with them.

[43]:
import matplotlib.pyplot as plt

month, price = df.index, df["Close"].values
plt.plot(month, price, color="blue")
plt.show()
/html/ipynb/moo-portfolio-01/_images/moo-portfolio-01_8_0.svg

Simple Moving Average (SMA)

A common way of cut down the amount of “noise” of such a chart is considering the moving averages. This helps to visualize the overall trend rather then short term volatility. The moving average can be implemented in a vectorized way using NumPy. The moving_avg creates first the indices for each window and then uses the mean function on the corresponding axes of the data set.

\begin{equation} \beta_t^{\rm (SMA)} = \frac{\sum_{k=1}^{m} v_{t – k – 1} }{m} \end{equation}

[44]:
import numpy as np

def moving_avg(v, m):
    y = []
    for k in range(1, len(v) + 1):
        i, j = max(0, k - m), k
        y.append(v[i:j].mean())
    return np.array(y)

Visualizing the moving average compared to the original prices looks as follows:

/html/ipynb/moo-portfolio-01/_images/moo-portfolio-01_14_0.svg

Weighted Moving Average (WMA)

The moving average makes the assumption that all observations in a period are considered equally important. However, it might be desired to give more importance to latter observations in the given period. This can be achieved by:

\begin{equation} \beta_t^{\rm (WMA)} = \frac{\sum_{k=1}^{m} \omega_k \cdot v_{t – k – 1} }{\sum_{k=1}^{m} \omega_k} \end{equation}

One important question is how to set the weights in each window? Obviously, more recent observation shall obtain a larger weight value. Thus, the weights need to be decayed. Commonly used decay strategy are linear and exponential decay.

\begin{align} \omega_k = k \cdot \left(\frac{1}{m}\right) \end{align}

[46]:
def linear_decay(m):
    return np.linspace(0.0, 1.0, m+1)[1:]
/html/ipynb/moo-portfolio-01/_images/moo-portfolio-01_21_0.svg

Exponential decay requires an \(\alpha\) parameter which defines how fast the decay should take place.

[48]:
def exp_decay(m, alpha):
    w = (np.ones(m) - alpha) ** np.arange(m)
    return w[::-1]

The exponential decay \(\alpha\) determines how much the impact of the current the previous prices should the current average have.

/html/ipynb/moo-portfolio-01/_images/moo-portfolio-01_25_0.svg

When \(\alpha=0\) then the average moving average is applied as before, since each entry has an equal weight. If \(\alpha=1\) then basically nothing happens since the weighted average will just take the most recent value and thus copy the price vector. For \(\alpha=0.15\) and \(\alpha=4\) the different decays are shown above. Clearly, a large value for \(\alpha\) increases the importance of the most recent observation.

[50]:
def weighted_moving_avg(v, m, weights):
    y = []

    for k in range(1, len(v) + 1):
        i, j = max(0, k - m), k

        w = weights[-(j-i):]
        w = w / w.sum()

        y.append((v[i:j] * w).sum())

    return np.array(y)
/html/ipynb/moo-portfolio-01/_images/moo-portfolio-01_28_0.svg

Exponential Weighted Moving Average (EWMA)

Instead of defining always a strict window, we can simply use the exponential weighting from the first to the current observation.

Probably an even more popular method is the Exponential Weighted Moving Average (EWMA) where the principle of exponential smoothing is used. Instead of defining a period what defines the number of observations all observations are used. However, the most recent observation always has the highest weight. The metric can easily be defined in a recursive manner:

\[\beta_1^{\rm (EWMA)} = v_{1}\]
\[\beta_t^{\rm (EWMA)} = \alpha \cdot v_{t} + (1-\alpha) \cdot \beta_{t-1}^{\rm (EWMA)}\]

Indirectly, the value of \(\alpha\) defines the impact of observations in the past. Commonly it is defined by having a period \(m\) in mind by:

\[\begin{split}\alpha = \frac{2}{(m+1)} \\\end{split}\]
[52]:
def default_alpha(m):
    return 2 / (m + 1)

The EWMA is implemented by using the its recursive definition by:

[53]:
def exp_moving_avg(v, alpha=None):
    alpha = default_alpha(period) if alpha is None else alpha

    y = [v[0]]
    for k in range(1, len(v)):
        y.append(alpha * v[k] + (1 - alpha) * y[-1])

    return np.array(y)
/html/ipynb/moo-portfolio-01/_images/moo-portfolio-01_37_0.svg

Percentage Change

This gives us an idea how the asset has behaved in the passed. Moreover, it is of interest how big the change from one time step two another – here each month – was. One way of measuring that is to compare the relative difference of each time measurement. For instance, the asset has had a value of \(v_1=31.83\) in the first and a value of \(v_2=32.64\) in the second month. Thus the relative change is given by

\[\frac{v_2 – v_1}{ v_1} = \frac{32.64 – 31.83}{31.83} = 0.02545\]

In general an implementation in Python by making use of NumPy vectors is given by

[55]:
def pct_change(v):
    _prev = v[:-1]
    _next = v[1:]
    return (_next - _prev) / _prev

pct = pct_change(price)

Now let us look at some statistics of the percentage changes. For instance, it is for sure of interest how the asset changed in average and what was the worst drop that has happened from one month to another.

The percentage changes can be visualized in each time period. In the figure below a positive percentage change is illustrated in blue and a negative one in red.

[56]:
plt.bar(month[1:], pct * (pct >= 0), width=50, color='b')
plt.bar(month[1:], pct * (pct < 0), width=50, color='r')
plt.show()
/html/ipynb/moo-portfolio-01/_images/moo-portfolio-01_45_0.svg

Moreover, the descriptive statistics in each month are an indicator how the value of the asset has behaved in the past.

[57]:
import pandas as pd
import numpy as np

np.round(pd.DataFrame(pct).describe()[0], 4)
[57]:
count    180.0000
mean       0.0061
std        0.0439
min       -0.1763
25%       -0.0151
50%        0.0108
75%        0.0334
max        0.1325
Name: 0, dtype: float64
[58]:
n, bins, patches = plt.hist(pct, bins='auto', density=True, color='#0504aa',alpha=0.75, rwidth=0.85)
plt.grid(axis='y', alpha=0.5)
plt.show()
/html/ipynb/moo-portfolio-01/_images/moo-portfolio-01_48_0.svg

A very common approach in finance is to model data using a probability distribution. The model can then be used in various ways. The model being used need to be chosen carefully because wrong assumptions initially might have a big impact in the further studies. In general, different kind of probability distributions can be used. Many studies use the Gaussian distribution because of its nice properties. Therefore, the arithmetic mean \(\mu\) and the standard deviation \(\sigma\) are estimated by:

[59]:
mu, sigma = np.mean(pct), np.std(pct)

The resulting normal distribution looks as follows:

/html/ipynb/moo-portfolio-01/_images/moo-portfolio-01_52_0.svg

Avatar

Julian Blank

Julian Blank is a PhD student in the Department of Computer Science and Engineering at Michigan State University. He received his B.Sc. in Business Information Systems from Otto von Guericke University, Germany in 2010. He was a visiting scholar for six months at the Michigan State University, Michigan, USA in 2015, and, finished his M.Sc. in Computer Science at Otto von Guericke University, Germany in 2016. He is the main developer of pymoo, an open source multi-objective optimization framework in Python. His research interests include evolutionary computation, multi-objective optimization, surrogate-assisted optimization and machine learning.

Leave a Reply

Your email address will not be published. Required fields are marked *