My two most recent blog posts were about Scaling Analytical Insights with Python; part 1 can be foundhereand part 2 can be foundhere. It has been several months since I wrote those, largely due to the fact that I relocated my family to Seattle to join Amazon in November; Ive spent most of the time on my primary project determining our global rollout plan and related business intelligence roadmap.

Prior to my departure at my former company, FloSports, we were in the process of overhauling our analytics reporting across the organization (data, marketing, product et al), and part of this overhaul included our financial reporting. While I left early on in that implementation, over the past several months Ive continued using Python extensively for financial analyses, particularlypandas. In this post, I will share how I leveraged some very helpful online resources, theYahoo Finance API(requires a work around and may require a future data source replacement), andJupyter notebookto largely automate the tracking and benchmarking of a stock portfolios performance.

As a quick background, I have been investing in my own stock portfolio since 2002 and developed a financial model for my portfolio a number of years ago. For years, I would download historical prices and load the data into the financial model while online brokers calculate realized and unrealized returns, as well as income and dividends, I like to have historical data in the model as I conduct my own analyses to evaluate positions. One view / report which Ive never found from online brokers and services is aPublic Market Equivalent-like analysis. In short, the Public Market Equivalent (PME) is a set of analyses used in the private equity industry to compare the performance of a private equity fund relative to an industry benchmark. Much more detailhere.

Related, the vast majority of equity portfolio managers are unable to select a portfolio of stocks which outperforms the broader market, e.g., S&P 500, over the long-term (~1 in 20 actively managed domestic funds beat index funds). Even when some individual stocks outperform, the underperformance of others often outweighs the better performing stocks, meaning overall an investor is worse off than simply investing in an index fund. During business school I learned about PME, and I incorporated a conceptually similar analysis into the evaluation of my current public equity holdings. To do this properly, you should measure the timing of investment inflows specific to each portfolio position (holding periods) relative to an S&P 500 equivalent dollar investment over the identical holding period. As an example, if you bought a stock on 6/1/2016 and you still own it, you would want to compare the stocks return over that period to the return of an equal dollar investment on 6/1/2016 in the S&P 500 (our benchmark example). Among other things, you may find that even if a stock has done relatively well it may still trail the S&P 500s return over the same time period.

In the past, I downloaded historical price data from Yahoo Finance and used INDEX and MATCH functions in excel to calculate the relative holding period performance of each position versus the S&P 500. While this is an OK way to accomplish this goal, conducting the same usingpandasin Jupyter notebook is more scalable and extensible. Whenever you download new data and load into excel, you inevitably need to modify some formulas and validate for errors. Usingpandas, adding new calculations, such as a cumulative ROI multiple (which Ill cover), takes almost no time to implement. And the visualizations, for which I usePlotly, are highly reproducible and much more useful in generating insights.

Disclosure:Nothing in this post should be considered investment advice. Past performance is not necessarily indicative of future returns. These are general examples about how to import data using pandas for a small sample of stocks across different time intervals and to benchmark their individual performance against an index. You should direct all investment related questions that you have to your financial advisor.

In addition to contributing this tutorial, Im continuing to revise and build upon this approach, and I outline some considerations for further development at the end of this post. I believe this post will be helpful for novice to intermediate-level data science oriented finance professionals, especially since this should extend to many other types of financial analyses.This approach is PME-like in the sense thats its measuring investment inflows over equal holding periods. As public market investments are much more liquid than private equity, and presuming you follow a trailing stop approach, from my perspective its more important to focus on active holdings its generally advisable to divest holdings which underperform a benchmark or which you no longer want to own for various reasons, while I take a long-term view and am happy to own outperforming stocks for as long as theyll have me.

I am a current DataCamp subscriber (future post forthcoming on DataCamp) andthis community tutorialon Python for Finance is great.

I have createda repo for this postincluding the Python notebookhere, and the excel filehere.

If you want to see the full interactive version (because Jupyter GitHub integration is awesome), you can view using nbviewerhere.

Import S&P 500 and sample ticker data, using the Yahoo Finance API

Create a merged portfolio master file which combines the sample portfolio dataframe with the historical ticker and historical S&P 500 data

Determine what the S&P 500 close was on the date of acquisition of each investment, which allows us to calculate the S&P 500 equivalent share position with the same dollars invested

Calculate the relative % and dollar value returns for the portfolio positions versus S&P 500 returns over that time

Calculate cumulative portfolio returns and ROI multiple, in order to assess how well this example portfolio compared to a market index

One of the more important items: dynamically calculate how each position is doing relative to a trailing stop, e.g., if a position closes 25% below its closing high, consider selling the position on the next trading day.

Total Return Comparisons % return of each position relative to index benchmark

Cumulative Returns Over Time $ Gain / (Loss) of each position relative to benchmark

Cumulative Investments Over Time given the above, how do the overall investment returns compare to the equal weighting and time period of S&P 500 investments?

Adjusted Close % off of High Comparison what is each positions most recent close relative to its adjusted closing high since purchased?

You will begin by importing the necessary Python libraries, import thePlotlyoffline module, and read in our sample portfolio dataframe.

Imports in order to be able to use Plotly offline.

from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot

Import the Sample worksheet with acquisition dates and initial cost basis:

read_excel(Sample stocks acquisition dates_costs.xlsx, sheetname

Now that you have read in the sample portfolio file, youll create a few variables which capture the date ranges for the S&P 500 and all of the portfolios tickers. Note that this is one of the few aspects of this notebook which requires an update each week (adjust the date range to include the most recent trading week here, we are running this off of prices through 3/9/2018).

The below will pull back stock prices from the start date until end date specified.

These are separate if for some reason want different date range than SP.

As mentioned in the Python Finance training post, thepandas-datareaderpackage enables us to read in data from sources like Google, Yahoo! Finance and the World Bank. Here Ill focus on Yahoo! Finance, although Ive worked very preliminarily with Quantopian and have also begun looking intoquandlas a data source. As also mentioned in the DataCamp post, the Yahoo API endpoint recently changed and this requires the installation of a temporary fix in order for Yahoo! Finance to work. Ive made this needed slight adjustment in the code below. I have noticed some minor data issues where the data does not always read in as expected, or the last trading day is sometimes missing. While these issues have been relatively infrequent, Im continuing to monitor whether or not Yahoo! Finance will be the best and most reliable data source going forward.

If youre following along with your own notebook, you should see something like the below once youve successfully read in the data from Yahoos API:

After loading in the S&P 500 data, youll see that I inspect the head and tail of the dataframe, as well as condense the dataframe to only include theAdj Closecolumn. The difference between theAdjusted Closeand theClosecolumns is that an adjusted close reflects dividends (see future areas for development below). When a company issues a dividend, the share price is reduced by the size of the dividend per share, as the company is distributing a portion of the companys earnings. For purposes of this analysis, you will only need to analyze this column. I also create a dataframe which only includes the S&Ps adjusted close on the last day of 2017 (start of 2018); this is in order to run YTD comparisons of individual tickers relative to the S&P 500s performance.

In the below code, you create an array of all of the tickers in our sample portfolio dataframe. You then write a function to read in all of the tickers and their relevant data into a new dataframe, which is essentially the same approach you took for the S&P500 but applied to all of the portfolios tickers.

As with the S&P 500 dataframe, youll create anadj_closedataframe which only has theAdj Closecolumn for all of your stock tickers. If you look at the notebook in the repo I link to above, this code is chunked out in more code blocks than shown below. For purposes of describing this here, Ive included below all of the code which leads up to our initialmerged_portfoliodataframe.

Also only pulling the ticker, date and adj. close columns for our tickers.

Grabbing the ticker close from the end of last year

Set portfolio index prior to merging with the adj close latest.

Merge the portfolio dataframe with the adj close dataframe; they are being joined by their indexes.

merge(portfolio_df, adj_close_latest, left_index

The below creates a new column which is the ticker return; takes the latest adjusted close for each position

and divides that by the initial share cost.

Depending on your level of familiarity withpandas, this will be very straightforward to slightly overwhelming. Below, Ill unpack what these lines are doing:

The overall approach you are taking is an example ofsplit-apply-combine(note this downloads a PDF).

line creates a new dataframe with only the columns provided in the list; here


column equals the date which you earlier specified in the

dataframes. I did this because this is how youll merge the two dataframes. The

function, very similar to SQL joins, is an extremely useful function which I use very often.

equal True, you are stating that the two dataframes share a common index and you will join both on this.

. This calculates the percent return for each stock position by dividing the

(initial purchase price for stock) and subtracting 1. This is similar to calculating a formula in excel and carrying it down, but in

You have taken the individual dataframes for the S&P 500 and individual stocks, and you are beginning to develop a master dataframe which well use for calculations, visualizations and any further analysis. Next, you continue to build on this master dataframe with further use of pandasmergefunction. Below, you reset the current dataframes index and begin joining your smaller dataframes with the master one. Once again, the below code block is broken out further in theJupyternotebook; here I take a similar approach to before where Ill share the code below and then break down the key callouts below the code block.

Here we are merging the new dataframe with the sp500 adjusted closes since the sp start price based on

each tickers acquisition date and sp500 close date.

merge(merged_portfolio, sp_500_adj_close, left_on

We will delete the additional date column which is created from this merge.

We then rename columns to Latest Date and then reflect Ticker Adj Close and SP 500 Initial Close.

Date_x: Latest Date, Adj Close_x: Ticker Adj Close

, Adj Close_y: SP 500 Initial Close, inplace

This new column determines what SP 500 equivalent purchase would have been at purchase date of stock.

merged_portfolio_sp[Equiv SP Shares]

merged_portfolio_sp[SP 500 Initial Close]

We are joining the developing dataframe with the sp500 closes again, this time with the latest close for SP.

merge(merged_portfolio_sp, sp_500_adj_close, left_on

Once again need to delete the new Date column added as its redundant to Latest Date.

Modify Adj Close from the sp dataframe to distinguish it by calling it the SP 500 Latest Close.

Adj Close: SP 500 Latest Close, inplace

in order to flatten the master dataframe and join on the smaller dataframes relevant columns.

line, you merge the current master dataframe (merged_portfolio) with the

; you do this in order to have the S&Ps closing price on each positions purchase date this allows you to track the S&P performance over the same time period that each position is held (from acquisition date to most recent market close date).

The merge here is slightly different than before, in that we join on the left dataframes

After completing this merge, you will have extra columns which you do not need since our master dataframe will eventually have a considerable number of columns for analysis, it is important to prune duplicative and unnecessary columns along the way.

There are several ways to remove unnecessary columns and perform various column name cleanups; for simplicity, I use

; and you distinguish the S&Ps initial adjusted close with

merged_portfolio_sp[Equiv SP Shares]

, you do so in order to be able to calculate the S&P 500s equivalent value for the close on the date you acquired each ticker position: if you spend $5,000 on a new stock position, you could have spent $5,000 on the S continuing the example, if the S&P 500 was trading at $2,500 per share at the time of purchase, you would have been able to purchase 2 shares. Later, if the S&P 500 is trading for $3,000 per share, your stake would be worth $6,000 (2 equivalent shares * $3,000 per share) and you would have $1,000 in paper profits over this comparable time period.

In the rest of the code block, you next perform a similar merge, this time joining on the S&P 500s latest close this provides the second piece needed to calculate the S&Ps comparable return relative to each positions holding period: the S&P 500 price on each tickers acquisition day and the S&P 500s latest market close.

You have now further developed your master dataframe with the following:

Each portfolio positions price, shares and value on the position acquisition day, as well as the latest markets closing price.

An equivalent S&P 500 price, shares and value on the equivalent position acquisition day for each ticker, as well as the latest S&P 500 closing price.

Given the above, you will next perform the requisite calculations in order to compare each positions performance, as well as the overall performance of this strategy / basket of stocks, relative to comparable dollar investment and holding times of the S&P 500.

Below is a summary of the new columns which you are adding to the master dataframe.

, you create a column which calculates the absolute percent return of the S&P 500 over the holding period of each position (note, this is an absolute return and is not an annualized return). In the second column (

, we calculate the dollar value (market value) equivalent based on the shares we hold multiplied by the latest adjusted close price (and subtract the S&P return from the ticker to calculate over / (under) performance).

columns calculate our unrealized dollar gain / loss on each position and comparable S&P 500 gain / loss; this allows us to compare the value impact of each position versus simply investing those dollars in the S&P 500.

Percent return of SP from acquisition date of position through latest trading day.

merged_portfolio_sp_latest[SP Return]

merged_portfolio_sp_latest[SP 500 Latest Close]

merged_portfolio_sp_latest[SP 500 Initial Close]

This is a new column which takes the tickers return and subtracts the sp 500 equivalent range return.

merged_portfolio_sp_latest[Abs. Return Compare]

merged_portfolio_sp_latest[ticker return]

merged_portfolio_sp_latest[SP Return]

This is a new column where we calculate the tickers share value by multiplying the original quantity by the latest close.

merged_portfolio_sp_latest[Ticker Share Value]


merged_portfolio_sp_latest[Ticker Adj Close]

We calculate the equivalent SP 500 Value if we take the original SP shares * the latest SP 500 share price.

merged_portfolio_sp_latest[SP 500 Value]

merged_portfolio_sp_latest[Equiv SP Shares]

merged_portfolio_sp_latest[SP 500 Latest Close]

This is a new column where we take the current market value for the shares and subtract the SP 500 value.

merged_portfolio_sp_latest[Abs Value Compare]

merged_portfolio_sp_latest[Ticker Share Value]

merged_portfolio_sp_latest[SP 500 Value]

This column calculates profit / loss for stock position.

merged_portfolio_sp_latest[Stock Gain / (Loss)]

merged_portfolio_sp_latest[Ticker Share Value]

merged_portfolio_sp_latest[Cost Basis]

This column calculates profit / loss for SP 500.

merged_portfolio_sp_latest[SP 500 Gain / (Loss)]

merged_portfolio_sp_latest[SP 500 Value]

merged_portfolio_sp_latest[Cost Basis]

You now have what you need in order to compare your portfolios performance to a portfolio equally invested in the S&P 500. The next two code block sections allow you to i) compare YTD performance of each position relative to the S&P 500 (a measure of momentum and how your positions are pacing) and ii) compare the most recent closing price for each portfolio position relative to its most recent closing high (this allows you to assess if a position has triggered a trailing stop, e.g., closed 25% below closing high).

Below, Ill start with the YTD performance code block and provide details regarding the code further below.

Merge the overall dataframe with the adj close start of year dataframe for YTD tracking of tickers.

merge(merged_portfolio_sp_latest, adj_close_start, on

Deleting date again as its an unnecessary column. Explaining that new column is the Ticker Start of Year Close.


Adj Close: Ticker Start Year Close, inplace

Join the SP 500 start of year with current dataframe for SP 500 ytd comparisons to tickers.

merge(merged_portfolio_sp_latest_YTD, sp_500_adj_close_start


Renaming so that its clear this column is SP 500 start of year close.

Adj Close: SP Start Year Close, inplace

merged_portfolio_sp_latest_YTD_sp[Share YTD]

merged_portfolio_sp_latest_YTD_sp[Ticker Adj Close]

merged_portfolio_sp_latest_YTD_sp[Ticker Start Year Close]

merged_portfolio_sp_latest_YTD_sp[SP 500 YTD]

merged_portfolio_sp_latest_YTD_sp[SP 500 Latest Close]

merged_portfolio_sp_latest_YTD_sp[SP Start Year Close]

dataframe, you are now merging the master dataframe with the

dataframe; as a quick reminder, you created this dataframe by filtering on the

; you do this because its how YTD (year-to-date) stock and index performances are measured; last years ending close is the following years starting price.

method to clarify the master dataframes newly added columns.

column) and calculate the YTD return for each (we also have an S&P 500 equivalent value for each value in the

In the below code block, you use thesort_valuesmethod to re-sort our master dataframe and then you calculate cumulative portfolio investments (sum of your position acquisition costs), as well the cumulative value of portfolio positions and the cumulative value of the theoretical S&P 500 investments. This allows you to be able to see how your total portfolio, with investments in positions made at different times across the entire period, compares overall to a strategy where you had simply invested in an index. Later on, youll use the[Cum Ticker ROI Mult]to help you visualize how much each investment contributed to or decreased your overall return on investment (ROI).

merged_portfolio_sp_latest_YTD_sp[Cum Invst]

merged_portfolio_sp_latest_YTD_sp[Cost Basis]

Cumulative sum of Ticker Share Value (latest FMV based on initial quantity purchased).

merged_portfolio_sp_latest_YTD_sp[Cum Ticker Returns]

merged_portfolio_sp_latest_YTD_sp[Ticker Share Value]

Cumulative sum of SP Share Value (latest FMV driven off of initial SP equiv purchase).

merged_portfolio_sp_latest_YTD_sp[Cum SP Returns]

merged_portfolio_sp_latest_YTD_sp[SP 500 Value]

Cumulative CoC multiple return for stock investments

merged_portfolio_sp_latest_YTD_sp[Cum Ticker ROI Mult]

merged_portfolio_sp_latest_YTD_sp[Cum Ticker Returns]

merged_portfolio_sp_latest_YTD_sp[Cum Invst]

You are now nearing the home stretch and almost ready to start visualizing your data and assessing the strengths and weaknesses of your portfolios individual ticker and overall strategy performance.

As before, Ive included the main code block for determining where positions are trading relative to their recent closing high; Ill then unpack the code further below.

Need to factor in that some positions were purchased much more recently than others.

Join adj_close dataframe with portfolio in order to have acquisition date.

delete_columns = [Quantity, Unit Cost, Cost Basis, Start of Year]

adj_close_acq_date[Start of Year]

Sort by these columns in this order in order to make it clearer where compare for each position should begin.

[Ticker, Acquisition Date, Date], ascending

Anything less than 0 means that the stock close was prior to acquisition.

adj_close_acq_date[Acquisition Date]

Modified the dataframe being evaluated to look at highest close which occurred after Acquisition Date (aka, not prior to purchase).

adj_close_acq_date[adj_close_acq_date[Date Delta]

This pivot table will index on the Ticker and Acquisition Date, and find the max adjusted close.

[Ticker, Acquisition Date], values

Merge the adj close pivot table with the adj_close table in order to grab the date of the Adj Close High (good to know).

Merge the Adj Close pivot table with the master dataframe to have the closing high since you have owned the stock.


merge(merged_portfolio_sp_latest_YTD_sp, adj_close_pivot_merged

[Ticker, Acquisition Date])

Renaming so that its clear that the new columns are closing high and closing high date.


Adj Close: Closing High Adj Close, Date: Closing High Adj Close Date, inplace

merged_portfolio_sp_latest_YTD_sp_closing_high[Pct off High]

merged_portfolio_sp_latest_YTD_sp_closing_high[Ticker Adj Close]

merged_portfolio_sp_latest_YTD_sp_closing_high[Closing High Adj Close]


dataframe; this is the third time that youve leveraged this

dataframe in order to conduct an isolated analysis which youll then combine with the overall master dataframe.

This initial merge is not particularly useful, as you have dates and adjusted close prices which pre-date your acquisition date for each position; as a result, well subset the data post our acquisition date, and then find the

to delete the merged dataframes unneeded columns; this is code I should refactor, as creating a list, e.g.,

, and then filtering the dataframe with this would be a better approach as an FYI, running the

code block more than once will throw an error and you would need to re-initialize your dataframe then run the

After removing the unnecessary columns, you then use the

columns (all ascending); you do this to make sure all of the ticker rows are sorted together, and we sort by Acquisition Date (in case weve purchased the same stock more than once) and Date ascending in order to filter out the dates prior to your positions acquisition dates. In other words, you are only concerned with the closing high since youve held the position.

In order to filter our dataframe, you create a new column

which is calculated by the difference between the Date and Acquisition Date columns.

You then convert this column into a numeric column, and you create a new dataframe called

is = 0. This ensures that you are only evaluating closing highs since the date that you purchased each position.

dataframe, well use a very powerful pandas function called

. If youre familiar with pivot tables in Excel, this function is similar in that you can pivot data based on a single or multi-index, specify values to calculate and columns to pivot on, and also use

function, we pivot on Ticker and Acquisition Date and specify that we would like to find the maximum (

for each position; this allows you to compare the recent Adjusted Close for each position relative to this High Adjusted Close.

dataframe, and you reset the index and join this once again on the

dataframe, which tells you when you purchased each position and the date on which it hit its closing high since acquisition.

Finally, we will combine our master dataframe with this last smaller dataframe,

After doing so, you are now able to calculate the final column needed,

and subtract 1. Note, that this percentage will always be negative, unless the stock happened to have its highest close (in this case it will be zero) on the most recent trading day evaluated (this is generally a very good sign if its the case).

This has been a pretty significant lift, and its now time for our long-awaited visualizations. If youve continued to follow along in your own notebook, you now have a very rich dataframe with a number of calculated portfolio metrics, as shown in the below:

For all of these visualizations youll usePlotly, which allows you to make D3 charts entirely without code. While I also useMatplotlibandSeaborn, I really value the interactivity ofPlotly; and once you are used to it, the syntax becomes fairly straightforward and dynamic charts are easily attainable.

Your first chart below compares each individual positions total return relative to the S&P 500 (same holding periods for the position and hypothetical investment in the S&P 500). In the below, youll see that over their distinct holding periods, 6 of the 8 positions outperformed the S&P. The last two, Twitter (which actually has had a negative return) and Walmart underperformed an equal timed investment in the S&P 500.

As each of these visualizations are relatively similar, Ill explain the code required to generate the above Plotly visualization, and for the remaining ones Ill only summarize observations from each visualization.


merged_portfolio_sp_latest_YTD_sp_closing_high[ticker return][0:10],


merged_portfolio_sp_latest_YTD_sp_closing_high[SP Return][0:10],

which will plot the x and y data you specify. Here, you specify in trace1 that you want to plot a bar chart, with each Ticker on the x-axis and each tickers return on the y-axis.

In trace2, to break up the data a bit, well use a Scatter line chart for the Ticker on the x-axis and the S&P Return on the y-axis.

Where the bar is above the line, the individual ticker (6 of 8 times) has outperformed the S&P 500.

You then create a data object with these traces, and then you provide a layout for the chart; in this case you specify a title, barmode, and the position of the legend; you also pass in a title and tick format (percent format to two decimal places) for the y-axis series.

, specifying the data and layout objects, which you previously named

The next chart below shows the gain / (loss) dollar amount for each position, relative to the S&P 500, as well as shows the Ticker Total Return %. While it is generally recommended that you allocate an equal position size to your positions (or potentially determine positition sizing based on implied volatility), this may not always be the case. For a less volatile investment, you may invest more than in a riskier position (or you may have other position sizing rules). Given this, this visualization shows both each positions return and the dollar value contribution to your overall portfoli