For the graduate machine learning course (CS7641) from the Computer Science Department in Georgia Tech we are required to perform a project using supervised and unsupervised machine learning algorithms. Having said that, our group decided to develop a Portfolio-Building (stocks) procedure with the goal of exceeding robo-traders’ rate of return without the requisite of paid data. A video version of our initial proposal can be found [ HERE ]; and below the team’s member names we present a midterm report on our project. Feel free to reach us if you are interested.
Data-Driven Portfolio Creation
In business circles the word ‘‘investment’’ refers to acquired assets or items that generate income or appreciate, generating future wealth. Given the social structure we live in (where money is needed to have access to goods and services), most people seek investments in high-performing assets to meet their future needs. This behavior is evidenced by the fact that more than 52% of the American families have either indirect or direct investments in the stock market .
The performance of the stock market is evaluated by indices, which are defined as ‘‘statistically representative samplings of any set of observable securities in a given market segment’’. Examples of indices include the S&P 500 (a representation of the large-cap segment of the US equity market) and the MSCI Europe Information Technology Index (a representation of IT companies in large and mid-cap segments across 15 Developed Markets countries in Europe). Indices are used as a benchmark to measure the performance of an investment portfolio; but they are not financial products, so they cannot be bought or sold.
Optimizing an investment portfolio has always been an attractive problem because of its economical implications. In 1952 it was theorized that the stock market is the ‘‘perfect portfolio’’ because it allows the maximum diversification of assets . In 1965 it was documented that is extremely uncommon for active stock traders to outperform the market consistently over time . In 1976 the first index mutual fund (Vanguard 500 Index fund, mimicking the S&P 500 index) was created, with the goal of providing a ‘‘passive’’ investment method that requires minimal transaction fees and generates returns that approximate the market’s return. Today there exist thousands of Index Mutual Funds and financial products linked to the performance of indices; however, the following question stands: Can we use previous market information in order to create a portfolio that outperforms the market return?
Literature Review and Problem Definition
There exist several investment-related problems that have been studied over the years, and two of the most prominent are the stock trading problem and the portfolio optimization problem. Solving these problems is very difficult as it is usually required to accurately forecast the future performance of the stocks or their future prices, and that in itself is a difficult problem, evidenced by recent works focused only on forecasting the direction of the price in the future (generative neural networks  and deep learning ) with less than stellar results. Most of the studies for this sub-problem use historical data to perform a time-series analysis.
The stock trading problem seeks to maximize returns by selling and buying stocks over short periods of time, basically mimicking the behavior of a ‘‘daily’’ stock trader or a ‘‘swing’’ trader, and recent techniques used are deep learning  and deep reinforcement learning .
We focus on the portfolio optimization problem, which seeks to build a portfolio of stocks to be held over a longer period of time; basically mimicking ‘‘buy and hold’’ traders. This is also closer to the idea of passive investing. This problem has been studied extensively in the optimization literature, and we refer the reader to  for a comprehensive review on the methods used to solve this problem under several variants. Recently, machine learning methods have also been used [9,10] to tackle this problem.
In simple terms, our problem statement is: Given historical data for a set of stocks overtime period [a,b], can we leverage this information to create a portfolio, comprised by a subset of those stocks, whose return matches or exceeds the market rate of return (performance) over some period [b,c]?
Results and Discussion
So far our results have exceeded our expectations, beating in 90% of the decision epochs the performance of SP&500, NASDAQ and Dow Jones, evaluating over a moving-window 10 year period. Having said that, in this very moment our focus is to develop a minimum viable product and evaluate the real-world performance of our algorithms by using them for trading.
-  Board of Governors of the Federal Reserve System. “Survey of Consumer Finances (SCF)”. link
-  Harry Markowitz. “Portfolio Selection”. The Journal of Finance 7.1 (1952), pp. 77–91
-  Eugene F. Fama. “The Behavior of Stock-Market Prices”. The Journal of Business 38.1 (1965), pp. 34–105. issn: 00219398, 15375374.
-  Jefferson Hernandez and Andres G. Abad. ‘‘Learning from multivariate discrete sequential data using a restricted Boltzmann machine model’’. In 2018 IEEE 1st Colombian Conference on Applications in Computational Intelligence (ColCACI). (2018), pp. 1–6. doi:10.1109/ColCACI.2018.8484854.
-  Alexiei Dingli and Karl Fournier. ‘‘Financial Time Series Forecasting – A Deep Learning Approach’’. International Journal of Machine Learning and Computing 7 (2017), pp. 118–122. doi:10.18178/ijmlc.2017.7.5.632.
-  Thomas G. Fischer and C. Krauss. ‘‘Deep learning with long short-term memory networks for financial market predictions’’. European Journal of Operations Research 270 (2018), pp. 654–669.
-  Yue Deng et al. ‘‘Deep Direct Reinforcement Learning for Financial Signal Representation and Trading”. IEEE Transactions on Neural Networks and Learning Systems 28.3 (2017), pp. 653–664. doi:10.1109/TNNLS.2016.2522401
-  Prisadarng Skolpadungket. “Portfolio management using computational intelligence approaches”. PhD thesis. University of Bradford, 2013.
-  Steve Y. Yang and Saud Almahdi. “An adaptive portfolio trading system: A risk-return portfolio optimization using recurrent reinforcement learning with expected maximum drawdown”. Expert Systems with Applications (2017).
-  Steve Y. Yang, Qiang Song and Anqi Liu. “Stock portfolio selection using learning-to-rank algorithms with news sentiment”. Neurocomputing (2017).
-  “Robo-Advisor Performance Is Only One Piece of the Puzzle”. link.