Data scientists are the darlings of the financial services industry. Hedge funds, asset managers, investment banks and quant funds are all competing for a select band of talent with global tech giants like Google. Universities can’t keep up with demand.
There is a shortage of candidates who combine a deep knowledge of financial markets, securities and investment strategies with a fluency in programming languages. But which programming language do you need to understand if you want to work in data science? Five years ago, Bank of America was talking about Python as the next big thing for its trading platforms. Now this is being combined with R.
“The same equivalent program in Python can be done with four times less code, and using the R language for statistical computing, you can run simulations easily, use the apply function and say in one statement what might take 10 lines of codes in another language,” said Mark J. Bennett during a presentation at the Trading Show Chicago 2016.
Bennett is a lecturer for the graduate program in analytics at the University of Chicago and senior data scientist at Bank of America Merrill Lynch. He manages software engineering groups, performs live-trade economic analysis and designs statistical forecasting algorithms at the bank.
R you ready to learn R?
Shops that do high-frequency/low-latency trading often use the statistical language R, and they need people who can implement a systematic way of developing programs to analyze investment data and risk management. Important skills to learn include time-series, forecasting, portfolio selection, covariance clustering, prediction and derivative securities.
“Analytics and data science are growing in importance in financial sector,” Bennett said. “SAS is being replaced by R and Python.
“Using R, we can simulate extreme events and their effect on prices, conduct portfolio analysis and visualize the covariances in a portfolio that we built,” he said. “Business, investor and consumer-level problems are attacked with tools from statistics, computer science and finance.”
Steep adoption curve of the R programming language
R is one of the most commonly used data-analytics and machine-learning languages in the world right now. In 2013, R was reported as the best-paid skill, with a median income of $115k, by Dice and several survey agents.
“As a free, open-source solution, with 6,000-plus packages – 20% growth from last year – covering thousands of use cases, R is attractive to most industries,” said Vivian Zhang, the founder and CTO of the NYC Data Science Academy. “R is at the forefront of machine learning, as leading statisticians and computer scientists prototype cutting-edge solutions in R, for example, XGBoost.”
As a result, computing communities for other languages such as Python and C++ quickly adapt these mature algorithms into their libraries.
The finance industry is still in the early days of adopting R.
“The finance industry’s need for validation and its sensitivity to regulations has slowed its adoption of R,” Zhang said.
Challenges that R is equipped to overcome
Many larger financial services firms are concerned about applying open-source languages in production lines, particularly those with trading desks and portfolio management. They depend on service-level agreements (SLAs) and paid support to provide a level of comfort that their needs will be met, Zhang said.
Google, Facebook and other Silicon Valley firms recognize the power of the R platform and use internal validation teams to prove features and functions using R, she said.
“Many finance firms don’t have this domain knowledge but instead turn to consulting firms to help validate their specific usages of R,” Zhang said.
Finance firms are also concerned about the processing efficiency of using R.
“Often, they have internal C++ teams to adapt algorithms and logics from the original R codes to speed it up,” she said.
Zhang cited the well-known case study American Century Investments revolutionizes their investment analytics platform by aggregating all of its analytics into R.
The level of demand for developers, programmers and data scientists with a command of R
Over the last 30 months, the NYC Data Science Academy has seen a 40% growth of individual and corporate training in the finance, risk and actuarial sectors.
“About half of our students are from managerial levels, including managing director, IT department manager, trading desk manager and asset managers,” Zhang said. “The other half is generally from technical and mathematical backgrounds, including quants and software engineers.”
Value proposition of open source
Many corporations, including financial services firms, don’t understand the value proposition of open-source solutions, Zhang said.
“They are lulled into complacency by corporate solutions sales professionals to believe that you get what you pay for and that expensive software and service agreements mean positive ROI,” she said. “R is among the many software solutions that disprove this fallacy.”
Follow @danbutchrwrites
Photo credit: Wavebreak Media/Thinkstock