The State of Machine Learning in 202024 June 2020
It is remarkable how much progress has been made in the fields of machine learning, artificial intelligence, and deep learning in a few short decades. From its humble beginnings, AI and algorithmic functionality have expanded to a multitude of industries and enterprise use cases, with more and more companies adopting these approaches every day.
As evidence of this, O’Reilly Media’s (host of the annual O’Reilly Strata Data and AI Conferences) recent study “The State of Machine Learning in the Enterprise” found that 49% of organizations reported they were exploring or “just looking” into deploying machine learning, while a slight majority of 51% claimed to be early adopters (36%) or sophisticated users (15%).
This data, while not completely comprehensive, is indicative of the growing trends in 2020 and beyond for machine learning and AI applications. Certain aspects of machine learning (data mining, advanced algorithms, and predictive analytics) are in particular demand, reflective of the growing role that information processing and analysis plays in the 21st century business environment.
But while the state of machine learning is more robust and exciting than ever before, a variety of challenges remain for both enterprise users and for researchers innovating in this arena. From the specific hurdles of the unique needs of various market verticals, to the financial and technological constraints still limiting widespread adoption for a broader base of enterprise users, machine learning development and integration is still developing a broader following. This guide will highlight some of the more striking trends and significant challenges around the state of machine learning in 2020.
Understanding the Subsets and Aspects of Machine Learning
The phrase ‘machine learning’ has broadened significantly, as industry professionals have used the term to include newer and broader innovations around deep learning, AI, and predictive automation.
Perhaps one of the best ways to define the term was used by Bernard Marr in his 2016 article What is the Difference Between Artificial Intelligence and Machine Learning. In it, he defines machine learning as “the broader concept of machines being able to carry out tasks in a way that we would consider smart”, while machine learning is “a current application of AI based around the idea that we should really just be able to give machines access to data and let them learn for themselves”.
Marr’s definition is useful, because it captures the broad array of tools in existence as well as those being developed that fit this goal of allowing machines to improve upon human processes and decision-making on the basis of their exponentially stronger computing and processing power. This effort has been supercharged by the continued expansion of cloud services and simultaneously increasing processing and network speeds, which have continued to expand the resources available for fostering and improving the machines doing the learning.
In terms of specific subsets or “verticals” of machine learning, some of the major areas included under this umbrella are:
- Analytics platforms
- Predictive algorithms
- Artificial intelligence in physical application (robotics, automated vehicles etc)
- Recommendation engines
- Geospatial analysis
- Neural networks
- Ensemble learning
- Video analysis and facial recognition software
- Model management and governance
- Hierarchical clustering
Sample Applications of Machine Learning
While the wealth of information collected through the Internet and mobile devices has always been known to experts in tech and in the broader business arena, the emergence of effective predictive algorithms and analytics tools has allowed that data to be interpreted, applied, and manipulated more than ever before.
Indicative of this, Dresner Advisory Services’ 6th annual 2019 Data Science and Machine Learning Market Study studied industry trends and found a number of unique data points around the growth of data science needs in enterprise cases:
- Data mining, advanced algorithms, and predictive analytics were among the highest-priority projects for enterprises adopting AI and machine learning in 2019. Reporting, dashboards, data integration, and advanced visualization are the leading technologies and initiatives strategic to Business Intelligence (BI) today.
- 40% of Marketing and Sales teams say data science (including AI and machine learning) is essential to continued success as a department. Business Intelligence Competency Centers (BICC), R&D, and executive management professionals are the next most engaged.
- Professionals throughout these sectors cite growing interest (and demand) for analytics tools and predictive behavior modeling that allows for new potential revenue streams and better manipulation and use of existing and potential customer data.
These data points are merely confirmation of what many working within these specific subsets of machine learning and AI already know -- that businesses are looking for tools that do not simply collect data anymore, but actively and dynamically model and respond in real-time.
The potential for continued optimization of machine learning and data science tools in these areas offers increasing potential for businesses to discard traditional (and archaic) marketing and sales strategies for data-driven alternatives. With improved predictive tools, companies will have the potential to streamline expenditures around lead generation and customer retention, instead relying on machine learning and predictive algorithms to anticipate customer needs and potential problems and drive decision-making before issues even arise.
Significant Challenges Facing the Machine Learning Industry
While the potential and continued innovations in machine learning fields are certainly promising, some substantial hurdles remain to even wider adoption and broader applications throughout all industries. Based on reports from a variety of major data science organizations and academic studies, the main challenges that still need to be solved to allow for continued expansion and growth include:
The Black Box Problem
Early iterations of machine learning often followed “decision tree algorithms” and adhered strictly to rules taught by human supervisors. While this left the processing fairly simplistic and limited, decision tree designs also ensured that supervisors knew how a machine was making its decisions and could adapt accordingly.
However, more modern deep learning algorithms develop a hierarchical representation of data – a design that allows them to create their own form of understanding. After analyzing large sets of data, neural networks can learn how to recognize specific objects or targets with significantly more accuracy than their predecessors. However, supervisors are still figuring out how they do it - which is what we call “The Black Box Problem”.
This inability to understand how machines are thinking plays into cultural and societal fears around artificial intelligence - in plain terms, people don’t trust machines that think and act like them when they don’t understand the process behind the behavior. This is a very real sociological challenge to further machine learning adoption - until machine learning is better understood, and that understanding is passed along to end users (say, someone considering their Netflix recommendations) the trust hurdle may continue to inhibit further widespread use.
Shortage of Skilled Labor
While the excitement around the potential of machine learning is widespread, unfortunately, the development of specialists actually versed in the field and technology has not quite caught up. With every major name in tech competing over a very limited pool of talent (by some estimates, as few as 10,000 true specialists), the bidding war over machine learning specialists has priced out most companies outside of the big players.
Not only does this mean fewer candidates for any machine learning needs, but also that less-qualified or talented candidates are still priced at exorbitant rates. A variety of studies of the sector have found that average machine learning specialists command salaries in the hundreds of thousands of dollars each year, with top-end technicians earning tens of millions. As evidence of this, in a court filing in 2016, Google revealed that one of the leaders of its self-driving-car division earned $120 million in incentives before he left for Google’s competitor – Uber.
While optimists will read this demand and talent shortage as evidence of the massive potential and rapid growth of possibilities for machine learning, the more pessimistic view is that the massive talent deficit is a severe limitation to broader development and applications of machine learning. Without more skilled specialists, average companies will be forced to rely on the spill-down of advances in the industry from corporate heavyweights and government entities.
Data Availability and Privacy Concerns
As mentioned earlier, machine learning’s growth has only been possible due to the rapid advances in cloud computing, data processing, and connectivity that have emerged in just the past few years. As advanced machine learning processes consume more and more data to operate, continued optimization of cloud services and data transfer will need to rise in conjunction.
But this leaves out the main data issue when it comes to machine learning -- accessibility. While cloud services have significantly cut down on expenses and headaches around finding reliable server capacity and storage, issues have arisen around both accessing data appropriately, as well as privacy concerns around the use and storage of personal data.
There is perhaps no more clear example of this than the recently developed European General Data Protection Regulation (GDPR), which was fought fervently by major tech companies but established privacy parameters that severely limit previous abilities to use and collect personal data. While that regulation is specific to the European Union, similar regulation questions are being debated in governments throughout the world.
Complexity and Limitations
An additional obstacle that is often forgotten when evaluating the growth of machine learning is just how new and untested many aspects truly are. It is true that the biggest tech companies spend significant capital on open source frameworks (for example, Google offers TensorFlow, and Microsoft cooperates with Facebook on developing Open Neural Network Exchange (ONNX)).
However, all these environments are very young. The first version of TensorFlow was released in February 2017, while PyTorch, another popular library, came out in October 2017. In comparison, some of the more popular web application development frameworks are significantly older in “tech” years – Ruby on Rails is now fifteen years old, and the Python-based Django has been in use for over a decade.
This just illustrates that for many people, the dreams and visions of the potential around machine learning are far past the practical capabilities of the technology right now. While there are seemingly limitless use cases (beyond the business enterprise example above), there are still extensive financial, technological, and social barriers to reaching them. But this is not necessarily a condemnation of the technology -- it is simply a reality of the relative newness of the technology itself.
What’s Next for Machine Learning?
So if this is the status quo, where does machine learning go from here? What steps are needed from individuals, educational institutions, and corporate and governmental drivers of innovation in the field to continue its growth and overcome the limitations above? Here are a few initiatives that could help in the near term:
Expanded Investment in R&D
While many companies are reticent to direct money into initiatives that may not pay off for years (or even decades) in machine learning, the long-term potential of applied uses of machine learning offer the possibility of returns that are exponentially higher. Companies must continue to take balanced risks in promoting further research, testing, and application of machine learning to continue expanding the potential of the tools themselves.
Improved Educational and Training Pathways
While a number of leading universities offer high-quality research and academic opportunities for potential data scientists and technicians, that number isn’t nearly enough to counter the “brain drain” cited above that currently limits the qualified tech pool. Educational institutions and government policies must acknowledge this growing demand and potential need, and adapt curriculum and training programs to drive more quality technicians into the talent pool - to both meet existing demand and curb salary inflation to allow more companies to enter the space.
Continue building public trust
The more that machine learning is only understood by a few specialists and industry leaders, the less the public will understand how it works - and thus, the Black Box Problem will continue. Whether it is through actions like those of Netflix and others revealing how their predictive algorithms work, or through better publicity and engagement opportunities to make machine learning relatable and understandable to normal citizens and end users, motivating public support behind investment and development in machine learning is key to the initiatives needed to spur continued growth.