Machine learning, and other statistical learning methods like deep learning, have enabled computers to learn by being trained with gazillions of samples, thus learning from large amounts of data instead of being explicitly programmed. Machine learning methods are now being applied to vision, speech recognition, language translation, and other capabilities that not long ago seemed impossible but are now approaching or surpassing human levels in several domains. Scaling things up is mainstream thinking, which has generated overhyped systems such as GPT-3 with 175B parameters and an astonishing lack of semantic understanding… but it works!
Deep Learning (DL) pioneer Geoff Hinton assumed this approach for a general-purpose learning procedure during his 2019 Turing Award Lecture, “Show the computer lots of examples of inputs together with the desired outputs. Let the computer learn how to map inputs to outputs using a general-purpose, learning procedure”. So, according to Hinton, big data is all we need for a successful AI. The famous computer scientist Andrew Ng has a different opinion when he says, “the importance of big data is overhyped,” and for Prof. Pedro Domingos, “data alone is not enough.”
Over the years, AI has shown several cyclic phases with reflection periods ¾the AI winters¾ and went from getting computers to do tasks for which we (humans) have codified rules and explicit knowledge (Symbolic AI) to getting computers to learn to do tasks for which we only have tacit knowledge [Kambhampati 2021]. The main reason for this wavering attitude to constantly move back and forth from symbolic AI to statistical AI in the quest for a human-like AI is the mainstream thinking to design data-centric systems instead of knowledge-based systems. And that until recently…
After many unsuccessful attempts over the years, our aspirational objective is still to build an AI that captures how humans think. However, despite the impressive results obtained, it cannot be achieved using statistical learning alone. At NeurIPS 2019, the DL pioneer Yoshua Bengio openly admitted: “We have machines that learn in a very narrow way. They need much more data to learn tasks than human examples of intelligence.”
In the same speech, Bengio made the exciting proposal to move From System 1 Deep Learning to System 2 Deep Learning, where “System 1 are the kinds of things that we do intuitively, unconsciously, that we can’t explain verbally, in the case of behavior, things that are habitual. That is what current deep learning is good at.” For System 2, “We want to have machines that understand the world, build good world models, understand cause and effect, and can act in the world to acquire knowledge.”[emphasis added]
Allow machines to learn the world by observing, like babies
– Yann LeCun (2022)
Hold on. Is Bengio saying that System1 is the kind of thing we do “–intuitively”, “_unconsciously”, and “we cannot explain”? But this is exactly what Polanyi [blog#2] described as tacit knowledge for humans: ”intuitive”, ”unconscious”, and “unexpressed.”. So, we end up with the exciting metaphor that as humans act in the world with their tacit knowledge, so do machines with their tacit knowledge. Machines can learn tacit knowledge, and storing knowledge is possible though differently from what happens with humans.
Deep Learning System1, far from being just an inscrutable black box, enables computers to acquire tacit knowledge by being trained with lots and lots of sample inputs, thus learning by analyzing large amounts of data instead of being explicitly programmed. This point is much more important than the recurrent criticism of the opacity (the “Black Box” problem) and limitations of Deep Learning.
Clearly, Deep Learning System 2 is not a move back to Symbolic AI, i.e., to propositional knowledge (know-that), but an extension of System 1 for building causal knowledge (know-why), but also relational knowledge (know-with), conditional knowledge (know-when), declarative knowledge (know-about). System 2 holds the promise to be a meta-knowledge system to “act in the world to acquire knowledge,” tacit and explicit.
In the pursuit of a knowledge-based system that integrates knowledge, not just raw data, we look at knowledge fusion and knowledge representation instead of limiting to the magic of scaling things up for better performances. I recall that knowledge representation is about how an intelligent agent’s beliefs, intentions, and judgments can be expressed for automated reasoning and how it represents internally explicit knowledge to solve complex real-life problems. Knowledge representation in AI is not just about storing data in a database ¾It is about machines that learn from that knowledge and behave intelligently [Sayantini 2022].
We know from Polanyi that knowledge is simultaneously tacit and explicit. According to Collins, the knowledge domain is a continuum including three instances of tacit knowledge slightly overlapping ¾collective tacit knowledge (CTK), relational tacit knowledge (RTK), and somatic tacit knowledge (STK) [blog#2]. We have already seen that even the simplest DL classification system learns its internal tacit knowledge representation from labeled data and stores tacit knowledge (know-how). Now we need an approach for machines that learn and reason from that stored knowledge.
The philosophical approach to developing a machine that learns and reasons was described 20 years ago as Knowledge Infusion by Leslie Valiant.
The research, as many others in the AI field, aimed to make computers more useful to humans, empowering them with the ability to acquire and manipulate common-sense or non-axiomatized knowledge with knowledge infusion in terms of robust logic. Knowledge Infusion was a particular approach to handling non-axiomatized knowledge defined to mean any process of knowledge acquisition by a computer that satisfies the three properties:
We adapt the semantics of learning so that it also applies to the reasoning problem.
Leslie Valiant – Knowledge infusion (2006)
Knowledge infusion is a meta-knowledge approach to handling non-axiomatic knowledge. In the words of Valiant: “We adapt the semantics of learning so that it also applies to the reasoning problem. Good empirical performance on previously unseen examples is the accepted criterion of success in supervised learning. It is exactly this criterion that we believe needs to be achieved for reasoning systems to be viewed as robust” Knowledge Infusion. The resulting AI system shall be able to learn and reason by performing the same tasks that a human will do but differently. By the way, very similar to the objectives of DL System2 as described by Y. Bengio.
Structured knowledge based on symbolic computing approaches supporting reasoning has seen significant growth in these years with the application of knowledge graphs (KG). A diligent integration of sub-symbolic and symbolic systems will raise opportunities to develop Neuro-Symbolic learning approaches for AI, where conceptual and statistical representations are combined and interrelated. The overall neuro-symbolic model performs symbolic reasoning by either learning the relations between symbols or selecting symbols at a certain point using an attention mechanism. Here, we are referring to a particular neuro-symbolic system, the Graph Neural Network (GNN) or Neuro[Symbolic] type-6 in the taxonomy proposed by Henry Kautz.
As an example, the high-level architecture for Knowledge Infused Learning has been developed by Prof Amit Sheth from the AI Institute of South Carolina (AIISC). The infusion of knowledge in ML/DL algorithms can be at three different levels of depth which induce a corresponding taxonomy for knowledge infusion as shallow, semi-deep, and deep infusion:
The incorporation of external knowledge will aid in supervising the learning of features for the model, and a deep infusion of representational knowledge ¾explicit and tacit, from KG within hidden layers will further enhance the learning process (Kursuncu 2020). The interesting points of KI-L architecture can be summarized as follows:
Cognitive systems adopting the Deep Infusion Knowledge architecture enable the integration of top-down driven symbolic reasoning (empowering the AI assistant to attend to value states in the KG for compliance with constraints arising from social norms and values) with bottom-up driven statistical learning (empowering the AI assistant to learn the statistical representations in the KG and adapting knowledge representations under the guidance of observations and social norms) (Seth 2016)
KI-L architecture is an interesting approach that can deal with some forms of Relational Tacit Knowledge (via bottom-up neural network learning) and articulate some types of Somatic Tacit Knowledge (via top-down reasoning on a knowledge graph). But it will never deal with Collective Tacit Knowledge that cannot be codified or mechanized. Machines cannot socialize or be meaningfully embedded in a social environment (as yet) because humans and machines are different in matter and form [Sanzogni 2017]. So do autonomous cars [see also blog#2] or any other autonomous systems that try to replace human (explicit) knowledge without incorporating social/ethical interactions and moral rules [Heder 2020].
In the next blog, I will describe AI interpretability in the context of Tacit Knowledge as introduced by Michael Polanyi.
Personal views and opinions expressed are those of the author.