[AI@50]
41% strongly agree and 44% somewhat agree that vision shouldn't be treated as input chanels converted to symbolic representations; instead, we should assume there might be intelligence going on earlier in the system
[AI@50]
41% strongly agree and 44% somewhat agree that vision shouldn't be treated as input chanels converted to symbolic representations; instead, we should assume there might be intelligence going on earlier in the system
14 July 2006 in AI@50, HCI, Meaning | Permalink | Comments (0) | TrackBack (0)
[AI@50]
Filene Auditorium, Moore Hall
Dartmouth College
Hanover, NH
Conference notes by Meg Houston Maker
Eric Grimson, MIT
Intelligent Medical Image Analysis: Computer Assisted Surgery and Disease Monitoring
Vision is primarily an inverse problem. The goal is to infer properties of objects in the world to support intelligent interactions. You want to recover some or all of the underlying information of the image.
Historical approaches are restricted to low-level processing: edge detection and shape recovery. We tended to ignore surface texture, etc. So you could have a representation of the geometry of the object, but then you have search problem, matching the detected results against a set of knowns. Is there a recognition of a particular geometric shape within what's detected? This doesn't work well in cluttered scenes -- selection, grouping, and focus of attention are critical to controlling complexity.
Recognition of shapes not well described by geometry is elusive. E.g., why can't you just recognize a cat? Well, if you could render a cat as polyhedrals, you could do a good job recognizing the cat. Many approaches were originally driven by the need to control complexity due to computational restrictions.
There has been a shift; we're now using more statistical methods for object recognition. E.g., SIFT methods (Lowe) or constellation methods (Malik, Perona); but these are mostly just more complex edge-detection. But we now have learning methods that allow us to train the system.
Examples from medical image analysis and image guided surgery: AI assists in building accurate models of the patient (e.g., the tumor in the patient's brain). That model is then used while conducting the surgery. The model takes input from MRIs or similary patient scans and makes inferences about the structures it sees (gray matter, white matter, tumor, etc.) Statistical learning is a big part of these methods. Domain -specific knowledge is incorporated in a way that reflect the details of the domain. The system outperforms the radiologist on specific tasks.
Statistical learning methods will continue to be central elements of computer vision approaches. A critical problem is incorporating prior knowledge or expectations -- learning where to expect different types of structures or features in the object.
Q&A; Grimson's comments:
So, we still can't recognize a cat, but we are closer. A good vision system would actually watch the cat walking, and do some math to figure out it is a cat.
The medical work is high-stakes work. So we work with the surgeons closely and build-in error monitoring to alert them to likely modes of failure.
Takeo Kanade, Carnegie Mellon
Artificial Intelligence Vision: Progress and Non-progress
Kanade developed a Matrix-like replay system called EyeVision for the Super Bowl 2001. Each of 30 cameras tracks an event with zoom, focus, and tilt in real-time. The system comprised 30 cameras mounted around the upper decks in the stadium in Tampa FL. Each camera was controlled remotely and the composite allowed for 3-d reconstructions of the actual footage. With one camera, you can't see the hole for a quarterback to make a pass. But when you swing it around in replay, you can see a big hole for him to pass to the receiver.
Vision is difficult. Vision was one of the earliest problems in AI: blocks world, outdoor scene analysis (which was overly ambitious when we look back on it today). Why is it so difficult? Lots of data: in one 1000 x 1000 image, it contains 10-6 10 x 10 images. In a 10 x 10 block at 8-bits per pixel, the number of possible images is 256-10 x 10 which equals 10-240. So, his theorem:
"Human kind has not seen all the 10x10 images yet." [laughter]
Context is another key difficulty. Why is that an image of a car? Because it's on a road. Why is that a road? Because there's a car on it!
We need a more disciplined approach for searching and defining constraints:
- physical and 3D recovery (shape from X methods)
- geometrical reasoning
- statistical modeling and learning -- statistical properties are quite often enough to recognize an object
He's developed a method of, essentially, shaking the head (or vision system) just slightly to detect the ground layer and obstacles. Edge and color detection produces useless and noisy output, while this produces correct structured images of the meaningful objects in a scene, and allows the disambiguation of figure from context or background.
Vision is physics, geometry, and statistics; this appraoch is generating real results.
Q&A
The fact that animals that move their heads to detect objects stereoscopically is interesting. It's great that this is in the real world, but the really interesting thing is that this is a rigorous method that works.
Terry Sejnowski, Salk Institute, UC San Diego
A Critique of Pure Vision
His goal: To understand how the brain works. But how will we know if we have the right principles, and if we really understand it? In his view, if you really understand it, then you can build it.
Sejnowski once asked Allen Newell why Newell had turned his back on neuroscience. Newell replied that at that time the field was too young to provide any useful information. That's all changed now, because we have much better techniques for analyzing the brain at all levels of investigation -- from the molecular level to the level of large structures.
Pure Vision: the idea of a feed-forward network that allows us to construct an image of the world. As we saw in Tadeo's talk, that if you have a motor system, you can do a few more interesting things.
In "A Critique of Pure Vision," Sejnowski and others argued there is weak evidence for the basic assumption that vision is a feed-foward system. Change blindness is an excellent argument against this approach. The test to show this consists of blinking an image on a screen in which one object in the scene is coming and going; observers have a hard time figuring out what part of the scene is changing. There's so much in the scene to pay attention to, that there's no way to keep track of it all at once. We now know what mechanisms in the brain allow this to happen; it's an internal neurological phenomenon, and attention modulates the firing rate fo the neurons in the recognition apparatus. So this is activity in an internal signal -- it's not coming in from the outside world.
Sejnowski cites many animal experiments that show temporal difference learning. The take-away is that techniques for studying neurons and reactions to stimuli are key to our understanding the cerebral cortext and the computations it's doing.
We don't know what the brain is doing. The only thing we have is the end of the computation -- what pops into our head. But all the heavy lifting is happening beneath the surface. We need to go beyond the surfaces and find the algorithms being used. Temporal difference learning is just one of these.
Q&A
Q: There's not a lot of clarity about what's AI, what's cognitive science, what's neurscience. What you're doing is not obviously AI.
A: My goal is to understand how the brain works. If all you have is the input and output, you're doomed. You need a way of understanding how the whole system is organized in between.
14 July 2006 in AI@50, HCI, Meaning | Permalink | Comments (0) | TrackBack (0)
[AI@50]
30% strongly agree, 31% somewhat agree, and 27% somewhat disagree that the future of AI will be propositional or probabilistic vs. based on complex knowledge representations
38% somewhat agree with brute force solutions vs. modeling of human thought
36% strongly agree and 43% somewhat agree that successful development will combine machine learning with automation of deductive reasoning
14 July 2006 in AI@50, HCI, Meaning | Permalink | Comments (0) | TrackBack (0)
[AI@50]
Filene Auditorium, Moore Hall
Dartmouth College
Hanover, NH
Conference notes by Meg Houston Maker
Rod Brooks, MIT
Intelligence and Bodies
Brooks uses a Mac
Cybernetics: 1948-1961: Control and communication in animals and machines, including the study of all possible behaviors of a machine, and modeling animals as machines. By 1952, Asby advocated considering the animal coupled with the environment. These researchers built instantiations, but didn't use digital computers (because they weren't invented yet). Grey Walter built two robot "tortioses," Elsie and Elmo, and published articles in 1951 showing these robots interacting with the physical world, responding to light and touch.
So the key "Dartmouth Innovations" were:
- Modularity: an element of abstraction
- Ungrounded symbols (another element of abstraction): put the human observer in the role of connecting the program to the world; let researchers jump to much "higher level problems"
- Search: Turing, Michie, Shannon, and Weiner had all considered this for chess
- Jumping onto the digital train as the mechanism for expression of intelligence (vs. analog circuits)
"The core of our humanity, our abillity to think and reason, was subject to our technological understanding."
CS and AI have largely merged over 50 years. The first technical approaches with in the new framework got replaced later by new innovations. Initial "problems " ended up being category errors. There is an inconvenient truth that applications drive science, despite scientists' distaste for this fact.
The AI founders -- Turing, Michie, and others -- were mathematicians and were able to reason about logic problems, theorems, non-chance games. This is what intelligence was to them. They were smart! So they weren't as interested in solving "simpler" intelligence problems [like how you manage to walk downstairs and get yourself a cup of coffee].
LISP gave us recursion and manpulation of ungrounded symbols (at a time when computer architectures used intrinsically non-recursive call structures). It gave you a control stack. So search routines became easier to write than they were in Fortran, and symbols became easier to use because they were pre-parsed. So AI became search- and symbol-centric because of the availability of this tool.
However, all Brooks's robots use LISP! Including the vacuum cleaners -- and there are about 2 million of these out there right now running LISP.
If you review how the brain was thought about over history, it's always been described in metaphorical terms. In 1900 it was "photographic," in the 1950s it was "a telephone switching system" in 2000 it's like the "world wide web."
Remember: descriptive is not generative, and the system is not the model.
Embodied AI: Robots in the World
Brooks shows clips of his robots interacting with objects, reacting to objects handed to them. They produce action in the world, without evident planning. Brooks argues that planning is an emergent property of the system. It's like designing a bridge -- you build the bridge to be very strong, but you don't put a "strength module" in the bridge.
The Future of AI
As Boomers age, this demographic trend is going to push for an increase in physical services -- more automation, more help with the physical world. AI research should focus on developing agents with:
- the visual object recognition of a 2-year-old
- the manual dexterity of a 6-year-old
- the language ability of 4-year-old
- the social sophisitication of 10-year-old
Nils Nilsson, Stanford
Routes to the Summit
The "summit" is human-level artificial intelligence. There are probably some false summits. There may be more than one summit. There is no single route to the summit -- search, reasoning, reinforcement learning -- will all end up being necessary and incorporated in reaching the summit.
Early Pioneers: Turing thought computation could include everything human brains do, no matter how creative or original. The McCulloch-Pitts Model of Nerve Nets were forerunners of the "sub-symbolic" AI approach. And the PSS: a physical symbol system has the necessary and sufficient means for intelligent action (Newell and Simon).
PSS = Formal Symbol Manipuation + designation and interpretation
An expression designates an object if the system can affect the object (action) or behave in ways dependent on the object (perception)
Interpretation is running programs
So PSS doesn't necessarily preclude connection
Promising Approaches:
GOFAI Approach: AI programming looks at a specific domain (geology, medicine, chess, chemistry, etc.). But these approaches are brittle and domain-specific, so you need to incorporate common sense into the system.
Cognitive Substrate Approach
Build a core or cognitive substrate within the system that has the abilities of a child. We need to program enough of them so that through learning, education, training, etc., the system could eventually become expert in a field (geology, medicine, chess, chemistry, etc.)
Computational Models of the Neocortex
Bayes network structures, both shallow and deep nets: simple inputs are fed up the hierarchy to increasingly abstract representations of the sensory inputs. This is promising, becuase we don't just react to raw sensory data; we also (only?) react to abstractions of data.
Pandemonium Reprise
Selfridge's Pandemonium consisted of cognitive demons, computational demons, and data or image demons. The system was both hierarchical and parallel; weights were adjusted by hill-climbing. Demons were committed neither to symbols nor to neural nets; it was neutral on the question of whether the system should be symbolic or sub-symbolic. Building on this model, we might include not just sensory inputs, but add motor outputs. And drive action.
Eric Horvitz, Microsoft Research
In Pursuit of Artificial Intelligence: Reflections on Challenges and Trajectories
We are pursuing intelligence admidst inescapable incompleteness: limitations in representation, time, and memory. Uncertainty is ubiquitous. So machinery for handling uncertainty and resource limitations is foundational in intelligence.
Beauty and the Bottleneck
A lot of AI is about looking at the bottlenecks and gaining insights. This forces us to look at the economics of computation and bounded optimality. Why should we compute? Why perceive and reason?
Continual Computation
How can the agent make best use of idle cycles to plan and reason about the future? We can generate and use algorithms that model the tradeoffs an agent makes between the expected value of a computation, the expected value of information, and the expected value of exploration and learning. It boils down to looking vs. thinking -- how does an agent make these computational trade-offs? Much remains to be done here in AI.
Intention Machines (e.g. web search)
Taking query inputs and making inferences based on those inputs about information goals. This can be used in travel interfaces, too.
AskMSR: a heuristic search system that takes a query and predicts the likelihood of a correct answer based on material already on the web.
Mixed initiative interactions: The machine and human make independent contributions to solve a problem. The human does what it can do, the machine does what it can do, and each queries the other in achieving the solution. These systems are integrative because they span multiple systems (within the context of HCI interfaces). E.g. MS Lookout.
14 July 2006 in AI@50, HCI, Meaning | Permalink | Comments (0) | TrackBack (0)
[AI@50]
Filene Auditorium, Moore Hall
Dartmouth College
Hanover, NH
Conference notes by Meg Houston Maker
Oliver Selfridge, MIT
Learning and Education for Software: new Approaches in Machine Learning
Selfridge will address learning: learning many things at once, and the full concept of learning in the human experience.
We know there is learning in other creatures. For example Alex, a 20-year-old African Gray parrot, has learned to speak English, and it can count up to six, and can name how many, e.g, small round objects are set before him. But this parrot uses a different kind of language than what Selfridge is really talking about.
And, Selfridge says he's also not going to talk about formalisms. There are enough people around here who will focus on formalisms [laughter]. Nor will he talk about neural networks.
Rather, he's focused on how we learn many things at once. We do learn things in order, but we don't have to learn them in order; sometimes we learn in parallel. For example, take a 3-month baby in a crib, with a sparkly mobile over him. He knows about reaching his arm, and about minimizing the visual angle so he can reach the object. And the first time he reaches he knocks the mobile away. But the next week, he may reach more gently. He knows to slow down a bit so he can control his movements. And much later, when the baby learns to kick a ball, he knows to slow down a bit before he hits it. It's a technique he's learned in other contexts. Selfridge says he doesn't care about how the baby does this formally. That's just not important.
But you can't apply a lesson learned about kicking a ball to how to eat spaghetti. There are complexities in learning. We don't read, e.g., just by recognizing characters. We look at a word or a phrase, and we connect it with meaning. We also look at a jar of mustard and we see Heinz, because we've connected past learning or understanding to that concept.
So, why do we learn at all? "To use technical terms, why do we give a shit?" The key role in learning is played by purposes. What are we trying to do? Goals and purposes play a large role in learning: if you don't learn, it's probably because you don't care.
And you never have just one purpose. You have a hierarchy of purposes. Every action or thought is a response to a purpose, something you're trying to do. When you learn chess, you don't learn just so you can win. You learn sub-structures that serve sub-purposes in the eventual service of winning. So when you learn the thing you're trying to learn, you learn lots of other things along the way. And this other kind of learning is curcial to the kind of responsive, rich behaviors we all have.
If you have a machine that plays chess, what use is that machine in teaching a child to play chess? When you teach a child to play chess, you lose so that the child will learn to take advantage of opportunities presented to her. This is a large part of ingenious education; to teach children to think for themselves and open the doors to new approaches to solve problems.
The role and importance of purposes cannot be understated. From the appreciation of beauty, all the way down to the movement of our muscles, can be expressed as control loops of purposes. Purposes get changed; circumstances arise. We have to fit the context around the purposes, and the purposes inside the context. Our notion of beauty when we are 8 is different from when we are 18.
And how do we get at the features that matter? There are relevant features that can't be taught, but which can be discovered. Information theory, which looks at channel capacity, was introduced in the 1930s. Now, young children know about it, and it's a feature that we know and use. The concept of the number zero is similar -- the feature of zero being part of a number was at one time a new concept.
How do we get new features? They can't just be generalized. So how do we learn new ones? Once you've gotten a feature, you can talk about it and teach about it. But if you don't have it, how do you get it? For example, how did Helen Keller learn to read? How do we get new concepts? What are the alternative ways of learning?
One thing about features is that if you want to make classifications and see what is similar between events, you need a set of features to describe them. How do we even learn what counts as a feature?
Be sensitive to the role of change. Life is always changing. And that change changes your point of view and your purposes. Part of learning has to be to understand that and to keep going anyhow. Marvin Minsky put it well when he said "the best is the enemy of the good." Meaning the best implies an optimization, which, when used in machine learning, implies that the world is standing still. Which is not a safe premise.
So, how do we express as a language for software not just what it should do but what it should try to do? At every stage the software should be told to try something and then to try something new. So instead of writing every detail of what we're trying to do, we should try to have the program work out what it should do. An important part of learning is that we don't learn nearly as much individually as we learn culturally. Let's put together a language to express many things we can't express now. Tell an AI program that it doesn't just want to do something, but to create a satisfaction parameter that will make it happier that it has done what it's tried to do. And remember, it's not just trying one thing, it's trying many things.
Ray Solomonoff, London
Machine Learning - Past and Future
Solomonoff is not interested in simulating humans. He's interested in creating machines that can solve problems much better than humans can. Like cure cancer. And these machines should follow Moore's Law.
At the beginning of AI, Solomonoff wasn't really impressed with the expert systems project. When you ask a human expert how they do what they do, they can't tell you. And these researchers had found that the more info you put into a system, the better it worked. This was contrary to most of science, in which when you get more info, you then can (or have to) reduce this to a set of rules or principles. This just wasn't happening in AI with expert systems.
Neural networks also seemed like an inefficient way to solve a problem. Given enough nodes, you can pretty much solve any problem, but it's cumbersome. Many other research projects are promising for some aspects of AI. But his approach was a different. He developed an algorithmic probability concept. Basically, he showed that you could take a universal machine (a Turning machine works, but others do, too) and give it random data, the output is infinite but can be predicted.
Solomonoff's math eludes me, but you can learn a bit more from his Wikipedia entry and his own website.
Leslie Pack Kaelbling, MIT
Learning to Think About the World
Early in AI there was a lot of emphasis on thinking and symbolic representation. Recently there has been emphasis on embodied cognition and embodied action; agents that have to perform in the world. A lot of robots can get by being reactive. But we need robots that can act in random environments, "in the wild," as it were. What is needed for that?
1) estimating the state of the world
2) making guesses about what will happen if you take any particular action
This means we need ways to represent uncertainty. The set of world states is too big to be useful. The problem is "learning to live." You do a lot of things, and you know a lot of things, so how can you figure out what action to take based on all of the things you do and know?
Probablistic dynamic outcomes: actions have probablistic outcomes, but the frame stays constant, and you jettison irrelevancies and minor nuances (MHM editorial: that seems mighty convenient). So now you have something that actually can be modeled.
She's talking so fast I can't keep pace with her! She shows examples from robotics, block worlds, and other systems. Check her website for more. I probably will have to.
Peter Norvig, Google
Web Search as a Product of and Catalyst for AI
Google's mission: Organize athe world's information and make it universally accessible and useful.
Machine learning and natural language processing are core technologies of searching now. Google offers a "predicted items" function that lets you do on-the-fly classification. E.g., if you type in Armani, you get Gucci, Prada, etc.
Trend history: maps the frequency of terms over time. E.g. of watermellon (spike in summer, low winter), and full moon (monthly), eclipse (predictably, when eclipses happen, plus a little noise for searches for the car and the programming language).
Natual language processing is a core technology. Here are some examples:
Spelling corection: E.g. "Britney Spears." Google takes a corpus-based approach. There is no "ground truth." Rather, they build a probablistic model based on the vast data of spelling variation in the corpus. The advantage of corpus approach is you will get names not in a dictionary, and you can acquire spellings in context, related to the surrounding words.
Extraction from semi-strutured text: it's possible to pull out facts from the text, too, like population data on a searched country: e.g., search on "what is population of japan" and you'll see how Google pulls out the population data and puts it at the top of search results.
Statistical Machine Translation (to, e.g., translate from Chinese or Arabic into English): There's no linguist on staff at Google who speaks these languages. They simply collect parallel texts (data that exist in both languages, e.g., from newspaper sites.), then they do a probablistic alignment; the results are extremely accurate.
Norvig takes a corpus-based approached, which is different from the traditional knowledge-base approach. Some concepts are not in reference books. The concept that "water flows downhill" is not something you can look up in the encyclopedia. Plus, even if you used existing knowledge, it would cost about $10,000 a page to encode, e.g., a chemistry textbook. So, how can we get knowledge or information without just paying someone to do it?
With Google, there is a huge corpus, much larger than what might be in an encyclopedia or textbook. And there is lots more accuracy with a larger training corpus, because you have more opportunity for disambiguation. This is "The Power of a Billion" (after Banko and Brill, 2001: Effect of Training Corpus Size).
Google created a complex model (called MapReduce) to process data. It's a learning algorithm that is broadly useful within the Googleplex and can be applied across projects.
Norvig references Andy Clark's book, "Being There: Putting Brain, Body, and World Together Again." We're not as smart as we think we are. We've figured out some clever stuff we can do with language and mathematics -- but these are external systems, or props, that amplify our intelligence.
Making information accessible and matching it to human needs is what Google's trying to do. "In the future search engines should be as ueful as HAL in 2001, but hopefully they won't kill people." (Sergey Brin)
Q&A:
Question: what about image processing?
Answer: we do very little image processing; Google Images is our second-most popular property, but the work is all around words surrounding the text. But we are starting to do a little image processing, signal processing, and the like.
13 July 2006 in AI@50, HCI, Meaning | Permalink | Comments (0) | TrackBack (0)
[AI@50]
59% say connectionism is a promising model, with some restrictions or qualifications
13 July 2006 in AI@50, HCI, Meaning | Permalink | Comments (0) | TrackBack (0)
[AI@50]
Filene Auditorium, Moore Hall
Dartmouth College
Hanover, NH
Conference notes by Meg Houston Maker
Geoffrey Hinton and Simon Odinero, Toronto
From Pandemonium to Graphical Models and Back Again
Odinero gives this talk solo
How to build a perceptual system -- a "pandemonium architecture" after Selfridge
The good old fashioned neural network: input - hidden layer - output layer, plus a back-propagation algorithm as a derivitive of learning.
What's wrong with this model?
- you need a labeled training corpus
- it's not efficient; unless the weights are redundant, labels can't provide enough information
- the learning time doesn't scale well with more than 2 or 3 layers
- neurons need to transmit the signal forward and back-propagated information backward; real neurons don't do this
Overcoming These Limits -- using a Restricted Boltzmann Machine
A simple model is a 2-layer network (a shallow network): 1 layer of hidden units (with no connections between them) and 1 layer of visible units. Start with training data on the visible layer (images of reality), update the hidden layer, then update the visible units in parallel to get a "reconstruction" of reality. Then update the hidden units again. This network can learn to recognize features of the training data, but it tries to frame all perceptions in terms of that corpus (so if it were trained on the symbol 2, and was given a 3, it might recognize some features of 3 that correlate with those of the 2 (e.g., the top curve to the left).
It is also possible to create a more complex model that has multiple hidden layers. This seems more promising than back-propagation because there is only one type of signal sent through the system. This system can "fantasize" outputs that map to reality. So if you tell such a system that this symbol is a 2, it can create output shapes that look like 2. Likewise, if you give the system a hand-written 2, it can guess most of the time that it represents the numeral 2.
MHM editorial: There is no evidence that this system carries or expresses any understanding of the meaning of "two-ness."
Rick Granger, Dartmouth College
Essential Circuits of Cognition: The Brain's Basic Operations, Architecture, and Representations
Granger uses a Mac.
The Problem: Intelligence is undefined; it's only defined by example, by what we've learned. There's no formal spec. The only spec we have comes from brains. The question is how to scale from brain mechanisms to high level faculties. One approach is an analysis of brain circuits and systems, and a deriviation of alrorithms and data structures, not just statistics, and anatomically structured systems that construct high-level cognition. It's not a debate between statistics and algorithmic analyses. The issue is to show how logic arises from statistical methods.
As mammalian brain size increases, specializations decrease. In large brains, the posterior and anterior cortex are heavily connected, and the output of the anterior cortex, which in small mammals drives the musculature, has feedback mechanisms back into the anterior cortext. These feedback architectures are repeated redundantly in the brain in nested clusters and drives more complex behavior/output.
Granger shows an example diagram of a core loop within the brain (a thalamocortical circuit) that acts as a feed-forward network in recognizing, in his example, a flower. As the output of one region becomes input for another region, more complex patterns are created, generating a kind of grammar within the system. So, at the end of the day, there are a few algorithms that can be derived from brain circuits, and these are fodder for AI research.
For more, see BrainEngineering.org
13 July 2006 in AI@50, HCI, Meaning | Permalink | Comments (0) | TrackBack (1)
[AI@50]
71% of the AI@50 audience said statistical/probablistic methods are most accurate in representing how the brain works.
79% said an accurate model of thinking is impossible without further discoveries about brain functioning.
13 July 2006 in AI@50, HCI, Meaning | Permalink | Comments (0) | TrackBack (0)
[AI@50]
Filene Auditorium, Moore Hall
Dartmouth College
Hanover, NH
Conference notes by Meg Houston Maker
Ron Brachman, Yahoo! Research
A Large Part of Human Thought
From the original summer project proposal -- the proposition that "a large part of human thought" was bound up in language, but there is also a conflation of language and concepts that language replaces:
"It may be speculated that a large part of human thought consists of manipulating words according to rules of reasoning and rules of conjecture. From this point of view, forming a generalization consists of admitting a new word and some rules whereby sentences containing it imply and are implied by others. This idea has never been very precisely formulated nor have examples been worked out." (From the original proposal.)
Cognitive Penetrability: language can produce widely ranging human behavior.
Knowledge Matters: "the city council refused to grant the protesters a permit becuase they feared violence" vs. "the city council refused to grant the protesters a permit because they advocated violence." In each case we understand a different reference for the pronoun "they." Background knowledge of the universe changes our interpretation of linguistic inputs, even if the inputs are, grammatically, nearly identical.
A good portion of human thought involves a store of represented beliefs, and procedures that operate on them to produce new beliefs (inferences) that impact decisions about what we do. The focus is on what can be known, and how, and what follows from what is known. Knowledge representation and procedures for manipulating knowledge became an object of study in early AI.
The internal conceptual monologue: From the simplest idea (formed from a simple syllogism such as "Socrates is a man and all men are mortal therefore Socrates is mortal") you get to daydreaming, imagining other possibilities, envisioning and caring about your new vision, planning about it, and teaching or preserving it as an idea. And all the way out, this action begets writing, reading, culture, and science. Having knowledge representations in our heads that are disconnected from concrete reality begets intelligence, culture, and society [MHM: some editorial liberty in this last bit].
You can use knowledge structures to do forward-looking inference, come to a mental conclusion, and make a decision about what to do. This was an original idea of the conference 50 years ago, and since then, it's had a great impact on:
- Database management systems
- Expert systems
- Descripton logics, and the semantic web (or OWL = web ontology language)
- Cognitive robotics
- Planning systems (e.g. SIPE)
- Model-based reasoning (e.g. at NASA)
Language and perception create usable memories, useful for making decisions about what to do now, or to plan future moves in the abstract. It's about taking advantage of past action, or learning. The explicit formal representation of what a machine knows and believes is at the heart of creating a machine that can do what humans can do.
To sum:
- Actions are conditioned by what we know and believe;
- At least part of this knowledge arrives in linguistic form;
- This allows us to focus on what needs to be known;
- And gives rise to a 'knowledge-based" approach.
This idea defined AI, as opposed to approaches like computational neuroscience.
Question: Much of the AI community has either forgotten this or would disagree with this focus. E.g. the natural language community.
Answer: These insights mentioned are still relevant, and essential to knit the whole of AI together.
Question: (actually a comment from John McCarthy): In some ways our proposal for a revolution was a proposal for a counter-revolution against the behaviorists that advocate a stimulus-response approach.
David Mumford, Brown University
What Is The Right Model for "Thought?"
Is thought logic/rule-based or is it stochastic models/statistical inference? This is an old question, but still being discussed. Logical deduction and statistical inference are really different, and maybe even incompatible models. Logic comes in many flavors: traditional, modal, temporal, etc. Stochastic models are differently structured. Logical deduction is brittle and unforgiving, whereas statistical inference is more adaptive and flexible.
"Logic/rule-based thinking is fun, but is it real? To grammarians, rules are the deep and very real way to understand natural language grammars, but language is a maze of exceptions within exceptions."
People think in a Bayesian (probablistic) fashion. People make inferences based on a collection of hypotheses about the context. E.g., given the following test:
Actual sound (missing consonant): "The ?eel is on the shoe."
Perceived sound "The heel is on the shoe."
People will not report they thought the word was wheel or steel. They know that shoes have heels, and so assume, absent other data, that the missing consonant was "h."
Implementing statistical inference: there are vast amounts of noisy data and many variables present. Statistical tables won't help you. So you must sample the entire probability distribution and do 'particle filtering' to create a result. But how does the brain do this? The brain isn't ever doing just one thing. There is no "grandmother cell" that fires when you think about your grandmother. The brain has to manage hundreds of variables at once, and filter. So choice is postponed.
Stuart Russell, UC Berkeley
The Approach of Modern AI
Russell would love to see the various branches of AI research merge; they have been quite divergent (cognitive science, mathematics, other research projects).
Formalisms or algorithms for probablistic knowledge bases: focus on concrete syntax, semantics, and completeness. A key component of any formalism is expressiveness. E.g. the rules of chess can be stated in about a page of first-order logic, whereas it takes 100,000 pages to do so in propositional logic. Humans operate on the 1-page version. The question is how we get there.
The focus of research projects should involve issues of various time scales, extended deliberation, a varied environment. A human-scale, dexterous 4-legged, seeing robot would be exceptionally valuable to the project.
13 July 2006 in AI@50, HCI, Meaning | Permalink | Comments (0) | TrackBack (0)
[AI@50]
We use our polling dongles to vote on several questions. The most interesting question: When will computers be able to simulate every aspect of human intelligence? 41% of us said "More than 50 years," and 41% said "Never."
13 July 2006 in AI@50, HCI, Meaning | Permalink | Comments (0) | TrackBack (0)
[AI@50]
Filene Auditorium, Moore Hall
Dartmouth College
Hanover, NH
Conference notes by Meg Houston Maker
John McCarthy
What Was Expected, What We Did, and AI Today
McCarthy corrects Heckman: "Artificial intelligence is not, by definition, simulation of human intelligence."
The symbolic role of the Dartmouth summer project of 1956 was more important than its specific results.
McCarthy's interest in AI started at a 1948 symposium at CalTech, wherein the brain and computers were compared. However there were no computers yet, so it was all somewhat -- theoretical. Turing had proposed them at that point, but they hadn't been realized yet.
"Artificial intelligence" as a term was chosen "to nail the flag to the mast," because McCarthy was disappointed in how few research papers dealt with making machines behave intelligently. The original idea of the summer project was that participants would work together, but in the end, each had his own research agenda, and they came to campus at various times and for varying lengths of time. "But the real reason we didn't live up to grand hopes was that AI was harder than we thought."
When will we have human-level AI? This is the wrong question. The right question should be focused on the idea that we will reach human level AI when someone solves basic problems.
Three classical problems of AI:
1. The frame problem -- how to avoid specifying what doesn't happen when a action occurs
2. The qualifacation problem -- how to avoid specifying every qualification for an action to be successful
3. The ramification problem -- how to avoid specifying all the side effects of an action
[paraphrasing] "All three have been solved in important contexts and for important applications, but none have been solved at the human level of intelligence. These ideas require extensions to logic," primarily non-monotonic logic. There are also probably several important problems nobody knows about yet.
To have human-level intelligence, we need the property of self-awareness.
Question from audience: Where are we in simulating human-year reasoning? Like a 1-year-old? A 2-year-old?
Answer: In some respects machines are ahead, in others they're not even up to a year. One of my complaints is that people in AI, psychology, and philosophy are too apt to focus on appearance rather than the reality that appears behind the appearance. Just drawing patterns of appearance (of things) is not enough. We are middle-sized objects, and we have the ability to recognize other middle-sized objects that existed before us.
Marvin Minsky
The Emotion Machine
Minsky uses a Mac.
An anecdote: Minsky tells a story about driving his daughter around in the car when she was 18 months old. She would periodically yell "Care!" Dad and mom couldn't figure this out. Turns out that she had seen TV adds for the organization CARE, whose logo is rendered as a stencil. And driving around, she had been seeing telephone poles with stenciled numbers on them, and had associated the stencil concept with the word Care. Logic and reasoning are complex!
Minsky is not a fan of ontologies, because reality isn't rigid. A plane is like a bird in many respects, but not in many others.
Much of the early AI work (60s and 70s) solved high-school and college-level logic and math problems. But today, there is still no vision program than can recognize the objects in a typical room, and there is no language program can answer simple questions about a children's story.
AI researchers have "physics envy" -- they wish for very general formal principles that can be taken as theory of thinking. "AI-ers user Occam's Razor too much!" The brain has myriad architectures. Biology is messy, brains use many different procedures and representations. If you want to understand a cognitive phenomenon, you should make at least 3 theories because it's likely the brain does each of those things in several ways. Look for multiple theories rather than elegant, complex ones. You have to think of the mind as a big jigsaw puzzle, and you can maybe cover each part with an elegant mathematical theory, but you still have to link them together.
We have different ways to think about different problems (and that's what makes us so smart): Analogy, Planning, Simplify, Reformulate, Simulate to Anticipate, Contradition. (See his new book, The Emotion Machine.) We need to formulate a machine that has reflective systems.
In conclusion, there are too many specialists, and we need more self-criticism about which methods are good for what problems.
13 July 2006 in AI@50, HCI, Meaning | Permalink | Comments (1) | TrackBack (0)
[AI@50]
Filene Auditorium, Moore Hall
Dartmouth College
Hanover, NH
Conference notes by Meg Houston Maker
I check in and am handed an audience response dongle from Turning (not Turing) Technologies. These will be in heavy use, apparently.
Filene Auditorium has precisely two (2) single gang electrical outlets for audience members, one on each wing wall, about half-way down the rows. Result? Because I plan to be here all day, I've claimed a seat on the side so my PowerBook G4 can guzzle merrily as I blog. (It's called a POWERbook for a reason.) So, Dartmouth takes two points for failing to provide outlets in the floor every few seats when they built this place a few years back.
People are mingling, entering, settling. The crowd is heavily male, but there's lots of age diversity. There are about 100 of us. John McCarthy, Marvin Minsky, Trenchard More, Ray Solomonoff, and Oliver Selfridge are all here. Carol Folt enters, shakes hands with Jim Moor, and smiles. We'll begin soon.
The conference opens with a short film, which itself opens with, if you can believe it, a trumpet herald.
Speakers:
Jim Moor, Dartmouth College
Carol Folt, Dean of Faculty, Dartmouth College
Barry Scherr, Provost, Dartmouth College
Carey Heckman, Dartmouth College
Carol's remarks are gracious, welcoming, and perfunctory, but Barry's done his homework and provides a bit of history and context.
Carey Heckman, "Tonypandy and the Origin of a Science"
Concerning the history of the 1956 conference. Tonypandy means received wisdom; stories that are assumed to be true. There's quite a bit of this surrounding the original gathering. It wasn't really a conference; it was more like an 8-week period of glorified poster sessions, despite what you read if you Google it. The term "artificial intelligence" was coined in the proposal, not at the conference, and by the time of the conference it was a term with some currency. One of the greatest contributions was in bringing together the experts' disparate research paths into a more focused project on the simulation of human thought. And interestingly, what survives as the project's greatest contribution to the literature is not a report from this conference (which was never produced), but the proposal.
So, without further ado... let the conference begin.
13 July 2006 in AI@50, HCI, Meaning | Permalink | Comments (0) | TrackBack (0)
[AI@50]
I'll be attending the AI@50 conference this week in Hanover, NH. This gathering celebrates, explores, and, to an extent, reprises the original Dartmouth Summer Research Project in artificial intelligence of 1956, which proceeded "on the basis of the conjecture that every aspect of learning or any other feature of intelligence can in principle be so precisely described that a machine can be made to simulate it." John McCarthy, then a Dartmouth mathematics professor, and his colleagues Marvin Minsky (Harvard), Nathaniel Rochester (IBM), and Claude Shannon (Bell Labs) coined the term "artificial intelligence" in their funding proposal to highlight the role computers may play in simulating (or bettering) human intelligence.
I'm planning to blog the conference. Stay tuned.
11 July 2006 in AI@50, HCI, Meaning | Permalink | Comments (4) | TrackBack (0)
Yesterday I received two email inquiries into my willingness to sell one of my domain names. This is not uncommon. The first email was typical: formulaic, slightly odd grammar, a few misspellings. Only one line about the domain name itself. Could easily have been generated algorithmically by an app walking a database.
The second one was different. It was personable. It was friendly. The grammar was good (huzzah). It made note of this weblog, and referenced actual posts. In short, it was generated by a living, breathing, human. It was believable.
Which is the point of the Turing test. The machine convinced me it was human. And so it was.
01 June 2006 in Information, Meaning | Permalink | Comments (0) | TrackBack (0)
Excerpts from my recent paper on semantic transparency - see previous post
Classical artificial intelligence (AI) holds that cognition is computational and so is primarily about information processing. Information is encoded within a cognitive system as symbols, manipulated according to rules to produce meaningful output. Symbols are persistent, reusable, and re-combinable elements that hold their meaning over time and across various contexts. The symbols are tokened in the system whenever they're needed to represent old concepts or to learn new ones. Systems that express these properties are called Physical Symbol System (PSS) architectures, because they run on physical hardware (whether biological or electro-mechanical) and rely on symbolic representation to model mental processes.
Symbols in PSS architectures carry meaning - express semantic properties - within the system. A system is said to be semantically transparent if the symbols express agreed-upon, natural language concepts. For example, in a semantically transparent PSS, the symbol "dog" is the symbol for a dog; "3" is the symbol for the quantity 3; and "John loves Mary" is an aggregation of symbols for "John," "loves," and "Mary." These symbols would be stored in the system's lookup tables or memory; the PSS architecture is structured so that these tokens are searchable and accessible. Simple or rudimentary ideas are thus represented by equally simple symbols, and the task of the system is to manipulate these, following the specified rules and requirements.
In semantically transparent AI systems, therefore, symbols closely map to the way humans articulate a concept. This highlights the significance of semantic transparency to researchers concerned with modeling human cognition. Clark (2001) notes, "These kinds of symbols reflect our own ideas about the task domain... they make it immediately obvious why the physical device is able to respect specific semantic regularities." Clark continues that since Classical AI holds that "intelligence resides at, or close to, the level of deliberative thought. This is... the theoretical motivation for the development of semantically transparent systems - ones that directly encode and exploit the kinds of information that a human agent might consciously access when trying to solve a problem."
Not all cognitive models rely on PSS architectures to represent information and carry meaning. In so-called connectionist systems, knowledge is not tokened in re-purposable symbols, as in the Classical AI model. Rather, information exists as a set of connection weights between the system's processing units. These units are connected to each other with various strengths, and each unit's activation (stimulatory or inhibitory) is a nonlinear function of the sum of all influences from the units feeding into it. Concepts are therefore expressed as a pattern of distributed activity; these models are often called parallel distributed processing (PDP) systems, acknowledging both the distributed nature of the activation patterns and its parallel (versus serial) processing methods.
Such connectionist networks have been found to be successful at both representation and learning. In practice, researchers feed a PDP system data and, depending on the output, tune the connection weights to reduce errors. Once suitably trained, a network can produce accurate output. A simple network was able, for example, to discriminate sonar echoes from undersea mines (vector output <1,0>) and rocks (<0,1>) (see Churchland, 1990). A more complex network was able to convert English text into recognizable speech, "discovering" the 26 English phonemes and the vowel/consonant distinction in the process (see Sejnowski and Rosenberg, 1987).
These systems cannot be said, therefore, to be "semantically transparent" like Classical AI systems. One cannot peer into them and find tokens representing natural language inputs or outputs, like "coffee" or "rain." Clark (2001) observes, "whereas basic physical symbol system approaches displayed a kind of semantic transparency such that familiar words and ideas were rendered as simple inner symbols, connectionist approaches introduced a much greater distance between daily talk and the contents manipulated by the computational system."
However, it is not impossible to regain the concept of "coffee" from a connectionist system. Smolensky (1991) notes that connectionism is committed to the notion that "mental processes are vectors partially specifying the state of a dynamical system (the activities of units in a connectionist network), and that mental processes are specified by the differential equations governing the evolution of that dynamical system." Since the system's knowledge inheres in a set of connection weights, it's possible to mathematically transform the output to reveal its constituent parts.
Clark (2001) observes, "The activation of a given unit... thus signals a semantic fact: but it may be a fact that defies easy description using the words and phrases of daily language. The semantic structure represented by a large pattern of unit activity may be very rich and subtle indeed, and minor differences in such patterns may mark equally subtle differences in contextual nuance."
Connectionist systems, then, are not without semantics; they are merely without readily (human-) accessible semantics. And what's more, the ability within the system to represent a range of values, not simply "on" or "off" states for each tokened value, as in Classical systems, means that connectionist systems offer more flexibility in representing ambiguities and, concomitantly, possibility or potential meaning in a way that a simple activated symbol system could never afford. For example, a connectionist system might read a series of sonar echoes and determine that the readings are more "mine-like" than "rock-like," but not necessarily one or the other, with, e.g., a unit activation pattern of < .8, .2 > rather than < 1, 0 >. Such system, in it capacity to represent these ambiguities, offers a broader expressive range than is possible with simple PSS architectures.
Churchland (1990) notes, "What we are confronting here is a possible conception of 'knowledge' or 'understanding' that owes nothing to the symbolic and sentential categories of current common sense and of traditional approaches in AI... An individual's overall theory-of-the-world, we might venture, is not a large collection or a long list of stored symbolic items. Rather, it is a specific point in that individual's synaptic weight space." In other words, even a system that does not represent knowledge with natural language may, nonetheless, be capable of inhering semantic content - content that is meaningful and, possibly even more representative of the continuous or "gray-scale" nature of human knowledge.
Sources and Further Reading
Churchland, Paul M. (1990). Cognitive Activity in Artificial Neural Networks. In Robert Cummins and Denise Dellarosa Cummins (eds.), Minds, Brains, and Computers: The Foundations of Cognitive Science. Malden MA: Blackwell Publishers, Inc., 2000; pp. 198-216.
Clark, Andy (2001). Mindware. New York, Oxford University Press.
Fodor, Jerry (1975). The Language of Thought: First Approximations. In Robert Cummins and Denise Dellarosa Cummins (eds.), Minds, Brains, and Computers: The Foundations of Cognitive Science. Malden MA: Blackwell Publishers, Inc., 2000; pp. 51-68.
Fodor, Jerry and Brian P. McLaughlin (1990). Connectionism and the Problem of Systematicity: Why Smolensky's Solution Doesn't Work. In Robert Cummins and Denise Dellarosa Cummins (eds.), Minds, Brains, and Computers: The Foundations of Cognitive Science. Malden MA: Blackwell Publishers, Inc., 2000; pp. 273-285.
Marr, D. (1982). Vision. In Robert Cummins and Denise Dellarosa Cummins (eds.), Minds, Brains, and Computers: The Foundations of Cognitive Science. Malden MA: Blackwell Publishers, Inc., 2000; pp. 69-83.
Sejnowski, Terrance J. and Charles R. Rosenberg (1987). Parallel Networks that Learn to Pronounce English Text. In Robert Cummins and Denise Dellarosa Cummins (eds.), Minds, Brains, and Computers: The Foundations of Cognitive Science. Malden MA: Blackwell Publishers, Inc., 2000; pp. 259-272.
Smolensky, Paul (1991). Connectionism, Constituency, and the Language of Thought. In Robert Cummins and Denise Dellarosa Cummins (eds.), Minds, Brains, and Computers: The Foundations of Cognitive Science. Malden MA: Blackwell Publishers, Inc., 2000; pp. 286-306.
25 February 2006 in Meaning | Permalink | Comments (0) | TrackBack (0)
What is that?
That's the question I've been pondering for over a week, when it was floated as a possible paper topic for my cognitive science class. Well, I almost know how a computer scientist would define it, and a bit of googling reinforces that apprehension. But how would a philosopher define it? How would a neuroscientist?
Classical AI folks think knowledge is symbolic representation, and once you take care of the syntax, semantics will follow. Connectionist (neural network) folks aren't much more enlighted on the subject of syntax begetting semantics, except they believe there are no stored syntactic tokens, per se. I don't think you can ignore semantics for long if you're really interested in modeling the human mind.
But this isn't just semantics, it's semantic transparency. Meaning, I assume, that the meaning itself it carried (how?) between systems (of what kind?) in a lossless way. Seems like a tall order, for any system - maybe especially a biological one.
Anybody care to put in their two bits?
Update: I think I've partly figured it out...
15 February 2006 in Meaning | Permalink | Comments (2) | TrackBack (0)