We humans have no trouble recognizing our friends (or enemies, or relatives, or children) the instant we see them. We can look at a table full of stuff and pick out the one item we want, usually with little or no error. We are able to use a variety of systems for visual communication; and in fact we can embellish these communication symbols to distraction and still recognize them.
A machine cannot. We use machine vision in factories, and there it works well--a machine can pick out bad examples of a product or material and reject them--but this role is really limited.
This is so because of how it's implemented. For products (rather than, say, fruit or vegetables) the machine generates an outline of the object being scanned and checks it against a database of outlines of similar objects; if the parameters fall within a certain range the product is not rejected. Anything outside this range is shunted to one side for human inspection. But the machine can only recognize the product it's programmed to inspect; anything which doesn't look like that product is automatically sent to the reject pile. If the machine is inspecting Pepsi cans and a can of tomatos somehow comes down the pipe, the tomatos go into the same chute as the bad Pepsi cans. The machine doesn't report, "Someone put a can of tomatos in the Pepsi line" or anything. It doesn't know what tomatos are, nor does it care. All it knows is that it's not a good Pepsi can.
...trying to get the machine to do much more would only reduce its efficiency and speed. Because the machine can spend only a few milliseconds at best evaluating the can, and because searching the database for a match takes time, its ability to differentiate is limited to "good/not good".
(There is another way: the generated outline is treated as a set of vectors. This eliminates the need for the database; but then the computer must calculate the endpoints of the vectors and determine whether or not they fall into the acceptable error ranges. It still takes time.)
Bar codes are a method for making products almost instantly recognizable to machines: each product has its own code, and the code matches a description and price in a numeric database which is extremely simple to search quickly. But that's also of limited utility.
If you want to build a domestic robot, none of that is any good. In the former case, the robot will spend more time staring at things and trying to figure out what to do with them than it will in actually doing anything. The number of different objects found in the home is staggering; a robot just sorting the silverware would take all freakin' day because it would have to compare the fork in its hand to all kinds of objects. ("Not Chapstick. Not chart. Not dart. Not....") Even a context-sensitive search would take too much time.
This was the point I got to: how do you get around this? Our computer technology is serial in nature; the computer's processor reads a byte from memory and does something with it, then repeats the process. Modern processors take eight bytes, actually, at a time, since they're 64-bit processors; and multi-processor computers complicate the issue a bit. But it's still largely a serial process; it is not really parallel, not compared with, say, the human brain.
When you're sorting silverware, you don't need to stop and look at each piece and figure out what it is; the pattern recognition is automatic and near-instantaneous. ("Fork. Fork. Spoon. Salad fork. Knife. Watch--how the hell did my watch end up in there?") The human brain automatically recognizes the familiar, to the point of trying to apply known templates to unfamiliar objects before it acknowledges that it's never seen X or Y before.
Then I decided to reduce the problem to its simplest form; what about words? When you want to think of a specific word, how do you remember it out of all the words in your brain? You pretty rapidly filter out unrelated things and come to the word you want, but how?
Then it hit me.
Quantum computing promises to make solving complex problems very fast. The example I recall best is likened to finding your keys, randomly lost in a 100-room hotel. You don't know what room they're in and you must find them, else you're stuck there.
Present-day computing requires that you enter each room and look for your keys in it. Quantum computing essentially lets you search all the rooms at once by setting up a probability waveform, then allowing it to collapse; the dictates of quantum computing ensure that if the program is correctly written, the collapsed probability waveform will indicate the room with your keys in it.
NO, I have no freakin' clue how they do that. But it works; they've done it.
It's like Schroedinger's cat which--ironically--was meant to be a refutation of quantum superposition but instead provides a very easily understood example. The search program applies superposition, then collapses the probability waveform, and there are your keys, in room 53.
And I realized, This has to be how the human brain does it.
It makes sense; there is no reason that biology cannot make use of quantum processes. In fact, not long ago I read something about the two halves of DNA nucleotides being held together with quantum entanglement.
How the brain does it, how brain chemistry is related to this function, how any of the things we know about the brain fit into this theory--I don't know. But I do know that it fits better than any other analogy we've come up with for how the brain sifts so rapidly through massive piles of data.
The brain doesn't just recognize patterns, though; the connection of a real-world object to its corresponding mental model brings with it instant access to everything the brain knows about that object: a fork is an eating utensil, usually metal but sometimes plastic or other materials. It's held so when used. It should be clean before it's used or put away. ("Clean": ....) It goes in this section of the drawer.
Everything known about "fork" is instantly accessible because the brain remembers the entire gamut as a unit--not just its shape and size, but its usages, its care, its storage, composition, weight--everything; and this works so well that a typical human brain can even fit the known properties of the fork against novel situations: "Damn it, this stupid trim fastener won't come loose. What I need is a special tool...hey, I bet a fork will get it out." Presto; and the new use for the fork is stowed along with the other information for future reference. (There is now a fork in my toolbox for exactly this purpose, BTW, so I know of what I speak.)
So to go back to our robot: now the robot can sort silverware pretty quickly. It selects a piece, sets up a quantum superposition, collapses it, determines that it's now holding spoon, and consults its internal database on spoon...and places the spoon in its proper place in the silverware drawer. Next up, knife....
Without an entirely new computational paradigm, the robot's memory won't work precisely like that of a human's; but it doesn't need to, because our processor technology operates incredibly faster than the human brain does. The hard part is pattern recognition; get that out of the way and a multiprocessor or two running at 3 GHz is enough computing power to handle the rest.
* * *
As an aside: one of the holy grails of automation is the self-driving car. You'd pull out onto the highway and tell your car, "You drive for a while," and then take a nap or pull out a book or watch TV or whatever. But it's not easy to accomplish, because there are all kinds of things you can run into, and it's not easy to get a computer to tell the difference between road and not-road.
The technology is much better now than it was 25 years ago: back then a self-driving machine could only move at a few miles an hour and it required a complex video scanning system. Now they can make it work at highway speed with regular video cameras. And it still relies on edge detection and figuring out where the road is and keeping the car's position within a certain set tolerance of those edges. But it's still not quite ready for mass production, because it still makes mistakes; and it won't be ready until it can do the right thing every time.
Faster computers have given these systems the ability to do their pattern recognition at a really high speed, but it's not quite good enough; drop it into a novel situation and its only options are to alert the human at the wheel to take over, or do a panic stop, or both. If the "novel situation" is a cow in the roadway, it's not going to do the cow much good. (Or you, or the guy behind you.)
* * *
This is the kind of stuff I think about when I'm trying to get to sleep.