Imagine this scene: an oak stand in rye, cloudless blue sky in the background, the sun hangs in the sky. I believe that all who see this description in mind a picture has emerged. Turn text into a picture, have always been uniquely human abilities. But the situation will soon be broken.
From Japan, University of Tokyo Hiroharu Kato and Tatsuya Harada also gives the same capabilities to your computer. Soon after, machines will be able to turn the text into the picture.
Computers, of course, imagination, after all, is limited, and the picture it presents is very simple, sometimes errors, even meaningless. But the technology still has a very important meaning, which presents the trend of the future, our computer will be more intelligent in the future, even human-like abilities.
For many years, computer scientists have been trying to make the computer to give it the ability, they wanted to use language to manage the images. For example, in the type a word into a search engine, or a list of words, then find pictures that highly relevant keywords.
We have in fact already implemented this feature, but your computer is not able to understand our language, but we have to add the label. Although the image search feature is useful, but its essence and image technology is almost unrelated.
So a few years ago, computer scientists are beginning to ask the image itself. They are trying to break the image into a series of pixels, they intercepted some pixels to image. For example, different pixel may represent different objects, such as the edge of the Cup, such as skin and sky.
In humans, the arrangement of pixels is meaningless, but it allows computers to recognize images: If a picture has a large number of representatives of pixels arranged in the sky, so this image's theme may well be the sky.
This means that the computer can compare to the picture at once. In this way, when you need some images of time, the computer can instantly search within the database, by comparing the pixel array of ways to find the images you need. Match the traditional image tag search, we can search for pictures more accurately. If two images of identical labels, pixel arrangement are approximate, so both pictures are similar must be very high. In fact, the researchers certain breakthrough has been made in this regard.
Similar to the text language, researchers called the pixel arrangement of "Visual text". And this new image analysis method is called "bag-of-Visual-words technique". In this way through the statistical distribution of the Visual text to image analysis.
Kato and Harada wants to solve the problem is just the opposite.
Visual text to your computer, and let it generate a complete picture, this step is more difficult. Because the Visual text to describe a part of a picture, but do not account for the part of the location in the image, and it is similar to what other visual text.
The two researchers said: "this step is a bit like we usually play jigsaw puzzles. Visual text is the all of the puzzle pieces, the biggest problem is that each piece is placed in the right position, so as to form a complete picture. "
In two completely different ways, the two researchers have solved the problem. The first, is more Visual than to smooth relations between the text. For example, a description language of glass, its most gentle transition, is next to its description language of glass.
However this ratio on the way and not as easy as it sounds, language, after all, is not a puzzle, it does not have a specific shape. So the researchers in two huge image database for a large number of languages are analyzed and compared in order to find such a smooth transition. Starting from the 70 s of a robot family context
The second way, is estimated for a specific language in the best location to pictures. For example, describing the language of the skies, its most likely the location is in the top of the picture.
Because the language itself does not contain location information, so Kato and Harada once again analyzed in the image database. They said: "every language has its best position. "Through than in the database, they offered different types of spoken and written languages to find the best position.
Of course, this calculation is dependent on your computer's processing power, uses images the size of the database and the number of languages. In theory, more data, more accurate results.
Despite the difficulties, Kato and Harada is broke, they in this way demonstrates to the world. They create a Visual text database made up of 101 images, each image shows different objects. They will resize the images to 128x128 pixels, the image size is 13x13 pixels composed of Visual text, each overlapping visual text as three-fourths.
After you have created a Visual text database, they can use this information to complete the next step: let the computer into an image description.
The practical effect of this technique is very surprising. Although some of the images it generates some inaccurate, even completely fail to understand, but also generates a lot of accurate image. For example, umbrellas, wrench, bucket, fish and human faces.
The technology probably will lead to many interesting programs in the future.
This technology they can use in the field of computer vision. Over the years, computer scientists have developed a number of automatic object recognition algorithm using these algorithms can identify various objects.
These algorithms are called "individual classifiers". They can provide high accuracy object identification, errors sometimes occur, however, the human eye can easily identify objects, will let computers do. In other words, computer vision technology now or are not reliable.
And such a situation is likely to be changed by Kato and Harada Institute. Their words can make individual classification of the image becomes more accurate. Computer vision closer and closer to human vision.
Finally, Kato and Harada can use this technology to make computer-generated images through human language. Humans using text it can be transformed into a visual language to make computer-generated images.
Human words into Visual language is not an easy thing. Two researchers conducted searches in the image data set, found all the images with text descriptions. Then these images add visual text. Following the completion of this tedious work, computer-human text through Visual text can be turned into an image.
Kato and Harada said: "in the test, some statement produces a completely pointless image. "The causes of this phenomenon, it is probably now the Visual text are not rich enough, and too simple. But believe that the study of development of computer-generated images will be more and more accurate.
In the area of computer vision, the two researchers of the invention is of great significance. If you search for "imagination" is defined, you will get the following results: "imagination is on the basis of the image, create new ideas, images in the mind, the ability to change. "In other words, Kato and Harada has created the world's first computer with imagination.
No comments:
Post a Comment