Identifying Auxiliary Web Images Using Combination of Analyses
Websites contain videos, images, and graphics that are not always relevant to the user. They cannot be ignored during printing therefore paper is wasted. This algorithm can swiftly identify and extract informative content from any web page.
Tewson Seeoun, Choochart Haruechaiyasek, Toshiaki Kondo, Thailand
Hewlett Packard wants customers to enjoy a better and greener experience when they print web pages so they sought a solution through an open innovation challenge for multimedia researchers across the world.
Today’s web pages are dripping with multimedia content - videos, images, graphics, a plethora of hyperlinks - that are peripherally related to the main content. Most users click on a page solely for the information content, but when it comes to printing these pages, all the extra material comes with them. Although data-mining applications are generally good at finding the most relevant text, multimedia content has proved to be a much taller order to sort out.
HP wanted ideas to eliminate this paper mountain to come from an open innovation contest because of the expertise it would attract.
“Having a competition like this,” said Dr. Qian Lin, director of HP’s Multimedia Interaction and Understanding Lab, “is a great way to generate new ideas and find new approaches to resolving the most important questions that the industry faces over the next 2-5 years.”
The company is no stranger to open innovation; it readily opens itself up to contributions from customers, inventors, researchers, and partners to help it create products faster and with fewer wrong turns, and at a lower cost with far less risk. “The smart people are not all in the United States. We are tapping the best minds of people all over the world,” said HP Labs Director Prith Banerjee in 'Entrepreneur Journeys' by Forbes columnist Sramana Mitra.
To help ensure solutions would be tailored to their needs HP clearly defined the sort of application it was after. Basically, algorithms had to be 99% accurate for any web page of any language, and only print what people are interested in seeing on paper, for example news articles or maps. The computer models should be able to retrieve or label all the informative content.
The challenge formed part of the Multimedia Grand Challenge 2009 organized by the Association for Computing Machinery, and those who were deemed to have the best ideas were invited to demo their work in front of HP at the multimedia conference.
Submitted algorithms were tested by comparing them with manually labeled results from a group of preselected web pages.
The winners were Tewson Seeoun, Choochart Haruechaiyasek, and Toshiaki Kondo from Thailand. Their response to the challenge was a paper based on their research “Identifying Auxiliary Web Images Using Combination of Analyses” in which a particular algorithm is able to quickly recognize auxiliary content with an overall average accuracy of 94.08 percent. This earned them an honorable mention from HP. It did not reach the 99 percent target, but was close. The researchers will continue to refine their model, to iron out bottlenecks and improve some of the features.
Next Story »