|
|
Invited Speakers
IAPR/ICDAR Outstanding Achievements Award
A View on the Past and Future of Character and Document Recognition
Hiromichi Fujisawa
Hitachi R&D
Japan
|
Abstract:
The talk presents an industrial view on the character and document recognition technology, looking from the past to the future. Since the birth of commercial Optical Character Readers (OCRs) in 1950s in US, the character and document recognition technology has made tremendous advancement, always supporting industrial and commercial applications. At the same time, these business applications have been always supportive to the investment in the new technology developments. We can see a virtuous cycle in here. It seems, however, that the wave of IT technologies is surging over this area, which has been cultivated over the period of more than fifty years. Namely, most of the information seems to be born in digital, possibly diminishing the demands for this technology. Other views are possible as well. For instance, search technologies, which have established their ubiquity, are expanding its territory to image documents. This talk attempts to discuss these opposing possibilities by reviewing some of the technical advancement and the future trends.
Download paper (PDF version) Download presentation (PDF version)
Biography:
Dr. Hiromichi Fujisawa joined Central Research Laboratory, Hitachi, Ltd. in 1974. Since then, he has engaged in research and development on handwritten character recognition, document understanding including postal address recognition and forms processing, and document retrieval. He led research teams for developing business OCR systems, postal address recognition engine, document management system, full-text search machine, etc. He has been involved in such international activities as academic conference organization, technical journal editing and international standardization. He was a visiting scientist at Carnegie Mellon University in 1981 and at Stanford University in 2006. He is now a Corporate Chief Scientist at Research & Development Group of Hitachi, Ltd., and a head of Global Standardization Office of Hitachi. Dr. Fujisawa is a Fellow of the Institute of Electronics and Electrical Engineers (IEEE) and a Fellow of the International Association for Pattern Recognition (IAPR).
|
|
|
Energy-Based Learning in Document Recognition and Computer Vision
Yann LeCun
Courant Institute of Mathematical Science, New York University
USA
|
Abstract:
Over the last few years, the Machine Learning and Natural Language Processing communities have devoted a considerable of work to learning models whose outputs are "structured", such as sequences of characters and words in a human language. The methods of choice include Conditional Random Fields, Hidden Markov SVMs, and Maximum Margin Markov Networks. These models can be seen as un-normalized versions of discriminative Hidden Markov Models. It may come to a surprise to the ICDAR community that this class of models was originally developed in the handwriting recognition community in the mid 90's to train handwritten recognition systems at word-level discriminatively. The various approaches can be described in a unified manner through to concept of "Energy-Based Model" (EBM). EBMs capture depencies between variables by associating a scalar energy to each configuration of the variables. Given a set of observed variables (e.g an image), an EBM inference consists in finding configurations of unobserved variables (e.g. a recognized word or sentence) that minimize the energy. Training an EBM consists in designing a loss function whose minimization will shape the energy surface so that correct variable configurations have lower energies than incorrect configurations. The main advantage of the EBM approach is to circumvent one of the main difficulties associated with training probabilistic models: keeping them properly normalized, a potentially intractable problem with complex models. Energy-Based learning has been applied with considerable success to such problems as handwriting recognition, natural language processing, biological sequence analysis, computer vision (object detection and recognition), image segmentation, image restoration, unsupervised feature learning, and dimensionality reduction. Several specific applications will be described (and, for some, illustrated with real-time demonstrations) including: a check reading system, a real-time system for simultaneously detecting human faces in images and estimating their pose; an unsupervised method for learning invariant feature hierarchies; and a real-time system for detecting and recognizing generic object categories in images, such as airplanes, cars, animal, and people.
Download paper (PDF version)
Download presentation (PDF version)
Biography:
Yann LeCun was born near Paris in 1960. He received a Diplôme d'Ingénieur from the Ecole Superieure d'Ingénieur en Electrotechnique et Electronique (ESIEE), Paris in 1983, a Diplôme d'Etudes Approfondies (DEA) from Université Pierre et Marie Curie, Paris in 1984, and a PhD in Computer Science from the same university in 1987. His PhD thesis was entitled Modeles connexionnistes de l'apprentissage" (connexionist learning models) and introduced an early version of the back-propagation algorithm for gradient-based machine learning. In 1987, he joined Geoff Hinton's group at the University of Toronto as a research associate. He then joined the Adaptive Systems Research Department at AT&T Bell Laboratories in Holmdel, NJ in 1988. In 1991, he spend six months with the Laboratoire Central de Recherche of Thomson-CSF in Orsay, France, after which he returned to Bell Labs. Shortly after AT&T's second breakup in 1996, he became head of the Image Processing Research Department, part of Larry Rabiner's Speech and Image Processing Research Lab at AT&T Labs-Research in Red Bank, NJ. In 2002, he became a Fellow of the NEC Research Institute (now NEC Labs America) in Princeton, NJ. He joined the Courant Institute of Mathematical Sciences at New York University as a Professor of Computer Science in 2003. Yann LeCun is associate editor of Pattern Recognition and Applications. He was associate editor of the Machine Learning Journal , and of the IEEE Transactions on Neural Networks . Since 1997, he has served as general chair and organizer of the "Learning Workshop" held every year since 1986 in Snowbird, Utah. He has served as program co-chair for CVPR 2000, CIFED 98, NIPS 95,94,90, INNC 90, and IJCNN 89. He has served on the program committee of numerous conferences and workshop. He was plenary keynote speaker at CVPR 2000, gave tutorials talks at the Newton Institute, Cambridge, 1997, ICPR 1994, the INRIA/CEA/EDF summer school 1994, NIPS 1993, AAAI 1990, the Cold Spring Harbor Neuroscience summer school 1990, Connectionism in Perspective, Zurich 1988, and the CMU Connectionist summer schools 1988, and 1986. He has given over 20 invited talks at various international conferences and workshops. Yann LeCun has published over 90 technical papers and book chapters on neural networks, machine learning, computer vision, pattern recognition, handwriting recognition, image compression, document understanding, image processing, VLSI design, and information theory.
|
Google Book Search: Document Understanding on a Massive Scale
Luc Vincent
Google Inc.
USA
|
Abstract:
Unveiled in late 2004, Google Book Search is an ambitious program to make all the world's books discoverable online. The sheer scale of the problem brings a number of unique document analysis and understanding challenges that are outlined in this paper. We also go over some of the ways that Google is working with the document analysis research community to help push the state of the art.
Download paper (PDF version)
Biography:
Luc Vincent is a recognized image analysis and computer vision expert, with 20 years experience and over 60 publications. He joined Google in 2004 and is presently leading several engineering projects focusing on vision, image processing, document understanding and Optical Character Recognition. Before joining Google, Dr. Vincent was Chief Scientist, then Vice President of Document Imaging at LizardTech, a developer of advanced image compression software. Prior to this, he led a large research and development team at the prestigious Xerox Palo Alto Research Center. He was also Director of Software Development at Scansoft (Now Nuance) and held various other management and individual contributor positions at Xerox Corporation. In addition, he has been a consultant to a number of large organizations, including ChevronTexaco, Agilent, Alza, Mitsubishi and Carl Zeiss. Dr. Vincent served as an Associate Editor for the IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) and for the Journal of Electronic Imaging. He was chairman of SPIE's conference on Document Recognition from 1994 to 1997, general chair and organizer of the International Symposium on Mathematical Morphology (ISMM) in 2000, and has served on the program committee of numerous conferences and workshops. Dr. Vincent earned a B.S. from Ecole Polytechnique, M.S. in Computer Science from University of Paris XI, and PhD in Mathematical Morphology from Ecole des Mines, all in Paris, France.
|
|
Advances in Writer Identification and Verification
Lambert Schomaker
Rijksuniversiteit Groningen
The Netherlands
|
Abstract:
The behavioral-biometrics methods of writer identification and verification are currently enjoying renewed interest, with very promising results. This paper presents a general background and basis for handwriting biometrics. A range of current methods and applications is given. Results on a number of methods are summarized and a more in-depth example of two combined approaches is presented. By combining textural, allographic and placement features, modern systems are starting to display useful performance levels. However, user acceptance will be largely determined by explainability of system results and the integration of system decisions within a Bayesian framework of reasoning that is currently becoming forensic practice.
Download paper (PDF version)
Download presentation (PDF version)
Biography:
Lambert Schomaker (19-2-1957) is professor of Artificial Intelligence and director of the AI research institute "ALICE" of Groningen University. His research concerns pattern-recognition problems in handwriting recognition, writer identification, handwritten-manuscript retrieval and related topics. Recent work involves large-scale historical handwriting retrieval on high-performance computers. He was the project coordinator of a large European project on multimodality in multimedial interfaces, and has enjoyed collaborative research projects with several industrial companies. Apart from research, his duties involve teaching courses in artificial intelligence and pattern classification. Prof. Schomaker has been involved in the organization of several conferences on handwriting recognition and modeling. He is member of the IEEE Computer Society, the IAPR and the Belgian/Dutch AI Society BNVKI. He has been the chairman of IAPR/TC-11 "Reading Systems" and is currently chairman of the IAPR task force on quality control. He is chairman of the International Unipen Foundation for benchmarking of handwriting recognition systems. Within the Netherlands he has been member of the Advisory Board of several research institutes. He has contributed to over 40 ISI and IEEE publications yielding 250 Thomson/ISI citations and 246 NEC/Citeseer citations and has given five invited lectures at international conferences.
|
Last Updated 10.07.2007
|
|