Registration -
Powered by Elgg

Mike Wald :: Blog :: Annotating Multimedia for Community Folksonomy and Ontology Building

June 04, 2007

Annotating Multimedia for Community Folksonomy and Ontology Building

 

Aim

To make multimedia resources easier to access, search, manage, and exploit for all students and teachers by supporting the creation of annotations, metadata and schema (ontologies) based on social interactions and freeform tagging.

Problem Area

Many exciting opportunities for learning will occur in ‘Web 2.0’and ‘social software’ environments when information is being communicated through speech and video rather than just text. While multimedia has become technically easier to create and offers many potential benefits for learning and teaching, it also can be difficult to access, search, manage, and exploit. The growing amount of informal knowledge available in speech and audio-visual form rather than text has therefore yet to achieve the level of interconnection and manipulation achieved for textual documents via the World Wide Web. This makes it particularly difficult for systems to reason about multimedia content. Deaf students can also be greatly disadvantaged by the lack of captioning (subtitling) of multimedia with the very high costs of manual captioning often cited as the reason for non-compliance with web accessibility guidelines and national legislation[1].  

Innovative Application

The provision of multimedia consisting of text captions synchronised with recorded speech and images/video enables all their communication qualities and strengths to be available as appropriate for different contexts, content, tasks, learning styles and preferences. Text can reduce the memory demands of spoken language; speech can better express subtle emotions; while images can communicate moods, relationships and complex information holistically. Since there is little evidence that students’ preferred learning media can be predicted reliably through learning style instruments[2], the availability of text captions and spoken output of text would enhance students’ choice of media.

 

Speech Recognition (SR) can provide a cost-effective way of automatically creating text captions synchronised with audio and video allowing audio visual material to be manipulated through searching, tagging, annotating, bookmarking and hyperlinking. Students and teachers can create tags, folksonomies, metadata, taxonomies and ontologies to support the structuring of elements of the multimedia. Links can be made to sections of the original multimedia instead of creating copies for reuse as learning objects or as appropriate, to provide evidence for assessment and E-portfolios.

 The next generation of Web applications emphasise social interaction and user participation. This can take the form of social networks, collaborative content creation, and freeform metadata creation, in the form of tagging[3]. Recently we have also seen the emergence of Semantic Wikis which allow Wiki pages and links to be typed[4]. Such typing and tagging is currently used to aid search, and to support recommendations, but it is necessarily unconstrained – this makes it a lightweight activity, but limits the usefulness of the tags as the relationship between tags is not understood (for example, synonyms are not modelled).  Schemas or ontologies have a relatively high cost but allow for more powerful manipulation of resources than those tagged in a freeform way[5]. It is possible to examine the use of free tags and typing and extrapolate such a schema, called a Folksonomy, which reflects the evolving view of the community rather than the perspective of a particular design team[6]. This has the advantage that it is still lightweight, but also affords the kind of relationships necessary for advanced search and personalisation. The Folksonomy approach allows a community to reflect on their activities and develop tag richness, increasing the utility and reusability of resources. Folksonomy construction joins the advantage of the Web 2.0 model with the utility of more traditional approaches. It is an important consideration for the e-Framework, which must strike a balance between freeform tagging and structured annotation, and supports the agile and evolutionary development of information models. An overview of the system is shown in figure 1.

Expertise

The Learning Societies Lab research group in Southampton has extensive experience in information modelling, the social web, and mobile and ubiquitous computing. The members of the group have been involved in Hypertext, Web and Knowledge research for over fifteen years, and are internationally recognised for their application of these technologies to the domain of e-learning. In addition the group has advised HEFCE, BECTA, and JISC on accessibility, disability and technology issues and has worked with IBM and the International Liberated Learning (LL) consortium on researching and implementing the use of Speech Recognition (SR) engines to automatically create synchronised captions from live or recorded audio and video.  A prototype application is currently under development to enable text captions synchronized with audio and video to be annotated by students and teachers.

Multimedia Presentation

A multimedia presentation (audio playback requires Internet Explorer) using text captions synchronized with audio and Powerpoint slides and demonstrating some of the ideas can be found at:

 

http://www.soton.ac.uk/~mw3/_uk__2007_05_30_0.WAV_frames.html

·        The text is highlighted automatically in time with the audio and selecting ‘Where am I? ’ will ensure the text also scrolls automatically with the audio

·        You can use the audio controls to move backwards or forwards through the presentation or to pause it

·        You can use the ‘find’ facility in your browser to search for text and play from that position

·        Clicking on a Powerpoint slide image inline with the text will open it full size in a separate window

·        Clicking on a Powerpoint slide thumbnail image will move backwards or forwards through the presentation to that position

·        You can resize the frames (and so enlarge the thumbnails)

 



[1] Wald, M. (2006) Creating Accessible Educational Multimedia through Editing Automatic Speech Recognition Captioning in Real Time. International Journal of Interactive Technology and Smart Education : Smarter Use of Technology in Education 3(2) pp. 131-142.
[2] Coffield, F., Moseley, D., Hall, E., Ecclestone, K. (2004) Learning styles and pedagogy in post-16 learning: A systematic and critical review, Learning and Skills Research Centre
[3] Millard, D. and Ross, M. (2006) Web 2.0: Hypertext by Any Other Name?. In Proceedings of ACM Conference on Hypertext and Hypermedia 2006, Odense, Denmark.
[4] Boulain, P. R., Parker, M. B., Millard, D. E. and Wills, G. B. (2006) Weerkat: An Extensible Semantic Wiki. In Proceedings of 8th Annual Conference on WWW Applications, Bloemfontein, Free State Province, South Africa.
[5] Millard, D., Tao, F., Doody, K., Woukeu, A. and Davis, H. (2006) The Knowledge Life Cycle for e-learning. International Journal of Continuing Engineering Education and Lifelong Learning: Special Issue on Application of Semantic Web Technologies in E-learning 16(1/2) pp. 110-121

[6] Al-Khalifa, H. S. and Davis, H. C. (2006) Harnessing the Wisdom of Crowds: How to Semantically  Annotate Web Resources using Folksonomies. In Proceedings of IADIS Web Applications and Research 2006 (WAR2006)

   

Posted by Mike Wald

You must be logged in to post a comment.