Colin's DITA blog: Coursework post

Evaluating and employing appropriate technologies for the digital representation of information.

There are many ways of digitally representing information ranging from a simple webpage through to a complex database containing many tables of information. The method you use depends on the purpose of the information and the information need of the user.

From the sessions on web 1.0 I feel that each of the methods from HTML web pages to complex databases using SQL queries and the process of “Information Retrieval” (IR) are each equally useful but only when appropriated correctly.

To visually display information that may not change a huge amount then a HTML web page is a very useful way of digitally representing the information for example a simple list of contact numbers or addresses. HTML is a useful tool because it is accessible by anyone with the Internet it is relatively simple to write and is of fairly low complexity to maintain and update. Of course if the information contains a large amount of data, for example lists of employees, then a form of IR or a database is more suitable. However, for a small business with less than say 10 employees HTML would be a much simpler way of storing basic contact details. Another use of an HTML web page is as a public facing data sheet. For example it could have the all the contact details, company information and any text information included on it and with the use of css style sheets can be made to look very interesting.

When you have a large amount of information that you own and therefore can manipulate then a database is by far the best solution as it allows users to centrally access information which could be in many formats, and easily allows queries to be developed to pull specific information from the database. In 1993 the UK Government published the following definition:

“SQL is the industry accepted Interface between applications and relational databases and is increasingly used to access non-relational data. It is therefore an important tool in achieving data integration across different databases.” CCTA 1993 (pg5)

The problems of this method are that SQL is a very difficult way of creating queries which depend on pinpoint accuracy by the user to correctly write their query using the correct terminology and spelling and the over reliance of a controlled vocabulary all of which can lead to problems. The plus points are that it allows multi user interaction and can be tailored to specifically suit individual’s needs. The government centre for information systems says of SQL implementations;

“SQL is a suitable language for applications requiring to store and manipulate data that can be represented as tables. Generally, implementations of SQL are targeted towards supporting larger, multi-user applications based on mainframes, mini-computers or large workstations.” CCTA 1993 (pg 15)

The main difference between SQL databases and Information Retrieval (IR) are that databases hold a high amount of data stored in many tables that can give accurate answers to specific questions, whereas an IR system can hold as much information organised into surrogate’s stored in tables which can then be searched using non specific key words and natural language. Rosenfeld and Morville say that;

“The database model is best applied to subsets or collections of structured, homogenous information within a broader website.” Rosenfeld and Morville 1998 (pg41)

Therefore when the information you have is not your own or is fairly unstructured then the best method of retrieving the information is by using (IR). This is where the user inputs search terms into some kind of search engine either in a natural language query or using a Boolean system to search the systems data to find what they are looking for.

For example a library catalogue would not use a SQL database as the information searched for has to be specific, they would use a form of IR where the user can search depending on there knowledge base either for an exact item or a vague subject search and all that lies in between and still retrieve adequate and accurate information at a much higher speed to using a database.

“Many studies indicated that users of information systems aren’t members of a single minded monolithic audience who want the same kinds of information delivered in the same ways. Some want just a little information, while others want detailed assessments of everything there is to know on a topic” Rosenfeld and Morville 1998 (pg102)

Managing data with appropriate information technologies.

For this section I am going to be looking at search methods in music information retrieval (MIR) and how the different kinds of search can be implemented on a web search for a particular piece of information (song title/artist/composer) and how similar searches could be implemented on more specific music information resources like a music organisations archive.

From what I have learnt about textual search methods I feel the use of natural language queries in MIR would be the most common search method that the average (non scholarly) music enthusiast would use followed by the use of Boolean operators to perform search modifications if they were unsuccessful on their first NLQ as this allows more specific results to be returned and in this type of search where many responses could well come up for a search. An important point is that any user will have a varying level of knowledge of the subject matter they are searching for and therefore will either use a known item search (KIS), fact search or a subject search to find the information they desire and this will affect their search terms.
“People often first encounter new music from various media (e.g., a movie, TV commercial, radio) without ever knowing the artist name and song title, or they often forget the information over the course of time. This brings up challenges to known-item-searches for music as those users must attempt to describe the sought music other than using the bibliographic information.” Jin Ha Lee 2010 (pg1025)
An example of this is in a web search for the lyric ‘dress this city in flames’ Google brings up the correct song but also brings up results for a different site as the main result. So to attain a more accurate search the use of Boolean operators could be used. By searching for ‘dress this city in flames AND lyrics’ the use of ‘AND lyrics’ means that all the result pages must also contain the word lyrics too so all the results should be song lyrics and from there the user can find the song they are looking for. This type of search would be a fact search as the user knows a fact and therefore has some knowledge of the subject and wants to find more information.

NLQ and Boolean operators are I feel are not only very good methods of fact searching but also could be very successfully used for a subject search. For example UK Hip-Hop could be refined using UK AND Hip-Hop. KIS will more often than not be a NLQ as the user knows what they are looking for and will search for it directly for example searching for “Paris in Flames by Thursday” will bring up the exact song they are looking for. This method of KIS would be used on a site like ITunes and their search method also suggests the most likely answer as you type therefore speeding up the search time which is among the most important aspects along with accuracy of result in IR.
However problems can occur when the user assumes they know something and in systems like ITunes the suggestion of titles can be extremely helpful for example if the user thought the song ‘Paris in Flames’ by Thursday was by Weezer and called ‘City in flames’ then the results returned would be inaccurate and the user would have to return to a fact search or subject search which are not supported on a system like ITunes.
“If someone were looking for music they previously heard, but all of the information they think is relevant to finding the item and is attempting to use in the search is incorrect, the search does seem to be a known-item search yet it is difficult to say the user really knows the object beyond its existence.” Jin Ha Lee 2010 (pg1025)

On more specific music sites then further search methods can be employed but because of the varying degree of knowledge between the average music searcher and the academic researcher the more advanced systems will often be aimed at the academic researcher who might wish to access a more complex collection that specifically holds musical scores or even the music itself. This opens up more complex search systems than text searches for example being able to search using intervals or scales that occur in pieces of music to find a collection of related material.
This is method is described in a paper by Peter van Kranenburg et al at Utrecht University when discussing Folk Song research in 2009;
“The Colonial Music Institute which promotes research in early American music and dance, offers an index for about 75,000 instrumental and vocal pieces from the period 1589-1839 (sic), including social dance tunes and songs. From each melody an incipit is present in the database. There are three way to browse these incipits: a sequence of scale degrees of all notes a sequence of scale degrees of stressed notes, and a sequence of intervals” (pg 27)

In conclusion I beleive that IR is a very useful tool in music libraries, bibliographies, databases and music web searches. The methods that suit MIR best in these scenarios I feel are NQL and Boolean operators, which should both be included in any form of music catalogue. As shown in my example of web searching the user’s perception of their level of knowledge affects the success of their searches, which can cause negative results to be retrieved, and therefore any search that does not allow Boolean operators would be severely hindered by missing out the ability to modify and clarify to improve results .
The use of non text searches of music catalogues is very interesting and something that really goes beyond the scope of this paper but it is something that if it could be implemented into a normal search of a catalogue or the web would lead to much higher accuracy and relevance of the information retrieved.

References

CCTA., 1993. Datababse language SQL explained. London: HMSO

van Kranenburg, P., Garbers, J., Volk, A., Wiering, F., Grijp, L. P. and Veltkamp, R C., 2010. Collaboration Perspectives for Folk Song Research and Music Information Retrieval:The Indispensable Role of Computational Musicology Journal of Interdisciplinary Music Studies, 4 (1), 17-48. Available from: http://www.musicstudies.org/JIMS2010/Kranenburg_JIMS_10040102.pdf [Accessed 27th October 2010]

Lee, Jin Ha ., 2010. Analysis of User Needs and Information Features in Natural Language Queries Seeking Music Information. Journal of the American Society for Information Science and Technology, 61 (5), Available from: http://0-web.ebscohost.com.wam.city.ac.uk/ehost/detail?vid=4&hid=105&sid=0c6a4d05-6167-42ba-a6bd-9e2c29d95c42%40sessionmgr111&bdata=JnNpdGU9ZWhvc3QtbGl2ZQ== - db=eoah&AN=21523896 [Accessed 27th October 2010].

Rosenfeld, Morville,P., 1998. Information Architecture for the World Wide Web. Sebastopol; O’Reilly & Associates.

http://colinbeard.blogspot.com/ last accessed 28/10/10

Colin's DITA blog

Saturday, 30 October 2010

Coursework post

No comments:

Post a Comment