Thursday, 9 December 2010

Coursework post 2

Using the Internet and evolving standards and technologies associated with the WWW to publish information in effective and accessible ways.

For this I am going to look at Web 2.0 and its use in libraries. From the lectures we received and the reading I have done in the area of web 2.0 and the much touted ‘Library 2.0’ I learned that many libraries where previously they may have been ignoring, avoiding or fighting Web 2.0 are now implementing some 2.0 activity. One of the most common is having Facebook pages for the library and also publishing blogs, wikis and tweeting about upcoming events/guides/faq’s etc . Some also make use of virtual ‘ask a librarian’ reference service. This is sometimes labelled Library 2.0 where “emphasis is placed on user-centered change and participation in the creation of content and community based services.” (Peltier-Davis, 2009)

One of the major components of web 2.0 in the library is the ability for users to ‘tag’ items in their catalogue. This allows users to expand on the metadata provided by the library and to create a network of resources. The positives of this mean that the metadata surrounding the catalogue is added to by the users of the information thus creating a richer set of data could make the item easier to find, an example of this is the ability to tag a book from the main library catalogue with a module code for a particular course thus allowing a different user to search for the module code and receive the information another user found useful for that course. Many university libraries are now implementing systems like this to augment the data surrounding their catalogues. This can also lead to book linking where a user who finds an information resource can then tag it so that it appears in the page for similar books that they have also found useful on a similar topic, for example if the user is researching Information Law finds a useful book and adds a correct tag this then widens the search results to include items that maybe aren’t solely focused on Information Law but perhaps have chapters. As mentioned in my blog some negatives with this and all Web 2.0 applications and interactions are that there can be inappropriate tagging, where items are tagged incorrectly or maliciously with a the lack of controlled vocabulary. In the traditional catalogue both are eliminated with the use of proper library metadata but in Web 2.0 they can be solved using either a variety of manual checking or automated checking and changing so that any words entered are changed to something within a controlled vocabulary, for example with a simple ‘did you mean’ function on the input page.

Another innovative way libraries can interact with their users via the Internet is by using a virtual librarian chat service. This is where when the user is on the library’s website and in particular searching the catalogue an instant messaging style box pops up and allows the user to ask any questions, much like they would when visiting the physical library, this adds value to a library website as it is a unique service that they can provide which could help persuade a user to use their services rather than relying on Google Scholar and Google Books. In an era where libraries need to prove their worth over systems like those mentioned to both users and funders a virtual librarian could be a very important tool that utilizes the power of web 2.0 in a effective and accessible way.

Identifying appropriate and innovative methods of digital data representation and organisation and assessing their potential for use in the information sciences.

For this I am going to look at the use of a semantic web in the library setting. The Semantic web is a term coined by Tim Berners-Lee to describe “an extension of the current one (WWW), in which information is given a well defined meaning, better enabling computers and people to work in cooperation” (Cited by Rubin, R, 2010.) there are many different definitions of the Semantic Web from many different sources but this I believe is a simple and easy to understand explanation from the man who first envisioned it. The idea is to create richer relationships between information that is machine-readable rather then the human readable current WWW information. This allows for unique links to be made between pieces of information depending on new connections such as “works for, is author of, depends on” (Rubin, 2010) rather than the current simple method of having one piece of data linked to another being the only connection.

This new method of ascertaining links between information relies on the Resource Description Framework (RDF), “In RDF, a document makes assertions that particular things (people, Web pages or whatever) have properties (such as ‘is a sister of,’ is the author of’) with certain values (another person, another Web page)” (Berners-Lee et al. 2001. P40). As described in my blog postings RDF’s are made up of triples containing a subject, an object and a predicate. These triples then form a web of data, with objects becoming subjects for further triples which create the web, which will contain many subjects and objects all interlinking in someway. From there we can develop a RDF schema which will describe the taxonomy for the RDF’s in whatever the domain of the schema is in. By using the Web Ontology Language (OWL) then the taxonomy and rules can show the links between information, for example if x is true then y must be true. Tim Berners-Lee describes this as the ‘Semantic Web Stack’.

The use of Semantic Web in libraries could come from the linking of the library catalogue to the web and the richness of information resources that it contains. Libraries already use a wealth of metadata in their catalogues and the library worker understands the need for this so it would make sense that if any progress were to be made in implementing a Semantic Web then using these channels, along side computer science workers, with all their expertise in metadata and cataloguing should be the way forward, to begin with at least. In Karen Coyle’s 2010 paper Library Data in the Web World she talks about this and how “With Web based data, we can use the vast information resources there to enhance our data by creating relationships between Library data and Information resources. This will not only increase opportunities for users to discover the library and its resources, but will also increase the value of the data by allowing its use in a wide variety of contexts.” (Coyle 2010). The Dublin Core Metadata Initiative has been pivotal in crossing metadata skills with the Semantic Web and they have produced their own set of standards for metadata, which can also be implemented into a Semantic Web.

The issues with implementing this are that it would/will take a huge effort by whoever decides to take on the task of creating the RDF’s and although in limited fields the benefits could be huge, medical research for example, the average user is happy with the current system of displaying information on the web and the investment needed does not out way the benefit gained at this moment in time. There is also the issue of trust in the data being used that it is correct and does not mislead or contain false information, for example in the MMR controversy . When coupled with he fact that information could also be marked with incorrect metadata all provide considerable hurdles to the Semantic Web working in any domain.

Utilising recent advances in information and communications technology to support the successful completion of a wide range of information related tasks with proficiency in an online digital environment.

For this I am going to look at mobile information and mobile devices, from the lectures on this topic and my blog post I believe there are many ways that this allows users to complete a wide range of information related tasks from anywhere in the world with nothing but a smart phone.

The advances in the last 3 years in mobile information technology have been vast. Since Apple announced the Iphone in 2007 the technological advances and possibilities in this area have grown exponentially. In these 3 short years apple have released 4 versions of the Iphone using the IOS, Google have released their own mobile OS in Andoid for use on a variety of phones, Windows have their own version of Windows 7 for smart phones as well as Blackberry and Nokia continuing their initial development of smart phone technologies. The advances of these technologies have led to new ways of searching for and utilizing information, which have become ingrained into modern life. In a Forbes online blog in 2009 Ewalt claimed that Apple alone have sold 50 million IOS capable devices (Iphone and Ipod touch)(Ewalt, D,M., 4th Nov 2009). The way that these devices utilise recent advances in information and communication technology is by allowing, via WIFI and 3g networks, users full access to the Internet wherever they are, utilizing context awareness to add a richness of data to any information gathered and allowing the transfer of files via Bluetooth.

I believe that the use of this in a library setting could take many forms from providing mobile versions of the library website and also a mobile version of their catalogue. This would allow customers to access the full library catalogue from their mobile device rather than using OPAC in the library or the full website on their device. As was discussed in my blog full websites can run slowly on mobile devices due to high graphical content and the need to do large amount of scrolling. Another way that academic or public libraries can utilise mobile devices and mobile information to allow users to satisfy some of their information needs is to provide an app for their organization.

As we discussed in the lab for this session and is mentioned in my blog this app could take many forms and provide many functions, it could include links to the mobile version of their catalogue, allow users to view their current items on loan and renew them, have a floor map of the library as well as a map that utilizes gps to show the route to the library from wherever they are. Future possible applications for the app could also allow users to check out the material with their phone by using a barcode scanner which exploits the mobile devices camera, it could also use augmented reality to guide the customer around the library to where they need to be (if the current gps technology is improved or abandoned for the more accurate Galileo system or similar).

Further features of mobile devices could be to replicate the users library card by containing a RFID chip that could be scanned to allow access to the library and from there the single mobile device could provide the user with everything they need to gain access to the library: search the catalogue, tell where the items are in the stock; guide to the item; check the item out and renew it in the future. Of course some of these possibilities are closer than others and depend on the user having a smart phone and the technology being cross compatible across 4 or 5 OS for phones. Also the Library would need to constantly update the mobile versions of the website and catalogue alongside the full versions.

References

Coyle, Karen,. 2010. Library Data in the Web World. Library Technology Reports, Feb2010, Vol. 46 Issue 2, p5-11, 7p. Available from http://ehis.ebscohost.com/eds/pdfviewer/pdfviewer?vid=2&hid=121&sid=bc989ed8-8c7c-4b61-830f-f600d48e16d0%40sessionmgr112 [Accessed 8^th December]

Ewalt, D,M., 4th Nov 2009. Apple’s Shocking App Store Numbers

. Digital Download (Forbes). Available from:http://blogs.forbes.com/digitaldownload/2009/11/04/apples-shocking-app-store-numbers/ [Accessed December 8^th]

Peltier-Davis, Cheryl,. 2009. Web 2.0, Library 2.0, Library User 2.0, Librarian 2.0: Innovative Services for Sustainable Libraries. Computers in Libraries, 29(10), 16-21. Available from http://0-web.ebscohost.com.wam.city.ac.uk/ehost/detail?vid=1&hid=113&sid=72761674-081e-429b-a03e-0ff1ceff362d%40sessionmgr114&bdata=JnNpdGU9ZWhvc3QtbGl2ZQ%3d%3d [Accessed 8^th December 2010]

Rubin, R,E,. 2010. Foundations of library and information science, 3^rd ed, New York, Neal-Schuman Publishers, INC.

http://colinbeard.blogspot.com/ [accessed 9/12/10]

DITA session 10

Information Architecture part 2

We looked at organisation systems for data and the Schemes, Structures and Labelling Systems used.

Schemes can be exact or ambiguous

Structures are the technology that support the schemes

Labelling systems give meaning to the schemes so that users know how they work

Exact systems are easier to make but are rarely used as the user has to know the exact piece of information they want to find. They are organised either Alphabetically, Chronologically, or Geographically.

Ambiguous searches are mush more common as the user will browse until they find what they need and can be; Topical or subject based
task orientated
or Audience specific

Labelling systems are the language used in the organisation scheme and should narrow the scope of the labels to specific audiences and they should be consistent.
Controlled vocabularies give meaning to the words and the relationships between the words should be defined (synonyms and antonyms ect.)

Then we looked at a search task where we had a picture of a vegetable but no idea of what it was and how we could find out using these search methods.

To do this we looked at Navigation and Searching styles.

The perfect catch (known item + exact)

Lobster trapping/Berry picking where there is an information need, a search takes place, and the information need is adjusted in regard to the search results and possibly a new search takes place.

Driftnetting where the user is randomly looking for things that do not relate to each other.

We then looked at how websites can help navigation by telling the user Where they are, whats related, where they have been and where to go next.

Finally we looked at the theory behind visual design and how important graphical design is on a website as first impressions count and how personalisation and customisation work on the net.

Personalisation is where the web site reads your cookies and history to try and provide information that would be use full to you but most often it is used in advertising.

Customisation is where the user alters the system to show the information they need.

DITA Session 9

Open Data and Information Architectures part 1.

We looked at Open Source software and how this is software released and developed for free for users to utilise and enhance. How most is released under the General Public Licence which means that you cannot use any part of the programme in a future programme that is sold for profit, all future incarnations must also be released under the GPL.

Open Data is a government and public body initiative to provide the data they collect for free on the Internet, this excludes the need for the costly and time expensive method of Freedom of Information act requests and also promotes transparency in government. there is also a push for this data to be released using the RDF system of the Semantic Web.

Open Data websites like data.gov.uk and data.gov (US) release their datasets as a searchable database that promote users to create applications that exploit the data and unlike open source these applications can use the data fro free but also charge users to use the applications. this leads to people creating open data mashups that provide information on many aspect of life such as local schools information and the historical instances of traffic build up.

Another valuable set of open data s the Ordinance Surveys data sets that could prove very valuable to programmers that can use the map information to create some very use full applications.

a downside of this is the possibility of the data being used for political means by being taken out of context.

Information Architecture part 1

We looked at the progress in the last 10 years on the Internet and how it was/is viewed using the Rosenfeld and Morville book as an example comparing the 2 editions.

We looked at the theory behind web design and how a site needs to work well and look good to be successful. how documents and interlinking's should be like rooms and doors interlinking together.

DITA session 8

The Semantic Web and Web 3.0

In this session we looked at what the Semantic Web is and its relation to web 3.0. the differences between Web 1.0 (read), Web 2.0 (read/write) and Web 3.0 (read/write/execute).

The Semantic Web has been touted since the inception of the net by the W3C consortium led by Tim Berners-Lee and it aims to give richer meaning to information and make that information machine readable. This allows for information to become unambiguous.

We looked at how RDF triples are made up of a subject (a resource), an object (a property of the subject) and a predicate (the relationship between the two).

we looked at how the Dublin Core Metadata Initiative are implementing a set of rules for metadata for the predicates. then we saw how the RDF triples can form webs of data by linking together using each others objects to become subjects.

then we looked at the taxonomies involved with RDF's that allow a schema to be produced. and how most taxonomies are hierarchical but not all are.

Then we took a look at OWL (Web Ontology Language) and how these set out the rules for the taxonomies and how they create relationships.

then the Semantic Web Stack was looked at and how it is made up of:
Web Resources
RDF (metadata)
RDFS (taxonomies)
OWL (ontology's)

The advantages of the Semantic web are that it allows for emergent behaviour where lots of facts + a few rules of inference = very surprisingly sophisticated results. But this only works in limited domains and there takes a huge input of effort for a very small output of data and there are also issues of trust in both the validity of the data put in and the metadata used.

Wednesday, 24 November 2010

DITA session 7

Mobile information

for this session we looked at the pros and cons of mobile information services IE context awareness for GPS locations and the limitation of screen and keyboard sizes.

Context awareness, GPS can provide web searches with local results provided by satellites pinpointing the location of the hardware to within 40 meters, most smart phones contain this capability and they also have compass and accelerometer capabilities so will know with direction you are facing which is useful for gaining directions via a piece of mapping software, it also allows you to geotag pictures and also access local information via wikipedia or alike.

how bluetooth can be used to send advertising to people discreetly and also the privacy issues with bluetooth.

the problems of limited screen size, how web sites provide mobile versions of their sites that allow easier navigation by a mobile devise and how servers can throw away information not needed by the user. it also can recognise the OS of the device and could in theory only send compatible information to the device but the technology isn't quite there for that yet.
how Mobile sites should be designed with the Mobile device in mid, keeping the need to scroll to a minimum, keeping graphics low, and all but basic navigation removed.

Keyboard size, the trade off with button numbers and size, the use of virtual keyboards on touchscreen devices how their are different keyboards for different tasks, the use of auto complete and the emergence of gesture control.

we then looked at what people actually use their mobile device for, mostly trivia and local information.

finally we looked at the combination of social media and location services and how this can add rich metadata to photos ect but also allow for serious privacy breaches.

during the lab session we discussed what a city student would want out of a mobile information app. we talked about using augmented reality to help guide someone around the campus, acess to the library catalouge, the inclusion of a social network to discuss lectures and recieve timetable changes.

DITA session 6

Web services and API's

we looked at the future of software as a service where the programme is not stored locally on a hard drive it is help on the net and the user accesses it via a web portal, only paying for what they use thus reducing the cost as you wouldn't need to buy a whole suite.

the possibilities of cloud computing, where all the users data is stored n the net with very little held on hard drives and thus leaves the home computer mearly as a window to the Internet to access all your files.

web services, we looked at the use of XML, the difference between a web page and a web service, how XML is not a language despite its name but is in fact a set of conventions to create a language that is similar to HTML, how XML documents consist of elements how each element contains other elements or text and can also have attributes and a document will have one root element.

API's hide the internal complexity of web services and allows programmers to with ease build on exisiting functionality. almost all programming is done via api's.

finally we looked at mashups which uses API's and web services to create new innovative systems, how no programming experience is needed how javascript can be used to manipulate web services and API's how many services publish code that can be used in HTML.

an example of my mash up can be found here

DITA session 5

This session focused on what web 2.0 is and the impact it has had on the way that the Internet is used.
We used the definition of that Web 1.0 was the read web and Web2.0 is the Read/Write web as used by the majority of users.

how it contains a rich user experience, encourages user participation, has dynamic content, uses meta data and promotes openness and freedom.

we looked at the limitations of HTML in terms of delivering rich user experiences and the promise of HTML 5 to provide this, how multiple users interacting can give a site purpose, the ability to tag items bringing together and adding metadata to more items without the need for a more formal library style metadata system but also how this can be abused, and how social interaction has lowered the amount of censorship and freedom of speech and how the narrative of a site will constrain the social constraints.

we then went on to look at Facebook and its impact on the web. how it is a non specific tool and you can do most social networking things on it, how the lack of avatars and handles gives a real online personality, its dependence on mutual agreements between 'friends', how your online activity becomes visible and commendable on, how it utilises a IM and email system, and finally we looked at the privacy issues contained within FaceBook.

Wikipedia is another form of web 2.0 we looked at and how anyone can contribute within the set editorial constraints, the pros and the cons of this method of gathering information, the wider contexts of wiki over a traditional encyclopedia, how it provides anonymity and how it can create a hive mind effect.

Blogs and how they are chronological pieces of short writing, their birth as diaries and how they have progressed to micro journalism and professional self promotion tools. how interlink blogosphers allow cross communication and the impact of micro blogging, ie twiter.

finally we looked at the negatives/criticisms of Web 2.0 how it can lead to buzz and hype, its promotion of narcissism and amateurism and enhances the fickle nature of peoples personalities.

Saturday, 30 October 2010

Coursework post

Evaluating and employing appropriate technologies for the digital representation of information.

There are many ways of digitally representing information ranging from a simple webpage through to a complex database containing many tables of information. The method you use depends on the purpose of the information and the information need of the user.

From the sessions on web 1.0 I feel that each of the methods from HTML web pages to complex databases using SQL queries and the process of “Information Retrieval” (IR) are each equally useful but only when appropriated correctly.

To visually display information that may not change a huge amount then a HTML web page is a very useful way of digitally representing the information for example a simple list of contact numbers or addresses. HTML is a useful tool because it is accessible by anyone with the Internet it is relatively simple to write and is of fairly low complexity to maintain and update. Of course if the information contains a large amount of data, for example lists of employees, then a form of IR or a database is more suitable. However, for a small business with less than say 10 employees HTML would be a much simpler way of storing basic contact details. Another use of an HTML web page is as a public facing data sheet. For example it could have the all the contact details, company information and any text information included on it and with the use of css style sheets can be made to look very interesting.

When you have a large amount of information that you own and therefore can manipulate then a database is by far the best solution as it allows users to centrally access information which could be in many formats, and easily allows queries to be developed to pull specific information from the database. In 1993 the UK Government published the following definition:

“SQL is the industry accepted Interface between applications and relational databases and is increasingly used to access non-relational data. It is therefore an important tool in achieving data integration across different databases.” CCTA 1993 (pg5)

The problems of this method are that SQL is a very difficult way of creating queries which depend on pinpoint accuracy by the user to correctly write their query using the correct terminology and spelling and the over reliance of a controlled vocabulary all of which can lead to problems. The plus points are that it allows multi user interaction and can be tailored to specifically suit individual’s needs. The government centre for information systems says of SQL implementations;

“SQL is a suitable language for applications requiring to store and manipulate data that can be represented as tables. Generally, implementations of SQL are targeted towards supporting larger, multi-user applications based on mainframes, mini-computers or large workstations.” CCTA 1993 (pg 15)

The main difference between SQL databases and Information Retrieval (IR) are that databases hold a high amount of data stored in many tables that can give accurate answers to specific questions, whereas an IR system can hold as much information organised into surrogate’s stored in tables which can then be searched using non specific key words and natural language. Rosenfeld and Morville say that;

“The database model is best applied to subsets or collections of structured, homogenous information within a broader website.” Rosenfeld and Morville 1998 (pg41)

Therefore when the information you have is not your own or is fairly unstructured then the best method of retrieving the information is by using (IR). This is where the user inputs search terms into some kind of search engine either in a natural language query or using a Boolean system to search the systems data to find what they are looking for.

For example a library catalogue would not use a SQL database as the information searched for has to be specific, they would use a form of IR where the user can search depending on there knowledge base either for an exact item or a vague subject search and all that lies in between and still retrieve adequate and accurate information at a much higher speed to using a database.

“Many studies indicated that users of information systems aren’t members of a single minded monolithic audience who want the same kinds of information delivered in the same ways. Some want just a little information, while others want detailed assessments of everything there is to know on a topic” Rosenfeld and Morville 1998 (pg102)

Managing data with appropriate information technologies.

For this section I am going to be looking at search methods in music information retrieval (MIR) and how the different kinds of search can be implemented on a web search for a particular piece of information (song title/artist/composer) and how similar searches could be implemented on more specific music information resources like a music organisations archive.

From what I have learnt about textual search methods I feel the use of natural language queries in MIR would be the most common search method that the average (non scholarly) music enthusiast would use followed by the use of Boolean operators to perform search modifications if they were unsuccessful on their first NLQ as this allows more specific results to be returned and in this type of search where many responses could well come up for a search. An important point is that any user will have a varying level of knowledge of the subject matter they are searching for and therefore will either use a known item search (KIS), fact search or a subject search to find the information they desire and this will affect their search terms.
“People often first encounter new music from various media (e.g., a movie, TV commercial, radio) without ever knowing the artist name and song title, or they often forget the information over the course of time. This brings up challenges to known-item-searches for music as those users must attempt to describe the sought music other than using the bibliographic information.” Jin Ha Lee 2010 (pg1025)
An example of this is in a web search for the lyric ‘dress this city in flames’ Google brings up the correct song but also brings up results for a different site as the main result. So to attain a more accurate search the use of Boolean operators could be used. By searching for ‘dress this city in flames AND lyrics’ the use of ‘AND lyrics’ means that all the result pages must also contain the word lyrics too so all the results should be song lyrics and from there the user can find the song they are looking for. This type of search would be a fact search as the user knows a fact and therefore has some knowledge of the subject and wants to find more information.

NLQ and Boolean operators are I feel are not only very good methods of fact searching but also could be very successfully used for a subject search. For example UK Hip-Hop could be refined using UK AND Hip-Hop. KIS will more often than not be a NLQ as the user knows what they are looking for and will search for it directly for example searching for “Paris in Flames by Thursday” will bring up the exact song they are looking for. This method of KIS would be used on a site like ITunes and their search method also suggests the most likely answer as you type therefore speeding up the search time which is among the most important aspects along with accuracy of result in IR.
However problems can occur when the user assumes they know something and in systems like ITunes the suggestion of titles can be extremely helpful for example if the user thought the song ‘Paris in Flames’ by Thursday was by Weezer and called ‘City in flames’ then the results returned would be inaccurate and the user would have to return to a fact search or subject search which are not supported on a system like ITunes.
“If someone were looking for music they previously heard, but all of the information they think is relevant to finding the item and is attempting to use in the search is incorrect, the search does seem to be a known-item search yet it is difficult to say the user really knows the object beyond its existence.” Jin Ha Lee 2010 (pg1025)

On more specific music sites then further search methods can be employed but because of the varying degree of knowledge between the average music searcher and the academic researcher the more advanced systems will often be aimed at the academic researcher who might wish to access a more complex collection that specifically holds musical scores or even the music itself. This opens up more complex search systems than text searches for example being able to search using intervals or scales that occur in pieces of music to find a collection of related material.
This is method is described in a paper by Peter van Kranenburg et al at Utrecht University when discussing Folk Song research in 2009;
“The Colonial Music Institute which promotes research in early American music and dance, offers an index for about 75,000 instrumental and vocal pieces from the period 1589-1839 (sic), including social dance tunes and songs. From each melody an incipit is present in the database. There are three way to browse these incipits: a sequence of scale degrees of all notes a sequence of scale degrees of stressed notes, and a sequence of intervals” (pg 27)

In conclusion I beleive that IR is a very useful tool in music libraries, bibliographies, databases and music web searches. The methods that suit MIR best in these scenarios I feel are NQL and Boolean operators, which should both be included in any form of music catalogue. As shown in my example of web searching the user’s perception of their level of knowledge affects the success of their searches, which can cause negative results to be retrieved, and therefore any search that does not allow Boolean operators would be severely hindered by missing out the ability to modify and clarify to improve results .
The use of non text searches of music catalogues is very interesting and something that really goes beyond the scope of this paper but it is something that if it could be implemented into a normal search of a catalogue or the web would lead to much higher accuracy and relevance of the information retrieved.

References

CCTA., 1993. Datababse language SQL explained. London: HMSO

van Kranenburg, P., Garbers, J., Volk, A., Wiering, F., Grijp, L. P. and Veltkamp, R C., 2010. Collaboration Perspectives for Folk Song Research and Music Information Retrieval:The Indispensable Role of Computational Musicology Journal of Interdisciplinary Music Studies, 4 (1), 17-48. Available from: http://www.musicstudies.org/JIMS2010/Kranenburg_JIMS_10040102.pdf [Accessed 27th October 2010]

Lee, Jin Ha ., 2010. Analysis of User Needs and Information Features in Natural Language Queries Seeking Music Information. Journal of the American Society for Information Science and Technology, 61 (5), Available from: http://0-web.ebscohost.com.wam.city.ac.uk/ehost/detail?vid=4&hid=105&sid=0c6a4d05-6167-42ba-a6bd-9e2c29d95c42%40sessionmgr111&bdata=JnNpdGU9ZWhvc3QtbGl2ZQ== - db=eoah&AN=21523896 [Accessed 27th October 2010].

Rosenfeld, Morville,P., 1998. Information Architecture for the World Wide Web. Sebastopol; O’Reilly & Associates.

http://colinbeard.blogspot.com/ last accessed 28/10/10

Wednesday, 27 October 2010

DITA session 4

DITA session 4

Information retrieval

In this session we looked at Information retrieval and how it works.

we started by looking at different definitions of Information Retrieval (IR) and how there are 3 different views. The user view where in the user has a need for information. The system view which looks at the IR system and its components and the sources view which is to do with the presentation of owned information.

We then looked at Information needs and in particular the use of searching on the web. We looked at the taxonomy for web searches which included Navigational queries, where a user finds for example a home page for something, Transactional queries, where the user is looking for a place to use a service eg Amazon, and Informational queries, where a user is doing a general search for a subject but not expecting a known result.

We also looked at multimedia search needs and a possible taxonomy for this style of searching be it on the web or on a catalogue. This contains of Known item retrieval where the user knows what they want for example ‘I want the song Enter Sandman by Metallica’, Fact retrieval where the user has some knowledge but not all so for example ‘who won the FA cup in 1999’, Subject retrieval where you have a topic but not many facts for the search for example ‘headline acts at Glastonbury’ and Exploratory retrieval where the user has very little knowledge of the subject and they enter a broad topic like ‘what films do you have’. All of these schemes depend on the users knowledge base and are very subjective with searches often blending between the schemes.

We then progressed to looking at the indexing of information to allow for accurate IR. For information to be able to be retrieved the media must be in a specific format be that HTML, XML or MPEG this allows the data to be processed correctly. Then you must identify the fields that will be searched, this allows users in for example a library catalogue to search authors, titles, genre ect. Then in text preparation the words must be analysed to make sure the text is searchable this can be done automatically by computer this also includes looking at non a-z characters and the use of numbers, the removal of stop words is an important step as a word index that is too full may not work properly so high frequency words are usually taken out like, the, and, to, be ect. Finally you have to look at stemming, this is where plurals and other suffixes have to be wither included or excluded for example if you were to index the word sun then it would make sense to include sunny and sunshine. Something that may also be looked at is synonyms this becomes tricky as these usually would have to come from a controlled vocabulary so some may be included while others not.

When looking at index structures we mostly only looked at Inverted File structures as this is used most widely and gives the fastest results, which is the main point of IR.
Surrogates are documentary records of a document for example a bibliographic record is a surrogate as it holds metadata about the item without actually holding the internal data and the index is the list of surrogates for example the library catalogue is an index as it holds the bibliographic records for the books it holds. The keyword files contain the index terms and the postings files contain a list of documents that contain those keywords.

We then went on to look at the two different ways of searching, using a Boolean logic model and using natural language. Boolean uses words such as AND, NOT & OR to specify if you want certain words included excluded or possibly included, they fit in between your search words and tells the engine what to include. For example ‘Electronic AND music’ will only bring up results that include both the words, ‘Electronic NOT music’ will only bring up results that don’t have the term music in and ‘Electronic OR music’ will bring up results that contain either. Another type of Boolean entry is the double quote, which means that all the words contained must appear in that order for example “Electronic music by Aphex Twin” will only bring up results that contain that phrase. The information is then usually displayed to the user in ranked order of relevance.
We then went on to look at natural language queries (NLQ) and how they use phrases and prose to define searches, this is most commonly used on the web but can give false results on a search. If a NLQ is unsuccessful then users may then add or delete terms or switch to a Boolean query so narrow the search fields this is called query modification and some sites do this automatically for you by adding a ‘did you mean’ function to their search engine.

We then briefly looked at how you can evaluate users searches and the relevance of the documents they retrieve. To do this firstly you need to define an entirely subjective viewpoint of a documents relevance, this is often done by asking the user how helpful the document was, then you can look at how many were relevant and how fast the system was at displaying the relevant documents. We also looked at the correlation between precision and recall and how there is an inverse relationship between the two.

Tuesday, 26 October 2010

DITA session 3

DITA session 3
Databases

In this session we talked about databases and SQL.
The first point looked at was how before databases were commonplace peoples information needs were hindered in a company by different departments all having their own information stored in their own way so if another department needed the information it would be difficult to get hold of and possibly in an incompatible format. This led to redundancy in information and inconsistency in peoples data.

A database allows data to be stored in a central place and allows users access to it (via a database management system) from many locations this resolves the incompatibility and redundancy issues.

This is a good way of dealing with data when you own the information as you can structure the database to suit the needs of the users and can fairly easily create search systems.

It however is not suitable if the data is not your own as the information is often unstructured and heterogeneous.

We briefly looked at Entity Relationship Modelling which is a little beyond the scope of this course but basically it allows you to describe the content of a database at a design level and is something that is done before any database is created and looks at the relationships between the things (entities) that will be in the database.

We then progressed onto SQL and how it is a language that allows communication between a user and the database management system to query what is in the database and allow housekeeping of the information.

A database is a collection of 2 dimensional data tables with rows and columns and the complicated part is what you do with the tables.

We then looked at the relationships between the entities within a database and how there can be 3 different kinds of these relationships the one to one where only one entity can be in one other, eg a painting can only be in one gallery, many to one where many items can appear in one entity, eg many paintings in one gallery, or many to many where lots of entities could be in lots of other entities, eg many painters in different galleries.

An entity is basically any thing and can have lots of attributes infinite in fact but a database should only collect relevant information for example a personnel database would need to know your name, address and phone number but possibly not your hair colour or shoe size.

When an entity has more than one attribute problems occur so you need to have more a table for each database table using the art gallery example from class we would have one table containing artworks which has the artist and date painted in them and another table for the galleries which could have the address, city and country in each of the attributes would be in a uniquely identified id so the two tables can relate to each other for example if the Mona Lisa is ID1 and it is in gallery 3 (where 3 is in the galleries table) then it can select the information from the right line in the gallery table to assign the correct gallery.

We then looked at how you query a database and a few of the keywords that have to be used.
Select colums
From tables
Where something is true

For example again using the gallery model
my sql> select name, country from galleries where country = “uk”;

The “” marks indicate you are looking for text within not a column.

Another example would be
my sql> select title
> from artworks
> where date > 1800;

You can split the query over many lines as the query only stops with a semi colon and in this case we have used a > to mean less than.

We then went into the lab to work with a real database to try and ascertain information about different aspects of the database.

This brought together the relatively simple theory and showed how complicated it can get when working with a real database the software is very particular and everything has to be done in a certain way it is case sensitive and everything has to be the correct terminology so for example typing Gallery when the table is galleries will respond with an error. It also showed how you have to understand the nature and layout of the tables before you can begin to query the database as if you don’t know what table an entity is in then it is impossible to stumble across the information you need. It is very precise and there is no way to work it out as you go you have to know the layout well.

To do this you can first type
show tables; which will show you the different tables in the database
desc authors; where authors is a table it will show the details of that table

This is vitally important as without this information the rest of the exploration is impossible.

We then had to develop queries using all that we had learnt to ascertain different pieces of information from the database.

Sunday, 10 October 2010

DITA session 2

The Internet and the World Wide Web

This session was on how the Internet and the world wide web work and interact with each other.

We talked about how the Internet is a collection of WAN's (wide area networks)

what different DNS (domain name system) mean, ie .com = company .org = organisation ac.uk = academical uk.

How the Internet is a collection of networks and the www is the documents they contain.

how it has become a disruptive technology and changed the way we and industry works for example publishing has moved from paper based to online and the advancement of open source software.

we discussed how a URL (universal resource locater) works from the HTTP (hyper text transfer protocol) through the DNS to the local path and how the hierarchy runs from left to right in the DNS and right to left in the local path.

We then looked at hyper text and HTML and looked at how HTML is written using a variety of tags and how a basic web page can be made using it.

Then we took a brief look at CSS style sheets and how they can add to an HTML web page.

we then in our lab session wrote 2 simple web pages using HTML one linking to the other and hosted them in our student web space.

http://www.student.city.ac.uk/~abjd609/DITA

Monday, 27 September 2010

Session 1

Today i missed the first lecture because of Network Rail being incapable of getting a train to London Waterloo by 8.15.

My interpretation of the lecture is based solely on the LAB session and the lecture notes and PowerPoint slides that i have read.

The basic over view seems that it was about setting up the structure of the module and to
"Introduce the nature and potential of computing and digital information represented at a number of levels and to provide practical experience of using and managing digital data represented in a number of forms and formats"
This was done by describing how computers use information, the different kinds of information how they are represented and how we can use them.

Topics covered included bits and bites, binary, how file formats work and what their suffix means, EG .txt. .jpeg .html, the importance of understanding ASCII and how different file formats interact with ASCII.

Also the coursework was spoken about and how we need to keep a blog of our progress through the course and that 2 blog entries will make up the coursework for this module and they will be about web 1.0 and web 2.0/3.0.

They also looked at how the lab sessions will work and what we are expected to do during these sessions.

Today the lab consisted of experimenting using file extensions and how they affect ASCII, this included creating a word text file, converting it to a .docx, adding formatting, viewing it in text viewer to see how non ASCII characters are shown, adding a picture and saving it as a HTML file so it can be viewed on a web page.