Tuesday, November 25, 2008

Unit 13 Readings

UTube Video was taken down due to a copyright violation.

"EPIC - TIA"
  • Latest News: EPIC has urged for screening of the Department of Homeland Security's proposed Office of Screening Coordination and Operation because it would oversee vast databases of digital fingerprints and photographs, eye scans and personal information from American citizens and legal foreign visitors - but this office does not propose how it is going to promote privacy rights
  • In 2002, DARPA founded TIA - intended to detect terrorists through analyzing tons of information
  • Worked through projects such as Project Genoa and the Human ID at a Distance Program - main goal of all its projects: to build a "virtual, centralized, grand database"
  • 2003 - Congress eliminated the funding for the controversial project

"No Place to Hide"

(The link on courseweb gave me an error that it could not find the page - so here are two websites that I used instead: A review of the book, http://www.simonsays.com/content/book.cfm?sid=33&pid=503479 and a site about information dealing with the book, http://www.noplacetohide.net/ )

From the book review...

  • A book written by Robert O'Harrow Jr.
  • Gives details of how private data and technology companies and the government are creating an industrial complex or a national intelligence infrastructure.
  • Explains how the government depends on a large reservoir of information about aspects of everyone lives to promote homeland security and fight terrorists
  • The example of the grocery discount card is unsettling
  • The book explores the impact of this new security system on our privacy, autonomy, liberties, and traditions

From the "No Place to Hide" Site

  • Gives all kinds of updates on new information pertaining to what No Place to Hide brings out - privacy with information issues
  • Gives a list of people who are/were responsible for such dramatic acts that question our liberties - such as the Patriot Act
  • Allows you to read the final chapter - chapter 10 of No Place to Hide

Tuesday, November 18, 2008

Unit 12

"Using a Wiki to Manage a Library Instruction Program"

By using a wiki librarian instructors will be more informed because they can share, facilitate, and collaborate more easily with one another for resources and classroom ideas/materials. Using wikis for library instruction has two main uses: the sharing of knowledge and the ability to cooperate in creating resources. An example is at the Charles C. Sherrod Library at East Tennessee State University, the library instruction program has a wiki to share knowledge (such as unforeseen assignment directions that weren't clear) and to collaborate resources (place their pre-existing handouts on the wiki to reflect the new information given out).

"Creating the Academic Library Folksonomy"

I thought this article was interesting because it gave us an idea of del.icio.us outside our own experience (of using citeulike for an assignment and the hands on point for del.icio.us). Uses the site such as Citeulike to talk about "bookmarking" pages on the Internet, so you don't lose your bookmarks of pages when you move from one computer to the other. "Such sites allow users to share these tags and discover new Internet resources through common subject headings...called a folksonomy - a taxonomy created by ordinary folks. If applied to library use, librarians could create pages for students to refer to "good," solid academic articles that they may have trouble finding. I was eager to start my own site and possible when I do a field placement at a public library to introduce the idea as a possible project.

"Weblogs"


  • A weblog is a blog which is a website that resembles a personal journal that is updated with individual entries or postings
  • A distinguished feature of blogs is its automatically archiving entries
  • They cover almost anything - a big trend is that people are covering current issues that are academic in nature which is causing a shift in how some groups communicate
  • Blogs are so popular because of the simple "out-of-the-box" concept which allow for the creation and maintenance of blogs. There are many different kinds from the free and simple blogs to the "robust packages."
  • Librarians can use blogs for projects, training, scheduling, reference, and team management. Librarians can also encourage students to use blogs for their assignments or projects

"Jimmy Wales: How a Ragtag Band Created Wikipedia"

  • Wikipedia (WP) was a radical idea
  • How it works from the inside: Wiki is a NPO, goal is to get Wikipedia to everyone in order to help them make better informed decisions; but this means more than just through the Internet and this is why they chose the free licensing model so that people could copy it and distribute it
  • Cost - 5,000 dollars for bandwidth a year plus one employee, Brian who is the software developer
  • Quality - not perfect, there are weaknesses, but much better quality than you would expect
  • How do they manage quality? Mostly social policies, especially with the rule of neutrality which is they don't talk about the "truth" or "objectivity" - get a lot more work done this way when they don't mill around about controversial issues with a topic
  • When something changes - creates a change page which can be changed back to the original (such as in case of vandalism)
  • Jim needs some control, but he sticks to the rules (and so do the other people who are voted into powerful positions in the Wiki community)
  • Stated (by Jim) that there is a wide-held belief that all academic and teachers hate it, he says it is not true - but I think there is an overwhelming amount of evidence that it is still so at many colleges
  • Wiki People Project - take 20 years or so - give people materials that they can use and understand all over the world, in all kinds of social positions and learning levels (you can't give someone who never had the chance to go to high school an encyclopedia written for college level and expect them to get something out of it)

Mudiest Point - This has to do with this weeks hands on assignment. What is the big difference (in academic opinion) between CiteULike and Del.iou.us?

Wednesday, November 12, 2008

Unit 11 Readings

"Dewey Meets Turing"

  • A more intriguing aspect of DLI was its ability to unite librarians and computer scientists
  • Computers scientists could see how the DLI project could help current library functions move forward
  • And libraries saw the project as a way to infuse needed funds
  • Libraries also saw that information technologies were important in order for them to have an impact on scholarly work and it would raise the expertise of the library community that had not been seen before
  • The Web changed many of the DLI projects plans - the web blurred the distinction between consumers and producers of information which undermined the common ground that had brought computer scientists and librarians together
  • For librarians, the web was much more difficult to integrate into their work
  • Against some predictions (of the WWW), the core functions of librarianship still remain today because information still needs to be organized, collated, and presented
  • The accomplishment of the DLI have broadened opportunities for LS, rather than marginalizing the field

"Digital Libraries"

  • There is a big difference between providing access to discrete sets of digital collections and providing digital services
  • information providers designed enhanced gateway and navigation services to address these concerns
  • Aggregate - Virtually collocate - federate = mantra for digital library projects
  • The offshoot of the DLI project started things such as - Google and the OAI-PMH
"Institutional Repositories"

  • Institutional Repository (IR)- a set of services that a university offers to the community for the management and dissemination of digital materials created by the institution and its community members
  • Represents a collaboration among librarians, IT, archives and record managers, faculty, and university administrators and policymakers
  • An essential preequisite is preservability - must be able to claim it permanent to the collection
  • Unclear of what commitments are being made to preserve supplemental materials - but they should be part of the record
  • The IR is a completement and a supplement for traditional publication venues
  • IR allow dissemination of materials and encourage the exploration of new forms of digital and scholarly communication
  • Need to guaruntee both short and long term accessibility - need to continue to use new accessible forms of technology (update often)
  • Cautions: IR used as tools of control over intellectual work, IR information overload, and hastily made IR without much real institutional commitment to the process because they want to get the project done quick

Muddiest Point: The IR article talks about federation as just being in the infant stages, and the "Dewey Meets Turing" article talks about federate being a part of its mantra - where is the ability to federate or the federation of digital materials today in the digital library realm?

Monday, November 10, 2008

Assignment Number 6

My Website:

http://littleelmquist.synthasite.com/

I had a few problems with the website - it wouldn't let me edit My 2600 page - so things like "easy acessible" should be "easily accessible" and "to" should be "two," but it gives me some errors and won't allow me to touch it.

I didn't have a hard time loading it into FileZilla, but just like it won't let me view the sample webpage on our Assignment sheet, it won't let me view my own - telling me that I do not have premission to access the site from Pitt.

Wednesday, November 5, 2008

Unit 10 Readings

"Web Searching Engines: Part 1"



  • Search engines crawl and index around 400 terabytes of data
  • A full crawl would saturate a 10-Gbps network link for more than 10 days
  • Simplest crawling algorithm uses a queue of URLS and initializes the queue with one or more "seed" URLs
  • But, this simple crawling method could only fetch 86,400 pages per day ( would take 634 years to crawl 20 billion pages) - crawler parallelism is one solution (but it still is not sufficient to achieve the necessary crawling rate and it could bombard web servers and cause them to become overloaded)
  • To increase crawling's success - a priority queue replaces the simple queue
  • Ranking depends heavily upon link information

"Web Searching Engines: Part 2"

  • Search engines use an inverted file to rapidly identify indexing terms
  • An inverted file is a concatenation of the postings lists for each distinct term
  • Indexers create inverted files in two phases - scanning, and inversion
  • For high quality rankings - indexers store additional info in the postings
  • Search engines can reduce demands on disk space and memory by using compression algorithms
  • PageRank - assigns different weights to links depending on the source's page rank
  • Most search engines use a combination of queries such as link popularity, spam score, how many clicks...ect
  • To speed crawling up: skipping (such as words like and, the); early termination; assigning documents numbers; and caching.

"Current Developments and Future Trends for the OAI Protocol for Metadata Harvesting"

  • Initial release in 2001 and initially developed as a means to federate access to diverse e-print archives through metadata harvesting and aggregation
  • Mission is to "develop and promote interoperability standards that aim to facilitate the efficient dissemination of content."
  • Others have begun to use the protocol to aggregate metadata relative to their needs
  • Provides users with browsing and searching capabilities as well as accessibility to machine processing
  • OAI is divided into: data providers, repositories, and service providers
  • EXs: Sheet Music Consortium and National Science Digital Library (NSDL has the broadest vision of the OAI service)
  • Issues: Completeness, Searchability and Browsable, and Amendable Machine Processing
  • Ongoing Challenges: variations and problems with data provider implementation and on the metadata; the third - lack of communication among service and data providers
  • (interesting connection to the Dublin Core - controlled vocabularies will be more important as providers try to cope with all the data)

"The Deep Web"

  • Traditional search engines do not probe beneath the surface - the deep web is hidden
  • Deep Web sources store their content in searchable databases that only produce results dynamically in a response to a direct request - but this is a laborious way to search
  • BrightPlanet's search technology automates the process of making dozen of direct queries using multiple-thread technology
  • Only search technology so far that is capable of identifying, retrieving, qualifying, classifying, and organizing both "deep" and "surface" content - described as "directed-query engine"
  • The deep web is about 500 times larger than the surface web and 97.4% of deep websites are publicly available without restriction; but they receive about half of the traffic as a surface website
  • Deep websites tend to return about 10% more documents than surface websites (sw) and nearly triple the quality of sw
  • Quality = both the quality of the search and ability to cover the subject requested
  • Most important finding: large amount of meaningful content not discoverable with conventional search technology and a lack of awareness of this content
  • If deep websites were easily searchable users could make better judgements about the information

Muddiest Point: Why hasn't the deep web come into light sooner? Why haven't IS and other tech savy students been using it before? Are their problems with BrightPlanets search engine? The last article does not seem to address any problems of deep search engines.