Yale Access to Knowledge Conference

Boing Boing mentions Yale’s upcoming Access to Knowledge conference (April 27-29) addressing policy issues raised by new IT developments. Remote participation via the A2K Wiki is encouraged.

We now have the ability to easily share knowledge with everyone in the world. I have talked a bit about the problematic transition from a closed information ecosystem to an open model — the most pressing issue in college IT. We’re looking for ways to preserve the expertise that academia has accumulated and which, to a large extent, has been encoded in professional culture.

Ironically, the principle of free access to knowledge, and the practices to support it, have only developed in closed-access institutions. The project now is to decode those practices into explicit policies, and put our money where our mouth is. Naturally, these new policies run against the grain of some of the protectionist policies of the closed model.

Especially with respect to the law, e.g. “intellectual property”, where educational institutions have special status, we need to make sure that leveling the playing field means increasing protections for the public rather than decreasing protections for educational institutions. A similar reevaluation is going on in many different areas: do bloggers deserve the same First Amendment protections as professional/institutional journalists? (See EFF’s Bloggers’ Rights resources.) Do publishers have the right to control all copying of their work? (See Lawrence Lessig’s Free Culture.)

In each case, a deal was struck at some point in the past that gave rights to a limited group of people. Now that the tools are available to all, we have to revisit that deal and see whether the limitations on the group were a key factor in striking the balance or simply a historical accident. We probably do need to expand our concept of a free press to include bloggers. As with other First Amendment rights, the more speech the better. Copyright, on the other hand, probably should not be extended to cover the majority of use of creative works. Historically, non-commercial use was generally unregulated; the absolute power of publishers over their work was limited by its scope.

New technology has shifted the balance in a wide range of areas, and now we need to renegotiate the policy deals. The A2K Wiki provides a good overview of these areas and some policy directions.


Laura Quilter Copyright Talk

Our Library Copyright group was addressed by librarian, copyright attorney, and researcher Laura Quilter today. She did a great job of giving the group a background on copyright law and how it affects college libraries. She also answered our questions about how we currently try to avoid copyright infringement. I think it went well — participants, please add your comments! Thanks to Sha Fagan and Julie Auster for making Laura’s visit possible.

Here is Laura’s powerpoint presentation, so I won’t summarize the whole talk, but I did come away with some interesting points which I don’t think were explicit in her notes:

  • Because copyright is privately enforced (there are no copypolice), copyright holders have an incentive to be as aggressive as possible. Therefore their vehemence doesn’t reflect, and shouldn’t be used to measure, the merits of their claims.
  • Copyright is a quasi-property right. Property rights have powerful imagery and we should beware carrying all the implications of property across an analogy to “intellectual property”.
  • Restricting access to on-campus users probably keeps us safe within “educational purposes”, which are specially protected under copyright law.
  • Forwarding an email, if you don’t get permission first, is a prima facie copyright infringement. We engage in many practices that could get us in trouble. We should not concern ourselves with eliminating all behavior that could conceivably lead to a lawsuit, but with reducing the amount and severity of risk.
  • Educational institutions making good-faith efforts to respect copyright can have infringement damages waived and just be enjoined from the infringing use, if such is found. Considering the privileged status of education in copyright law, together we are in a very strong legal position.
  • Every effort we make to comply is a “plus point” to be considered with the other circumstances of possible infringement. The important thing is that we make a good-faith effort to respect copyright.
  • Thinking through our policy is one of the best ways to bolster our case, should we get in trouble.

See also: Copyright Resources


Social Search

Found a neat summary of social search by Arnaud Fischer at Web technology has gone through a few distinct phases. First (early-mid 1990s) was just digitizing and hyperlinking information, making its interconnectedness literal. Second, Google (1998) revolutionized search; you no longer need to know where information is in order to get it. But, as I’ve previously posted, there are benefits to cataloging information rather than just sifting through an undifferentiated mess. It seems that any algorithm that is less complex than an intelligent agent is, in addition to being less effective at finding good results, susceptible to manipulation.

Throughout the past decade, a search engine’s most critical success factors – relevance, comprehensiveness, performance, freshness, and ease of use – have remained fairly stable. Relevance is more subjective than ever and must take into consideration the holistic search experience one user at a time. Inferring each user’s intent from a mere 2.1 search terms remains at the core of the relevance challenge.

Social search addresses relevance head-on. After “on-the-page” and “off-the-page” criteria, web connectivity and link authority, relevance is now increasingly augmented by implicit and explicit user behaviors, social networks and communities.

Attempts to literally harness human judgement to do the work of a cataloging engine (see the Mechanical Turk) don’t scale to internet proportions. What we need is a way to collect social information without imposing a burden on users. Some sites have succeeded in providing a platform where users freely contribute linked content , e.g. WikiPedia, and some have further gotten their users to add cataloging information — e.g. YouTube‘s tags. Visualizations like tag clouds make these implementations easier to use, but no deeper. And they still require intentional effort, and therefore goodwill and trust — two of the least scalable human resources.

I fundamentally agree with Fischer’s conclusion that using self-organizing groups to generate consensus is a much better way to measure relevance. The big question is how to balance the needs of social data collection with the freedom of association that it depends on. The public, machine-readable nature of most web forums amplifies any chilling effect into a snowstorm of suppression. Further, when a web service becomes popular, there is a strong temptation to monetize that popularity with ads, sponsored content, selling user information to marketers, etc. That background probably skews the opinions expressed in that forum, and by extension, the extractable metadata.

But even more fundamentally, there is something different about web discourse that makes participants espouse opinions they normally wouldn’t. Try researching software on a users’ forum. Many seemingly sincere opinions are based not on experience but are sort of meta-opinions as to the consensus answer. In fact I bet most posters really are sincere — I have caught myself doing this. This reification is what actually creates consensus, and having posted on the accepted side of an issue recursively increases a poster’s reputation. There is no check on this tendency because we lack the normal social status indicators online. I would bet that posters regress toward the mean opinion much faster than in offline discourse. Any social scientists out there want to test this out?

Speaking of which, there are already many empirically supported social biases which affect our judgements and opinions. Are we ready for Search by Stereotype? The tyranny of the majority and the tragedy of the commons are as dangerous to the freedom of (digital) association as government suppression or market influences.

The web as it currently stands, the non-semantic web, is largely non-judgemental in its handling of information. Divergent opinions have equal opportunity to be tagged, linked, and accepted. This non-function is actually a feature. Before we trust online consensus to generate relevance measurements for our social search engines, we need to understand and try to compensate for its biases.


Learning 2.0

Wired runs a story about Learning 2.0, a self-directed, web-based learning tool. It was designed for the Public Library of Charlotte and Mecklenburg County to help their librarians get more web-savvy.

Using tools is the best way to learn them, and this project teaches about web 2.0 technologies by using those technologies. It gives you a nice list of tasks like starting a blog, using Technorati, and playing with YouTube. It’s not just for librarians — this looks like a great way to get familiar with new web tech for anyone who feels a little left behind by some of this new stuff.


GPL v.3 Draft Released

The new version of the Gnu General Public License (GPL) moves one step closer to release today as the third draft is released for public discussion. The GPL is the basic legal tool by which free software is kept free, and it has held up for many years (v.1 was released in 1989). Recent developments in the commercialization of open source software have shown up some loopholes and weaknesses in the license.

Richard Stallman, president of the FSF and principal author of the GNU GPL, said, “The GPL was designed to ensure that all users of a program receive the four essential freedoms which define free software. These freedoms allow you to run the program as you see fit, study and adapt it for your own purposes, redistribute copies to help your neighbor, and release your improvements to the public. The recent patent agreement between Microsoft and Novell aims to undermine these freedoms. In this draft we have worked hard to prevent such deals from making a mockery of free software.”

Incorporating public and expert comments on previous drafts (the second draft was released in July 2006), this latest draft is open to comment for 60 days and then will be made official 30 days later.

For background on the Microsoft/Novell agreement, see Novell’s take vs. the protest. See also: a nice allegory explaining the situation.


Drawbacks of Multitasking

I took a break from IMing with a coworker and turned down my music for a minute to read this NYTimes piece about multitasking. I didn’t get through the whole thing because I got an email but I gather it’s about the downsides of trying to concentrate on several things at once.

The point is well taken when any one task, such as driving, requires quick reaction time. Our brains can only afford enough resources to concentrate on one thing at a time. On the other hand, that email can wait a minute while I surf the web. And newer communications methods like IM tolerate asynchrony better.

Both the technology and the social protocol expect up-to-the-minute, but not up-to-the-second, updates. When using these new technologies, we do sacrifice real-time responsiveness, but in return we get multiple collaborative modalities of near-real-time communication.


Lessig: Make Way for Copyright Chaos

Lawrence Lessig comments on the Viacom v. YouTube lawsuit (see my previous post) in this NY Times editorial. Aside from the question of whether the DMCA is a good law, Lessig’s point is that asking the courts to reinterpret the “safe harbor” clause at this point is inappropriate and will lead to years of chilling uncertainty.

DMCA was basically a huge gift to the content industry, allowing them to keep their old business model in the face of technological obsolescence. Now even those sweeping legal subsidies are not enough for them. The content industry wants to put the burden of policing copyright infringement onto the service providers (which, by the way, includes educational institutions).

The scary thing is that given the legal, social, and rhetorical circumstances, such judicial burden shifting is not only possible but has recent precedent — the Grokster case.

[B]y setting the precedent that the court is as entitled to keep the Copyright Act “in tune with the times” as Congress, it has created an incentive for companies like Viacom, no longer satisfied with a statute, to turn to the courts to get the law updated. Congress, of course, is perfectly capable of changing or removing the safe harbor provision to meet Viacom’s liking. But Viacom recognizes there’s no political support for the change it wants. It thus turns to a policy maker that doesn’t need political support — the Supreme Court.


Crowdsourcing Journalism

Wired and have started a cool new open collaborative journalism project called Assignment Zero. “Crowdsourcing” is the method originated by open source software developers to freely collaborate on projects. SourceForge is a good example of software development by self-organized teams of interested participants. But, using web technology, that method quickly spread to other subject areas, and is especially useful in journalism, where collectively the audience usually knows more than the writer.

Assignment Zero will take crowdsourcing as its model and its first topic. They built an attractive and functional-looking web platform for the project and already have several leads and next steps outlined. Anyone who is interested can join and work on a subtopic; they suggest that teachers can assign their classes to a chunk of the story. There are still editors but what is investigated will not be centrally controlled. The contents will be under a Creative Commons license rather than owned.

One of the most interesting aspects of crowdsourcing is how social dynamics take over in the absence of explicit power relations among the team members. There is definitely less coercion involved in, say, a SourceForge project than within a commercial software development company. In journalism, there is fairly little focus (at least explicitly) on the story as a solution to an engineering problem; rather journalists still think of themselves as exposing the truth. I wonder how this idea will change when the story itself becomes radically dependent upon a diversity of subjectivities.

I’m very curious what Assignment Zero will come up with. Whether the result is good or bad, something will be learned about the workings of the process, laying the groundwork for a whole new way of clarifying and putting together our collective knowledge.


Data Explosion Continues

Wired runs this AP piece about the latest estimate of the amount of digital data on Earth: we’re up to 161 exabytes! (161 billion gigabytes.) The last best estimate was 5 exabytes in 2003, so data has increased 32-fold in the last 4 years. Pretty impressive–and impossible to catalog.

Of course, that curve should level off somewhat as more fields complete their transition to digital. We can be fairly confident that some next-generation storage technology (e.g. holographic storage) will prevent us from running out of room in the physical world for all this data. But just as we have found that you can’t just throw computers at an knowledge problem, you also can’t just throw storage at a data problem.

Not every piece of data is suitable to be posted on the web where google can crawl and index it. Not every search needs an algorithmic best guess based on keywords, incoming links, tags, or user profiling. How we search and catalog our 161 exabytes of data is the next big question… and getting bigger.


Supreme Court Considers Software Patents

Ars Technica covers the latest case, Microsoft v. AT&T, concerning liability for patent infringement for the distribution of software outside of the United States. At stake is the legal status of software: is it more like machines or more like math?

The Ars piece links to an amicus brief filed by the Software Freedom Law Center:

“In contrast to the Federal Circuit, the Supreme Court has maintained limits on patentable subject matter throughout U.S. history,” said Eben Moglen, Executive Director of SFLC. “The Supreme Court has consistently ruled that algorithms and mathematics cannot be patented. Since software is expressed as mathematical algorithms, it should not be patentable.”