Joint WICOW/AIRWeb Workshop on Web Quality (WebQuality 2011)

In conjunction with the 20th International World Wide Web Conference in Hyderabad, India.

March 28, 2011

News: WebQuality 2012 is planned to be held in conjunction with WWW2012


The objective of the workshop is to provide the research communities working on web spam, abuse, credibility, and reputation topics with a survey of current problems and potential solutions. It will present an opportunity for close interaction between practitioners who may have focused on more isolated sub-areas previously. We also want to gather crucial feedback for the academic community from participants representing major industry players on how web content quality research can contribute to practice.

On one hand, the joint workshop will cover the more blatant and malicious attempts that deteriorate web quality such as spam, plagiarism, or various forms of abuse and ways to prevent them or neutralize their impact on information retrieval. On the other hand, it will also provide a venue for exchanging ideas on quantifying finer-grained issues of content credibility and author reputation, and modeling them in web information retrieval.


Themes and Topics

The main themes of the workshop are that of evaluating web information credibility, and identifying and combating qualitatively extreme content (and related behavior), such as spam. These themes encompass a large set of often-related topics and subtopics, as listed below.

Assessing the credibility of content and people on the web and social media.

    Measuring quality of web content
  • Information quality and credibility of web search results, on social media sites, of online mass-media and news, and on the Web in general
  • Estimation of information age, provenance, validity, coverage, and completeness or depth
  • Formation, change, and evolution of opinions
  • Sociological and psychological aspects of information credibility estimation
  • Users studies of information credibility evaluation
    Uncovering distorted and biased content
  • Detecting disagreement and conflicting opinions
  • Detecting disputed or controversial claims
  • Uncovering distorted or biased, inaccurate or false information
  • Uncovering common misconceptions and false beliefs
  • Search models and applications for finding factually correct information on the Web
  • Comparing and evaluating online reviews, product or service testimonials
    Modeling author identity, trust, and reputation
  • Estimating authors' and publishers' reputation
  • Evaluating authors' qualifications and credentials
  • Transparent ranking/reputation systems
  • Author intent detection
  • Capturing personal traits and sentiment
  • Modeling author identity, authorship attribution, and writing style
  • Systems for managing author identity on the Web
  • Revealing hidden associations between authors, commenters, reviewers, etc.
    Role of groups and communities
  • Role of groups, communities, and invisible colleges in the formation of opinions on the Web
  • Social-network-based credibility evaluation
  • Analysis of information dissemination on the Web
  • Common cognitive or social biases in user behavior (e.g., herd behavior)
  • Credibility in collaborative environments (e.g., on Wikipedia)
    Multimedia content credibility
  • Detecting deceptive manipulation or distortion of images and multimedia
  • Hiding content in images
  • Detecting incorrect labels or captions of images on the Web
  • Detecting mismatches between online images and the represented real objects
  • Credibility of online maps

Fighting spam, abuse, and plagiarism on the Web and social media

    Reducing web spam
  • Detecting various types of search engine spam (e.g., link spam, content spam, or cloaking)
  • Uncovering social network spam (e.g., serial sharing and lobbying) and spam in online media (e.g., blog, forum, wiki spam, or tag spam)
  • Identifying review and rating spam
  • Characterizing trends in spamming techniques
    Reducing abuses of electronic messaging systems
  • Detecting e-mail spam
  • Detecting spit (spam over internet telephony) and spim (spam over instant messenger)
    Detecting abuses in internet advertising
  • Click fraud detection
  • Measuring information credibility in online advertising and monetization
    Uncovering plagiarism and multiple-identity issues
  • Detecting plagiarism in general, and in web communities, social networks, and cross-language environments in particular
  • Identifying near-duplicate and versioned content of all kinds (e.g., text, software, image, music, or video)
  • High-similarity retrieval technologies (e.g., fingerprinting and similarity hashing)
    Promoting cooperative behavior in social networks
  • Monitoring vandalism, trolling, and stalking
  • Detecting fake friendship requests with spam intentions
  • Creating incentives for good behavior in social networks
  • User studies of misuse of the Web
    Security issues with online communication
  • Detecting phishing and identity theft
  • Flagging malware (e.g., viruses and spyware)
  • Web forensics

Other adversarial issues

  • Modeling and anticipating responses of adversaries to counter-measures
  • New web infringements
  • Web content filtering
  • Bypassing censorship on the Web
  • Blocking online advertisements
  • Reverse engineering of ranking algorithms
  • Stealth crawling

Accepted Papers

"Spam Detection in Online Classified Advertisements" [slides] [pdf]
Hung Tran, Thomas Hornbeck, Viet Ha-Thuc, James Cremer, Padmini Srinivasan

"Improving Malicious URL Re-Evaluation Scheduling Through an Empirical Study of Malware Download Centers" [slides] [pdf]
Kyle Zeeuwen, Matei Ripeanu, Konstantin Beznosov

"Got Traffic? An Evaluation of Click Traffic Providers" [slides] [pdf]
Qing Zhang, Thomas Ristenpart, Stefan Savage, Geoffrey M. Voelker

"Web Information Analysis for Open-domain Decision Support: System Design and User Evaluation" [slides] [pdf]
Takuya Kawada, Susumu Akamine, Daisuke Kawahara, Yoshikiyo Kato, Yutaka Leon, Kentaro Inui, Sadao Kurohashi, Yutaka Kidawara

"Web Spam Classification: a Few Features Worth More" [pdf]
Miklós Erdélyi, Andras Garzo, András A. Benczúr

"Characterizing the Uncertainty of Web Data: Models and Experiences" [slides] [pdf]
Lorenzo Blanco, Valter Crescenzi, Paolo Merialdo, Paolo Papotti

"Modeling and Evaluating Credibility of Web Applications" [slides] [pdf]
Adriano Pereira, Sara Guimarães, Arlei Silva, Wagner Meira Jr.



[8:30 - 10:00]

Invited Talk Session:

  • Assuring Data Trustworthiness - Concepts and Research Challenges [pdf]
[10:00 - 10:30]

** Coffee Tea Break **

[10:30 - 12:00]

Web Spam Session:

  • "Web Spam Classification: a Few Features Worth More"
  • "Spam Detection in Online Classified Advertisements"
  • "Improving Malicious URL Re-Evaluation Scheduling Through an Empirical Study of Malware Download Centers"
[12:00 - 13:30]

** Lunch Break **

[13:30 - 15:20]

Web Quality Session:

  • "Characterizing the Uncertainty of Web Data: Models and Experiences"
  • "Modeling and Evaluating Credibility of Web Applications"
  • "Got Traffic? An Evaluation of Click Traffic Providers"
  • "Web Information Analysis for Open-domain Decision Support: System Design and User Evaluation"

Invited Talk

Title: Assuring Data Trustworthiness - Concepts and Research Challenges [pdf]

Speaker: Elisa Bertino

Abstract: Today, more than ever, there is a critical need for organizations to share data within and across the organizations so that analysts and decision makers can analyze and mine the data, and make effective decisions. However, in order for analysts and decision makers to produce accurate analysis and make effective decisions and take actions, data must be trustworthy. Therefore, it is critical that data trustworthiness issues, which also include data quality, provenance and lineage, be investigated for organizational data sharing, situation assessment, multi-sensor data integration and numerous other functions to support decision makers and analysts. The problem of providing trustworthy data to users is an inherently difficult problem that requires articulated solutions combining different methods and techniques. In the talk we will first elaborate on the data trustworthiness challenge and discuss a trust fabric framework to address this challenge. The framework is centered on the need of trustworthiness and risk management for decision makers and analysts and includes four key components: identity management, usage management, provenance management and attack management. We will then present an initial approach for assess the trustworthiness of streaming data and discuss open research directions.

Short bio: Elisa Bertino is professor of at the Computer at the Department of Computer Sciences, Purdue University and Research Director of CERIAS. Her main research interests cover many areas in the fields of information security and database systems. Her research combines both theoretical and practical aspects, addressing as well applications in a number of domains, such as medicine and humanities. She is co-editor in chief of VLDB Journal and she is currently a member of the editorial boards of several international journals, including ACM Transactions on Information and System Security, IEEE Internet Computing, IEEE Security&Privacy, Acta Informatica.



Carlos Castillo (Yahoo! Research)
Zoltan Gyongyi (Google Research)
Adam Jatowt (Kyoto University)
Katsumi Tanaka (Kyoto University)

PC Members:
James Caverlee (Texas A&M University)
Gordon Cormack (University of Waterloo)
Matt Cutts (Google)
Brian Davison (Lehigh University)
Dennis Fetterly (Microsoft)
Andrew Flanagin (University of California, Santa Barbara)
Miriam Metzger (University of California, Santa Barbara)
Andrew Tomkins (Google)
Masashi Toyoda (University of Tokyo)
Steve Webb (Georgia Institute of Technology)
Min Zhang (Tsinghua University)
Xiaofang Zhou (University of Queensland)



Emailadam [at] dl [dot] kuis [dot] kyoto-u [dot] ac [dot] jp
Phone and Fax+81-75-753-5909