Submission

iUnit Retrieval Subtask

Input

The input of the iUnit retrieval subtask is a TSV file that contains a list of queries:

  • MC1-E-Queries.tsv (for English), or
  • MC1-J-Queries.tsv (for Japanese)

The each line of these query files is of the following format:

[qid][TAB][query]

where [qid] is a query ID.

e.g.

MC1-E-0001[TAB]marlon brando acting style

Output

The output of the iUnit retrieval subtask must be stored in a single TSV file for each system.

File Name Format

The filename must be

RET-[teamID]-[language]-[runtype]-[integer].tsv

where

  • [teamID] is your registered team ID, e.g. MSRA
  • [language] is either E (English) or J (Japanese)
  • [runtype] is the identifier of a run type and must be either “MAND” (MANDATORY) or “OPEN”.
    • MANDATORY: Organizers will provide baseline search results and their page contents for each query. Participants must use these contents only to generate a list of iUnits. Note that any data resources can be used for estimating the importance of each iUnit.
    • OPEN: Participants may choose to search the live web on their own to generate a list of iUnits. Any run extracts iUnits from at least some privately-obtained web search results is considered as an OPEN run, even if it also uses the baseline data.
  • [integer] is a unique integer for each team’s run starting from 1, which represents the priority of a run file. Run files with a smaller [integer] will be evaluated with higher priority in the event that organizers do not have resources enough to evaluate all the submissions. Note that at least one MANDATORY run with the highest priority will be evaluated regardless of its [integer].

Some example run names for a team “MSRA” would be:

RET-MSRA-E-MAND-1.tsv
RET-MSRA-E-OPEN-2.tsv
RET-MSRA-E-MAND-3.tsv
RET-MSRA-E-OPEN-4.tsv

File Content Format

All run files should be encoded in UTF-8. [TAB] is used as the separator.

Each run file begins with exactly one system description line, which should be in the following format:

SYSDESC[TAB][brief one-sentence system description in English]

Please make sure the description text does not contain a newline symbol.

Below the system description line, there must be an output line for each iUnit. Each output line should contain a query ID, an iUnit for the query, and a knowledge source of the iUnit. The required format is:

[qid][TAB][iUnit][TAB][score][TAB][source]

[score] is the score of the iUnit by which the iUnits are sorted in descending order. Please use the reciprocal rank of iUnits if your system does not assign any score to iUnits.

[source] is the knowledge sources from which you extract the iUnit. This will be used for investigating what kinds of knowledge sources the participating teams have utilized. [source] must be either a URL (for OPEN runs) or a filename (for MANDATORY runs).

The order of the output lines is interpreted as a ranking of iUnits. For example, the evaluation system interprets the lines below as meaning that the rank of x is higher than that of y.

MC1-E-0001  x   0.9 http://example.com/index1.html
MC1-E-0001  y   0.8 http://example.com/index2.html

iUnit Summarization Subtask

Input

The input of the iUnit summarization subtask is a query file described earlier.

i.e.

  • MC1-E-Queries.tsv (for English), or
  • MC1-J-Queries.tsv (for Japanese)

Participating teams can use a list of iUnits distributed by the organizers:

  • MC1-E-iUnits.tsv (for English), or
  • MC1-J-iUnits.tsv (for Japanese)

The format of these files is the same as that of the output of the iUnit retrieval subtask.

Output

The output of the iUnit summarization subtask must be stored in a single XML file for each system.

File Name Format

The filename must be

SUM-[teamID]-[language]-[runtype]-[integer].xml

See the file name format of the iUnit retrieval subtask for the variables.

[runtype] is the identifier of a run type and must be either “MAND” (MANDATORY) or “OPEN”.

  • MANDATORY: Participants must use a iUnit list distributed by the organizers only to generate summaries. Note that any data resources can be used for estimating the importance of each iUnit.
  • OPEN: Participants may choose to search the live web on their own to generate summaries. Any run uses contents from at least some privately-obtained web search results is considered as an OPEN run, even if it also uses the baseline data.

Some example run names for the team “MSRA” would be:

SUM-MSRA-E-MAND-1.xml
SUM-MSRA-E-OPEN-2.xml
SUM-MSRA-E-MAND-3.xml
SUM-MSRA-E-OPEN-4.xml

File Content Format

The output of iUnit summarization subtask is called X-strings, which are the system output for each query and two-layered summarization in the MobileClick task.

  • The XML file includes a [results] node as the root node.
  • The [results] node contains exactly one [sysdesc] node. Please briefly describe your system in English here.
  • The [results] node also contains [result] nodes, each of which corresponds an X-string and has a [qid] attribute.
  • A [result] node contains a [firstlayer] node and [secondlayer] nodes.
  • The [firstlayer] node contains text and [link] nodes, which represents a link to a [secondlayer] node like ‘a’ tag in HTML.
  • A [link] node has an attribute [id], which specifies a [secondlayer] to be linked.
  • The [secondlayer] nodes has an attribute [id], and contains text.
  • NOTE: The length of text in the first layer is limited to L, and the length of text in each second layer is also limited to L. L is 280 characters for the English iUnit Summarization Subtask, while L is 140 characters for the Japanese iUnit Sumarization Subtask. Symbols (such as ‘,’ and ‘(‘) are excluded. Excess text will be truncated in evaluation.

The DTD is shown below:

 
 <!ELEMENT results (sysdesc, result*)>
 <!ELEMENT sysdesc (#PCDATA)>
 <!ELEMENT result (firstlayer, secondlayer*)>
 <!ELEMENT firstlayer (#PCDATA | link)*>
 <!ELEMENT secondlayer (#PCDATA)>
 <!ELEMENT link (#PCDATA)>
 <!ATTLIST result qid ID #REQUIRED>
 <!ATTLIST link id CDATA #REQUIRED>
 <!ATTLIST secondlayer id CDATA #REQUIRED>
 

Please look at the example below:

 
  <results>
  <sysdesc>This result comes from the sample file</sysdesc>
  <result qid="MC-SAMPLE-E-0001">
  <firstlayer>
    Marlon Brando brought the techniques of method acting to prominence along with his
    Stanislavski System training and improvisational skills. 
    <link id="1">Notable Related Films</link>
    <link id="2">Effects on Others</link>
  </firstlayer>
  <secondlayer id="1">
    He brought method acting to prominence in the films A Streetcar Named Desire and On the
    Waterfront, both directed by Elia Kazan in the early 1950s. Brando was voted the Academy 
    Award for Best Actor for his intelligent performance; once again, he improvised 
    important details that lent more humanity to what could otherwise have been a cliched role.
  </secondlayer>
  <secondlayer id="2">
    His acting style, combined with his public persona as an outsider uninterested in the 
    Hollywood of the early 1950s, had a profound effect on a generation of actors, including 
    James Dean and Paul Newman , and later stars, including Robert De Niro.
  </secondlayer>
  </result>
 ...
  </results>