Re: trecvid: other feature evaluation
- Subject: Re: trecvid: other feature evaluation
- From: "John R Smith" <jsmith@us.ibm.com>
- Date: Thu, 28 Mar 2002 10:09:24 -0500
- Content-type: text/plain; charset=us-ascii
Hi Paul,
Thank you for your feedback, Please again see comments below.
Thanks,
John
---------------------------------------------------
Manager, Pervasive Media Management
IBM T. J. Watson Research Center
30 Saw Mill River Road
Hawthorne, NY 10532
(914) 784-7320; jrsmith@watson.ibm.com
---------------------------------------------------
|---------+---------------------------->
| | Paul Over |
| | <over@nist.gov> |
| | Sent by: |
| | trecvid@nist.gov |
| | |
| | |
| | 03/28/2002 08:27 |
| | AM |
| | Please respond to|
| | trecvid |
| | |
|---------+---------------------------->
>-------------------------------------------------------------------------------------------------------------------------------|
| |
| To: Multiple recipients of list <trecvid@nist.gov> |
| cc: |
| Subject: trecvid: other feature evaluation |
| |
| |
>-------------------------------------------------------------------------------------------------------------------------------|
John,
Follow-up...
Rereading your suggestion, it seems clear you're NOT thinking
about evaluation using an annotated subcollection. Right?
(Feature definitions will of course still be needed.)
[JRS: We are not thinking about detector evaluation using annotated
sub-collections but rather as something tied into the overall retrieval
exercise.]
When you say "Given enough submissions, we could evaluate precision-
recall of the detection results using a pooling method.", do you mean
(assuming shared shot boundaries to enable pooling) NIST assessors
should check each unique shot-feature-value submission for correctness?
[JRS: We should assume shared shot boundaries. By pooling, I mean in the
traditional TREC sense in that the top detection results from contributing
systems (assuming the detector results can be ranked such as by confidence
score) would be combined to determine ground-truth. This would then be
used to give estimate of precision vs. recall for each of the contributing
systems. Given priorities, allowing exchange of detection results between
participants before retrieval exercise is completed is most important.
Scoring of detection results would be nice to have if it can be done
cheaply and with minimal human intervention.]
I ask because this may be a problem for NIST since no assessor time
has been budgeted for this or tools to do the assessment. I had been
thinking about adding additional feature information to the shot
annotation - something we have someone to help with.
[JRS: the above pooling method can be done automatically without human
assessment if necessary, again, given enough submissions.]
- Paul
--
Paul Over - Retrieval Group
Information Access Division
Information Technology Laboratory
National Institute of Standards and Technology
Bldg. 225 Rm. A211 (Mailstop 8940)
Gaithersburg, MD 20899-8940 USA
Voice: 301 975-6784 Fax: 301 975-5287
Date Index |
Thread Index |
Problems or questions? Contact list-master@nist.gov