Bruce has everyone do a round of introductions.

Tomorrow: some presentations & general issues.
Today: discuss the document.
11-12 slides that cover the paper.
clarification questions during, discussion at the end.

Bruce talks on the existing draft.

[See slides -- I only note discussion and some clarification points,
I don't repeat slide text here.]

On metrics and measurements:  
Some metrics are more difficult than others, some have multiple ways of measuring.  This draft uses
  METRIC when mean absolute
  MEASUREMENT when mean observation of metric

On the evolution of this document:

In this document, we are not interested in specifying a single tool for a job
-- goal is to describe the measurements.

As a first step, we have try to place the metrics, measurements, and
tools into a kind of type system (see graph in slides).

We don't try to define measurements.

question: Say you have a triangle of measurement points,
  two use one scheme, one another.  how compare if you don't'
  define the measurements?

We don't try to define NEW measurements -- for example, we don't
write a new document, create a new tool and the put a paper in sigcomm 
We are trying to catalog techniques and create a common lingua
franca.

We also don't want to be approving standard measurements -- no
NMWG stamp-of-approval.

question: We can do something close to approval by listing
   positives & negatives
yes -- we do want to have a frank discussion of the "pros" and "cons".

This document will be a framework for metric schemas and organization.
There will be a companion tool & measurement list maintained on-line,
in order to allow the list to reflect current technology.)

We'd like to end up with an annotated hierarchy, or relationship
with tools at the leaves.  Some tools can be used for two
different metrics, but they are predominantly one or the other.

question: Is this an inheritance relationship? or not necessarily true?

We would like it to be true, since a clear relationship makes the
world cleaner.  We think it can be done to some degree, not sure can
do 100% -- but it's the goal.

The talk ended with a plea for volunteers to help with the next steps:
- finish list of metrics / definitions
- work on document
- start tool categorization


-=-=-=-
Discussion:
[People I know are cited as much as possible, but take all such citations
"with a grain of salt" (treat them as suspect): any errors are those
of the note taker.]

Tiz: describe metrics
  'natural quantity' not performance of given application

   in general, a metric could give estimate of a given application.
   throughput is the best example of this.

brian: depends on app.  how distinguish net from disk.  no idea if
  number has to do with app.

q: how do you define an app.  what's not an application, the hardware?

brian:
"app meas wkg group" - could define as well.

bruce: tool.  output of tool measures something. if can id what tool
measures, can group.  what's trying to meas is metric.  various
tools are meas definition.

q: that's a circular defn.  to me: metric is standard that base measurement on.
  

richard: what is an app.  if the primary 
aim is to do computation & use network;  "user app", not particularly
  relevant to measuring app performance.
FTP: users want.  good way to put measurement that users will find useful.
(tiz: measure all layers below that)
measure disk structures, and everything.

allan: discussion that had before... (rome) low-lev meas folks, and app folks.
have different levels, focus on net meas.  but developing meas useful
to applications.  low-level net.  high-lev: app.  want to couple
both, don't yet know how to couple.

peter: these questions clarification what's about... 
who is consumer of document.  as written now, can't tell who is meant
to use it.  know in data grid for EU, has doc like this, has 
same headings...
how different than EU deliverable 7.2.  what trying to get out.
summary for tool does per-se not good use of wg.

brian: the customer is the grid scheduler -- any grid middleware.
with optimization in mind.
that doc (as remember) has list of tools.
we're going beyond that.  categorize, map to metrics... combine tool
a, b, & c and come up with a useful number.

q: combining tools in this doc? 

 or follow-on doc.

bruce: if can import that doc (EU) as-is, then let's do it.
not aware of a precise enough doc for building catalog.

peter: not pushing any particular doc, just wants to find difference.

peter: do you intend to address the suggestions about what might
appear in info. services (object definition).  

brian: fold into gma event-naming wg.  that's general, and needs
networking involved too.

q: can you define scope of what you mean by network.  if have
network... need to define security boundaries.  does this define
security issues?

bruce: no.  only fuzzy thing contemplated right now... there
are security implications; depends on technique/measurement.
some of these techniques measure performance of firewall.

brian: pros/cons of tools.

bruce: in terms of e2e perf, everything is involved.  app memory,
os, param for os, eth card... all involved.  some tools will measure
everything.  snmp query to fetch cap of hop... just measured that
hop.

q: recommendation that for certain type of app need certain level
of visibility [my paraphrase]

rao: perf now getting tied to security.  firewalls do rate control...
def of bandwidth has to take that into account.  tcp vs udp vs icmp.
can't get away from it entirely, must consider.

q: old art of supercomputer benchmarking.  in group in utrect,
found out that final say is with app that will run.  so in-band
& out-band measurement tool.  app could still have lower perf,
just because of itself or tcp options or something.  need inband
or run on that machine.  we should tackle these issues... how
relate to map to real applications.

brian: most people use iperf.  iperf is incredibly intrusive..
so there are tradeoffs.

les: jump from sample use of network measurements.  particular
example, file transfer throughput, is not a good example.
disks can limit 400mbps transfer link to 80mbps.  think you
might be overselling.

bruce: had a big example, would have taken another 10 pages.

brian: how about put in caveat "assuming infinite disk bandwidth"

group: that's good.

q: visualization, especially if can measure many aspects at once.
certain viz of that metric might point out disk bandwidth, versus
others that might not bring up that point.
so, extend to visualizations?

bruce: right now, have 3 measurements in there in detail... providing
a list of "here's how you might want to analyze this data you collect"
would be very useful.

brian: we need to keep to goals that can be completed in 12 months.  
It's easy to come up with next step scenarios.   watch out!

pdinda: a couple of things
  query app can make
  imposing queries on resources
  so might be helpful to constrain query app can make, and let
   someone else worry about composition.

brian: yes.

pdinda: idea of query app might make?  what's the idea for these tools.

brian: vague ideas
bruce: have ideas, but not part of this document.
is need to develop usage scenarios.  would like 3 -4 in this doc.
tools perspective, app perspective, queries that might be of interest.

full scope of what app might do with data, outside scope.

pdinda: constrain severely so can work on metrics.

martin: describe hierarchy... longest string match.
  grid/network/tcp
  grid/network/tcp/iperf

  query that gives pool of metrics along given path.

brian: next step: drive one number versus query that returns 5.

tiziana: want to keep things simple, but not too simple.
common language & terminology.  one fundamental question
is how to use metrics.  people writing middleware ask networking
people.  in datagrid, define cost functions for optimization of
different grid services.  in my view, high-level metric,
compound, based on metrics in documents.  how to use is the
contribution that is missing, and what the community needs.
very important.

bruce:
metric is: number for closeness of data replicants.
  in their sense, not necessarily base.

wanted to refer to metrics here as network characteristics
refer to measurements as metrics.

(states IPPM stuff from RFC 2330).

q: explicitly only measuring internet performance?  as opposed to
other types of networks.

"inside of supercomputer" - myrnet.

bruce:
want answer whether over internet or over internal connectivity.
where possible, any means of connecting independent processors.
on other hand, don't want 25 pages on MPP interconnectivity.

q: even with myrinet, loss is possible.

go back to metric vs measurement.
by what written, very few metrics (like 4).
with measurements, want to define in quite a precise way.
one-way latency, back-to-back dispersion,
which comes close to precise definition of measurement.
I personally don't like word measurement.  ruler-and-string
is measurement.  " observation of measurement."

jms:
if another group defined terms, just stay with how defined.

q: define metrics, and precisely what measurements are,
and that will be bulk of document.

jitter is metric or measurement.
derive jitter from series of other measurements.

mrt: stick in more examples.  if 4 or 5, stick em all there.
in beginning.  people get impatient.

thomas n: extensions of definitions: not one defn for everyone
to define same quantity.  extend defn... app world delay is not
same delay that app person at layer 3 would see.

brian: should be extendable.

thomas: wg.  what is bigger entity? 

performance area.

thomas:  a perfect thing to spin-off another wg to specify that.
  measurements at app level.  people should volunteer to work on that.

brian:
   we need to finish this by july.  conversions document for next one.
   aggregations & converting between different metrics.
   controversial, but really necessary.  tries to map tools to
   each other & make them work in useful ways.

  room for lots of researchers to do work here.  not ready for wg,
  a res group.

measurement: 

closeness quantity: metric.  fits within scope of document,
or fits outside.  aggregating measurement.

q: focus on what can I observe
next: how can I use it.  naturally goes to two documents.
not less important!

--> brian declares agreement on meas/metric, no one argues.

tiz: slide 10.  throughput, file transfer.  only differ
  by methodology, frequency that would give to user.
 tcp throughput... measure every given number of seconds,
  not terribly different from file transfer.  in principle
  can do with both.  varies - frequency.

broke out because that's things people do.
discussions of singletons.
get into samples... tcp throughput meas over fixed length of
time over freq, easy to define statistical sample.  file
transfer... over different windows in time, what it means
to do this every 30 sec is unclear.

brian: les has data iperf/ftp/bbcp/... shows that don't correlate
   well for the most part.

rao: if someone doing active window control, numbers not correlated.

richard: important to know that, though.  if need to decide where
to get data, that function of path might mean that path is not
suitable.

this slide merges capacity & availability meas onto one slide.

rao: if someone tunes buffers... meas according to that,
  "derived" "converted"... ? 

bruce:
converting between them might be hard.  do best shot
converting without error...

to specify accurately, need to know all kinds of info.

discussion of how networks might tweak BTC.  I noted that it was
thoughts...  bruce noted that some ISPs thought they could control
NORMOS (CMU proj).

thomas:
tcp parameters: work of web100.  lots of parameters that might be useful

ongoing research on perf measurements should be tracked as part of
next steps.

brian: wants a volunteer for tool tax.

thomas: what about caida taxonomy. 
bruce: doesn't go to right level of detail.  but good starting point.
brian: ...
NLANR has one too

jimf: have ones that end users could take most advantage of.
brian: closest to ones this community needs
jimf: may not have right level of detail.

q: problem here, as soon as go into more detail, go into value
judgments on what tool can do.  when pros/cons.

high level can do without value judgments.  if deep into pros/cons
value judgments. 

brian: some are objective: iperf is intrusive

q: skepticism about how deep

brian: but have to do it...

- isn't this a good thing?  encourage people to fix, argue, or say ...
- research papers willing to admit limitations

ppdg/gryphin: put out a call for what do people use.
"IVDDL"  goal of project is to come up with a common set of tools
to deploy across common testbeds used by physics folks

brian: could contact authors, but might not get 

richard:
datagrid developed set of tools, put into framework, distributed
around european datagrid.  look for correlations happening now
young work.

brian: group at lbl doing something similar for net100.

bruce: this is really useful.  4 groups same thing?
if can put together, then will minimize that in future.

les: pipechar, iperf, pathrate, ..., bbcp, gridftp.
pathrate fails over 100mbps   time granularity isn't right.

les was "volunteered" to help.

someone from the back volunteered too: (ratilal haria)

looking for volunteers for document:
  - martin swany
  - thilo kielmann

like to volunteer someone from wp.7.
  - richard hughes-jones


-=-
brian: any glaring omissions.
  bw, cap avail util
  tcp thing not a metric yet???!!!

q: surprised that don't have marginal increase in throughput
given an extra tcp stream.

q: unfairness metric?

dual would be mpath.  decrease in throughput in ensemble given
a large number of streams.
maybe no tools to measure, creates category of thought.

increase in total tput by adding tcp stream
  impact in other users on link

impact of increasing/decreasing window.
  surely that is a measurement.
  [measurement in terms of ]

q: can do roughly.  want utility.  better to have > 1 stream
versus 1 stream.

BW with multiple tcp measurements is a measurement, not metric.

fn(cap of link, util of link, characteristics of other apps, [also TCP])

very dynamic, not static.

characteristic of dynamic network.  stability of meas over time series.
we all invest knowing that past performance not 

metric: marginal utility of TCP stream.
  that is a metric.  is that metric derived from other metrics.

  related to utilization.

  metric is in some sense "indivisible" -- have to measure.
  can measure by other metrics.
  at a loss, haven't studied that sufficiently well, to understand

point of writing this down is to have a discussion.

perhaps - distribution of traffic on link, # of lows types of flows

deb: if want to do that, need to add metric to availability... need
to make assumptions about other flows.

static and dynamic stuff.

rao: availability & usability.

deb: availability depends on how go after.  what's not known is
how to.

richard:
availability: what program gets.
   conditions: define measurement.
    all measurements of metrics.

[losses & max bandwidth..... # of flows]
-=-

peter: no notion of error..

Meeting ended because of time constraints.

=============

KEY to people that I know in the discussion above:
brian: brian tierney
bruce: bruce lowekamp
deb: deborah agarwal
jimf: jim ferguson
jms: jennifer schopf
les: les cottrell
martin: martin swany
mrt: mary thompson
pdinda: peter dinda
peter: peter clarke
q: any person that I didn't know or didn't catch who said
rao: nageswara rao
richard: richard hughes-jones
tiz: tiziana ferarri
thomas: thomas ndousse