Bruce has everyone do a round of introductions. Tomorrow: some presentations & general issues. Today: discuss the document. 11-12 slides that cover the paper. clarification questions during, discussion at the end. Bruce talks on the existing draft. [See slides -- I only note discussion and some clarification points, I don't repeat slide text here.] On metrics and measurements: Some metrics are more difficult than others, some have multiple ways of measuring. This draft uses METRIC when mean absolute MEASUREMENT when mean observation of metric On the evolution of this document: In this document, we are not interested in specifying a single tool for a job -- goal is to describe the measurements. As a first step, we have try to place the metrics, measurements, and tools into a kind of type system (see graph in slides). We don't try to define measurements. question: Say you have a triangle of measurement points, two use one scheme, one another. how compare if you don't' define the measurements? We don't try to define NEW measurements -- for example, we don't write a new document, create a new tool and the put a paper in sigcomm We are trying to catalog techniques and create a common lingua franca. We also don't want to be approving standard measurements -- no NMWG stamp-of-approval. question: We can do something close to approval by listing positives & negatives yes -- we do want to have a frank discussion of the "pros" and "cons". This document will be a framework for metric schemas and organization. There will be a companion tool & measurement list maintained on-line, in order to allow the list to reflect current technology.) We'd like to end up with an annotated hierarchy, or relationship with tools at the leaves. Some tools can be used for two different metrics, but they are predominantly one or the other. question: Is this an inheritance relationship? or not necessarily true? We would like it to be true, since a clear relationship makes the world cleaner. We think it can be done to some degree, not sure can do 100% -- but it's the goal. The talk ended with a plea for volunteers to help with the next steps: - finish list of metrics / definitions - work on document - start tool categorization -=-=-=- Discussion: [People I know are cited as much as possible, but take all such citations "with a grain of salt" (treat them as suspect): any errors are those of the note taker.] Tiz: describe metrics 'natural quantity' not performance of given application in general, a metric could give estimate of a given application. throughput is the best example of this. brian: depends on app. how distinguish net from disk. no idea if number has to do with app. q: how do you define an app. what's not an application, the hardware? brian: "app meas wkg group" - could define as well. bruce: tool. output of tool measures something. if can id what tool measures, can group. what's trying to meas is metric. various tools are meas definition. q: that's a circular defn. to me: metric is standard that base measurement on. richard: what is an app. if the primary aim is to do computation & use network; "user app", not particularly relevant to measuring app performance. FTP: users want. good way to put measurement that users will find useful. (tiz: measure all layers below that) measure disk structures, and everything. allan: discussion that had before... (rome) low-lev meas folks, and app folks. have different levels, focus on net meas. but developing meas useful to applications. low-level net. high-lev: app. want to couple both, don't yet know how to couple. peter: these questions clarification what's about... who is consumer of document. as written now, can't tell who is meant to use it. know in data grid for EU, has doc like this, has same headings... how different than EU deliverable 7.2. what trying to get out. summary for tool does per-se not good use of wg. brian: the customer is the grid scheduler -- any grid middleware. with optimization in mind. that doc (as remember) has list of tools. we're going beyond that. categorize, map to metrics... combine tool a, b, & c and come up with a useful number. q: combining tools in this doc? or follow-on doc. bruce: if can import that doc (EU) as-is, then let's do it. not aware of a precise enough doc for building catalog. peter: not pushing any particular doc, just wants to find difference. peter: do you intend to address the suggestions about what might appear in info. services (object definition). brian: fold into gma event-naming wg. that's general, and needs networking involved too. q: can you define scope of what you mean by network. if have network... need to define security boundaries. does this define security issues? bruce: no. only fuzzy thing contemplated right now... there are security implications; depends on technique/measurement. some of these techniques measure performance of firewall. brian: pros/cons of tools. bruce: in terms of e2e perf, everything is involved. app memory, os, param for os, eth card... all involved. some tools will measure everything. snmp query to fetch cap of hop... just measured that hop. q: recommendation that for certain type of app need certain level of visibility [my paraphrase] rao: perf now getting tied to security. firewalls do rate control... def of bandwidth has to take that into account. tcp vs udp vs icmp. can't get away from it entirely, must consider. q: old art of supercomputer benchmarking. in group in utrect, found out that final say is with app that will run. so in-band & out-band measurement tool. app could still have lower perf, just because of itself or tcp options or something. need inband or run on that machine. we should tackle these issues... how relate to map to real applications. brian: most people use iperf. iperf is incredibly intrusive.. so there are tradeoffs. les: jump from sample use of network measurements. particular example, file transfer throughput, is not a good example. disks can limit 400mbps transfer link to 80mbps. think you might be overselling. bruce: had a big example, would have taken another 10 pages. brian: how about put in caveat "assuming infinite disk bandwidth" group: that's good. q: visualization, especially if can measure many aspects at once. certain viz of that metric might point out disk bandwidth, versus others that might not bring up that point. so, extend to visualizations? bruce: right now, have 3 measurements in there in detail... providing a list of "here's how you might want to analyze this data you collect" would be very useful. brian: we need to keep to goals that can be completed in 12 months. It's easy to come up with next step scenarios. watch out! pdinda: a couple of things query app can make imposing queries on resources so might be helpful to constrain query app can make, and let someone else worry about composition. brian: yes. pdinda: idea of query app might make? what's the idea for these tools. brian: vague ideas bruce: have ideas, but not part of this document. is need to develop usage scenarios. would like 3 -4 in this doc. tools perspective, app perspective, queries that might be of interest. full scope of what app might do with data, outside scope. pdinda: constrain severely so can work on metrics. martin: describe hierarchy... longest string match. grid/network/tcp grid/network/tcp/iperf query that gives pool of metrics along given path. brian: next step: drive one number versus query that returns 5. tiziana: want to keep things simple, but not too simple. common language & terminology. one fundamental question is how to use metrics. people writing middleware ask networking people. in datagrid, define cost functions for optimization of different grid services. in my view, high-level metric, compound, based on metrics in documents. how to use is the contribution that is missing, and what the community needs. very important. bruce: metric is: number for closeness of data replicants. in their sense, not necessarily base. wanted to refer to metrics here as network characteristics refer to measurements as metrics. (states IPPM stuff from RFC 2330). q: explicitly only measuring internet performance? as opposed to other types of networks. "inside of supercomputer" - myrnet. bruce: want answer whether over internet or over internal connectivity. where possible, any means of connecting independent processors. on other hand, don't want 25 pages on MPP interconnectivity. q: even with myrinet, loss is possible. go back to metric vs measurement. by what written, very few metrics (like 4). with measurements, want to define in quite a precise way. one-way latency, back-to-back dispersion, which comes close to precise definition of measurement. I personally don't like word measurement. ruler-and-string is measurement. " observation of measurement." jms: if another group defined terms, just stay with how defined. q: define metrics, and precisely what measurements are, and that will be bulk of document. jitter is metric or measurement. derive jitter from series of other measurements. mrt: stick in more examples. if 4 or 5, stick em all there. in beginning. people get impatient. thomas n: extensions of definitions: not one defn for everyone to define same quantity. extend defn... app world delay is not same delay that app person at layer 3 would see. brian: should be extendable. thomas: wg. what is bigger entity? performance area. thomas: a perfect thing to spin-off another wg to specify that. measurements at app level. people should volunteer to work on that. brian: we need to finish this by july. conversions document for next one. aggregations & converting between different metrics. controversial, but really necessary. tries to map tools to each other & make them work in useful ways. room for lots of researchers to do work here. not ready for wg, a res group. measurement: closeness quantity: metric. fits within scope of document, or fits outside. aggregating measurement. q: focus on what can I observe next: how can I use it. naturally goes to two documents. not less important! --> brian declares agreement on meas/metric, no one argues. tiz: slide 10. throughput, file transfer. only differ by methodology, frequency that would give to user. tcp throughput... measure every given number of seconds, not terribly different from file transfer. in principle can do with both. varies - frequency. broke out because that's things people do. discussions of singletons. get into samples... tcp throughput meas over fixed length of time over freq, easy to define statistical sample. file transfer... over different windows in time, what it means to do this every 30 sec is unclear. brian: les has data iperf/ftp/bbcp/... shows that don't correlate well for the most part. rao: if someone doing active window control, numbers not correlated. richard: important to know that, though. if need to decide where to get data, that function of path might mean that path is not suitable. this slide merges capacity & availability meas onto one slide. rao: if someone tunes buffers... meas according to that, "derived" "converted"... ? bruce: converting between them might be hard. do best shot converting without error... to specify accurately, need to know all kinds of info. discussion of how networks might tweak BTC. I noted that it was thoughts... bruce noted that some ISPs thought they could control NORMOS (CMU proj). thomas: tcp parameters: work of web100. lots of parameters that might be useful ongoing research on perf measurements should be tracked as part of next steps. brian: wants a volunteer for tool tax. thomas: what about caida taxonomy. bruce: doesn't go to right level of detail. but good starting point. brian: ... NLANR has one too jimf: have ones that end users could take most advantage of. brian: closest to ones this community needs jimf: may not have right level of detail. q: problem here, as soon as go into more detail, go into value judgments on what tool can do. when pros/cons. high level can do without value judgments. if deep into pros/cons value judgments. brian: some are objective: iperf is intrusive q: skepticism about how deep brian: but have to do it... - isn't this a good thing? encourage people to fix, argue, or say ... - research papers willing to admit limitations ppdg/gryphin: put out a call for what do people use. "IVDDL" goal of project is to come up with a common set of tools to deploy across common testbeds used by physics folks brian: could contact authors, but might not get richard: datagrid developed set of tools, put into framework, distributed around european datagrid. look for correlations happening now young work. brian: group at lbl doing something similar for net100. bruce: this is really useful. 4 groups same thing? if can put together, then will minimize that in future. les: pipechar, iperf, pathrate, ..., bbcp, gridftp. pathrate fails over 100mbps time granularity isn't right. les was "volunteered" to help. someone from the back volunteered too: (ratilal haria) looking for volunteers for document: - martin swany - thilo kielmann like to volunteer someone from wp.7. - richard hughes-jones -=- brian: any glaring omissions. bw, cap avail util tcp thing not a metric yet???!!! q: surprised that don't have marginal increase in throughput given an extra tcp stream. q: unfairness metric? dual would be mpath. decrease in throughput in ensemble given a large number of streams. maybe no tools to measure, creates category of thought. increase in total tput by adding tcp stream impact in other users on link impact of increasing/decreasing window. surely that is a measurement. [measurement in terms of ] q: can do roughly. want utility. better to have > 1 stream versus 1 stream. BW with multiple tcp measurements is a measurement, not metric. fn(cap of link, util of link, characteristics of other apps, [also TCP]) very dynamic, not static. characteristic of dynamic network. stability of meas over time series. we all invest knowing that past performance not metric: marginal utility of TCP stream. that is a metric. is that metric derived from other metrics. related to utilization. metric is in some sense "indivisible" -- have to measure. can measure by other metrics. at a loss, haven't studied that sufficiently well, to understand point of writing this down is to have a discussion. perhaps - distribution of traffic on link, # of lows types of flows deb: if want to do that, need to add metric to availability... need to make assumptions about other flows. static and dynamic stuff. rao: availability & usability. deb: availability depends on how go after. what's not known is how to. richard: availability: what program gets. conditions: define measurement. all measurements of metrics. [losses & max bandwidth..... # of flows] -=- peter: no notion of error.. Meeting ended because of time constraints. ============= KEY to people that I know in the discussion above: brian: brian tierney bruce: bruce lowekamp deb: deborah agarwal jimf: jim ferguson jms: jennifer schopf les: les cottrell martin: martin swany mrt: mary thompson pdinda: peter dinda peter: peter clarke q: any person that I didn't know or didn't catch who said rao: nageswara rao richard: richard hughes-jones tiz: tiziana ferarri thomas: thomas ndousse