 |
ohohlfeld.com : blog
|
|

|
|
As the SIGCOMM 2008, held in Seattle this year, is getting closer, I noticed that the accepted papers are now available online. They can be accessed here. A group of researchers in my group at Deutsche Telekom Laboratories will present their Time Machine, which allows later inspection of network activity that becomes interesting in retrospect.
Edit: Serveral papers are reviewed in the blog of Michael Mitzenmacher.
Just a quick side note: The recently elected spokesman of GI/ITG’s MMB section, Prof. Markus Siegle, suit the action to the word. Papers that were published in the 13th (2006) and 14th (2008) GI/ITG Conference on Measurement, Modelling and Evaluation of Computer and Communication Systems (MMB) have been added to the DBLP academic libary, run by Michael Ley at the University of Trier. Thus, these publications can now be included in typical author performance and reputation measures more easily.
Due to the Internet it is easy to “steal” parts or the complete work of others — e.g. essays, theses or other works assigned to students — and re-use them by not labeling it as the work of others (citing). Writing an essay by using the cut & paste technique to copy text blocks from the Internet is easy and quick. Why should a student spend much time on writing an essay that has been already written before? According to a report by the BBC, Student plagiarism is common in the UK and probably becoming more so. In order to limit plagiarism, universities publish guidelines on how to avoid plagiarism. But what exactly is plagiarism? Wikipedia defines plagiarism as
Plagiarism is the practice of claiming or implying original authorship of (or incorporating material from) someone else’s written or creative work, in whole or in part, into one’s own without adequate acknowledgement.
Can there be something as self-plagiarism? Can we steal something from our own work? Yes, in some sense, and it is a problem in academia. I reported recently, that I’m currently involved in the review process for an academic conference. A couple of days ago, one of the reviewers, who worked on a paper that was also assigned to me, claimed to have found a case of self-plagiarism and notified the conference chairs to check this case. Subsequently, the chairs asked the reviewers to check this claim and re-visit their reviews if needed. In the end, the paper has been rejected due to self-plagiarism.
What happened here and why is it bad to steal from oneself? In a first step, I’m going to redefine the term to steal in context of self-plagiarism. It may be adequate when speaking about plagiarism in the sense of stealing a text, but an author cannot steal his own work. I only used this term to highlight the problem of plagiarism in the introduction of this post. According to Roig, “self-plagiarism occurs when authors reuse their own previously written work or data in a ‘new’ written product without letting the reader know that this material has appeared elsewhere” [Roi06]. Thus, self-plagiarism is more about (deceit and fraudulent) concealment than stealing.
But why can it be a problem in academia when authors are reusing previously written work without citing? Well, it is a problem due to novelty of scientific papers. A research paper should present something now, something that was not know before. A new result, a new algorithm, whatever. This makes it interesting and justifies a new publication. Thus, reusing an existing paper means consciously publishing a known fact by claiming to present something new, e.g. in order to increase one’s Google Scholar rating. Academic conferences want to publish and discuss unpublished work and thus self-plagiarism is a problem. (It is alright to publish an extended version or an article based on several conference papers in an academic journal)
And why is it desirable to do self-plagiarism? Well, reusing a previously published paper is much less work than doing originate research and increases the amount of published papers. The amount of published papers is a simple metric that may be used to guess the “competence” of an researcher (as discussed in an previous post). Thus, the more papers published, the better — publish or perish! This fact may entice an author into doing so.

The workshop is finally over and I’m back to Germany. All in all I have to say that IWQoS was a very interesting workshop, having contributions of a very high quality. I want to present a brief résumé here, but I’m not giving an extensive review and thus recommend you to take a look at the program on your own.
- Two-state Markov models for describing transmission channels are still popular (e.g. used by Liu et al.)
- Algorithms in the field of Pre-Congestion Notification are subject to performance evaluations, which is a good thing in general as evaluations of RED active queue management have been published when RED was already widely deployed and thus were too late to be taken into account. It seems like this is not the case for PCN.
- An interesting contribution has been made to the field of profile based traffic classification in the work of Hu et al., where data mining techniques are applied to generate distinct behavioral application profiles. The authors present an evaluation of an rule set for BitTorrent and PPLive. In contrast to the techniques presented in our talk about Spam and Traffic Profiling techniques in 2006, this approach seems to be more flexible — at least at first sight.
- YouTube has been again subject to an extensive evaluation. In contrast to the papers presented at the Internet Measurement Conference in 2007, this paper discusses the social networks formed in YouTube and their small world character.
- The invited talk given by a colleague of David Hutchison entitled QoS: (Still) a Grand Challenged? reviewed the evolution of QoS techniques starting from ATM and Broadband ISDN. The conclusion drawn from this talk is that QoS is still a considerable challenge and security and resilience issues need to be taken more seriously, which seems to be reasonable.However, it remains to be seen whether the delivery of 100 MBit/s to the home really changes the world as much as highlighted in the talk. What is known to me about ADSL service providers is that most of the users are not extensively using the big pipe they pay for and rather stick with ocassionally using HTTP and checking their mail. In the first days of ADSL deployment, those access lines were extensively used by power users and thus resulted in a high increase of traffic in the core. However, traffic in the core increases much more slowly with a increasing number of ADSL users nowadays, as most of the users are not using their access link very extensively. I’m wondering if this will be similar for 100 Mbit/s access links in the future.
Goedenavond,
ik ben aan de Universiteit Twente sinds gisteren middag. Het weer is goed en de conferentie interessant. Gisteren was een groot feest met bekende DJ’s.

Well, in order to address my target audience, I better switch from Dutch to English I’m currently at the University of Twente for IWQoS 2008, which is a three day workshop focusing on Quality of Service in telecommunication networks. As the chair mentioned during the opening session, 40 % of the participants are from USA/Canada, 40% from Europe and 20 % from Asia/Australia.

At IWQoS, I will present my regular paper which addresses stochastic packet loss models as used for generating Quality of Experience impairments. This research is motivated by the study of perceptual video quality of video sequences, which are impaired due to transmission failures (packet loss). In this work, we analytically derive the second-order statistics for the amount of packet losses in multiple time scales from finite state Markovian point processes to be used for adapting the model to the packet loss pattern observed in measurements.


The University of Twente has more the style of an American campus than a European; the campus is located outside of the city and contains student housings, a supermarket, restaurants — and can therefore be considered as a city of its own. We probably do not have many similar campuses in Europe and I really like this design. One really feels to be in a university and not just somewhere in a city center, where occasionally some academic facilities are placed.

When I arrived at the hotel yesterday, a huge party (citymoves) was going on at the campus. I guess around 10.000 people must have attended this open air event where several DJs, which are very famous in the Netherlands (Armin van Burren, Marco V, ATB, …), spinned Trance music.
The first day of the conference was quite interesting. I was quite surprised, that several talks addressed the topic of small buffers and buffer sizing in core routers. A talk considered the introduction of small world networks in Bitorrent trackers in order to maximise the clustering coefficient. Although this are good news for the P2P community, service providers might see this as bad news as Bittorrent clients will more likely establish non-regional connections which will cause more traffic on expensive peerings.
A talk presented findings from the analysis of an propriety P2P video streaming system and highlighted the demand for quality of service in such an unreliable multicast network. It was surprising for me to see that 80.000 users were not able to join the stream at all.
When the last session ended at 6 PM, we had a little welcome reception, helping to get to know each other. A good place to meet interesing people. I’m really looking forward to the dinner tomorrow evening.
Papers submitted to academic conferences typically need to pass an peer review process, where a single paper is assigned to several reviewers who judge the novelty value of the proposed study/solution, its quality, understandability and so on. The peer review process is a good thing, as authors can benefit from the comments of the reviewers and typically the best submitted papers are accepted for publication (A peer review process has drawbacks, but this is a different thing).
I’m currently reviewing papers for a IEEE conference and I’m shocked about the poor quality of some submitted papers. Some conferences are pleasent to the reviewer, when the papers which need to be reviewed typically have a high quality. Some others aren’t. In the remainder of this post, I present some things which bothered me the most about the current submissions.
Some things that bother me as a reviewer:
- The quality of the presentation is poor: When a paper is sloppy typesetted, it can be hard to read and leaves the bad taste in my mouth, that the author also did sloppy work during the evaluation of the presented study. Some papers teem with typing errors — would it be too much to ask the authors to run the spellcheck before submission? Moreover, some authors seem to be inexperienced with the style of technical research papers, which results in strange typesettings and unclear formulas.
- The discussion of related work is sometimes omitted. This always gives one the feeling that the authors are not familiar with the literature and the state of the art in their field, as they are unable to distinguish their study from existing work.
- Please, do not label your study as “extensive”, if it was not extensive or if no study was carried out at all!
- Check the scope of the conference before submission (read the call for papers!) and do not submit papers that are not at all related to the conference (this is just spam).
- Provide reasonable justifications for your assumptions and don’t take them for granted.
- Your assumptions should be reflected in the experiment design!!
- When proposing a new approach, compare its performance to traditional methods to highlight its benefits!
- The methodology / experiment design should be well explained and justified, as this influences the results obtained and their verification. A brief overview of the experiment without mentioning important paramter is simply not enough.
Some notes on the presentation quality:
- Use a spell checker
- Typeset formulas correctly (variables are written in italic, use subscripts, use the usual notation, e.g. a Sigma when denoting sums, …)
- Learn how to use your word processor
- Do not vary with the font size (e.g. in the same paragraph….) or font styles (e.g. changing to sans-serif fonts (this looks like a plagiarism of an inexperienced undergraduate student…)
- Creating understandable plots is more work than just somehow plotting your data! Spend some time on formatting your plot, highlight relevant parts, choose an appropriate design, …
- Prefer vector graphics whenever possible and do not include screenshots of your Matlab desktop!
Hi everyone,
I’m glad to announce that I don’t belong to 82 % of authors whose paper has been rejected from the IEEE IWQoS 2008 (acceptance rate 18 % for full papers in 2006). So I’m currently preparing the camera ready version of my paper and looking forward to a trip to The Netherlands in order to present it in June. I’ll post more information regarding the paper as soon as I’m done with the camera ready version.
Just a brief announcement: the papers presented at the IMC 07 are available on the web. There are many interesting publications and it’s worth to look at some of the papers.
Two papers are covering YouTube conntent [1] and traffic [2]. The first one received the best paper award. The paper by Cha et al. [1] is devoted to the analysis of user generated content offered at YouTube. Content production patterns, user participation and the way of how web surfer’s find content are examined. It was interesting to me that the authors also analysed content aliasing, i.e. multiple copy of the same video are present. They stated that “Most videos have 1 to 4 aliases, while the maximum number of aliases is 89 (…) A large number of aliases are uploaded on the same day as the original video or within a week.” (cf. Section 6.1). Moreover, they showed that simple caching of the most popular videos can offload server traffic by as much as 50%.
In contrast, Gill et al. [2] characterise YouTube traffic measured at the edge (university network) during 85 consecutive days . YouTube traffic was responsible for 4.6 % of the total traffic on the campus Internet link (625,593 videos viewed). The authors also highlight that local caches (in-network) could shrink the traffic, as 50% of the video requests relate to previously requested videos. They state that caching could reduce YouTube traffic in the campus Internet link by a factor of 2, translating to 3.19 TB. However, it was quite interesting to see that although YouTube imposes a limit on the maximum video file size of 100 MB, 0.1 % of the analysed video were larger than that limit. Only 10 % of the analysed videos were larger than 21.9 MB. The file size should reflect the short duration of most videos: “the mean video duration observed on campus is 4.15 minutes with a median of 3.33 minutes (…) 52.3 % of the videos in the all time popular category are between 3 and 5 minutes long.”. They also evaluated the encoding bit-rate of the served videos, suggesting that the target audience are broadband users, the age and rating of the videos. Social networks were also subject to [3].
Dischinger et al. [4] presented a nice analysis of residual broadband access networks (focusing on cable and DSL links) by sending ICMP ping probes and TCP reset packets to sinks. The main research questions were: “1. what are the typical bandwidth, latency and loss characteristics of residual broadband links? 2.) how do the characteristics of broadband networks differ from those of academic or corporate networks and 3.) what are the implications of broadband-network properties for future protocol and system designers?” Some of the findings were that “many cable links show high variation in link bandwidths over shot timescales. Packet transmissions over cable suffer [from?] high jitter as a result of cable’s time-slotted access policy. DSL links show large last-hop delays and considerable deployment of active queue management policies such as random early detection (RED).”
All in all, there are many highly interesting papers and I suggest to take a look at them.
References:
[1] Cha et al.: “I Tube, You Tube, Everybody Tubes: Analyzing the World’s Largest User Generated Content Video System” (2007)
[2] Gill et al.: “YouTube Traffic Characterization: A View From the Edge” (2007)
[3] Mislove et al.: “Measurement and Analysis of Online Social Networks” (2007)
[4] Dischinger et al.: “Characterizing Residential Broadband Networks” (2007)
This years ACM Sigcomm conference is held in Kyoto, Japan. There is a paper about BubbleStorm, a flexible P2P system for meta data distribution and lookup and also about the analysis of Skype traffic. The paper by Xie et al. addresses the dynamics of IP addresses. Oliveira et al. studied the evolution of AS topology. These were the most interesting papers to me.
Update (2008-02-24): some theory related papers presented at ACM Sigcomm’07 are reviewed here.
|
 |
© 2001-2008 by Oliver Hohlfeld, B.Sc.
| Imprint |
|
|
|