SYNC -- Scheduling Your Network Connections

PEOPLE INVOLVED

FUNDING PROVIDED BY

NSF Career Grant CCR-0133077
NSF ITR 02-168
Cisco Systems via Pittsburgh Digital Greenhouse grant 00-1
Spinnaker Networks via Pittsburgh Digital Greenhouse grant 01-1
A gift from the EMC Corporation

SYNOPSIS

The purpose of this project is to resolve the following question: Is it possible to reduce the expected response time of every static HTTP request at a Web server, simply by changing the order in which we schedule the requests?

We find that the answer is largely yes. Our idea is to replace the traditional FAIR (Processor-Sharing) scheduling policy used by Web servers with a different policy, SRPT (Shortest Remaining Processing Time), which gives preference to requests for small files or requests with short remaining file size. The implementation is at the kernel level and involves controlling the order in which socket buffers are drained into the network. Specifically we are scheduling the server's uplink bandwidth according to SRPT.

In [TOCS 2003], we demonstrate via implementation that SRPT scheduling vastly improves mean response times over the traditional FAIR scheduling policy, without penalizing requests for large files, or only negligibly penalizing these. Experiments use Apache and Flash web servers running over Linux under LAN and WAN settings.

In [SIGMETRICS 2001], we consider the same question analytically for an M/G/1 queue. We compare the M/G/1/SRPT queue (SRPT scheduling) with the M/G/1/PS queue (Processor-sharing scheduling). We expect to see that the largest jobs prefer M/G/1/PS as compared with M/G/1/SRPT. However we find that this is not true. In particular we make the following two points:

We prove that when the load is less than 1/2, then all jobs, large and small, have lower (or equal) expected response times under SRPT as compared with PS. This theorem holds for all job size distributions.
We show that the cutoff of "1/2" in the above theorem is higher for heavy-tailed distributions, and can be above ".99" in practice.

In [ITC 2003] (implementation) and [TR 2001] (theory) we consider the scenario of time-varying load, where the Web server alternates between periods of overload and periods of low load. We find that the traditional FAIR scheduling algorithm used in Web servers performs very badly under such a workload. After only a few seconds of overload, web servers start dropping client requests, clients experience timeouts, and client response times soar. Even when the period of overload ends, the effects of overload linger -- the number of connections at the server stay high, and so do the client response times. We show both under implementation [ITC 2003] and in theory [TR 2001] that SRPT scheduling dramatically improves response times under workloads with time-varying load, without penalizing requests for large files. Implementation results are validated under a wide range of real-world conditions including lossy links, geographically-dispersed clients, and various user abort/reload behaviors.

SOME OF OUR RELEVANT PAPERS

[TOCS 2003] : Mor Harchol-Balter, Bianca Schroeder, Nikhil Bansal, Mukesh Agrawal. "Size-based Scheduling to Improve Web Performance." ACM Transactions on Computer Systems , Vol. 21, No. 2, May 2003. Postscript .

[ITC 2003] : Bianca Schroeder and Mor Harchol-Balter. "Web servers under overload: How scheduling can help." 18th International Teletraffic Congress . Berlin, Germany. September 2003. (Original Technical report Number CMU-CS-02-143.) Postscript .

[SIGMETRICS 2001] : Nikhil Bansal and Mor Harchol-Balter "Analysis of SRPT Scheduling: Investigating Unfairness." Proceedings of ACM Sigmetrics 2001 Conference on Measurement and Modeling of Computer Systems. Postscript and Abstract.

[TR 2001] : Nikhil Bansal and Mor Harchol-Balter. "Scheduling Solutions for Coping with Transient Overload" CMU Technical report Number CMU-CS-01-134. Postscript

[USITS 1999] : Mark Crovella, Bob Frangioso, and Mor Harchol-Balter, "Connection Scheduling in Web Servers," USENIX Symposium on Internet Technologies and Systems (USITS '99), Boulder, Colorodo, October '99, pp. 243-254. Postscript and Abstract.

Back to Mor's Home Page .