CS237 Distributed Systems Middleware
Lecture Notes
Set 1: Middleware and Distributed Systems Introduction
Set 2: Fundamentals: Time, State and Coordination in Distributed Systems
Set 3: Distributed Computing Architectures
Set 4: Messaging Middlewares and Pub/Sub Systems
Set 5: Fault Tolerance in Distributed Systems
Set 6: Early Middleware Frameworks (DCE, Corba, Hadoop)
Set 7: Java-based Technologies
Set 8: Service Oriented Middleware
Set 9: Cloud and Virtualization Platforms
Homeworks
Submitting Homeworks: Homeworks should be submitted as a single PDF document via Gradescope. If you wrote your assignment by hand, please scan it and submit a PDF. When uploading assignments, please ensure that there is at most one problem per page. (One problem can span multiple pages, but you should not have two problems on one page).
Homework 1: PDF (due Monday, April 18, 2022 at 11:59pm, via Gradescope)
Homework 2: PDF (due Wednesday, May 4, 2022 at 11:59pm, via Gradescope; ignore the due date listed in the PDF)
Homework 3: Paper Summaries from Summary Set 3 (due Wednesday, May 25, 2022 at 11:59pm, via Gradescope)
Homework 4: Paper Summaries from Summary Set 4 (due Monday, June 6, 2022 at 11:59pm, via Gradescope)
Course Reading Materials
How to read a paper:
- How to Read a Paper S. Keshav David R. Cheriton School of Computer Science, University of Waterloo Waterloo, ON, Canada
Reference Books:
- Coulouris et al Distributed Systems: Concepts & Design,4th ed. ISBN: 0-321-26354-5.
- Tanenbaum & van Steen Distributed Systems: Principles and Paradigms, 2nd ed. ISBN: 0-132-39227-5.
- Ben-Ari Principles of Concurrent and Distributed Programming Prentice-Hall International Series in Computer Science, 1990.
- Sape Mullender Distributed Systems Second Edition, Addison-Wesley, 1998.
- Haggit Attiya and Jennifer Welch Distributed Computing: Fundamentals, Simulations and Advanced Topics
- McGraw Hill, 1998.
- Robert Orfali and Dan Harkey Client/Server Programming with Java and CORBA, Second Edition John Wiley and Sons Inc., 1998
Middleware and Distributed Systems Introduction (NO REVIEW REQUIRED):
- Middleware David E. Bakken: Encyclopedia of Distributed Computing, Kluwer Academic Publisher.
- Managing Complexity: Middleware Explained Andrew T. Campbell, Geoff Coulson, and Michael E. Kounavis IT Professional, Vol. 1, No. 5, September/October 1999.
- Middleware a model for distributed system services Philip A. Bernstein; Commun. ACM 39, 2 (Feb. 1996), Pages 86 - 98
- An architecture for next generation middleware G. S. Blair, G. Coulson, P. Robin, and M. Papathomas. 2009. In IFIP
- Fallacies of Distributed Computing Explained by Arnon Rotem-Gal-Oz
Coordination in Distributed Systems (SUMMARY SET 1):
Time, State and Coordination:
- Lamport, "Time, Clocks and the Ordering of Events in a Distributed System", Communications of the ACM, 1978
- Mattern, "Virtual Time and Global States of Distributed Systems", Proc. Workshop on Parallel and Distributed Algorithms, 1989
- M. Chandy and L. Lamport, "Distributed Snapshots: Determining Global States of Distributed Systems", ACM Transactions on Computer Systems, 1985
- Kshemkalyanit, M. Raynalt and M. Singhals, "An introduction to snapshot algorithms in distributed computing,1999.
- Raynal M. and Singhal M., Logical time: Capturing causality in distributed systems, Computer, vol. 29, pp. 49-56, 1996.
- Hunt, Patrick, et al. ZooKeeper: Wait-free Coordination for Internet-scale Systems. USENIX annual technical conference. Vol. 8. No. 9. 2010.
- Burrows, Mike.The Chubby lock service for loosely-coupled distributed systems. Proceedings of the 7th symposium on Operating systems design and implementation. 2006.
Recent Systems:
Lakshman, Avinash, and Prashant Malik. "Cassandra: a decentralized structured storage system.", ACM SIGOPS Operating Systems Review 44.2 (2010): 35-40.
Corbett, James C., et al. "Spanner: Google’s globally distributed database.", ACM Transactions on Computer Systems (TOCS) 31.3 (2013): 1-22.
Gao, Roshan Sumbaly Jay Kreps Lei, and Alex Feinberg Chinmay Soman Sam Shah. "Serving Large-scale Batch Computed Data with Project Voldemort."
Additional Papers:
Cristian and C. Fetzer, "Fault-tolerant external clock synchronization", ICDCS 1995
C.l Fidge, Timestamps in message-passing systems that preserve the partial ordering , Australian Computer Sci. Comm. 10 (I) (February 1988) 56-66.
C.l Fidge, Fundamentals of distributed system observation, IEEE Software 13 (6) (November 1996) 77-83.
C.l Fidge, A limitation of vector timestamps for reconstructing distributed computations, in: Elsevier Science, 1998, Information Processing Letters 87-91.
Schwarz, R. and Mattern, F. "Detecting causal relationships in distributed computations, Distributed Computing, 1994.
Distributed Computing Architectures(SUMMARY SET 2):
- Y. Chawathe, S. Ratnaswamy, L. Breslau, N. Lanham, S. Shenker, "Making Gnutella-like P2P Systems Scalable", ACM Sigcomm 2003
- Daniel Stutzbach, Reza Rejaie, and Subhabrata Sen, "Characterizing Unstructured Overlay Topologies in Modern P2P File-Sharing Systems", Networking, 2008. (Gnutella related)
- Dongyu Qiu and R. Srikant, "Modeling and performance analysis of BitTorrent-like peer-to-peer networks", Proceedings of ACM Sigcomm, 2004.
- I. Stoica, R. Morris, D. Karger, M. Frans Kaashoek, H.Balakrishnan, "Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications", Proc. ACM Sigcomm, August 2001.
- S. Ratnasamy, P. Francis, M. Handley, R. Karp, S. Shenker, "A Scalable Content-Addressable Network", Proceedings of ACM Sigcomm, August, 2001.
- A. Rowstron, P. Druschel, "Pastry: Scalable, distributed object location and routing for large-scale peer-to-peer systems", IFIP/ACM International Conference on Distributed Systems Platforms (Middleware), November, 2001.
- J Kubiatowicz, "OceanStore: an architecture for global-scale persistent storage", SIGPLAN 2000
- DeCandia, Giuseppe, et al. "Dynamo: amazon's highly available key-value store.", ACM SIGOPS 2007.
- Qin Lv, Pei Cao, Edith Cohen, Kai Li, and Scott Shenker, "Search and Replication in Unstructured Peer-to-Peer Networks", 16th international conference on Supercomputing. 2002.
- Mishra, Asit K., et al. "Towards Characterizing Cloud Backend Workloads: Insights from Google Compute Clusters.", ACM SIGMETRICS 2010.
- Liu, Yang, Jogesh K. Muppala, and Malathi Veeraraghavan. "A survey of data center network architectures.", 2014.
- Satyanarayanan, Mahadev. "The emergence of edge computing.", Computer 50.1, 2017.
- Karger, David, et al."Consistent hashing and random trees: Distributed caching protocols for relieving hot spots on the world wide web.", 29th ACM symposium on Theory of computing. 1997.
Messaging Middlewares, Pub/Sub Systems (SUMMARY SET 3):
- Banavar, Guruduth, et al. "A Case for Message Oriented Middleware", International Symposium on Distributed Computing 1999
- Fabret, Françoise, et al. "Filtering algorithms and implementation for very fast publish/subscribe systems.", SIGMOD 2001
- Banerjee, B. Bhattacharjee and C. Kommareddy, "Scalable Application Layer Multicast", ACM SIGCOMM 2002
- Castro, Miguel, et al. "SCRIBE: A large-scale and decentralized application-level multicast infrastructure", 2002
- Pietzuch, Peter R., and Jean M. Bacon. "Hermes: A distributed event-based middleware architecture", ICDCSW 2002
- Eugster, Patrick Th, et al. "The Many Faces of Publish/Subscribe", ACM computing surveys 2003
- Jacobsen, Hans-Arno, et al. "The PADRES publish/subscribe system.", 2006
- Jacobsen, Hans-Arno, et al. "Content-based publish-subscribe over structured overlay networks", ICDCS 2005
- Chockler, Gregory, et al. "Spidercast: a scalable interest-aware overlay for topic-based pub/sub communication", DEBS 2007
- Setty, Vinay, et al. "Poldercast: Fast, robust, and scalable architecture for P2P topic-based pub/sub" ACM International Conference on Distributed Systems Platforms and Open Distributed Processing 2012
- Hojjat Jafarpour, Bijit Hore, Sharad Mehrotra and Nalini Venkatasubramanian. "CCD: Efficient Customized Content Dissemination in Distributed Publish/Subscribe.", Middleware 2009
- Hojjat Jafarpour, Sharad Mehrotra and Nalini Venkatasubramanian. " A Fast and Robust Content-based Publish/Subscribe Architecture", IEEE NCA 2008
- Kreps, Jay, Neha Narkhede, and Jun Rao. "Kafka: A distributed messaging system for log processing.", Proceedings of the NetDB 2011
- Carbone, Paris, et al. "Apache flink: Stream and batch processing in a single engine", Bulletin of the IEEE Computer Society Technical Committee on Data Engineering 2015
- M. Y. S. Uddin, and N. Venkatasubramanian. "Edge Caching for Enriched Notifications Delivery in Big Active Data", ICDCS 2018
- Destounis, Apostolos, Georgios S. Paschos, and Iordanis Koutsopoulos. "Streaming big data meets backpressure in distributed network computation", INFOCOM 2016
- Tatbul, Nesime, Uǧur Çetintemel, and Stan Zdonik. "Staying fit: Efficient load shedding techniques for distributed stream processing", Proceedings of the 33rd international conference on Very large data bases 2007
Additional Papers:
- Pardo-Castellote, Gerardo. "Omg data-distribution service: Architectural overview.", ICDCSW 2003
- Object Management Group Data Distribution Service™
- Apache Pulsar
- Apache Storm
Fault Tolerance in Distributed Systems (SUMMARY SET 4):
Time, State and Coordination:
- Lamport, "Time, Clocks and the Ordering of Events in a Distributed System", Communications of the ACM, 1978
- Mattern, "Virtual Time and Global States of Distributed Systems", Proc. Workshop on Parallel and Distributed Algorithms, 1989
- M. Chandy and L. Lamport, "Distributed Snapshots: Determining Global States of Distributed Systems", ACM Transactions on Computer Systems, 1985
- Kshemkalyanit, M. Raynalt and M. Singhals, "An introduction to snapshot algorithms in distributed computing,1999.
- Raynal M. and Singhal M., Logical time: Capturing causality in distributed systems, Computer, vol. 29, pp. 49-56, 1996.
- Hunt, Patrick, et al. ZooKeeper: Wait-free Coordination for Internet-scale Systems. USENIX annual technical conference. Vol. 8. No. 9. 2010.
- Burrows, Mike.The Chubby lock service for loosely-coupled distributed systems. Proceedings of the 7th symposium on Operating systems design and implementation. 2006.
Recent Systems:
- Lakshman, Avinash, and Prashant Malik. "Cassandra: a decentralized structured storage system.", ACM SIGOPS Operating Systems Review 44.2 (2010): 35-40.
- Corbett, James C., et al. "Spanner: Google’s globally distributed database.", ACM Transactions on Computer Systems (TOCS) 31.3 (2013): 1-22.
- Gao, Roshan Sumbaly Jay Kreps Lei, and Alex Feinberg Chinmay Soman Sam Shah. "Serving Large-scale Batch Computed Data with Project Voldemort."
Additional Papers:
- Cristian and C. Fetzer, "Fault-tolerant external clock synchronization", ICDCS 1995
- C.l Fidge, Timestamps in message-passing systems that preserve the partial ordering , Australian Computer Sci. Comm. 10 (I) (February 1988) 56-66.
- C.l Fidge, Fundamentals of distributed system observation, IEEE Software 13 (6) (November 1996) 77-83.
- C.l Fidge, A limitation of vector timestamps for reconstructing distributed computations, in: Elsevier Science, 1998, Information Processing Letters 87-91.
- Schwarz, R. and Mattern, F. "Detecting causal relationships in distributed computations, Distributed Computing, 1994.
Fault Tolerance and Reliability :
Consensus
- J. Fischer, N. A. Lynch, and M. S. Paterson, "Impossibility of Distributed Consensus with One Faulty Process", Journal of ACM, 1985
- Dolev, C. Dwork, L. Stockmeyer, "On the Minimal Synchronism Needed for Distributed Consensus", Journal of ACM, 1987.
- Lamport, Leslie, "Paxos made simple", ACM Sigact News 32.4 (2001).
- Ongaro, Diego, and John Ousterhout, "In search of an understandable consensus algorithm", 2014 USENIX Annual Technical Conference. --- RAFT ---
- Howard, Heidi, and Richard Mortier. "Paxos vs Raft: Have we reached consensus on distributed consensus? ", Proceedings of the 7th Workshop on Principles and Practice of Consistency for Distributed Data. 2020
Failure Detectors
- D. Chandra and S. Toueg, "Unreliable Failure Detectors for Reliable Distributed Systems", Journal of ACM, 1985
- D. Chandra, V. Hadzilacos and S. Toueg, "The Weakest Failure Detector for Solving Consensus", Journal of ACM, 1996
- K. Aguilera, W. Chen, and S. Toueg, "Heartbeat: A Timeout-Free Failure Detector for Quiescent Reliable Communication", Cornel, 1997
Replication
- S. Sandhu and S. Zhou, "Cluster-based file replication in large-scale distributed systems", ACM SIGMETRICS, 1992
- Gray, P. Helland, P. Neil and D. Shasha , "The dangers of replication and a solution", ACM SIGMOD, 1996
Logging
- P. Sistla and J. L. Welch, "Efficient distributed recovery using message logging", ACM SIOPS, 1989
Middleware Frameworks (PRESENTATIONS):
Early Distributed Computing Frameworks and Object-based Middlewares:
- DCE
- The DCE security service, Hewlett-Packard Journal, 1995.
- CORBA specification, www.omg.org
- RT CORBA: Realt time CORBA
- Fault tolerance CORBA: A Fault Tolerance Framework for CORBA
- ZEN: Optimizing the ORB Core to Enhance Real-time CORBA Predictability and Performance
- Data Access and Integration: ODBC/JDBC
- Java Jini: "Architectural Overview", Sun Microsystems
- Java RMI: "Java RMI Tutorial"
- EJB: "Enterprise JavaBeans Technology", Sun Developer Network
- J2EE: "Overview", Sun Developer Network
Service Oriented Architectures and Web Services:
- Web services by M. Fisher
- .NET: "The .NET Framework"
- SOAP Web Service: (http://www.w3.org/TR/soap/)
- A comparison of SOAP and REST implementations of a service based interaction independence middleware framework
- SOAP-binQ: high-performance SOAP with continuous quality management
- SOA: Service-Oriented Computing: State of the Art and Research Challenges
- Restful Web-Service: Original work is done by Roy Fielding at UCI as his Ph.D thesis(http://roy.gbiv.com/vita.html)
- Principled design of the modern Web architecture
- ActiveMQ, Rabbit MQ
Cloud Computing, Mobile Cloud Computing Platforms:
- Giurgiu, O. Riva, D. Juric, I. Krivulev, and G. Alonso,Calling the Cloud: Enabling mobile phones as interfaces to cloud applications, Journal of ACM, 1985.
- Chun, S. Ihm, P. Maniatis, M. Naik, A. Patti,CloneCloud: Elastic Execution between Mobile Device and Cloud, To appear in Proceedings of the 6th European Conference on Computer Systems (EuroSys 2011), April 2011.
- Wen, W. Zhang,and H. Luo, "Energy Optimal Mobile Application Execution: Taming Resource-Poor Mobile Devices with Cloud Clones", In IEEE INFOCOM 2012.
- Michael P. Papazoglou, "Cloud Blueprints for Integrating and Managing Cloud Federations", In Springer Software Service and Application Engineering, 2012.
- Tobias Kurze, Markus Klemsy, David Bermbachy, Alexander Lenkz, Stefan Taiy and Marcel Kunze, "Cloud Federation".
- "Towards Characterizing Cloud Backend Workloads: Insights from Google Compute Clusters".
- "CloudNaaS: A Cloud Networking Platform for Enterprise Applications".
- "Effects of virtualization and cloud computing on data center networks".
- "The Case for Enterprise-Ready Virtual Private Clouds".
- What is (isn't) Google App Engine?, https://developers.google.com/appengine/training/intro/whatisgae
- Introducing Azure, http://azure.microsoft.com/en-us/documentation/articles/fundamentals-introduction-to-azure/
Big Data Systems:
- "The Hadoop Distributed File System: Architecture and Design".
- "MapReduce: Simplified Data Processing on Large Clusters".
- Yahoo! Hadoop Tutorial
- Zaharia, Matei, et al. "Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing.", 9th Symposium on Networked Systems Design and Implementation NSDI 2012 --- (SPARK)
- Apache Spark: A Unified Engine for
Big Data Processing
- Nishtala, Rajesh, et al. "Scaling memcache at facebook.", 10th Symposium on Networked Systems Design and Implementation NSDI 13.