“计算机性能让我着迷,”这项发现的第一作者之一、埃默里大学研究生张亚卓(音译)在访问瑞士时说。 张将于5月获得博士学位,她接受了苏黎世联邦理工学院的博士后研究金。 计算机科学家发明了一种高效但极其简单的算法,可以决定从网络缓存中丢弃哪些项目,为新项目腾出空间。 被称为sieve的新开源算法有可能大规模改变网络流量管理。 sieve是埃默里大学、卡内基梅隆大学和佩利坎基金会计算机科学家的联合项目。 该团队关于sieve的论文将于4月在加利福尼亚州圣克拉拉举行的第21届usenix网络系统设计与实现研讨会上发表。 这篇论文的预印本已经引起了轰动。 sieve成为黑客新闻的热门话题,也是颇具影响力的科技通讯《tldr》的专题报道,吸引了数万人访问sieve网站。 这篇论文的第一作者之一、埃默里大学博士生张亚卓说:“筛子比我们更大更大。”。 “它已经表现得很好了,但我们得到了很多好的建议,让它变得更好。 这就是开源世界的美妙之处。 张与杨俊成(jasonyang)是论文的第一作者,杨俊成在埃默里大学获得了计算机科学硕士学位,现在是卡内基梅隆大学的博士生。 埃默里大学计算机科学系副教授ymir vigfusson说:“sieve是对一种久经考验的缓存驱逐算法的简单改进,该算法已经使用了几十年,就像在计算世界中使用了几个世纪一样。”。 vigfusson和卡内基·梅隆计算机科学系副教授rashmi vinayak是这篇论文的共同高级作者。 来自培利坎基金会的计算机工程师姚跃也是一位合著者。 除了它的速度和有效性之外,引发人们对sieve兴趣的一个关键因素是它的简单性,使它具有可扩展性。 vigfusson说:“简单就是极致的精致。”。 “在一个设计为在几分之一秒内为数十亿人服务的系统中,这些部件越简单,就越容易有效地实施和维护该系统。 “保持‘热物’的便利性许多人都明白定期整理衣橱的价值。 从不使用的物品可以扔掉,很少使用的可以搬到阁楼或其他偏远的地方。 这使得最常见的磨损物品触手可及,因此可以快速找到,而无需四处翻找。 缓存就像一个组织良好的计算机数据储藏室。 缓存中充满了用户请求的最流行对象的副本,或者it术语中的“热门对象”。 缓存与计算机网络的主数据库分开维护这一小部分热门对象,主数据库就像一个巨大的仓库,里面装满了系统可以提供的所有信息。 缓存热对象可以使网络系统更高效地运行,快速响应用户的请求。 web应用程序可以有效地处理更多的流量,方法是弹出一个方便的壁橱来获取用户想要的大多数对象,而不是去仓库搜索每个请求的庞大数据库。 张说:“缓存无处不在。”。 “这对每一家使用web应用程序的公司来说都很重要,无论大小。 每个网站都需要一个缓存系统。 “然而,缓存在计算机科学领域的研究相对不足。 张为sieve设计的wondera标志,将较热的物体用红色描绘,较冷的物体用蓝色描绘。 张还为sieve设计了一个网站,包括一个演示其工作原理的运动图。 张在家乡广州的大学获得了本科和硕士学位,她开始主修软件工程。 “编写代码和制作网站很有趣,”她说,“但一旦你学会了如何做,就不会从根本上具有挑战性。”。 我想对技术的主干有更多的了解。 计算机性能使我着迷。 张向emory申请与vigfusson合作,因为他专注于计算机安全和缓存等基本主题,并且擅长简单地谈论这些主题。 她说:“让复杂的想法易于理解是很重要的。”。 反过来,vigfusson欣赏张如何以一种惊奇的感觉来处理棘手的问题。 他说:“她做科学是有正当理由的。”。 “她对探索的过程和穿越未知的边界感到高兴。 2016年,vigfusson获得了国家科学基金会教员早期职业发展计划(职业)拨款,用于探索缓存系统。 杨在读纪念硕士时就带头做了这个项目。 作为卡内基梅隆大学的博士生,杨继续与vigfusson合作,并在张2019年来到埃默里时帮助指导她。 缓存是如何工作的虽然缓存可以被认为是计算机的一个组织良好的壁橱,但当数以百万计的人,随着需求的不断变化,正在使用它时,很难知道这个壁橱里应该放什么。 高速缓存的快速内存运行起来很昂贵,但对web用户的良好体验至关重要。 目标是将最有用的、未来的信息保存在缓存中。 其他物体必须不断地被筛选出来,或者在技术术语中被“驱逐”,为不断变化的热物体阵列腾出空间。 缓存逐出算法决定抛出什么对象以及何时抛出。 fifo,或称“先进先出”,是20世纪60年代开发的一种经典驱逐算法。 想象一下传送带上排列的物体。 新请求的对象在左侧输入,最旧的对象到达右侧行的末尾时会被逐出。 在lru或“最近最少使用”算法中,对象也会沿着直线移动,最后被驱逐。 但是,如果一个物体在传送带上向下移动时再次被请求,它就会被移回生产线的头部。 驱逐算法有数百种变体,但为了提高效率,它们往往具有更大的复杂性。 这通常意味着它们是不透明的,需要很高的维护,尤其是在处理大量工作负载时。 张解释说:“如果一个算法非常复杂,它往往会有更多的错误,所有这些错误都需要修复。”。 sieve是一个简单的理想化lru和其他一些算法,它对基本的fifo方案进行了简单的调整。 sive最初将请求的对象标记为“零”。 如果物体在传送带上向下移动时再次被请求,其状态将变为“一。 当一个标记为“一”的对象到达行的末尾时,它会自动重置为“零”并被逐出。 指针或“移动的手”也会在物体沿着直线移动时扫描它们。 指针从线的末端开始,然后跳到头,在一个连续的圆圈中移动。 每当指针碰到标记为“零”的对象时,该对象就会被逐出。 张说:“尽快驱逐不受欢迎的物体很重要,而筛子在这项任务中速度很快。”。 除了这种对象的快速降级之外,sieve还设法以最小的计算工作量维护缓存中的流行对象,在计算机术语中被称为“惰性提升”。 研究人员认为,sieve是最简单的缓存驱逐算法,可以有效地实现快速降级和延迟升级。 较低的未命中率缓存的另一个目的是实现较低的不命中率——必须从“仓库”中提取的请求对象的一部分。 为了评估sieve,研究人员对meta、wikimedia、x和其他四个大型数据集的开源网络缓存痕迹进行了实验。 结果表明,sieve在超过45%的轨迹上实现了比九种最先进的算法更低的未命中率。 次优算法的未命中率仅为15%。 筛的简单易用引发了一个问题,为什么以前没有人想出这种方法。 张认为,sieve团队对近年来网络流量模式变化的关注可能产生了影响。 “例如,”她说,“新商品现在很快变得‘热门’,但也很快消失。”。 人们不断地对事物失去兴趣,因为新事物不断出现。 “web缓存工作负载往往遵循所谓的广义zipfian分布,其中一小部分对象占请求的很大比例。 sive可能已经达到了当前工作负载的最佳状态。 vigfusson说:“对于我们理解网络缓存驱逐来说,这显然是一个变革性的时刻。”。 “它改变了一个长期盲目使用的结构。 他指出,管理大量网络流量的大型公司正在进行调查,并补充道,“即使对网络缓存系统进行微小的改进,也可以在大型数据中心节省数百万美元。 张和杨有望在五月获得博士学位。 “他们正在做令人难以置信的工作,”vigfusson说。 “可以肯定地说,他们两人现在都是网络缓存驱逐方面的世界专家。 ”。 “computer performance fascinates me,” says emory graduate student yazhuo zhang, co-first author of the discovery, shown on a visit to switzerland. set to receive her phd in may, zhang accepted a post-doctoral fellowship at the federal institute of technology zurich (eth zurich).computer scientists have invented a highly effective, yet incredibly simple, algorithm to decide which items to toss from a web cache to make room for new ones. known as sieve, the new open-source algorithm holds the potential to transform the management of web traffic on a large scale.sieve is a joint project of computer scientists at emory university, carnegie mellon university and the pelikan foundation. the team’s paper on sieve will be presented at the 21st usenix symposium on networked systems design and implementation (nsdi) in santa clara, california, in april.a preprint of the paper is already making waves. sieve became a hot topic on hacker news and the subject of a feature in the influential tech newsletter tldr, driving tens of thousands of visits to the sieve website.“sieve is bigger and greater than just us,” says yazhuo zhang, an emory phd student and co-first author of the paper. “it is already performing well but we are getting a lot of good suggestions to make it even better. that’s the beauty of the open-source world.”zhang shares first authorship of the paper with juncheng (jason) yang, who received his master’s degree in computer science at emory and is now a phd candidate at carnegie mellon.“sieve is an easy improvement of a tried-and-true cache-eviction algorithm that’s been in use for decades which is literally like centuries in the world of computing,” says ymir vigfusson, associate professor in emory’s department of computer science.vigfusson is co-senior author of the paper, along with rashmi vinayak, an associate professor in carnegie mellon’s computer science department. yao yue, a computer engineer at the pelikan foundation, is also a co-author.in addition to its speed and effectiveness, a key factor sparking interest in sieve is its simplicity, lending it scalability.“simplicity is the ultimate sophistication,” vigfusson says. “the simpler the pieces are within a system designed to serve billions of people within a fraction of a second, the easier it is to efficiently implement and maintain that system.”keeping ‘hot objects’ handymany people understand the value of regularly reorganizing their clothing closet. items that are never used can be tossed and those that are rarely used can be moved to the attic or some other remote location. that leaves the items most commonly worn within easy reach so they can be found quickly, without rummaging around.a cache is like a well-organized closet for computer data. the cache is filled with copies of the most popular objects requested by users, or “hot objects” in it terminology. the cache maintains this small collection of hot objects separately from a computer network’s main database, which is like a vast warehouse filled with all the information that could be served by the system.caching hot objects allows a networked system to run more efficiently, rapidly responding to requests from users. a web application can effectively handle more traffic by popping into a handy closet to grab most of the objects users want rather than traveling down to the warehouse and searching through a massive database for each request.“caching is everywhere,” zhang says. “it’s important to every company, big or small, that is using web applications. every website needs a cache system.”and yet, caching is relatively understudied in the computer science field.a sense of wondera logo for sieve, created by zhang, portrays hotter objects in shades of red and colder ones in shades of blue. zhang also designed a web site for sieve, including a motion graphic demonstrating how it works.zhang, who received her undergraduate and master’s degrees at universities in her hometown of guangzhou, china, started off majoring in software engineering. “it’s fun to code and to make a website,” she says, “but it’s not fundamentally challenging once you learn how to do it. i wanted to gain more understanding of the backbone of technology. computer performance fascinates me.”zhang applied to emory to work with vigfusson, given his focus on fundamental topics such as computer security and caching, and his skill at talking about them in simple terms. “it’s important to make complex ideas easy to understand,” she says.in turn, vigfusson appreciates how zhang approaches intractable problems with a sense of wonder.“she’s doing science for all the right reasons,” he says. “she is delighted by the process of exploration and by traversing the frontiers of the unknown.”in 2016, vigfusson received a national science foundation faculty early career development program (career) grant to explore cache systems. yang took the lead on the project while he was an emory master’s student. as a phd student at carnegie mellon, yang continued to collaborate with vigfusson and helped to mentor zhang when she arrived at emory in 2019.how caching workswhile caching can be thought of as a well-organized closet for a computer, it is difficult to know what should go into that closet when millions of people, with constantly changing needs, are using it.the fast memory of the cache is expensive to run yet critical to a good experience for web users. the goal is to keep the most useful, future information within the cache. other objects must be continuously winnowed out, or “evicted” in tech terminology, to make room for the changing array of hot objects.cache-eviction algorithms determine what objects to toss and when to do so.fifo, or “first-in, first-out,” is a classic eviction algorithm developed in the 1960s. imagine objects lined up on a conveyor belt. newly requested objects enter on the left and the oldest objects get evicted when they reach the end of the line on the right.in the lru, or “least recently used,” algorithm the objects also move along the line toward eviction at the end. however, if an object is requested again while it moves down the conveyor belt, it gets moved back to the head of the line.hundreds of variations of eviction algorithms exist but they have tended to take on greater complexity to gain efficiency. that generally means they are opaque to reason about and require high maintenance, especially when dealing with massive workloads.“if an algorithm is very complicated, it tends to have more bugs, and all of those bugs need to be fixed,” zhang explains.a simple idealike lru and some other algorithms, sieve makes a simple tweak on the basic fifo scheme.sieve initially labels a requested object as a “zero.” if the object is requested again as it moves down the belt, its status changes to “one.” when an object labeled “one” makes it to the end of the line it is automatically reset to “zero” and evicted.a pointer, or “moving hand,” also scans the objects as they travel down the line. the pointer starts at the end of the line and then jumps to the head, moving in a continuous circle. anytime the pointer hits an object labeled “zero,” the object is evicted.“it’s important to evict unpopular objects as quickly as possible, and sieve is very fast at this task,” zhang says.in addition to this quick demotion of objects, sieve manages to maintain popular objects in the cache with minimal computational effort, known as “lazy promotion” in computer terminology. the researchers believe that sieve is the simplest cache-eviction algorithm to effectively achieve both quick demotion and lazy promotion.a lower miss ratiothe purpose of caching is to achieve a low miss ratio the fraction of requested objects that must be fetched from “the warehouse.”to evaluate sieve, the researchers conducted experiments on open-source web-cache traces from meta, wikimedia, x and four other large datasets. the results showed that sieve achieves a lower miss ratio than nine state-of-the-art algorithms on more than 45% of the traces. the next best algorithm has a lower miss ratio on only 15%.the ease and simplicity of sieve raises the question of why no one came up with the method before. the sieve team’s focus on how patterns of web traffic have changed in recent years may have made the difference, zhang theorizes.“for example,” she says, “new items now become ‘hot’ quickly but also disappear quickly. people continuously lose interest in things because new things keep coming up.”web-cache workloads tend to follow what are known as generalized zipfian distributions, where a small subset of objects account for a large proportion of requests. sieve may have hit a zipfian sweet spot for current workloads.“it is clearly a transformative moment for our understanding of web-cache eviction,” vigfusson says. “it changes a construct that’s been used blindly for so long.”marquee companies that manage massive amounts of web traffic are making inquiries, he notes, adding, “even a tiny improvement in a web-caching system can save millions of dollars at a major data center.”zhang and yang are on track to receive their phds in may.“they are doing incredible work,” vigfusson says. “it’s safe to say that both of them are now among the world experts on web-cache eviction.”.
埃默里大学留学推荐:
本文来源: 计算机科学家创造了一种简单的方法来加速缓存筛选