News 2018

December 2018

Whittaker Named Among Top Pittsburghers of All Time

Byron Spice

Pittsburgh magazine has named William "Red" Whittaker, the Fredkin University Professor of Robotics, as one of "The 50 Greatest Pittsburghers of All Time," putting him in the company of Andrew Carnegie, Rachel Carson, Jonas Salk, Fred Rogers and Chuck Noll. For the magazine's January 2019 issue, editors selected the 50 men and women from more than 200 years of Pittsburgh history, based on their contributions in fields ranging from sports to technology and on how they put the national spotlight on Pittsburgh. The issue coincides with the magazine's 50th year of publishing. Considered the father of field robotics, Whittaker's contributions propelled robots from research curiosities mostly found bolted to factory floors or relegated to laboratories to mobile, autonomous units capable of working outdoors in harsh and challenging environments. Field robotics incorporates innovations in controls, sensing, propulsion, suspension, communications and navigation to enable devices that can operate autonomously in changing and uncertain environments. After the nuclear reactor meltdown at Three Mile Island near Harrisburg, Pa., in 1979, Whittaker and his team developed robots to inspect the damaged reactor's basement and perform repairs. This work led to the formation of Carnegie Mellon's Field Robotics Center, which he continues to direct. His other innovations include Dante II, a walking robot that explored an active volcano; Nomad, which searched for meteorites in Antarctica; and Tugbot, which surveyed an 1,800-acre area of Nevada for buried hazards. He led the Tartan Racing team and its Boss self-driving vehicle to victory in the $2 million Defense Advanced Research Projects Agency Urban Challenge in 2007, creating the template for today's burgeoning autonomous vehicle industry. He also founded Astrobotic, a Pittsburgh company that plans to deliver payloads to the moon's surface. Another Pittsburgh company, RedZone Robotics, is using robots to revolutionize the inspection of sewer lines.

Thwarting Bias in AI Systems

Alexandra George

Artificial intelligence systems are at work in many areas where we might not realize — making decisions about credit, what ads to show us and which job applicants to hire. While these systems are really good at systematically combing through lots of data to detect patterns and optimize decisions, the biases held by humans can be transmitted to these systems through the training data.A team of researchers from Carnegie Mellon University — including CyLab's Anupam Datta, professor of electrical and computer engineering at CMU Silicon Valley; Matt Fredrikson, assistant professor of computer science; and Ph.D. student Samuel Yeom — are detecting what factors directly or indirectly affect decision outcomes and correcting them when they are used inappropriately.Bias often appears in AI systems through factors like race or gender that aren't directly inputted into the system, but still have a strong influence on their decisions. Discrimination can happen when one of these attributes is strongly correlated with information that is directly used by the system. For example, suppose a system that makes decisions about credit uses zip code as a factor to make its decisions. The direct information about race is not given to the system, but zip code is strongly correlated with race since many neighborhoods are still segregated. By using zip code, the system would be indirectly making decisions based on race. In this case, zip code is a proxy for race.  "If zip code is encoding race and is being used to make decisions about credit, then it's not a defensible proxy," said Datta. "That's what our method can uncover. It can look inside these machine learning models and discover proxies that are influential in the decisions of the model."To detect bias and repair algorithms that may be making inappropriate decisions, the researchers have developed detection algorithms that identify the variables in a system that may be exhibiting proxy use in an unfair way. The algorithm combs through the model to detect the variables that are correlated with a protected feature (like race, age, or gender) and heavily influence the decision outcome.The concept of proxy use in machine learning models was formally studied in an earlier paper by a Carnegie Mellon team including Datta, Fredrikson, Ko, Mardziel, and Sen. The first proxy detection algorithm they created was a slow, brute-force algorithm that works in the context of simple decision tree and random forest models, two classes of machine learning models. Most recently, Yeom, Datta, and Fredrikson developed an algorithm that works on linear regression models and scales to numerous applications where these kinds of models are used in the real world, detailed in a paper presented at NeurIPS 2018."Our recent results show that, in the case of linear regression, we can simply treat the input attributes as vectors in a low-dimensional space," said Yeom, "and this allows us to use an existing convex optimization technique to identify a proxy quickly."Once the algorithm has detected the influential variables, it shares them with a human domain expert who decides if the proxy is used in a way that is unjustified. To demonstrate that the algorithm works in practice, they ran it on a model used by a large police department to predict who is likely to be involved in a shooting incident. This model did not have any strong proxies, but gang membership was found to be a weak proxy for race and gender. A domain expert would then consult this information and decide whether it is a justified proxy. Their method is broadly applicable to machine learning models that are widely used in such high-stakes applications.Not all instances of proxy use are negative, either. For example, debt-to-income ratio is also strongly associated with race. But if debt-to-income ratio can separately be justified as a strong predictor of creditworthiness, then it is legitimate to use. That is why it is important to have a human domain expert be able to decide once the algorithm has flagged the proxy use.Detecting and correcting biases in AI systems is just the beginning. As AI systems continue to make important decisions for us, we need to make sure they are fair. Hiring tools, criminal justice systems, and insurance companies all use AI to make decisions, and many other domain areas are continuing to incorporate artificial intelligence in new ways."Being able to explain certain aspects of a model's predictions helps not only with identifying sources of bias, but also with recognizing decisions that may at first appear biased, but are ultimately justified," said Fredrikson. "We believe that this ability is essential when applying the approach to real applications, where the distinction between fair and unfair use of information is not always clear-cut."

CDC Says Carnegie Mellon's Flu Forecasts Once Again Most Accurate

CMU's Epidemiological Forecasting Systems Get Top Marks Four Years in a Row

Byron Spice

The U.S. Centers for Disease Control and Prevention has announced that Carnegie Mellon University's forecasts of national and regional influenza activity during the 2017-2018 flu season were the most accurate of the 30 systems in its flu forecasting initiative. Carnegie Mellon's Delphi Research Group has proven the most accurate four years in a row and for four of the five years that the CDC has run the forecasting initiative. Last season, 21 research groups participated, testing 30 different forecasting systems. "The CDC seems very happy with this entire initiative," said Roni Rosenfeld, Delphi leader and head of the Machine Learning Department. "Flu forecasts were actually used in official CDC communications for the first time in the 2017-2018 season." Delphi fielded two systems. The first uses machine learning to make predictions based on both past patterns and input from the CDC's domestic flu surveillance system. The second system bases its predictions on the judgments of human volunteers who submit their own weekly predictions — the so-called wisdom-of-crowds approach. CMU is using updated versions of both systems to forecast the current season, which thus far is off to a mercifully slow start. The forecasts are available through the Delphi website, and the group welcomes input from anyone who wants to contribute to the weekly crowdsourced forecasts. The flu forecasts by CMU and other participants suggest flu activity will rise in the next few weeks, but Rosenfeld said it's too soon to say when flu activity will peak or how high it might get. Unlike flu surveillance, which tracks flu activity based on reports of flu-like illnesses from physicians, flu forecasting attempts to look into the future, much like a weather forecast, so health officials can plan ahead. Delphi and the other forecasting groups make weekly predictions throughout the flu season. The CDC combines those efforts in its FluSight Network, with forecasts available online. In addition to CMU's Delphi team, the CDC's forecasting initiative includes groups from entities such as Columbia University, Los Alamos National Laboratory and the University of Massachusetts-Amherst. Initially, the CDC asked participants to forecast flu-related visits to physician offices in each of 10 regions in the nation. But last year, the CDC added two additional forecasting challenges — flu-related visits to physician offices for each state and flu-related hospitalization nationwide for each of five age groups. Delphi co-leader Ryan Tibshirani, associate professor of statistics and machine learning, said forecasting by state is important because flu activity can be very local, so wide differences are possible even within a region. Forecasting hospitalization is important, he added, because some strains of flu can result in more severe illness than others, so flu activity levels in hospitals can differ markedly from activity levels in doctors' offices. CMU participated in all three challenges, submitting the most accurate forecasts for each one. The machine learning method proved most accurate for the state-by-state and hospitalization forecasts, while the wisdom-of-crowds method was most accurate for regional forecasts. "Virtually all members of the Delphi group contributed to our success, with theory, algorithmic development, implementation and weekly participation in the crowdsourcing system," Rosenfeld said. "Special kudos to the students who oversaw the competition weekly, if not daily, during the unusually long flu season." Those students included Logan Brooks, a Ph.D. student in the Computer Science Department; Aaron Rumack, a Ph.D. student in machine learning; and Jiaxian Sheng, a senior computer science major. The Delphi group unites students and faculty from the Machine Learning, Statistics and Data Science, Computer Science and Computational Biology departments. The forecasting efforts are supported by the Machine Learning for Good fund established at the School of Computer Science by Uptake, the Defense Threat Reduction Agency, and the National Institute of General Medical Sciences' Models of Infectious Disease Agency Study (MIDAS).

SCS Professors Reimagine What It Takes To Code

Aisha Rashid (DC 2019)

David Kosbie and Mark Stehlik believe anyone can code. As course instructors for Principles of Computing — better known to Carnegie Mellon University students by its course number, 15-110 — that belief comes in handy. One of two introductory courses offered in the School of Computer Science, 15-110 covers programming constructs along with history and current events in computer science, tailored to students with little to no computer science background. This fall semester, Kosbie and Stehlik switched up elements of the course, with the goal of enhancing students' experiences in 15-110. While the course provided students with the necessary tools and resources they'd need as novice coders, the instructors wanted to take it to the next level and showcase it to a wider audience. Using creative approaches and techniques inside and outside of the classroom, they hope to transform what might at first seem like complicated principles of computer science into collaborative and interactive problem-solving tools applicable to any and all fields of study. Their mission wasn't without challenges — the first being that their course is mandatory for many CMU students. "Most of the students taking 15-110 aren't there because they want to be, they're there because someone chose for them to be there," said Kosbie, an associate teaching professor in SCS. "Maybe they didn't come in kicking and screaming, and maybe they're okay with it, but nonetheless, they may not see this as a future. They may see this as a check in whatever box they have to check." Kosbie and Stehlik also know they have to manage their students' expectations. Often, new students may not be the most receptive to the basics of computer science theory and simple coding exercises. "Earlier in the course, you don't have the computational tools to do grandiose stuff," said Stehlik, teaching professor of computer science and assistant dean for outreach. "When students think about graphics, their thoughts immediately go to things like video games, like 'Halo,' or 'Fortnite.' But you have to start small, and a lot of the times when you start small, you start simple both in the domain and in the coding techniques." The instructors also recognize that they must tailor the course to a primarily freshman audience, as 15-110 is often taken by CMU students during their first two semesters. "Courses designed for a predominantly freshman audience have responsibilities to move students from a high school mindset to a Carnegie Mellon mindset, and that's true regardless of the course," Stehlik said. "It gets students to understand what it means to be in this environment, and what it means four years from now, to be a graduate from this environment." So striking a balance between keeping the class engaging for students while ensuring legitimate CMU-quality outcomes is one of the chief focuses Kosbie and Stehlik have for the course. How, then, are the duo re-envisioning 15-110 to meet these goals? Team homeworks, which are collaborative weekly group homework sessions led by a teaching assistant, make the content more approachable. With this addition, students participate in an extra two hours outside of lecture and watch interactive videos, practice coding challenges to prepare for the upcoming week, and even have opportunities to conduct research on the role of computer science in current events. This enables them to more qualitatively discuss the course material in a collaborative learning environment. Kosbie and Stehlik have also incorporated more guest lectures from CMU faculty as a part of the curriculum. Guest lecturers this semester included Distinguished Career Professor of Computer Science Lenore Blum, Bruce Nelson Professor of Computer Science Manuel Blum, Department Head and L.L. Thurstone Professor of Philosophy and Psychology David Danks, Angel Jordan Professor of Computer Science Tuomas Sandholm, Computer Science Professor Roger Dannenberg and Art Professor Golan Levin. "The guest lecturers give students a sense of a much bigger picture," Stehlik said. "When I taught 15-110 in the spring of 2017, we had one guest lecturer the entire semester. So this, in a sense, is opening up that box a lot wider to illustrate these concepts less from our own knowledge and more from the experts at CMU who are immersed in that knowledge." Finally, to ensure students are engaged with the content of the course as much as possible, they've also incorporated a new term project. This final project allows students to fuse their technical and creative skills and create an interactive game or activity about computer science in order to both show off what they've learned throughout the semester and educate their audience. The project's end goal is to help students better understand how the core technical skills they learned all semester can fundamentally change how they approach problem-solving, regardless of their profession. With these additions to the 15-110 curriculum, Kosbie and Stehlik both set a very clear desire to ensure students get the most out of their exposure to computer science. "I'd like the biggest takeaway of the course to be that students not be afraid of coding and understand that it's very relevant to their future careers," Stehlik said. "We want to entice students who think that they may not be able to do this to believe that they can, and then to continue to do so."

RFID Tag Arrays Track Body Movements, Shape Changes

Washable, Battery-Free Tags Could Be Cheaply Embedded in Clothing

Byron Spice

Carnegie Mellon University researchers have found ways to track body movements and detect shape changes using arrays of radio-frequency identification (RFID) tags. RFID-embedded clothing thus could be used to control avatars in video games — much like in the movie "Ready Player One." Embedded clothing could also tell you when you should sit up straight— much like your mother.RFID tags are nothing new, which is part of their appeal for these applications, said Haojian Jin, a Ph.D. student in CMU's Human-Computer Interaction Institute (HCII). They are cheap, battery-free and washable.What's new is the method that Jin and his colleagues devised for tracking the tags, and monitoring movements and shapes. RFID tags reflect certain radio frequencies. It would be possible, but not practical, to use multiple antennae to track this backscatter and triangulate the locations of the tags. Rather, the CMU researchers showed they could use a single, mobile antenna to monitor an array of tags without any prior calibration.Just how this works varies based on whether the tags are used to track the body's skeletal positions or to track changes in shape. For body-movement tracking, arrays of RFID tags are positioned on either side of the knee, elbow or other joints. By keeping track of the ever-so-slight differences in when the backscattered radio signals from each tag reach the antenna, it's possible to calculate the angle of bend in a joint."By attaching these paper-like RFID tags to clothing, we were able to demonstrate millimeter accuracy in skeletal tracking," Jin said.The researchers call this embedded clothing RF-Wear and described it earlier this year at the UbiComp 2018 conference in Singapore. It could be an alternative to systems such as Kinect, which use a camera to track body movements and can only work when the person is in the camera's line of sight. It also could be an alternative to existing wearables, which generally depend on inertial sensors that are expensive, difficult to maintain and power hungry, Jin said.RFID-embedded clothes might also be an alternative to wrist-worn devices, such as Fitbit, for activity tracking or sports training.The technology for monitoring changes in curves or shapes, called WiSh (for Wireless Shape-aware world), also uses arrays of RFIDs and a single antenna, but relies on a more sophisticated algorithm for interpreting the backscattered signals to infer the shape of a surface.WiSh was presented earlier this year at Mobisys, the International Conference on Mobile Systems, Applications and Services, in Munich, by Jin and Jingxian Wang, a Ph.D. student in CMU's Electrical and Computer Engineering (ECE) Department. It could be incorporated into smart fabrics and used to track a user's posture. It could also be embedded in a variety of objects."We can turn any soft surface in the environment into a touch screen," Wang said. Smart carpets, for instance, could detect the presence and locations of people, or be used to control games or devices. Soft toys could respond to or otherwise register squeezes and bends. Smart pillows might help track sleep quality.WiSh also could be used to monitor the structural health of bridges or other infrastructure, Wang noted. The researchers measured the curvature of Pittsburgh's 10th Street Bridge by using a robot to drag a string of 50 RFID tags along the bridge's sidewalk."We're really changing the way people are thinking about RF sensing," Jin added."Weaving these tags into clothing will only add a minimal cost, under $1," Jin said. The most expensive part of these measurement systems is the antenna. But smartphones already use 13 MHz antennas for services such as Apple Pay. Adding a 900 MHz antenna for RFID-related applications might be feasible in future smartphones, eliminating the need for a separate device, he suggested.In addition to Jin and Wang, the research team included Jason Hong, associate professor in the HCII; Swarun Kumar, assistant professor of ECE; and Zhijian Yang of Tsinghua University in China. The National Science Foundation and Google provided support for this research.

Autism Risk-Factors Identified in 'Dark Matter' of Human Genome

Abby Simmons

Using cutting-edge statistical models to analyze data from nearly 2,000 families with an autistic child, a multi-institute research team discovered tens of thousands of rare mutations in noncoding DNA sequences and assessed if these contribute to autism spectrum disorder. Published Dec. 14 in the journal Science, the study is the largest to date for whole-genome sequencing in autism. It included 1,902 families comprising both biological parents, a child affected with autism and an unaffected sibling.     Scientists representing Carnegie Mellon University; the University of California, San Francisco; the University of Pittsburgh School of Medicine; Massachusetts General Hospital; Harvard Medical School and the Broad Institute led the research team. The study is one of 13 being released Dec. 14 as part of the first round of results to emerge from the National Institute of Mental Health's PsychENCODE consortium — a nationwide research effort that seeks to decipher how noncoding DNA, often referred to as the 'dark matter' of the human genome, contributes to psychiatric diseases such as autism, bipolar disorder and schizophrenia. Over the past decade, scientists have identified dozens of genes associated with autism by studying so-called "de novo" mutations — newly arising changes to the genome found in children but not their parents. To date, most de novo mutations linked to autism have been found in protein-coding genes. It has proven far more difficult for scientists to identify autism-associated mutations in noncoding regions of the genome. "Protein-coding genes clearly play an important role in human disorders like autism, yet their expression is regulated by the 'noncoding' genome, which covers the remaining 98.5 percent of the genome and remains somewhat mysterious," said Carnegie Mellon's Kathryn Roeder, corresponding author and UPMC Professor of Statistics and Life Sciences in the Statistics and Data Science and Computational Biology departments. "Because the genome comprises 3 billion nucleotides, identifying which portions of the noncoding genome, when mutated, enhance the risk of autism is as challenging as looking for a needle in a haystack." Using a novel bioinformatics framework, the researchers were able to compress the search from billions of nucleotides to tens of thousands of functional categories that potentially contribute to autism. Working with these categories, they used machine learning tools to build statistical models to predict autism risk from a subset of the families in the study. They then applied this model to an independent set of families and successfully predicted patterns of risk in the noncoding genome. Though rare de novo mutations were found in many noncoding regions of the genome, the strongest signals arose from promoters — noncoding DNA sequences that control gene transcription. These risk-conferring promoters were most often located far from the genes under their control. They were also found to be largely conserved across species, suggesting that any rare mutations that might arise in these promoters are more likely to disrupt normal biology. "For years, scientists have used genome-wide studies to find common variants that confer disease risk. Our group has now focused on creating a computational framework that's capable of finding rare, high-impact variants associated with a human disorder, looking across all the noncoding regions of the genome," said Stephan Sanders, corresponding author and professor of psychiatry at the UCSF Weill Institute for Neurosciences and Institute for Human Genetics. The team's findings have practical implications for future research on model organisms, like mice, as attempts are made to move toward genetically informed therapies for autism. But the value of studying the noncoding genome extends well beyond autism. "We were particularly interested in the elements of the genome that regulate when, where and to what degree genes are transcribed. Understanding this noncoding sequence could provide insights into a variety of human disorders," said Bernie Devlin, corresponding author and professor of psychiatry at the University of Pittsburgh School of Medicine. "We are just scratching the surface of what there is to learn about noncoding regulatory variation in human disease, and the new methods this team has developed will catalyze an important step forward into larger and more comprehensive studies," said Michael Talkowski of Massachusetts General Hospital, Harvard Medical School and the Broad Institute, who also served as corresponding author on the study. Lead authors on the paper are Joon-Yong An and Donna Werling of the UCSF Weill Institute for Neurosciences, and Kevin Lin and Lingxue Zhu of CMU's Department of Statistics and Data Science. The National Institutes of Health, the Simons Foundation Autism Research Initiative and the Broad Institute's Stanley Center for Psychiatric Research provided funding for this research.  

Alumna Q&A: Alexandra Johnson

Susie Cribbs

Carnegie Mellon University doesn't always consider itself cool. But this year, Seventeen magazine begged to differ, naming CMU one of its 2018 "Cool Schools." Their reasons? Our gender parity in STEM fields and strong community of female coders. One person who has helped make CMU cool is Alexandra Johnson (CS 2014). The Washington state native played integral roles in strengthening programs like SCS Day and Women @ SCS, and worked hard to improve multiple areas of campus life. And she did it all while earning her bachelor's in computer science; completing internships at Duolingo, Facebook and Rent the Runway; being active in Greek life; and serving as an undergraduate teaching assistant. Alexandra works at a startup in San Francisco now, but was on campus earlier this semester for a Women @ SCS panel discussion. We caught up with her to learn more about her time at CMU, how organizations like Women @ SCS transformed her CMU experience, what she's doing now and what her plans are for the future. Washington state's a long way from Pittsburgh. How did you end up at Carnegie Mellon? I'd known for a long time that I wanted to major in computer science. I liked math, and my parents said I should work in startups in Silicon Valley. I always had it in my head that I wanted to major in computer science, so I only applied to schools that I knew were top programs in CS. Carnegie Mellon was far and away the one with the best resources. Once you arrived on campus, how did you feel about SCS? SCS gives you enough theory to really impress interviewers, and so you get great internships right off the bat. And you get practical experience. Spending four years understanding the theory behind why we do something and what it means for code to run a certain way and what it means for code to be modular — I think that's actually important. Because then you can drop into any situation and you can learn any language and any new paradigm. SCS teaches you how to learn about programming. Did anything at CMU disappoint you? I tried to help create the experience I felt was missing by getting involved in clubs and activities. I ran SCS Day for three years. I was also really involved in Women @ SCS, which is such a valuable group for SCS. Another student and I put a lot of effort into making sure there were always upperclassmen at the freshmen events, getting people to come to the meetings, letting people know how they could get involved. Both SCS Day and Women @ SCS have sustained a lot of momentum in the years after my peers and I left CMU, and I'm so proud of the work the students are doing today. What are you doing now? I work at SigOpt. We provide an API that can help you fine-tune the hyperparameters of your machine learning models. I do a lot of the full stack engineering, a lot of the process to make the building of the models repeatable. I just finished development on a big machine learning platform project called Orchestrate, which contained a lot of exciting technical and logistical challenges. What did you do at CMU that best trained you for your job at SigOpt? Certainly Operating Systems was a great crash course in "manage a project over four months and don't hit any deadlines." Doing that project and having a partner on it taught me a lot. When I was at work and I had a chance to manage a project as the tech lead, I gave myself generous deadlines, all of which I hit. I also kept some of the ethic of that class. OS taught me that sometimes, in a large project with many small, moving pieces, the most important thing is that the project gets done. How do you feel about California? There's a lot of energy in San Francisco. I organize a meetup now, Women in Machine Learning and Data Science. It's an extension of the work I did with Women @ SCS and SCS Day. How do you find women in tech? How does it compare to being a woman in SCS? I definitely think SCS is probably the best place to go as a female undergrad. I don't think I've found something that strong anywhere else. It was just so great. I really wanted to be involved in the community and there WAS a community to be involved in. Some of that extends out to the Bay Area, but it's different, because you're not all on campus together. Let's talk about your time at CMU. What are some of your favorite memories? Some of my best computer-science related memories were when I was taking Operating Systems and we would sneak into the conference rooms in the upper levels of the Gates-Hillman Centers that overlook Pittsburgh and get a really good sunrise view. It was in the fall semester, so everything was snowy. Those times when we were doing that, but we were all together — they were great. My favorite class at CMU was actually not in SCS. It was the History of Clothing in the School of Drama, taught by Barbara Anderson. It was a class at the apex of what it should be. I couldn't have gone to another school and majored in computer science and gotten that same experience. That's an experience unique to CMU — that I majored in computer science and literally walked across the Pausch Bridge every day to the School of Drama and took my class from their costume expert. You've been out of school for a little more than four years. What's your career plan? I thought I had a plan when I graduated and I have less of a plan now. I'm learning that life and career doesn't work in five- or 10-year plans. If you'd asked me this eight months ago, I would have said I want to make a really great technical contribution to my company, which I felt like I hadn't necessarily made at the time. Now, I feel like I've worked on something that is interesting and my plans are to just keep working on that. It reminds me of what I like about tech. My work allowed me to learn about a piece of technology that's relatively newish in the industry, and I want to keep working with and learning about it. I want to write a couple blog posts. I want to give a couple talks. And then I'll see where I am. What advice would you give prospective students applying to SCS? I would tell them to really go look at the curriculum. Look at the advantages. Look at things like if they want to start programming right off the bat. If they do, it's a great place. Is being around a lot of other really intelligent students who want to talk about computer science and think about computer science and breathe computer science — is having that really important to them? Because SCS is the place where you can have that.

Parrot Genome Analysis Reveals Insights Into Longevity, Cognition

Genome of blue-fronted Amazon parrot compared with 30 other bird species

Byron Spice

Parrots are famously talkative, and a blue-fronted Amazon parrot named Moises – or at least its genome – is telling scientists volumes about the longevity and highly developed cognitive abilities that give parrots so much in common with humans. Perhaps someday, it will also provide clues about how parrots learn to vocalize so well. Morgan Wirthlin, a BrainHub post-doctoral fellow in Carnegie Mellon University’s Computational Biology Department and first author of a report to appear in the Dec. 17 issue of the journal Current Biology, said she and her colleagues sequenced the genome of the blue-fronted Amazon and used it to perform the first comparative study of parrot genomes. By comparing the blue-fronted Amazon with 30 other long- and short-lived birds — including four additional parrot species — she and colleagues at Oregon Health and Science University (OHSU), the Federal University of Rio de Janeiro and other entities identified a suite of genes previously not known to play a role in longevity that deserve further study. They also identified genes associated with longevity in fruit flies and worms. ''In many cases, this is the first time we’ve connected those genes to longevity in vertebrates,'' she said. Wirthlin, who began the study while a Ph.D. student in behavioral neuroscience at OHSU, said parrots are known to live up to 90 years in captivity — a lifespan that would be equivalent to hundreds of years for humans. The genes associated with longevity include telomerase, responsible for DNA repair of telomeres (the ends of chromosomes), which are known to shorten with age. Changes in these DNA repair genes can potentially turn cells malignant. The researchers have found evidence that changes in the DNA repair genes of long-lived birds appear to be balanced with changes in genes that control cell proliferation and cancer. The researchers also discovered changes in gene-regulating regions of the genome — which seem to be parrot-specific — that were situated near genes associated with neural development. Those same genes are also linked with cognitive abilities in humans, suggesting that both humans and parrots evolved similar methods for developing higher cognitive abilities. ''Unfortunately, we didn’t find as many speech-related changes as I had hoped,'' said Wirthlin, whose research is focused on the evolution of vocal behaviors, including speech. Animals that learn songs or speech are relatively rare — parrots, hummingbirds, songbirds, whales, dolphins, seals and bats — which makes them particularly interesting to scientists, such as Wirthlin, who hope to gain a better understanding of how humans evolved this capacity. ''If you’re just analyzing genes, you hit the end of the road pretty quickly,'' she said. That’s because learned speech behaviors are thought be more of a function of gene regulation than of changes in genes themselves. Doing comparative studies of these ''non-coding'' regulatory regions, she added, is difficult, but she and Andreas Pfenning, assistant professor of computational biology, are working on the computational and experimental techniques that may someday reveal more of their secrets. This work was supported through the Brazilian Avian Genome Consortium and by the National Institutes of Health/National Institute on Deafness and Other Communication Disorders. See coverage of this research by New Scientist.    

Bible Readings Help Create New Multilingual Dataset

New Resource Can Be Used To Build Text-to-Speech Systems for Hundreds of Languages

Byron Spice

It's the Christmas season, which means that beloved Bible verses are being read and recited innumerable times — and in a vast number of languages. The Bible's global reach as evidenced this time of year has enabled a Carnegie Mellon University professor to create a language resource that could enhance communication in hundreds of languages. By tapping online text and audio recordings of the New Testament in more than 700 languages, Alan Black, a professor in CMU's Language Technologies Institute, has created a dataset that can be used to build text-to-speech computer systems and other modern speech technologies for so-called low-resource languages. These languages, such as Kaqchikel in central Guatemala, Lun Bawang of Malaysia and Indonesia, and Mamprusi in northern Ghana, often are spoken by relatively small groups of people and generally lack the kind of technological tools for recognizing or translating language that are routinely available for high-resource languages such as English, Spanish or Mandarin Chinese. Black said it generally isn't profitable to build such systems — or often even basic tools such as dictionaries or pronunciation guides — for low-resource languages. But that never mattered to Christian missionaries, he added. "They don't care about commercial aspects," Black explained. "They care about the Word." In many cases, what few resources exist for these languages are the work of missionaries. "I suspect that for some of these languages these are the only written texts that exist." Black was able to tap one of those evangelical resources — an online service called Bible.is that provides recordings of the New Testament in more than a thousand languages — to create what he calls the CMU Wilderness Multilingual Speech Dataset. This dataset, available for free download online via GitHub, includes audio, word pronunciations and other tools necessary to build text-to-speech systems. From Bible.is, Black downloaded recordings of more than 700 languages for which both audio and text were available. That represents about 10 percent of the world's languages, he noted. "They are languages that missionaries would care about," Black said, including those spoken in areas such as Central and South America, West and East Africa, and Southeast Asia. He then set about aligning the text with the audio, determining which words in the text corresponded with spoken words. By so doing, he was able to establish pronunciation rules that make it possible to vocalize any word in that language, not just those included in the Bible. To make those alignments, Black and his CMU students were aided by the similar spelling and pronunciations across languages of three Hebrew names — Jesus, David and Abraham — and the first verse of the Book of Matthew: "The book of the genealogy of Jesus Christ, the son of David, the son of Abraham." "I now probably know that first sentence in Matthew better than anyone else," Black added. A computer program that makes a best guess at pronunciation helps create an initial alignment of text and audio. This first attempt often is incomprehensible, Black noted, but a machine learning program then analyzes the alignment and fine-tunes it. Thus far, he and his students have completed alignments for 600 of the languages and hope to finish the remaining, more troublesome languages soon. In some cases, poor quality recordings, misidentified languages and unrecognizable writing systems have thwarted their efforts. Development of the dataset was an outgrowth of a Defense Advanced Research Projects Agency program called Lorelei, which sought ways to develop speech recognition tools for low-resource languages within a matter of hours or days. Such tools would be useful, for instance, in responding to epidemic outbreaks or other humanitarian crises. Rather than build such tools on demand — which requires intensive work — Black worked to identify existing resources, such as Bible.is, that could be tapped to create these tools inexpensively in advance. He and his students have demonstrated that tools such as a speech synthesizer can indeed be created using the Wilderness dataset. Tools for processing and translating speech are particularly important for low-resource languages because many of their speakers are illiterate, Black explained. The dataset also should be useful for linguists, he added, noting it makes it possible to do studies of how languages vary across the planet. For instance, the dataset includes about 100 languages from the Amazon basin, enabling studies of how words are formed and how they relate to words in other languages.

Hodgins Named 2018 ACM Fellow

Byron Spice

The Association for Computing Machinery (ACM) has named Jessica Hodgins, professor of robotics and computer science, one of 56 new ACM fellows honored for their significant contributions to computer science. Hodgins, who leads the Facebook AI Research lab in Pittsburgh in addition to her faculty duties, was cited by the ACM for her contributions to character animation, human simulation and humanoid robotics. Hodgins's research focuses on computer graphics, animation and robotics with an emphasis on generating and analyzing human motion. Formerly vice president for research at Disney Research, she last year was elected president of the ACM's Special Interest Group on Computer Graphics and Interactive Techniques (SIGGRAPH). She has received numerous awards, including SIGGRAPH's Steven Anson Coons Award for Outstanding Creative Contributions to Computer Graphics. She received her Ph.D. in computer science at CMU in 1989. Former CMU professor Bruce Maggs, now a professor of computer science at Duke University, also was named an ACM fellow "for contributions to the development of content distribution networks and the theory of computer networks." ACM will formally recognize its 2018 fellows at its annual awards banquet, to be held June 15 in San Francisco. Additional information about new and current ACM fellows is available on the ACM website.

Three SCS Faculty Members Named 2019 IEEE Fellows

Byron Spice

Three School of Computer Science faculty members — Venkatesan Guruswami, Mor Harchol-Balter and Eric Xing — have been elevated to fellows in the Institute of Electrical and Electronics Engineers (IEEE), the world's largest technical professional organization. Fellow status is a distinction reserved for select members who have demonstrated extraordinary accomplishments in an IEEE field of interest. Guruswami, a professor in the Computer Science Department (CSD), was cited "for contributions to list error-correction and algorithmic coding theory." His research spans a number of topics in theoretical computer science, including the theory of error-correcting codes, probabilistically checkable proofs, computational complexity theory and algebraic algorithms. He joined the CMU faculty in 2009. Harchol-Balter, a professor in CSD since 1999, was cited "for contributions to performance analysis and design of computer systems." Her work on designing new resource-allocation policies includes load-balancing policies, power-management policies and scheduling policies for distributed systems. She is heavily involved in the SIGMETRICS/PERFORMANCE research community and is the author of a popular textbook, "Performance Analysis and Design of Computer Systems." Xing, a professor in the Machine Learning Department since 2004, was cited "for contributions to machine learning algorithms and systems." His research interests lie in machine learning, computational biology and statistical methodology. He and his collaborators have developed a framework called Petuum for distributed machine learning with massive data, big models and a wide spectrum of algorithms. Petuum is now established as a company, of which Xing is founder, CEO and chief scientist. The total number of fellows selected in any one year cannot exceed one-tenth of one percent of the total voting IEEE membership. A complete list of the Class of 2019 is available on the IEEE site.

SCS Master's Student Named Schwarzman Scholar

Susie Cribbs

School of Computer Science master's student Hima Tammineedi has been named to the 2020 class of Schwarzman Scholars, a highly competitive graduate fellowship inspired by the Rhodes Scholarships that features one year of study at Tsinghua University in China. Launched in 2016, the Schwarzman Scholars program prepares future global leaders to meet the geopolitical challenges of the 21st century. During their year of study, the world's best young minds explore the economic, political and cultural factors that have contributed to China's growth as a global power. Tammineedi is the second CMU student to be named a Schwarzman Scholar. Chrystal Thomas, who graduated from the Mellon College of Science, earned the award in 2016. "The program brings the world's future leaders to China because it is and will continue to be one of the largest factors influencing the future of the world — politically, economically and technologically," said Tammineedi, who earned his bachelor's degree in computer science with a minor in machine learning from CMU this past May. "Having an understanding of the country will be essential in order to be a global leader. This directly aligned with my interests for my future, and given that I already had a big interest in China, it's the perfect opportunity for me." Tammineedi is one of 147 students worldwide selected for the program. More than 2,800 students applied for the fellowship, which begins in August. "Our newest class includes a diverse group of future leaders from around the world," said Stephen A. Schwarzman, co-founder and CEO of Blackstone and chair of Schwarzman Scholars. "They join a global network of scholars who have committed themselves to being a force for change, regardless of where their professional or personal passions take them. My hope is that a year in Beijing will inspire and challenge these students in ways they haven't even imagined. I look forward to seeing how this new class will leave its mark." Schwarzman Scholars spend a year in Beijing, where they earn a master's degree in global affairs from Tsinghua's Schwarzman College. In addition to taking classes in the core curriculum, scholars pursue an individually designed concentration in public policy, international studies, or economics and business. Outside the classroom, scholars gain exposure to China through internships, mentorship opportunities, special speakers and travel. Tammineedi, who will earn a master's degree in machine learning in May 2019, will study public policy as a Schwarzman Scholar. He first became interested in the scholarship when it caught his eye on the website for CMU's Fellowships and Scholarships Office. He then worked with Richelle Bernazzoli, the office's assistant director, to complete the grueling application process that included in-person interviews. "Hima is a budding expert in artificial intelligence whose interests span machine learning, transportation and urban issues. It was clear from our first meeting that he has been on an impressive trajectory since high school," Bernazzoli said. "Every step of the way, Hima has approached his work with humility, thoughtfulness and a sense of responsibility to society. These qualities will make him an excellent Schwarzman Scholar and leader in his field for years to come." While Tammineedi said he's excited to learn from global leaders and the strong group of peers that will surround him during his scholar experience, he's also looking forward to expanding his education beyond computer science. "I've studied computer science and machine learning at CMU, and that's where a lot of my interests lie. But my ultimate goals involve effecting change in the world at a large level," he said. "While I do believe that a deep understanding of tech could make this change possible, I don't think just knowing tech can lead to the best changes. I want to develop a better understanding of how the world and its countries and governments function in order to understand the best ways technology can help." For more about this year's class of Schwarzman Scholars, visit the organization's website.