Research is integral to the Personal Health Informatics doctoral degree, as students and faculty work together to find impactful solutions to today’s health and wellness challenges. The transition from centralized, provider-centric treatment to personalized, patient-focused care requires researchers to cross the boundaries of public health, medicine, and social and computer sciences. which in turn prepares students for excellence in a specific research area of personal health informatics. It is this interdisciplinary experience that sets the program apart.
Research is integral to the Personal Health Informatics doctoral degree, as students and faculty work together to find impactful solutions to today’s health and wellness challenges. The transition from centralized, provider-centric treatment to personalized, patient-focused care requires researchers to cross the boundaries of public health, medicine, and social and computer sciences. which in turn prepares students for excellence in a specific research area of personal health informatics. It is this interdisciplinary experience that sets the program apart.
Research is integral to the Personal Health Informatics doctoral degree, as students and faculty work together to find impactful solutions to today’s health and wellness challenges. The transition from centralized, provider-centric treatment to personalized, patient-focused care requires researchers to cross the boundaries of public health, medicine, and social and computer sciences. which in turn prepares students for excellence in a specific research area of personal health informatics. It is this interdisciplinary experience that sets the program apart.
Research is integral to the Personal Health Informatics doctoral degree, as students and faculty work together to find impactful solutions to today’s health and wellness challenges. The transition from centralized, provider-centric treatment to personalized, patient-focused care requires researchers to cross the boundaries of public health, medicine, and social and computer sciences. which in turn prepares students for excellence in a specific research area of personal health informatics. It is this interdisciplinary experience that sets the program apart.
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Aliquam at nulla vitae ipsum convallis euismod sit amet nec arcu. Ut porttitor ex ipsum, a consequat elit imperdiet non. Maecenas velit lacus, semper at elementum sit amet, semper non odio. Aenean congue justo ac leo porta sollicitudin. Ut et diam in elit suscipit semper. Nullam risus neque, elementum vel sapien a, sollicitudin sollicitudin lectus. Praesent ac ipsum ullamcorper leo rutrum egestas ut eget ex. Quisque sed erat ipsum. Integer a congue ante, nec egestas ligula. Vestibulum molestie risus at mi mattis malesuada. Nulla non dolor non justo fermentum feugiat. Etiam ante tellus, mollis vel urna sed, vestibulum facilisis erat. Proin leo purus, laoreet a arcu sed, pellentesque fringilla nibh.
Interdum et malesuada fames ac ante ipsum primis in faucibus. Proin libero sem, lobortis in orci varius, aliquet hendrerit metus. Donec odio lectus, eleifend eget posuere id, sagittis sit amet erat. Proin vulputate ipsum lectus, a egestas magna dapibus id. Fusce rutrum viverra consequat. Integer eget nisi ultrices, auctor tellus non, convallis arcu. Phasellus gravida leo at pellentesque dapibus. Pellentesque nisl diam, tempus ut nisl at, viverra gravida magna. Ut interdum at velit convallis dapibus. Mauris turpis ligula, pulvinar in eros in, faucibus faucibus metus.
Maecenas laoreet porta cursus. In vulputate elementum ex vel venenatis. Aenean accumsan et neque non porttitor. Nullam eget porttitor elit, id convallis leo. Integer ornare cursus nisi, ac vestibulum ligula. Quisque dignissim quam eu turpis volutpat, quis dapibus ante sagittis. Nunc scelerisque quis lacus ac facilisis. Ut porta rhoncus molestie.
This work investigates new models of cloud computing that combine domain-targeted languages with scalable data processing, sharing, and management abstractions within a distributed service platform that “scales” programmer productivity.
As the cost of computing and communication resources has plummeted, applications have become data-centric with data products growing explosively in both number and size. Although accessing such data using the compute power necessary for its analysis and processing is cheap and readily available via cloud computing (intuitive, utility-style access to vast resource pools), doing so currently requires significant expertise, experience, and time (for customization, configuration, deployment, etc). This work investigates new models of cloud computing that combine domain-targeted languages with scalable data processing, sharing, and management abstractions within a distributed service platform that “scales” programmer productivity. To enable this, this research explores new programming language, runtime, and distributed systems techniques and technologies that integrate the R programming language environment with open source cloud platform-as-a-service (PaaS) in ways that simplify processing massive datasets, sharing datasets across applications and users, and tracking and enforcing data provenance. The PIs’ plans for research, outreach, integrated curricula, and open source release of research artifacts have the potential for making cloud computing more accessible to a much wider range of users: The data analytics community who use the R statistical analysis environment to apply their techniques and algorithms to important problems in areas such as biology, chemistry, physics, political science and finance, by enabling them to use cloud resources transparently for their analyses, and to share their scientific data/results in a way that enables others to reproduce and verify them.
The Applied Machine Learning Group is working with researchers from Harvard Medical School to predict outcomes for multiple sclerosis patients. A focus of the research is how best to interact with physicians to use both human expertise and machine learning methods.
Many of the truly difficult problems limiting advances in contemporary science are rooted in our limited understanding of how complex systems are controlled. Indeed, in human cells millions of molecules are embedded in a complex genetic network that lacks an obvious controller; in society billions of individuals interact with each other through intricate trust-family-friendship-professional-association based networks apparently controlled by no one; economic change is driven by what economists call the “invisible hand of the market”, reflecting a lack of understanding of the control principles that govern the interactions between individuals, companies, banks and regulatory agencies.
These and many other examples raise several fundamental questions: What are the control principles of complex systems? How do complex systems organize themselves to achieve sufficient control to ensure functionality? This proposal is motivated by the hypothesis that the architecture of many complex systems is driven by the system’s need to achieve sufficient control to maintain its basic functions. Hence uncovering the control principles of complex self-organized systems can help us understand the fundamental laws that govern them.
The PI’s goal in this project is to revolutionize media-assisted oral presentations in general, and STEM presentations in particular, through the use of an intelligent, autonomous, life-sized, animated co-presenter agent that collaborates with a human presenter in preparing and delivering his or her talk in front of a live audience.
Although journal and conference articles are recognized as the most formal and enduring forms of scientific communication, oral presentations are central to science because they are the means by which researchers, practitioners, the media, and the public hear about the latest findings thereby becoming engaged and inspired, and where scientific reputations are made. Yet despite decades of technological advances in computing and communication media, the fundamentals of oral scientific presentations have not advanced since software such as Microsoft’s PowerPoint was introduced in the 1980’s. The PI’s goal in this project is to revolutionize media-assisted oral presentations in general, and STEM presentations in particular, through the use of an intelligent, autonomous, life-sized, animated co-presenter agent that collaborates with a human presenter in preparing and delivering his or her talk in front of a live audience. The PI’s pilot studies have demonstrated that audiences are receptive to this concept, and that the technology is especially effective for individuals who are non-native speakers of English (which may be up to 21% of the population of the United States). Project outcomes will be initially deployed and evaluated in higher education, both as a teaching tool for delivering STEM lectures and as a training tool for students in the sciences to learn how to give more effective oral presentations (which may inspire future generations to engage in careers in the sciences).
This research will be based on a theory of human-agent collaboration, in which the human presenter is monitored using real-time speech and gesture recognition, audience feedback is also monitored, and the agent, presentation media, and human presenter (cued via an intelligent wearable teleprompter) are all dynamically choreographed to maximize audience engagement, communication, and persuasion. The project will make fundamental, theoretical contributions to models of real-time human-agent collaboration and communication. It will explore how humans and agents can work together to communicate effectively with a heterogeneous audience using speech, gesture, and a variety of presentation media, amplifying the abilities of scientist-orators who would otherwise be “flying solo.” The work will advance both artificial intelligence and computational linguistics, by extending dialogue systems to encompass mixed-initiative, multi-party conversations among co-presenters and their audience. It will impact the state of the art in virtual agents, by advancing the dynamic generation of hand gestures, prosody, and proxemics for effective public speaking and turn-taking. And it will also contribute to the field of human-computer interaction, by developing new methods for human presenters to interact with autonomous co-presenter agents and their presentation media, including approaches to cueing human presenters effectively using wearable user interfaces.
Northeastern University is a Center of Academic Excellence in Information Assurance Education and Research. It is also one of the four schools recently designated by the National Security Agency as a Center of Academic Excellence in Cyber Operations. Northeastern has produced 20 SFS students over the past 3 years. All of the graduates are placed in positions within the Federal Government and Federally Funded Research and Development Centers. One of the unique elements of the program is the diversity of students in the program.
ABSTRACT
Northeastern University is a Center of Academic Excellence in Information Assurance Education and Research. It is also one of the four schools recently designated by the National Security Agency as a Center of Academic Excellence in Cyber Operations. Northeastern has produced 20 SFS students over the past 3 years. All of the graduates are placed in positions within the Federal Government and Federally Funded Research and Development Centers. One of the unique elements of the program is the diversity of students in the program. Of the 20 students, 5 are in Computer Science, 6 are in Electrical and Computer Engineering, and 9 are in Information Assurance. These students come with different backgrounds that vary from political science and criminal justice to computer science and engineering. The University, with its nationally-recognized Cooperative Education, is well-positioned to attract and educate strong students in cybersecurity.
The SFS program at Northeastern succeeds in recruiting a diverse group of under-represented students to the program, and is committed to sustaining this level of diversity in future recruiting. Northeastern University is also reaching out to the broader community by leading Capture-the-Flag and Collegiate Cyber Defense competitions, and by actively participating in the New England Advanced Cyber Security Center, an organization composed of academia, industry, and government entities.
PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
Sun E. and Kaeli D.. “Aggressive Value Prediction on a GPU,” Journal of Parallel Processing, 2012, p. 1-19.
Azmandian F., Dy. J. G., Aslam J.A., Kaeli D.. “Local Kernel Density Ratio-Based Feature Selection for Outlier Detection,” Journal of Machine Learning Research, v.25, 2012, p. 49-64.
This is a study of the structure and dynamics of Internet-based collaboration. The project seeks groundbreaking insights into how multidimensional network configurations shape the success of value-creation processes within crowdsourcing systems and online communities. The research also offers new computational social science approaches to theorizing and researching the roles of social structure and influence within technology-mediated communication and cooperation processes.
This is a study of the structure and dynamics of Internet-based collaboration. The project seeks groundbreaking insights into how multidimensional network configurations shape the success of value-creation processes within crowdsourcing systems and online communities. The research also offers new computational social science approaches to theorizing and researching the roles of social structure and influence within technology-mediated communication and cooperation processes. The findings will inform decisions of leaders interested in optimizing all forms of collaboration in fields such as open-source software development, academic projects, and business. System designers will be able to identify interpersonal dynamics and develop new features for opinion aggregation and effective collaboration. In addition, the research will inform managers on how best to use crowdsourcing solutions to support innovation and marketing strategies including peer-to-peer marketing to translate activity within online communities into sales.
This research will analyze digital trace data that enable studies of population-level human interaction on an unprecedented scale. Understanding such interaction is crucial for anticipating impacts in our social, economic, and political lives as well as for system design. One site of such interaction is crowdsourcing systems – socio-technical systems through which online communities comprised of diverse and distributed individuals dynamically coordinate work and relationships. Many crowdsourcing systems not only generate creative content but also contain a rich community of collaboration and evaluation in which creators and adopters of creative content interact among themselves and with artifacts through overlapping relationships such as affiliation, communication, affinity, and purchasing. These relationships constitute multidimensional networks and create structures at multiple levels. Empirical studies have yet to examine how multidimensional networks in crowdsourcing enable effective large-scale collaboration. The data derive from two distinctly different sources, thus providing opportunities for comparison across a range of online creation-oriented communities. One is a crowdsourcing platform and ecommerce website for creative garment design, and the other is a platform for participants to create innovative designs based on scrap materials. This project will analyze both online community activity and offline purchasing behavior. The data provide a unique opportunity to understand overlapping structures of social interaction driving peer influence and opinion formation as well as the offline economic consequences of this online activity. This study contributes to the literature by (1) analyzing multidimensional network structures of interpersonal and socio-technical interactions within these socio-technical systems, (2) modeling how success feeds back into value-creation processes and facilitates learning, and (3) developing methods to predict the economic success of creative products generated in these contexts. The application and integration of various computational and statistical approaches will provide significant dividends to the broader scientific research community by contributing to the development of technical resources that can be extended to other forms of data-intensive inquiry. This includes documentation about best practices for integrating methods for classification and prediction; courses to train students to perform large-scale data analysis; and developing new theoretical approaches for understanding the multidimensional foundations of cyber-human systems.
Currently, there are no automated tools that have the capacity to perform tracing tasks on the scale of mammalian neural circuits. Needless to say, the existence of such a tool is critical both for basic mapping of synaptic connectivity in normal brains, as well as for describing the changes in the nervous system which underlie neurological disorders. With this proposal we plan to continue the development of Neural Circuit Tracer – software for accurate, automated reconstruction of the structure and dynamics of neurites from 3D light microscopy stacks of images.
Our understanding of brain functions is hindered by the lack of detailed knowledge of synaptic connectivity in the underlying neural network. While synaptic connectivity of small neural circuits can be determined with electron microscopy, studies of connectivity on a larger scale, e.g. whole mouse brain, must be based on light microscopy imaging. It is now possible to fluorescently label subsets of neurons in vivo and image their axonal and dendritic arbors in 3D from multiple brain tissue sections. The overwhelming remaining challenge is neurite tracing, which must be done automatically due to the high-throughput nature of the problem. Currently, there are no automated tools that have the capacity to perform tracing tasks on the scale of mammalian neural circuits. Needless to say, the existence of such a tool is critical both for basic mapping of synaptic connectivity in normal brains, as well as for describing the changes in the nervous system which underlie neurological disorders. With this proposal we plan to continue the development of Neural Circuit Tracer – software for accurate, automated reconstruction of the structure and dynamics of neurites from 3D light microscopy stacks of images. Our goal is to revolutionize the existing functionalities of the software, making it possible to: (i) automatically reconstruct axonal and dendritic arbors of sparsely labeled populations of neurons from multiple stacks of images and (ii) automatically track and quantify changes in the structures of presynaptic boutons and dendritic spines imaged over time. We propose to utilize the latest machine learning and image processing techniques to develop multi-stack tracing, feature detection, and computer-guided trace editing capabilities of the software. All tools and datasets created as part of this proposal will be made available to the research community.
Public Health Relevance
At present, accurate methods of analysis of neuron morphology and synaptic connectivity rely on manual or semi-automated tracing tools. Such methods are time consuming, can be prone to errors, and do not scale up to the level of large brain-mapping projects. Thus, it is proposed to develop open-source software for accurate, automated reconstruction of structure and dynamics of large neural circuits.
This project will develop new research methods to map and quantify the ways in which online search engines, social networks, and e-commerce sites use sophisticated algorithms to tailor content to each individual user.
ABSTRACT
This project will develop new research methods to map and quantify the ways in which online search engines, social networks, and e-commerce sites use sophisticated algorithms to tailor content to each individual user. This “personalization” may often be of value to the user, but it also has the potential to distort search results and manipulate the perceptions and behavior of the user. Given the popularity of personalization across a variety of Web-based services, this research has the potential for extremely broad impact. Being able to quantify the extent to which Web-based services are personalized will lead to greater transparency for users, and the development of tools to identify personalized content will allow users to access information that may be hard to access today.
Personalization is now a ubiquitous feature on many Web-based services. In many cases, personalization provides advantages for users because personalization algorithms are likely to return results that are relevant to the user. At the same time, the increasing levels of personalization in Web search and other systems are leading to growing concerns over the Filter Bubble effect, where users are only given results that the personalization algorithm thinks they want, while other important information remains inaccessible. From a computer science perspective, personalization is simply a tool that is applied to information retrieval and ranking problems. However, sociologists, philosophers, and political scientists argue that personalization can result in inadvertent censorship and “echo chambers.” Similarly, economists warn that unscrupulous companies can leverage personalization to steer users towards higher-priced products, or even implement price discrimination, charging different users different prices for the same item. As the pervasiveness of personalization on the Web grows, it is clear that techniques must be developed to understand and quantify personalization across a variety of Web services.
This research has four primary thrusts: (1) To develop methodologies to measure personalization of mobile content. The increasing popularity of browsing the Web from mobile devices presents new challenges, as these devices have access to sensitive content like the user’s geolocation and contacts. (2) To develop systems and techniques for accurately measuring the prevalence of several personalization trends on a large number of e-commerce sites. Recent anecdotal evidence has shown instances of problematic sales tactics, including price steering and price discrimination. (3) To develop techniques to identify and quantify personalized political content. (4) To measure the extent to which financial and health information is personalized based on location and socio-economic status. All four of these thrusts will develop new research methodologies that may prove effective in other areas of research as well.
PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH
David Lazer, Ryan Kennedy, Gary King, Alessandro Vespignani. “The Parable of Google Flu: Traps in Big Data Analysis,” Science, v.343, 2014, p. 1203.
This project is using cloud computing to re-architect web-based services in order to enable end users to regain privacy and control over their data. In this approach—a confederated architecture—each user provides the computing resources necessary to support her use of the service via cloud providers.
Users today have access to a broad range of free, web-based social services. All of these services operate under a similar model: Users entrust the service provider with their personal information and content, and in return, the service provider makes their service available for free by monetizing the user-provided information and selling the results to third parties (e.g., advertisers). In essence, users pay for these services by providing their data (i.e., giving up their privacy) to the provider.
This project is using cloud computing to re-architect web-based services in order to enable end users to regain privacy and control over their data. In this approach—a confederated architecture—each user provides the computing resources necessary to support her use of the service via cloud providers. All user data is encrypted and not exposed to any third-parties, users retain control over their information, and users access the service via a web browser as normal.
The incredible popularity of today’s web-based services has lead to significant concerns over privacy and user control over data. Addressing these concerns requires a re-thinking of the current popular web-based business models, and, unfortunately, existing providers are dis-incentivized from doing so. The impact of this project will potentially be felt by the millions of users who use today’s popular services, who will be provided with an alternative to the business models of today.
The thesis of this project is that one can obtain the best of both worlds — performance evaluations with the quality of experts but at the cost of crowd workers — by optimally leveraging both experts and crowd workers in asking the “right” assessor the “right” question at the “right” time.
Evaluating the performance of information retrieval systems such as search engines is critical to their effective development. Current “gold standard” performance evaluation methodologies generally rely on the use of expert assessors to judge the quality of documents or web pages retrieved by search engines, at great cost in time and expense. The advent of “crowd sourcing,” such as available through Amazon’s Mechanical Turk service, holds out the promise that these performance evaluations can be performed more rapidly and at far less cost through the use of many (though generally less skilled) “crowd workers”; however, the quality of the resulting performance evaluations generally suffer greatly. The thesis of this project is that one can obtain the best of both worlds — performance evaluations with the quality of experts but at the cost of crowd workers — by optimally leveraging both experts and crowd workers in asking the “right” assessor the “right” question at the “right” time. For example, one might ask inexpensive crowd workers what are likely to be “easy” questions while reserving what are likely to be “hard” questions for the expensive experts. While the project focuses on the performance evaluation of search engines as its use case, the techniques developed will be more broadly applicable to many domains where one wishes to efficiently and effectively harness experts and crowd workers with disparate levels of cost and expertise.
To enable the vision described above, a probabilistic framework will be developed within which one can quantify the uncertainty about a performance evaluation as well as the cost and expected utility of asking any assessor (expert or crowd worker) any question (e.g. a nominal judgment for a document or a preference judgment between two documents) at any time. The goal is then to ask the “right” question of the “right” assessor at any time in order to maximize the expected utility gained per unit cost incurred and then to optimally aggregate such responses in order to efficiently and effectively evaluate performance.
This project seeks to demonstrate how to build realistic yet secure compilers. This is a notoriously difficult problem. On one hand, a secure compiler must ensure that low-level contexts cannot launch any “attacks” on the compiled component that would have been impossible to launch in the high-level language. On the other hand, a realistic compiler cannot simply limit the expressiveness of the low-level target language to achieve the security goal.
Advanced programming languages, based on dependent types, enable program verification alongside program development, thus making them an ideal tool for building fully verified, high assurance software. Recent dependently typed languages that permit reasoning about state and effects—such as Hoare Type Theory (HTT) and Microsoft’s F*—are particularly promising and have been used to verify a range of rich security policies, from state-dependent information flow and access control to conditional declassification and information erasure. But while these languages provide the means to verify security and correctness of high-level source programs, what is ultimately needed is a guarantee that the same properties hold of compiled low-level target code. Unfortunately, even when compilers for such advanced languages exist, they come with no formal guarantee of correct compilation, let alone any guarantee of secure compilation—i.e., that compiled components will remain as secure as their high-level counterparts when executed within arbitrary low-level contexts. This project seeks to demonstrate how to build realistic yet secure compilers. This is a notoriously difficult problem. On one hand, a secure compiler must ensure that low-level contexts cannot launch any “attacks” on the compiled component that would have been impossible to launch in the high-level language. On the other hand, a realistic compiler cannot simply limit the expressiveness of the low-level target language to achieve the security goal.
The intellectual merit of this project is the development of a powerful new proof architecture for realistic yet secure compilation of dependently typed languages that relies on contracts to ensure that target-level contexts respect source-level security guarantees and leverages these contracts in a formal model of how source and target code may interoperate. The broader impact is that this research will make it possible to compose high-assurance software components into high-assurance software systems, regardless of whether the components are developed in a high-level programming language or directly in assembly. Compositionality has been a long-standing open problem for certifying systems for high-assurance. Hence, this research has potential for enormous impact on how high-assurance systems are built and certified. The specific goal of the project is to develop a verified multi-pass compiler from Hoare Type Theory to assembly that is type preserving, correct, and secure. The compiler will include passes that perform closure conversion, heap allocation, and code generation. To prove correct compilation of components, not just whole programs, this work will use an approach based on defining a formal semantics of interoperability between source components and target code. To guarantee secure compilation, the project will use (static) contract checking to ensure that compiled code is only run in target contexts that respect source-level security guarantees. To carry out proofs of compiler correctness, the project will develop a logical relations proof method for Hoare Type Theory.
The intellectual merit of this project is the development of a proof architecture for building verified compilers for today’s world of multi-language software: such verified compilers guarantee correct compilation of components and support linking with arbitrary target code, no matter its source.
Compilers play a critical role in the production of software. As such, they should be correct. That is, they should preserve the behavior of all programs they compile. Despite remarkable progress on formally verified compilers in recent years, these compilers suffer from a serious limitation: they are proved correct under the assumption that they will only be used to compile whole programs. This is an entirely unrealistic assumption since most software systems today are comprised of components written in different languages compiled by different compilers to a common low-level target language. The intellectual merit of this project is the development of a proof architecture for building verified compilers for today’s world of multi-language software: such verified compilers guarantee correct compilation of components and support linking with arbitrary target code, no matter its source. The project’s broader significance and importance are that verified compilation of components stands to benefit practically every software system, from safety-critical software to web browsers, because such systems use libraries or components that are written in a variety of languages. The project will achieve broad impact through the development of (i) a proof methodology that scales to realistic multi-pass compilers and multi-language software, (ii) a target language that extends LLVM—increasingly the target of choice for modern compilers—with support for compilation from type-safe source languages, and (iii) educational materials related to the proof techniques employed in the course of this project.
The project has two central themes, both of which stem from a view of compiler correctness as a language interoperability problem. First, specification of correctness of component compilation demands a formal semantics of interoperability between the source and target languages. More precisely: if a source component (say s) compiles to target component (say t), then t linked with some arbitrary target code (say t’) should behave the same as s interoperating with t’. Second, enabling safe interoperability between components compiled from languages as different as Java, Rust, Python, and C, requires the design of a gradually type-safe target language based on LLVM that supports safe interoperability between more precisely typed, less precisely typed, and type-unsafe components.
This project will support a plugin architecture for transparent checkpoint-restart.
Society’s increasingly complex cyberinfrastructure creates a concern for software robustness and reliability. Yet, this same complex infrastructure is threatening the continued use of fault tolerance. Consider when a single application or hardware device crashes. Today, in order to resume that application from the point where it crashed, one must also consider the complex subsystem to which it belongs. While in the past, many developers would write application-specific code to support fault tolerance for a single application, this strategy is no longer feasible when restarting the many inter-connected applications of a complex subsystem. This project will support a plugin architecture for transparent checkpoint-restart. Transparency implies that the software developer does not need to write any application-specific code. The plugin architecture implies that each software developer writes the necessary plugins only once. Each plugin takes responsibility for resuming any interrupted sessions for just one particular component. At a higher level, the checkpoint-restart system employs an ensemble of autonomous plugins operating on all of the applications of a complex subsystem, without any need for application-specific code.
The plugin architecture is part of a more general approach called process virtualization, in which all subsystems external to a process are virtualized. It will be built on top of the DMTCP checkpoint-restart system. One simple example of process virtualization is virtualization of ids. A plugin maintains a virtualization table and arranges for the application code of the process to see only virtual ids, while the outside world sees the real id. Any system calls and library calls using this real id are extended to translate between real and virtual id. On restart, the real ids are updated with the latest value, and the process memory remains unmodified, since it contains only virtual ids. Other techniques employing process virtualization include shadow device drivers, record-replay logs, and protocol virtualization. Some targets of the research include transparent checkpoint-restart support for the InfiniBand network, for programmable GPUs (including shaders), for networks of virtual machines, for big data systems such as Hadoop, and for mobile computing platforms such as Android.
PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH
Kapil Arya and Gene Cooperman. “DMTCP: Bringing Interactive Checkpoint?Restart to Python,” Computational Science & Discovery, v.8, 2015, p. 16 pages. doi:10.1088/issn.1749-4699
Jiajun Cao, Matthieu Simoni, Gene Cooperman,
and Christine Morin. “Checkpointing as a Service in Heterogeneous Cloud Environments,” Proc. of 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid’15),, 2015, p. 61–70. doi:10.1109/CCGrid.2015.160
This project will focus on the development of the REDEX tool, a lightweight domain-specific tool for modeling programming languages useful for software development. Originally developed as an in-house tool for a small group of collaborating researchers, REDEX escaped the laboratory several years ago and acquired a dedicated user community; new users now wish to use it for larger and more complicated programming languages than originally envisioned. Using this framework, a programmer articulates a programming language model directly as a software artifact with just a little more effort than paper-and-pencil models. Next, the user invokes diagnostic tools to test a model’s consistency, explore its properties, and check general claims about it.
This award funds several significant improvements to REDEX: (1) a modular system that allows its users to divide up the work, (2) scalable performance so that researchers can deal with large models, and (3) improvements to its testing and error-detection system. The award also includes support for the education of REDEX’s quickly growing user community, e.g., support for organizing tutorials and workshops.
This project addresses an urgent, emergent need at the intersection of software maintenance and programming language research. Over the past 20 years, working software engineers have embraced so-called scripting languages for a variety of tasks. Software engineers choose these languages because they make prototyping easy, and before the engineers realize it, these prototypes evolve into large, working systems and escape into the real world. Like all software, these systems need to be maintained—mistakes must be fixed, their performance requires improvement, security gaps call for fixes, their functionality needs to be enhanced—but scripting languages render maintenance difficult. The intellectual merits of this project are to address all aspects of this real-world software engineering problem.
The “Gradual Typing Across the Spectrum” project addresses an urgent, emergent need at the intersection of software maintenance and programming language research. Over the past 20 years, working software engineers have embraced so-called scripting languages for a variety of tasks. They routinely use JavaScript for interactive web pages, Ruby on Rails for server-side software, Python for data science, and so on. Software engineers choose these languages because they make prototyping easy, and before the engineers realize it, these prototypes evolve into large, working systems and escape into the real world. Like all software, these systems need to be maintained—mistakes must be fixed, their performance requires improvement, security gaps call for fixes, their functionality needs to be enhanced—but scripting languages render maintenance difficult. The intellectual merits of this project are to address all aspects of this real-world software engineering problem. In turn, the project’s broader significance and importance are the deployment of new technologies to assist the programmer who maintains code in scripting languages, the creation of novel technologies that preserve the advantages of these scripting frameworks, and the development of curricular materials that prepares the next generation of students for working within these frameworks.
A few years ago, the PIs launched programming language research efforts to address this problem. They diagnosed the lack of sound types in scripting languages as one of the major factors. With types in conventional programming languages, programmers concisely communicate design information to future maintenance workers; soundness ensures the types are consistent with the rest of the program. In response, the PIs explored the idea of gradual typing, that is, the creation of a typed sister language (one per scripting language) so that (maintenance) programmers can incrementally equip systems with type annotations. Unfortunately, these efforts have diverged over the years and would benefit from systematic cross-pollination.
With support from this grant, the PIs will systematically explore the spectrum of their gradual typing system with a three-pronged effort. First, they will investigate how to replicate results from one project in another. Second, they will jointly develop an evaluation framework for gradual typing projects with the goal of diagnosing gaps in the efforts and needs for additional research. Third, they will explore the creation of new scripting languages that benefit from the insights of gradual typing research.
This research will leverage the sensing capabilities of the TDS system and PI Patel’s expertise in spoken interaction technologies for individuals with speech impairment, as well as Co-PI Fu’s work on machine learning and multimodal data fusion, to develop a prototype clinically viable tool for enhancing speech clarity by coupling lingual-kinematic and acoustic data.
Speech is a complex and intricately timed task that requires the coordination of numerous muscle groups and physiological systems. While most children acquire speech with relative ease, it is one of the most complex patterned movements accomplished by humans and thus susceptible to impairment. Approximately 2% of Americans have imprecise speech either due to mislearning during development (articulation disorder) or as a result of neuromotor conditions such as stroke, brain injury, Parkinson’s disease, cerebral palsy, etc. An equally sizeable group of Americans have difficulty with English pronunciation because it is their second language. Both of these user groups would benefit from tools that provide explicit feedback on speech production clarity. Traditional speech remediation relies on viewing a trained clinician’s accurate articulation and repeated practice with visual feedback via a mirror. While these interventions are effective for readily viewable speech sounds (visemes such as /b/p/m/), they are largely unsuccessful for sounds produced inside the mouth. The tongue is the primary articulator for these obstructed sounds and its movements are difficult to capture. Thus, clinicians use diagrams and other low-tech means (such as placing edible substances on the palate or physically manipulating the oral articulators) to show clients where to place their tongue. While sophisticated research tools exist for measuring and tracking tongue movements during speech, they are prohibitively expensive, obtrusive, and impractical for clinical and/or home use. The PIs’ goal in this exploratory project, which represents a collaboration across two institutions, is to lay the groundwork for a Lingual-Kinematic and Acoustic sensor technology (LinKa) that is lightweight, low-cost, wireless and easy to deploy both clinically and at home for speech remediation.
PI Ghovanloo’s lab has developed a low-cost, wireless, and wearable magnetic sensing system, known as the Tongue Drive System (TDS). An array of electromagnetic sensors embedded within a headset detects the position of a small magnet that is adhered to the tongue. Clinical trials have demonstrated the feasibility of using the TDS for computer access and wheelchair control by sensing tongue movements in up to 6 discrete locations within the oral cavity. This research will leverage the sensing capabilities of the TDS system and PI Patel’s expertise in spoken interaction technologies for individuals with speech impairment, as well as Co-PI Fu’s work on machine learning and multimodal data fusion, to develop a prototype clinically viable tool for enhancing speech clarity by coupling lingual-kinematic and acoustic data. To this end, the team will extend the TDS to track tongue movements during running speech, which are quick, compacted within a small area of the oral cavity, and often overlap for several phonemes, so the challenge will be to accurately classify movements for different sound classes. To complement this effort, pattern recognition of sensor spatiotemporal dynamics will be embedded into an interactive game to offer a motivating, personalized context for speech motor (re)learning by enabling audiovisual biofeedback, which is critical for speech modification. To benchmark the feasibility of the approach, the system will be evaluated on six individuals with neuromotor speech impairment and six healthy age-matched controls.
The goal of the Dialog project is to create channels of communication between these translation processes and software engineers, with the expectation that the latter can use this new source of information to improve the speed, size, or energy consumption of their software.
The “Compiler Coaching” (Dialog) project represents an investment in programming language tools and technology. Software engineers use high-level programming languages on a daily basis to produce the apps and applications that everyone uses and that control everybody’s lives. Once a programming language translator accepts a program as grammatically correct, it creates impenetrable computer codes without informing the programmer how well (fast or slow, small or large, energy hogging or efficient) these codes will work. Indeed, modern programming languages employ increasingly sophisticated translation techniques and have become obscure black boxes to the working engineer. The goal of the Dialog project is to create channels of communication between these translation processes and software engineers, with the expectation that the latter can use this new source of information to improve the speed, size, or energy consumption of their software.
The PIs will explore the Dialog idea in two optimizing compiler settings, one on the conventional side and one on the modern one: for the Racket language, a teaching and research vehicle that they can modify as needed to create the desired channel, and the JavaScript programming language, the standardized tool for existing Web applications. The intellectual merits concern the fundamental principles of creating such communication channels and frameworks for gathering empirical evidence on how these channels benefit the working software engineer. These results should enable the developers of any programming language to implement similar channels of communication to help their clients. The broader impacts are twofold. On one hand, the project is likely to positively impact the lives of working software engineers as industrial programming language creators adapt the Dialog idea. On the other hand, the project will contribute to a two-decades old, open-source programming language project with a large and longstanding history of educational outreach at multiple levels. The project has influenced hundreds of thousands of high school students in the past and is likely to do so in the future.
Prior academic work has examined how to automatically discover vulnerabilities in binary software, and even how to automatically craft exploits for these vulnerabilities, the ability to answer basic security-relevant questions about closed-source software remains elusive. This project aims to provide algorithms and tools for answering these questions.
Software, including common examples such as commercial applications or embedded device firmware, is often delivered as closed-source binaries. While prior academic work has examined how to automatically discover vulnerabilities in binary software, and even how to automatically craft exploits for these vulnerabilities, the ability to answer basic security-relevant questions about closed-source software remains elusive.
This project aims to provide algorithms and tools for answering these questions. Leveraging prior work on emulator-based dynamic analyses, we propose techniques for scaling this high-fidelity analysis to capture and extract whole-system behavior in the context of embedded device firmware and closed-source applications. Using a combination of dynamic execution traces collected from this analysis platform and binary code analysis techniques, we propose techniques for automated structural analysis of binary program artifacts, decomposing system and user-level programs into logical modules through inference of high-level semantic behavior. This decomposition provides as output an automatically learned description of the interfaces and information flows between each module at a sub-program granularity. Specific activities include: (a) developing software-guided whole-system emulator for supporting sophisticated dynamic analyses for real embedded systems; (b) developing advanced, automated techniques for structurally decomposing closed-source software into its constituent modules; (c) developing automated techniques for producing high-level summaries of whole system executions and software components; and (d) developing techniques for automating the reverse engineering and fuzz testing of encrypted network protocols. The research proposed herein will have a significant impact outside of the security research community. We will incorporate the research findings of our program into our undergraduate and graduate teaching curricula, as well as in extracurricular educational efforts such as Capture-the-Flag that have broad outreach in the greater Boston and Atlanta metropolitan areas.
The close ties to industry that the collective PIs possess will facilitate transitioning the research into practical defensive tools that can be deployed into real-world systems and networks.
This project studies the design of highly robust networked systems that are resilient to extreme failures and rapid dynamics, and provide optimal performance under a wide spectrum of scenarios with varying levels of predictability.
Modern information networks are composed of heterogeneous nodes and links, whose capacities and capabilities change unexpectedly due to mobility, failures, maintenance, and adversarial attacks. User demands and critical infrastructure needs, however, require that basic primitives including access to information and services be always efficient and reliable. This project studies the design of highly robust networked systems that are resilient to extreme failures and rapid dynamics, and provide optimal performance under a wide spectrum of scenarios with varying levels of predictability.
The focus of this project will be on two problem domains, which together address adversarial network dynamics and stochastic network failures. The first component is a comprehensive theory of information spreading in dynamic networks. The PI will develop an algorithmic toolkit for dynamic networks, including local gossip-style protocols, network coding, random walks, and other diffusion processes. The second component of the project concerns failure-aware network algorithms that provide high availability in the presence of unexpected and correlated failures. The PI will study failure-aware placement of critical resources, and develop flow and cut algorithms under stochastic failures using techniques from chance-constrained optimization. Algorithms tolerant to adversarial and stochastic uncertainty will play a critical role in large-scale heterogeneous information networks of the future. Broader impacts include student training and curriculum development.
The goal of this project is to study the foundations of policy design for controlling epidemics, using a broad class of epidemic games on complex networks involving uncertainty in network information, temporal evolution and learning.
The control of epidemics, broadly defined to range from human diseases such as influenza and smallpox to malware in communication networks, relies crucially on interventions such as vaccinations and anti-virals (in human diseases) or software patches (for malware). These interventions are almost always voluntary directives from public agencies; however, people do not always adhere to such recommendations, and make individual decisions based on their specific “self interest”. Additionally, people alter their contacts dynamically, and these behavioral changes have a huge impact on the dynamics and the effectiveness of these interventions, so that “good” intervention strategies might, in fact, be ineffective, depending upon the individual response.
The goal of this project is to study the foundations of policy design for controlling epidemics, using a broad class of epidemic games on complex networks involving uncertainty in network information, temporal evolution and learning. Models will be proposed to capture the complexity of static and temporal interactions and patterns of information exchange, including the possibility of failed interventions and the potential for moral hazard. The project will also study specific policies posed by public agencies and network security providers for controlling the spread of epidemics and malware, and will develop resource constrained mechanisms to implement them in this framework.
This project will integrate approaches from Computer Science, Economics, Mathematics, and Epidemiology to give intellectual unity to the study and design of public health policies and has the potential for strong dissertation work in all these areas. Education and outreach is an important aspect of the project, and includes curriculum development at both the graduate and under-graduate levels. A multi-disciplinary workshop is also planned as part of the project.
The objective of the proposed research is to make progress on several mutually enriching directions in computational complexity theory, including problems at the intersections with algorithms and cryptography.
Computational inefficiency is a common experience: the computer cannot complete a certain task due to lack of resources such as time, memory, or bandwidth. Computational complexity theory classifies — or aims to classify — computational tasks according to their inherent inefficiency. Since tasks requiring excessive resources must be avoided, complexity theory is often indispensable in the design of a computer system. Inefficiency can also be harnessed to our advantage. Indeed, most modern cryptography and electronic commerce rely on the (presumed) inefficiency of certain computational tasks.
The objective of the proposed research is to make progress on several mutually enriching directions in computational complexity theory, including problems at the intersections with algorithms and cryptography. Building on the principal investigator’s (PI’s) previous works, the main proposed directions are:
This research is closely integrated with a plan to achieve broad impact through education. The PI is reshaping the theory curriculum at Northeastern on multiple levels. At the undergraduate level, the PI is working on and using in his classes a set of lecture notes aimed towards students lacking mathematical maturity. At the Ph.D. level, the PI is including into core classes current research topics including some of the above. Finally, the PI will continue to do research working closely with students at all levels.
This project will afford the opportunity of greatly expanding the understanding of realistic complex networks by joining theoretical analysis of coupled networks with extensive analysis of appropriately chosen large-scale databases.
The significant advances realized in recent years in the study of complex networks are severely limited by an almost exclusive focus on the behavior of single networks. However, most networks in the real world are not isolated but are coupled and hence depend upon other networks, which in turn depend upon other networks. Real networks communicate with each other and may exchange information, or, more importantly, may rely upon one another for their proper functioning. A simple but real example is a power station network that depends on a computer network, and the computer network depends on the power network. Our social networks depend on technical networks, which, in turn, are supported by organizational networks. Surprisingly, analyzing complex systems as coupled interdependent networks alters the most basic assumptions that network theory has relied on for single networks. A multidisciplinary, data driven research project will: 1) Study the microscopic processes that rule the dynamics of interdependent networks, with a particular focus on the social component; 2) Define new mathematical models/foundational theories for the analysis of the robustness/resilience and contagion/diffusive dynamics of interdependent networks. This project will afford the opportunity of greatly expanding the understanding of realistic complex networks by joining theoretical analysis of coupled networks with extensive analysis of appropriately chosen large-scale databases. These databases will be made publicly available, except for special cases where it is illegal to do so.
This research has important implications for the understanding the social and technical systems that make up a modern society. A recent US Scientific Congressional Report concludes ?No currently available modeling and simulation tools exist that can adequately address the consequences of disruptions and failures occurring simultaneously in different critical infrastructures that are dynamically inter-dependent? Understanding the interdependence of networks and its effect on the system robustness and on the structural and functional behavior is crucial for properly modeling many real world systems and applications, from disaster preparedness, to building effective organizations, to comprehending the complexity of the macro economy. In addition to these intellectual objectives, the research project includes the development of an extensive outreach program to the public, especially K-12 students.
This research targets the design and evaluation of protocols for secure, privacy-preserving data analysis in an untrusted cloud.
Therewith, the user can store and query data in the cloud, preserving privacy and integrity of outsourced data and queries. The PIs specifically address a real-world cloud framework: Google’s prominent MapReduce paradigm.
Traditional solutions for single server setups and related work on, e.g., fully homomorphic encryption, are computationally too heavy and uneconomical and offset cloud advantages. The PIs’ rationale is to design new protocols tailored to the specifics of the MapReduce computing paradigm. The PIs’ methodology is twofold. First, the PIs design new protocols that allow the cloud user to specify data analysis queries for typical operations such as searching, pattern matching or counting. For this, the PIs extend privacy-preserving techniques, e.g., private information retrieval or order preserving encryption. Second, the PIs design protocols guaranteeing genuineness of data retrieved from the cloud. Using cryptographic accumulators, users can verify whether data has not been tampered with. Besides design, the PIs also implement a prototype that is usable in a realistic setting with MapReduce.
The outcome of this project enables privacy-preserving operations and secure data storage in a widely-used cloud computing framework, thus remove one major adoption obstacle, and make cloud computing available for a larger community.
This project aims to comprehensively investigate the resiliency of Wi-Fi networks to smart attacks, and to design and implement robust solutions capable of resisting or countering them. The project additionally focuses on harnessing new capabilities of Wi-Fi radios, such as multiple-input and multiple-output (MIMO) antennas, to protect against powerful adversaries.
Wi-Fi has emerged as the technology of choice for Internet access. Thus, virtually every smartphone or tablet is now equipped with a Wi-Fi card. Concurrently, and as a means to maximize spectral efficiency, Wi-Fi radios are becoming increasingly complex and sensitive to wireless channel conditions. The prevalence of Wi-Fi networks, along with their adaptive behaviors, makes them an ideal target for denial of service attacks at a large, infrastructure level.
This project aims to comprehensively investigate the resiliency of Wi-Fi networks to smart attacks, and to design and implement robust solutions capable of resisting or countering them. The project additionally focuses on harnessing new capabilities of Wi-Fi radios, such as multiple-input and multiple-output (MIMO) antennas, to protect against powerful adversaries. The research blends theory with experimentation and prototyping, and spans a range of disciplines including protocol design and analysis, coding and modulation, on-line algorithms, queuing theory, and emergent behaviors.
The anticipated benefits of the project include: (1) a deep understanding of threats facing Wi-Fi along several dimensions, via experiments and analysis; (2) a set of mitigation techniques and algorithms to strengthen existing Wi-Fi networks and emerging standards; (3) implementation into open-source software that can be deployed on wireless network cards and access points; (4) security training of the next-generation of scientists and engineers involved in radio design and deployment.
The objective of this research is to develop a comprehensive theoretical and experimental cyber-physical framework to enable intelligent human-environment interaction capabilities by a synergistic combination of computer vision and robotics.
Specifically, the approach is applied to examine individualized remote rehabilitation with an intelligent, articulated, and adjustable lower limb orthotic brace to manage Knee Osteoarthritis, where a visual-sensing/dynamical-systems perspective is adopted to: (1) track and record patient/device interactions with internet-enabled commercial-off-the-shelf computer-vision-devices; (2) abstract the interactions into parametric and composable low-dimensional manifold representations; (3) link to quantitative biomechanical assessment of the individual patients; (4) facilitate development of individualized user models and exercise regimen; and (5) aid the progressive parametric refinement of exercises and adjustment of bracing devices. This research and its results will enable us to understand underlying human neuro-musculo-skeletal and locomotion principles by merging notions of quantitative data acquisition, and lower-order modeling coupled with individualized feedback. Beyond efficient representation, the quantitative visual models offer the potential to capture fundamental underlying physical, physiological, and behavioral mechanisms grounded on biomechanical assessments, and thereby afford insights into the generative hypotheses of human actions.
Knee osteoarthritis is an important public health issue, because of high costs associated with treatments. The ability to leverage a quantitative paradigm, both in terms of diagnosis and prescription, to improve mobility and reduce pain in patients would be a significant benefit. Moreover, the home-based rehabilitation setting offers not only immense flexibility, but also access to a significantly greater portion of the patient population. The project is also integrated with extensive educational and outreach activities to serve a variety of communities.
This multi-institutional MIDAS Center of Excellence provides a multi-disciplinary approach to computational, statistical, and mathematical modeling of important infectious diseases.
This is a proposal for a multi-institutional MIDAS Center of Excellence called the Center for Statistics and Quantitative Infectious Diseases (CSQUID). The mission the Center is to provide national and international leadership. The lead institution is the Fred Hutchinson Cancer Research Center (FHCRC). Other participating institutions are the University of Florida, Northeastern University, University of Michigan, Emory University, University of Washington (UW), University of Georgia, and Duke University. The proposal includes four synergistic research projects (RP) that will develop cutting-edge methodologies applied to solving epidemiologic, immunologic and evolutionary problems important for public health policy in influenza, dengue, polio, TB, and other infectious agents: RP1: Modeling, Spatial, Statistics (Lead: I. Longini, U. Florida);RP2: Dynamic Inference (Lead: P. Rohani, U Michigan);RP 3: Understanding transmission with integrated genetic and epidemiologic inference (Co-Leads: E. Kenah, U Florida and T. Bedford, FHCRC);RP 4: Dynamics and Evolution of Influenza Strain Variation (Lead: R. Antia, Emory U). The Software Development and Core Facilities (Lead: A. Vespignani, Northeastern U) will provide leadership in software development, access, and communication. The Policy Studies (Lead: J. Koopman, U Michigan) will provide leadership in communication of our research results to policy makers, as well as conducting novel research into policy making. The Training, Outreach, and Diversity Plans include ongoing training of 9 postdoctoral fellows and 5.25 predoctoral research assistants each year, support for participants in the Summer Institute for Statistics and Modeling in Infectious Diseases (UW) and ongoing Research Experience for Undergraduates programs at two institutions, among others. All participating institutions and the Center are committed to increasing diversity at all levels. Center-wide activities include Career Development Awards for junior faculty, annual workshops and symposia, outside speakers, and participation in the MIDAS Network meetings. Scientific leadership will be provided by the Center Director, a Leadership Committee, an external Scientific Advisory Board as well as the MIDAS Steering Committee.
Public Health Relevance
This multi-institutional MIDAS Center of Excellence provides a multi-disciplinary approach to computational, statistical, and mathematical modeling of important infectious diseases. The research is motivated by multiscale problems such as immunologic, epidemiologic, and environmental drivers of the spread of infectious diseases with the goal of understanding and communicating the implications for public health policy.
Using the Asthma BioRepository for Integrative Genomic Exploration (Asthma BRIDGE), we will perform a series of systems-level genomic analyses that integrate clinical, environmental and various forms of “omic” data (genetics, genomics, and epigenetics) to better understand how molecular processes interact with critical environmental factors to impair asthma control.
The over-arching hypothesis of this proposal is that inter-individual differences in asthma control result from the complex interplay of both environmental, genomic, and socioeconomic factors organized in discrete, scale-free molecular networks. Though strict patient compliance with asthma controller therapy and avoidance of environmental triggers are important strategies for the prevention of asthma exacerbation, failure to maintain control is the most common health-related cause of lost school and workdays. Therefore, better understanding of the molecular underpinnings and the role of environmental factors that lead to poor asthma control is needed. Using the Asthma BioRepository for Integrative Genomic Exploration (Asthma BRIDGE), we will perform a series of systems-level genomic analyses that integrate clinical, environmental and various forms of “omic” data (genetics, genomics, and epigenetics) to better understand how molecular processes interact with critical environmental factors to impair asthma control. This proposal consists three Specific Aims, each consisting of three investigational phases: (i) an initial computational discovery phase to define specific molecular networks using the Asthma BRIDGE datasets, followed by two validation phases – (ii) a computational validation phase using an independent clinical cohort, and (iii) an experimental phase to validate critical molecular edges (gene-gene interactions) that emerge from the defined molecular network.
In Specific Aim 1, we will use the Asthma BRIDGE datasets to define interactome sub-module perturbed in poor asthma control;the regulatory variants that modulate this asthma-control module;and to develop a predictive model of asthma control.
In Specific Aim 2, we will study the effects exposure to air pollution and environmental tobacco smoke on modulating the asthma control networks, testing for environment-dependent alterations in network dynamics.
In Specific Aim 3, we will study the impact of inhaled corticosteroids (ICS – the most efficacious asthma-controller medication) on network dynamics of the asthma-control sub-module by comparing network topologies of acute asthma control between subjects taking ICS to those not on ICS. For our experimental validations, we will assess relevant gene-gene interactions by shRNA studies bronchial epithelial and Jurkat T- cell lines. Experimental validations of findings from Aim 2 will be performed by co-treating cells with either cigarette smoke extract (CSE) or ozone. Similar studies will be performed with co-treatment using dexamethasone to validate findings from Aim 2. From the totality of these studies, we will gain new insights into the pathobiology of poor asthma control, and define targets for biomarker development and therapeutic targeting.
Public Health Relevance
Failure to maintain tight asthma symptom control is a major health-related cause of lost school and workdays. This project aims to use novel statistical network-modeling approaches to model the molecular basis of poor asthma control in a well-characterized cohort of asthmatic patients with available genetic, gene expression, and DNA methylation data. Using this data, we will define an asthma-control gene network, and the genetic, epigenetic, and environmental factors that determine inter-individual differences in asthma control.
Crowdsourcing measurement of mobile Internet performance, now the engine for Mobiperf.
Mobilyzer is a collaboration between Morley Mao’s group at the University of Michigan and David Choffnes’ group at Northeastern University.
Mobilyzer provides the following components:
Measurements, analysis, and system designs to reveal how the Internet’s most commonly used trust systems operate (and misfunction) in practice, and how we can make them more secure.
Research on the SSL/TLS Ecosystem
Every day, we use Secure Sockets Layer (SSL) and Transport Layer Security (TLS) to secure our Internet transactions such as banking, e-mail and e-commerce. Along with a public key infrastructure (PKI), they allow our computers to automatically verify that our sensitive information (e.g., credit card numbers and passwords) are hidden from eavesdroppers and sent to trustworthy servers.
In mid-April, 2014, a software vulnerability called Heartbleed was announced. It allows malicious users to capture information that would allow them to masquerade as trusted servers and potentially steal sensitive information from unsuspecting users. The PKI provides multiple ways to prevent such an attack from occurring, and we should expect Web site operators to use these countermeasures.
In this study, we found that the overwhelming majority of sites (more than 73%) did not do so, meaning visitors to their sites are vulnerable to attacks such as identify theft. Further, the majority of sites that attempted to address the problem (60%) did so in a way that leaves customers vulnerable.
Practical and powerful privacy for network communication (led by Stevens Le Blond at MPI).
Entails several threads that cover Internet measurement, modeling and experimentation.
Understanding the geographic nature of Internet paths and their implications for performance, privacy and security.
This study sheds light on this issue by measuring how and when Internet traffic traverses national boundaries. To do this, we ask you to run our browser applet that visits various popular websites, measures the paths taken, and identifies their locations. By running our tool, you will help us understand if and how Internet paths traverse national boundaries, even when two endpoints are in the same country. And we’ll show you these paths, helping you to understand where your Internet traffic goes
This project will develop methodologies and tools for conducting algorithm audits. An algorithm audit uses controlled experiments to examine an algorithmic system, such as an online service or big data information archive, and ascertain (1) how it functions, and (2) whether it may cause harm.
Examples of documented harms by algorithms include discrimination, racism, and unfair trade practices. Although there is rising awareness of the potential for algorithmic systems to cause harm, actually detecting this harm in practice remains a key challenge. Given that most algorithms of concern are proprietary and non-transparent, there is a clear need for methods to conduct black-box analyses of these systems. Numerous regulators and governments have expressed concerns about algorithms, as well as a desire to increase transparency and accountability in this area.
This research will develop methodologies to audit algorithms in three domains that impact many people: online markets, hiring websites, and financial services. Auditing algorithms in these three domains will require solving fundamental methodological challenges, such as how to analyze systems with large, unknown feature sets, and how to estimate feature values without ground-truth data. To address these broad challenges, the research will draw on insights from prior experience auditing personalization algorithms. Additionally, each domain also brings unique challenges that will be addressed individually. For example, novel auditing tools will be constructed that leverage extensive online and offline histories. These new tools will allow examination of systems that were previously inaccessible to researchers, including financial services companies. Methodologies, open-source code, and datasets will be made available to other academic researchers and regulators. This project includes two integrated educational objectives: (1) to create a new computer science course on big data ethics, teaching how to identify and mitigate harmful side-effects of big data technologies, and (2) production of web-based versions of the auditing tools that are designed to be accessible and informative to the general public, that will increase transparency around specific, prominent algorithmic systems, as well as promote general education about the proliferation and impact of algorithmic systems.
This project aims to investigate the development of procedural narrative systems using crowd-sourcing methods.
This project will create a framework for simulation-based training, which supports a learner’s exploration and replay, and exercise theory of mind skills in order to deliver the full promise of social skills training. The term Theory of Mind (ToM) refers to the human capacity to use beliefs about the mental processes and states of others. In order to train social skills, there has been a rapid growth in narrative-based simulations that allow learners to role-play social interactions. However, the design of these systems often constrains the learner’s ability to explore different behaviors and their consequences. Attempts to support more generative experiences face a combinatorial explosion of alternative paths through the interaction, presenting an overwhelming challenge for developers to create content for all the alternatives. Rather, training systems are often designed around exercising specific behaviors in specific situations, hampering the learning of more general skills in using ToM. This research seeks to solve this problem through three contributions: (1) a new model for conceptualizing narrative and role-play experiences that addresses generativity, (2) new methods that facilitate content creation for those generative experiences, and (3) an approach that embeds theory of mind training in the experience to allow for better learning outcomes. This research is applicable to complex social skill training across a range of situations: in schools, communities, the military, police, homeland security, and ethnic conflict.
The research begins with a paradigm shift that re-conceptualizes social skills simulation as a learner rehearsing a role instead of performing a role. This shift will exploit Stanislavsky’s Active Analysis (AA), a performance rehearsal technique that explicitly exercises Theory of Mind skills. Further, AA’s decomposition into short rehearsal scenes can break the combinatorial explosion over long narrative arcs that exacerbates content creation for social training systems. The research will then explore using behavior fitting and machine learning techniques on crowd-sourced data as way to semi-automate the development of multi-agent simulations for social training. The research will assess quantitatively and qualitatively the ability of this approach to (a) provide experiences that support exploration and foster ToM use and (b) support acquiring crowd-sourced data that can be used to craft those experiences using automatic methods.
This project is unique in combining cutting-edge work in modeling theory of mind, interactive environments, performance rehearsal, and crowd sourcing. The multidisciplinary collaboration will enable development of a methodology for creating interactive experiences that pushes the boundaries of the current state of the art in social skill training. Reliance on crowd sourcing provides an additional benefit of being able to elicit culturally specific behavior patterns by selecting the relevant crowd, allowing for both culture-specific and cross-cultural training content.
Evidence Based Medicine (EBM) aims to systematically use the best available evidence to inform medical decision making. This paradigm has revolutionized clinical practice over the past 30 years. The most important tool for EBM is the systematic review, which provides a rigorous, comprehensive and transparent synthesis of all current evidence concerning a specific clinical question. These syntheses enable decision makers to consider the entirety of the relevant published evidence.
Systematic reviews now inform everything from national health policy to bedside care. But producing these reviews requires researchers to identify the entirety of the relevant literature and then extract from this the information to be synthesized; a hugely laborious and expensive exercise. Moreover, the unprecedented growth of the biomedical literature has increased the burden on those trying to make sense of the published evidence base. Concurrently, more systematic reviews are being conducted every year to synthesize the expanding evidence base; tens of millions of dollars are spent annually conducting these reviews.
RobotReviewer aims to mitigate this issue by (semi-) automating evidence synthesis using machine learning and natural language processing.
View the RobotReviewer page to read more.
Software development is facing a paradigm shift towards ubiquitous concurrent programming, giving rise to software that is among the most complex technical artifacts ever created by humans. Concurrent programming presents several risks and dangers for programmers who are overwhelmed by puzzling and irreproducible concurrent program behavior, and by new types of bugs that elude traditional quality assurance techniques. If this situation is not addressed, we are drifting into an era of widespread unreliable software, with consequences ranging from collapsed programmer productivity, to catastrophic failures in mission-critical systems.
This project will take steps against a concurrent software crisis, by producing verification technology that assists non-specialist programmers in detecting concurrency errors, or demonstrating their absence. The proposed technology will confront the concurrency explosion problem that verification methods often suffer from. The project’s goal is a framework under which the analysis of programs with unbounded concurrency resources (such as threads of execution) can be soundly reduced to an analysis under a small constant resource bound, making the use of state space explorers practical. As a result, the project will largely eliminate the impact of unspecified computational resources as the major cause of complexity in analyzing concurrent programs. By developing tools for detecting otherwise undetectable misbehavior and vulnerabilities in concurrent programs, the project will contribute its part to averting a looming software quality crisis.
The research will enable the auditing and control of personally identifiable information leaks, addressing the key challenges of how to identify and control PII leaks when users’ PII is not known a priori, nor is the set of apps or devices that leak this information. First, to enable auditing through improved transparency, we are investigating how to use machine learning to reliably identify PII from network flows, and identify algorithms that incorporate user feedback to adapt to the changing landscape of privacy leaks. Second, we are building tools that allow users to control how their information is (or not) shared with other parties. Third, we are investigating the extent to which our approach extends to privacy leaks from IoT devices. Besides adapting our system to the unique format for leaks across a variety of IoT devices, our work investigates PII exposed indirectly through time-series data produced by IoT-generated monitoring.
The purpose of this project is to develop a conversational agent system that counsels terminally ill patients in order to alleviate their suffering and improve their quality of life.
Although many interventions have now been developed to address palliative care for specific chronic diseases, little has been done to address the overall quality of life for older adults with serious illness, spanning not only the functional aspects of symptom and medication management, but the affective aspects of suffering. In this project, we are developing a relational agent to counsel patients at home about medication adherence, stress management, advanced care planning, and spiritual support, and to provide referrals to palliative care services when needed.
When deployed on smartphones, virtual agents have the potential to deliver life-saving advice regarding emergency medical conditions, as well as provide a convenient channel for health education to improve the safety and efficacy of pharmacotherapy.
We are developing a smartphone-based virtual agent that provides counseling to patients with Atrial Fibrillation. Atrial Fibrillation is a highly prevalent heart rhythm disorder and is known to significantly increase the risk of stroke, heart failure and death. In this project, a virtual agent is deployed in conjunction with a smartphone-based heart rhythm monitor that lets patients obtain real-time diagnostic information on the status of their atrial fibrillation and determine whether immediate action may be needed.
This project is a collaboration with University of Pittsburgh Medical Center.
The last decade has seen an enormous increase in our ability to gather and manage large amounts of data; business, healthcare, education, economy, science, and almost every aspect of society are accumulating data at unprecedented levels. The basic premise is that by having more data, even if uncertain and of lower quality, we are also able to make better-informed decisions. To make any decisions, we need to perform “inference” over the data, i.e. to either draw new conclusions, or to find support for existing hypotheses, thus allowing us to favor one course of action over another. However, general reasoning under uncertainty is highly intractable, and many state-of-the-art systems today perform approximate inference by reverting to sampling. Thus for many modern applications (such as information extraction, knowledge aggregation, question-answering systems, computer vision, and machine intelligence), inference is a key bottleneck, and new methods for tractable approximate inference are needed.
This project addresses the challenge of scaling inference by generalizing two highly scalable approximate inference methods and complementing them with scalable methods for parameter learning that are “approximation-aware.” Thus, instead of treating the (i) learning and the (ii) inference steps separately, this project uses the approximation methods developed for inference also for learning the model. The research hypothesis is that this approach increases the overall end-to-end prediction accuracy while simultaneously increasing scalability. Concretely, the project develops the theory and a set of scalable algorithms and optimization methods for at least the following four sub-problems: (1) approximating general probabilistic conjunctive queries with standard relational databases; (2) learning the probabilities in uncertain databases based on feedback on rankings of output tuples from general queries; (3) approximating the exact probabilistic inference in undirected graphical models with linearized update equations; and (4) complementing the latter with a robust framework for learning linearized potentials from partially labeled data.
Mass spectrometry is a diverse and versatile technology for high-throughput functional characterization of proteins, small molecules and metabolites in complex biological mixtures. The technology rapidly evolves and generates datasets of an increasingly large complexity and size. This rapid evolution must be matched by an equally fast evolution of statistical methods and tools developed for analysis of these data. Ideally, new statistical methods should leverage the rich resources available from over 12,000 packages implemented in the R programming language and its Bioconductor project. However, technological limitations now hinder their adoption for mass spectrometric research. In response, the project ROCKET builds an enabling technology for working with large mass spectrometric datasets in R, and rapidly developing new algorithms, while benefiting from advancements in other areas of science. It also offers an opportunity of recruitment and retention of Native American students to work with R-based technology and research, and helps prepare them in a career in STEM.
Instead of implementing yet another data processing pipeline, ROCKET builds an enabling technology for extending the scalability of R, and streamlining manipulations of large files in complex formats. First, to address the diversity of the mass spectrometric community, ROCKET supports scaling down analyses (i.e., working with large data files on relatively inexpensive hardware without fully loading them into memory), as well as scaling up (i.e., executing a workflow on a cloud or on a multiprocessor). Second, ROCKET generates an efficient mixture of R and target code which is compiled in the background for the particular deployment platform. By ensuring compatibility with mass spectrometry-specific open data storage standards, supporting multiple hardware scenarios, and generating optimized code, ROCKET enables the development of general analytical methods. Therefore, ROCKET aims to democratize access to R-based data analysis for a broader community of life scientists, and create a blueprint for a new paradigm for R-based computing with large datasets. The outcomes of the project will be documented and made publicly available at https://olga-vitek-lab.khoury.northeastern.edu/.
This award reflects NSF’s statutory mission and has been deemed worthy of support through evaluation using the Foundation’s intellectual merit and broader impacts review criteria.
Northeastern University proposes to organize a Summer School ‘Big Data and Statistics for Bench Scientists.’ The Summer School will train life scientists and computational scientists in designing and analyzing large-scale experiments relying on proteomics, metabolomics, and other high-throughput biomolecular assays. The training will enhance the effectiveness and reproducibility of biomedical research, such as discovery of diagnostic biomarkers for early diagnosis of disease, or prognostic biomarkers for predicting therapy response.
Northeastern University requests funds for a Summer School, entitled Big Data and Statistics for Bench Scientists. The target audience for the School are graduate and post-graduate life scientists, who work primarily in wet lab, and who generate large datasets. Unlike other educational efforts that emphasize genomic applications, this School targets scientists working with other experimental technologies. Mass spectrometry-based proteomics and metabolomics are our main focus, however the School is also appropriate for scientists working with other assays, e.g. nuclear magnetic resonance spectroscopy (NMR), protein arrays, etc. This large community has been traditionally under-served by educational efforts in computation and statistics. This proposal aims to fill this void. The Summer School is motivated by the feedback from smaller short courses previously co-organized or co- instructed by the PI, and will cover theoretical and practical aspects of design and analysis of large-scale experimental datasets. The Summer School will have a modular format, with 8 20-hour modules scheduled in 2 parallel tracks during 2 consecutive weeks. Each module can be taken independently. The planned modules are (1) Processing raw mass spectrometric data from proteomic experiments using Skyline, (2) Begnner’s R, (3) Processing raw mass spectrometric data from metabolomic experiments using OpenMS, (4) Intermediate R, (5) Beginner’s guide to statistical experimental design and group comparison, (6) Specialized statistical methods for detecting differentially abundant proteins and metabolites, (7) Statistical methods for discovery of biomarkers of disease, and (8) Introduction to systems biology and data integration. Each module will introduce the necessary statistical and computational methodology, and contain extensive practical hands-on sessions. Each module will be organized by instructors with extensive interdisciplinary teaching experience, and supported by several teaching assistants. We anticipate the participation of 104 scientists, each taking on average 2 modules. Funding is requested for three yearly offerings of the School, and includes funds to provide US participants with 62 travel fellowships per year, and 156 registration fee wavers per module. All the course materials, including videos of the lectures and of the practical sessions, will be publicly available free of charge.
Different individuals experience the same events in vastly different ways, owing to their unique histories and psychological dispositions. For someone with social fears and anxieties, the mere thought of leaving the home can induce a feeling of panic. Conversely, an experienced mountaineer may feel quite comfortable balancing on the edge of a cliff. This variation of perspectives is captured by the term subjective experience. Despite its centrality and ubiquity in human cognition, it remains unclear how to model the neural bases of subjective experience. The proposed work will develop new techniques for statistical modeling of individual variation, and apply these techniques to a neuroimaging study of the subjective experience of fear. Together, these two lines of research will yield fundamental insights into the neural bases of fear experience. More generally, the developed computational framework will provide a means of comparing different mathematical hypotheses about the relationship between neural activity and individual differences. This will enable investigation of a broad range of phenomena in psychology and cognitive neuroscience.
The proposed work will develop a new computational framework for modeling individual variation in neuroimaging data, and use this framework to investigate the neural bases of one powerful and societally meaningful subjective experience, namely, of fear. Fear is a particularly useful assay because it involves variation across situational contexts (spiders, heights, and social situations), and dispositions (arachnophobia, acrophobia, and agoraphobia) that combine to create subjective experience. In the proposed neuroimaging study, participants will be scanned while watching videos that induce varying levels of arousal. To characterize individual variation in this neuroimaging data, the investigators will leverage advances in deep probabilistic programming to develop probabilistic variants of factor analysis models. These models infer a low-dimensional feature vector, also known as an embedding, for each participant and stimulus. A simple neural network models the relationship between embeddings and the neural response. This network can be trained in a data-driven manner and can be parameterized in a variety of ways, depending on the experimental design, or the neurocognitive hypotheses that are to be incorporated into the model. This provides the necessary infrastructure to test different neural models of fear. Concretely, the investigators will compare a model in which fear has its own unique circuit (i.e. neural signature or biomarker) to subject- or situation-specific neural architectures. More generally, the developed framework can be adapted to model individual variation in neuroimaging studies in other experimental settings.
Easy Alliance, a nonprofit initiative, has been instituted to solve complex, long term challenges in making the digital world a more accessible place for everyone.
Computer networking and the internet have revolutionized our societies, but are plagued with security problems which are difficult to tame. Serious vulnerabilities are constantly being discovered in network protocols that affect the work and lives of millions. Even some protocols that have been carefully scrutinized by their designers and by the computer engineering community have been shown to be vulnerable afterwards. Why is developing secure protocols so hard? This project seeks to address this question by developing novel design and implementation methods for network protocols that allow to identify and fix security vulnerabilities semi-automatically. The project serves the national interest as cyber-security costs the United States many billions of dollars annually. Besides making technical advances to the field, this project will also have broader impacts in education and curriculum development, as well as in helping to bridge the gap between several somewhat fragmented scientific communities working on the problem.
Technically, the project will follow a formal approach building upon a novel combination of techniques from security modeling, automated software synthesis, and program analysis to bridge the gap between an abstract protocol design and a low-level implementation. In particular, the methodology of the project will be based on a new formal behavioral model of software that explicitly captures how the choice of a mapping from a protocol design onto an implementation platform may result in different security vulnerabilities. Building on this model, this project will provide (1) a modeling approach that cleanly separates the descriptions of an abstract design from a concrete platform, and allows the platform to be modeled just once and reused, (2) a synthesis tool that will automatically construct a secure mapping from the abstract protocol to the appropriate choice of platform features, and (3) a program analysis tool that leverages platform-specific information to check that an implementation satisfies a desired property of the protocol. In addition, the project will develop a library of reusable platform models, and demonstrate the effectiveness of the methodology in a series of case studies.
Most computer programs process vast amounts of numerical data. Unfortunately, due to space and performance demands, computer arithmetic comes with its own rules. Making matters worse, different computers have different rules: while there are standardization efforts, efficiency considerations give hardware and compiler designers much freedom to bend the rules to their taste. As a result, the outcome of a computer calculation depends not only on the input, but also on the particular machine and environment in which the calculation takes place. This makes programs brittle and un-portable, and causes them to produce untrusted results. This project addresses these problems, by designing methods to detect inputs to computer programs that exhibit too much platform dependence, and to repair such programs, by making their behavior more robust.
Technical goals of this project include: (i) automatically warning users of disproportionately platform-dependent results of their numeric algorithms; (ii) repairing programs with platform instabilities; and (iii) proving programs stable against platform variations. Platform-independence of numeric computations is a form of robustness whose lack undermines the portability of program semantics. This project is one of the few to tackle the question of non-determinism in the specification (IEEE 754) of the theory (floating-point arithmetic) that machines are using today. This work requires new abstractions that soundly approximate the set of values of a program variable against a variety of compiler and hardware behaviors and features that may not even be known at analysis time. The project involves graduate and undergraduate students.
Side-channel attacks (SCA) have been a realistic threat to various cryptographic implementations that do not feature dedicated protection. While many effective countermeasures have been found and applied manually, they are application-specific and labor intensive. In addition, security evaluation tends to be incomplete, with no guarantee that all the vulnerabilities in the target system have been identified and addressed by such manual countermeasures. This SaTC project aims to shift the paradigm of side-channel attack research, and proposes to build an automation framework for information leakage analysis, multi-level countermeasure application, and formal security evaluation against software side-channel attacks.
The proposed framework provides common sound metrics for information leakage, methodologies for automatic countermeasures, and formal and thorough evaluation methods. The approach unifies power analysis and cache-based timing attacks into one framework. It defines new metrics of information leakage and uses them to automatically identify possible leakage of a given cryptosystem at an early stage with no implementation details. The conventional compilation process is extended along the new dimension of optimizing for security, to generate side-channel resilient code and ensure its secure execution at run-time. Side-channel security is guaranteed to be at a certain confidence level with formal methods. The three investigators on the team bring complementary expertise to this challenging interdisciplinary research, to develop the advanced automation framework and the associated software tools, metrics, and methodologies. The outcome significantly benefits security system architects and software developers alike, in their quest to build verifiable SCA security into a broad range of applications they design. The project also builds new synergy among fundamental statistics, formal methods, and practical system security. The automation tools, when introduced in new courses developed by the PIs, help improving students’ hands-on experience greatly. The project also leverages the experiential education model of Northeastern University to engage undergraduates, women, and minority students in independent research projects.
Nontechnical Description: Artificial intelligence especially deep learning has enabled many breakthroughs in both academia and industry. This project aims to create a generative and versatile design approach based on novel deep learning techniques to realize integrated, multi-functional photonic systems, and provide proof-of-principle demonstrations in experiments. Compared with traditional approaches using extensive numerical simulations or inverse design algorithms, deep learning can uncover the highly complicated relationship between a photonic structure and its properties from the dataset, and hence substantially accelerate the design of novel photonic devices that simultaneously encode distinct functionalities in response to the designated wavelength, polarization, angle of incidence and other parameters. Such multi-functional photonic systems have important applications in many areas, including optical imaging, holographic display, biomedical sensing, and consumer photonics with high efficiency and fidelity, to benefit the public and the nation. The integrated education plan will considerably enhance outreach activities and educate students in grades 7-12, empowered by the successful experience and partnership previously established by the PIs. Graduate and undergraduate students participating in the project will learn the latest developments in the multidisciplinary fields of photonics, deep learning and advanced manufacturing, and gain real-world knowledge by engaging industrial collaborators in tandem with Northeastern University’s renowned cooperative education program.
Technical Description: Metasurfaces, which are two-dimensional metamaterials consisting of a planar array of subwavelength designer structures, have created a new paradigm to tailor optical properties in a prescribed manner, promising superior integrability, flexibility, performance and reliability to advance photonics technologies. However, so far almost all metasurface designs rely on time-consuming numerical simulations or stochastic searching approaches that are limited in a small parameter space. To fully exploit the versatility of metasurfaces, it is highly desired to establish a general, functionality-driven methodology to efficiently design metasurfaces that encompass distinctly different optical properties and performances within a single system. The objective of the project is to create and demonstrate a high-efficiency, two-level design approach enabled by deep learning, in order to realize integrated, multi-functional meta-systems. Proper deep learning methods, such as Conditional Variational Auto-Encoder and Deep Bidirectional-Convolutional Network, will be investigated, innovatively reformulated and tailored to apply at the single-element level and the large-scale system level in combination with topology optimization and genetic algorithm. Such a generative design approach can directly and automatically identify the optimal structures and configurations out of the full parameter space. The designed multi-functional optical meta-systems will be fabricated and characterized to experimentally confirm their performances. The success of the project will produce transformative photonic architectures to manipulate light on demand.
Critical infrastructure systems are increasingly reliant on one another for their efficient operation. This research will develop a quantitative, predictive theory of network resilience that takes into account the interactions between built infrastructure networks, and the humans and neighborhoods that use them. This framework has the potential to guide city officials, utility operators, and public agencies in developing new strategies for infrastructure management and urban planning. More generally, these efforts will untangle the roles of network structure and network dynamics that enable interdependent systems to withstand, recover from, and adapt to perturbations. This research will be of interest to a variety of other fields, from ecology to cellular biology.
The project will begin by cataloging three built infrastructures and known interdependencies (both physical and functional) into a “network of networks” representation suitable for modeling. A key part of this research lies in also quantifying the interplay between built infrastructure and social systems. As such, the models will incorporate community-level behavioral effects through urban “ecometrics” — survey-based empirical data that capture how citizens and neighborhoods utilize city services and respond during emergencies. This realistic accounting of infrastructure and its interdependencies will be complemented by realistic estimates of future hazards that it may face. The core of the research will use network-based analytical and computational approaches to identify reduced-dimensional representations of the (high-dimensional) dynamical state of interdependent infrastructure. Examining how these resilience metrics change under stress to networks at the component level (e.g. as induced by inundation following a hurricane) will allow identification of weak points in existing interdependent infrastructure. The converse scenario–in which deliberate alterations to a network might improve resilience or hasten recovery of already-failed systems–will also be explored.
Students will be working on building a library of cache-oblivious data structures and measuring the performance under different workloads. We will first implement serial versions of the algorithms, and then implement the parallel version of several known cache oblivious data structures and algorithms. Read more.
The training plan is to bring in students(ideally in pairs of 2) who are currently sophomores/junior and have taken a Computer Systems course using C/C++. Students need not have any previous research experience, but generally will have experience using threads(e.g. pthreads) and have taken an algorithms course.
[1] (2 weeks) Students will first work through understanding the basics of Cache-Oblvious Algorithms and Data structures from: http://erikdemaine.org/papers/BRICS2002/paper.pdf
[2] (2 weeks) Students will then work through select lectures and exercises on caches from here: https://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-172-performance-engineering-of-software-systems-fall-2010/video-lectures/
[3] (1 week) Students will then learn the basics of profiling
[4] (2 weeks) Next students will implement a few data structures and algorithms, and then
[5] (4 weeks) Students will work to find good real world benchmarks, mining github repositories for benchmarks that suffer from false-sharing performance related problems.
[6] The remaining time will be writing up and polishing collected results.
The key research questions we are investigating in the Mon(IoT)r research group are:
Our methodology entails recording and analyzing all network traffic generated by a variety of IoT devices that we have acquired. We not only inspect traffic for PII in plaintext, but attempt to man-in-the-middle SSL connections to understand the contents of encrypted flows. Our analysis allows us to uncover how IoT devices are currently protecting users’ PII, and determine how easy or difficult it is to mount attacks against user privacy.
Wehe uses your device to exchange Internet traffic recorded from real, popular apps like YouTube and Spotify—effectively making it look as if you are using those apps. As a result, if an Internet service provider (ISP) tries to slow down an YouTube, Wehe would see the same behavior. We then send the same app’s Internet traffic, but replacing the content with randomized bytes, which prevents the ISPs from classifying the traffic as belonging to the app. Our hypothesis is that the randomized traffic will not cause an ISP to conduct application-specific differentiation (e.g., throttling or blocking), but the original traffic will. We repeat these tests several times to rule out noise from bad network conditions, and tell you at the end whether your ISP is giving different performance to an app’s network traffic.
Type-safe programming languages report errors when a program applies operations to data of the wrong type—e.g., a list-length operation expects a list, not a number—and they come in two flavors: dynamically typed (or untyped) languages, which catch such type errors at run time, and statically typed languages, which catch type errors at compile time before the program is ever run. Dynamically typed languages are well suited for rapid prototyping of software, while static typing becomes important as software systems grow since it offers improved maintainability, code documentation, early error detection, and support for compilation to faster code. Gradually typed languages bring together these benefits, allowing dynamically typed and statically typed code—and more generally, less precisely and more precisely typed code—to coexist and interoperate, thus allowing programmers to slowly evolve parts of their code base from less precisely typed to more precisely typed. To ensure safe interoperability, gradual languages insert runtime checks when data with a less precise type is cast to a more precise type. Gradual typing has seen high adoption in industry, in languages like TypeScript, Hack, Flow, and C#. Unfortunately, current gradually typed languages fall short in three ways. First, while normal static typing provides reasoning principles that enable safe program transformations and optimizations, naive gradual systems often do not. Second, gradual languages rarely guarantee graduality, a reasoning principle helpful to programmers, which says that making types more precise in a program merely adds in checks and the program otherwise behaves as before. Third, time and space efficiency of the runtime casts inserted by gradual languages remains a concern. This project addresses all three of these issues. The project’s novelties include: (1) a new approach to the design of gradual languages by first codifying the desired reasoning principles for the language using a program logic called Gradual Type Theory (GTT), and from that deriving the behavior of runtime casts; (2) compiling to a non-gradual compiler intermediate representation (IR) in a way that preserves these principles; and (3) the ability to use GTT to reason about the correctness of optimizations and efficient implementation of casts. The project has the potential for significant impact on industrial software development since gradually typed languages provide a migration path from existing dynamically typed codebases to more maintainable statically typed code, and from traditional static types to more precise types, providing a mechanism for increased adoption of advanced type features. The project will also have impact by providing infrastructure for future language designs and investigations into improving the performance of gradual typing.
The project team will apply the GTT approach to investigate gradual typing for polymorphism with data abstraction (parametricity), algebraic effects and handlers, and refinement/dependent types. For each, the team will develop cast calculi and program logics expressing better equational reasoning principles than previous proposals, with certified elaboration to a compiler intermediate language based on Call-By-Push-Value (CBPV) while preserving these properties, and design convenient surface languages that elaborate into them. The GTT program logics will be used for program verification, proving the correctness of program optimizations and refactorings.
This award reflects NSF’s statutory mission and has been deemed worthy of support through evaluation using the Foundation’s intellectual merit and broader impacts review criteria.
When building large software systems, programmers should be able to use the best language for each part of the system. But when a component written in one language becomes part of a multi-language system, it may interoperate with components that have features that don’t exist in the original language. This affects programmers when they refactor code (i.e., make changes that should result in equivalent behavior). Since programs interact after compilation to a common target, programmers have to understand details of linking and target-level interaction when reasoning about correctly refactoring source components. Unfortunately, there are no software toolchains available today that support single-language reasoning when components are used in a multi-language system. This project will develop principled software toolchains for building multi-language software. The project’s novelties include (1) designing language extensions that allow programmers to specify how they wish to interoperate (or link) with conceptual features absent from their language through a mechanism called linking types, and (2) developing compilers that formally guarantee that any reasoning the programmer does at source level is justified after compilation to the target. The project has the potential for tremendous impact on the software development landscape as it will allow programmers to use a language close to their problem domain and provide them with software toolchains that make it easy to compose components written in different languages into a multi-language software system.
The project will evaluate the idea of linking types by extending ML with linking types for interaction with Rust, a language with first-class control, and a normalizing language, and developing type preserving compilers to a common typed LLVM-like target language. The project will design a rich dependently typed LLVM-like target language that can encapsulate effects from different source languages to support fully abstract compilation from these languages. The project will also investigate reporting of cross-language type errors to aid programmers when composing components written in different languages.
This award reflects NSF’s statutory mission and has been deemed worthy of support through evaluation using the Foundation’s intellectual merit and broader impacts review criteria.
Modern programming languages ranging from Java to Matlab rely on just-in-time compilation techniques to achieve performance competitive with computer languages such as C or C++. What sets just-in-time compilers apart from batch compilers is that they can observe the programs actions as it executes, and inspect its state. Knowledge of the program’s state and past behavior, allows the compiler to perform speculative optimizations that improve performance. The intellectual merits of this research are to devise techniques for reasoning about the correctness of the transformations performed by just-in-time compilers. The project’s broader significance and importance are its implications to industrial practice. The results of this research will be applicable to commercial just-in-time compilers for languages such as JavaScript and R.
This project develops a general model of just-in-time compilation that subsumes deployed systems and allows systematic exploration of the design space of dynamic compilation techniques. The research questions that will be tackled in this work lie along two dimensions: Experimental—explore the design space of dynamic compilation techniques and gain an understanding of trade-offs; Foundational—formalize key ingredients of a dynamic compiler and develop techniques for reasoning about correctness in a modular fashion.
To provide open-source, interoperable, and extensible statistical software for quantitative mass spectrometry, which enables experimentalists and developers of statistical methods to rapidly respond to changes in the evolving biotechnological landscape.
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed vel justo tellus. Proin volutpat justo at arcu efficitur venenatis. Vestibulum eu pharetra nulla. Curabitur et tempus ante. Nulla id sapien id libero lacinia interdum vitae et ligula. Mauris in aliquet justo. Nam et fringilla leo. Vestibulum scelerisque ipsum mollis quam tristique, vitae consequat ante sollicitudin. Vivamus in tempus lectus, sed aliquet ante. Nullam ut diam a orci tincidunt pellentesque. Praesent at enim ut sem molestie facilisis ut ut sapien. Aenean lacinia erat sit amet tempor sagittis. Integer condimentum luctus lorem, in mattis lectus ullamcorper at. Curabitur eros magna, vulputate id faucibus nec, cursus sit amet odio.
Fusce tristique enim ut turpis consequat, eget porta nisl fringilla. Pellentesque quis tristique ipsum, ut eleifend odio. Sed eget velit magna. Vivamus nec metus sit amet mi sodales cursus et a purus. Duis turpis arcu, fringilla fringilla tincidunt eu, scelerisque vitae risus. Nulla facilisi. Integer at dui volutpat, ornare urna non, malesuada ante. Vestibulum eget purus ac tortor tempus interdum nec ac dolor. Vivamus sed eros eleifend, ornare mi sed, mollis tortor. Donec varius id justo id sollicitudin. Maecenas ut volutpat mi, ut vehicula purus. Nullam eu consequat tellus, ac fermentum felis. Morbi euismod risus ut risus consectetur, a efficitur orci varius. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Mauris ut turpis vel libero finibus interdum eu eget dui. Quisque scelerisque aliquam quam vitae rhoncus.
Ut semper iaculis ante non pretium. Ut sed ligula nibh. Duis sollicitudin, arcu quis mattis posuere, odio mauris rutrum sapien, id cursus nibh tellus in nisi. Cras consequat finibus metus, nec gravida metus varius id. Nulla orci libero, viverra sit amet sollicitudin sit amet, lobortis at mauris. Phasellus ac felis pellentesque, gravida neque ut, dapibus leo. Cras a ex purus. Sed rutrum pretium lacus et aliquam. Curabitur finibus ante non nisl pellentesque, sed rutrum risus hendrerit. Nunc elementum hendrerit nisl vel bibendum. Maecenas auctor lacus id orci condimentum placerat. Ut imperdiet condimentum nulla, non elementum dolor gravida in. Curabitur nec ligula nec sem tincidunt aliquet. Nulla consectetur consectetur viverra. Phasellus scelerisque gravida pharetra.
Nam varius vestibulum metus sit amet porttitor. Nunc a bibendum nunc. In vel laoreet enim. Mauris venenatis nisl lectus, ac tincidunt diam tincidunt ac. Aliquam pellentesque finibus purus, ac suscipit est. Pellentesque at ligula eleifend, varius libero eget, finibus diam. Etiam pulvinar aliquet lectus, vitae condimentum felis pharetra sit amet. Donec neque ligula, interdum ac est vel, lacinia mollis erat. Quisque commodo nisi ipsum, et sollicitudin quam imperdiet et. Curabitur interdum consequat varius. Sed auctor mattis varius. Nam sodales tortor ex, at tempor diam tincidunt eleifend. Aliquam ullamcorper efficitur mauris ac tincidunt. Sed eu elementum nunc. Aliquam auctor varius lacus eu aliquet.
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Aliquam at nulla vitae ipsum convallis euismod sit amet nec arcu. Ut porttitor ex ipsum, a consequat elit imperdiet non. Maecenas velit lacus, semper at elementum sit amet, semper non odio. Aenean congue justo ac leo porta sollicitudin. Ut et diam in elit suscipit semper. Nullam risus neque, elementum vel sapien a, sollicitudin sollicitudin lectus. Praesent ac ipsum ullamcorper leo rutrum egestas ut eget ex. Quisque sed erat ipsum. Integer a congue ante, nec egestas ligula. Vestibulum molestie risus at mi mattis malesuada. Nulla non dolor non justo fermentum feugiat. Etiam ante tellus, mollis vel urna sed, vestibulum facilisis erat. Proin leo purus, laoreet a arcu sed, pellentesque fringilla nibh.
Interdum et malesuada fames ac ante ipsum primis in faucibus. Proin libero sem, lobortis in orci varius, aliquet hendrerit metus. Donec odio lectus, eleifend eget posuere id, sagittis sit amet erat. Proin vulputate ipsum lectus, a egestas magna dapibus id. Fusce rutrum viverra consequat. Integer eget nisi ultrices, auctor tellus non, convallis arcu. Phasellus gravida leo at pellentesque dapibus. Pellentesque nisl diam, tempus ut nisl at, viverra gravida magna. Ut interdum at velit convallis dapibus. Mauris turpis ligula, pulvinar in eros in, faucibus faucibus metus.
Maecenas laoreet porta cursus. In vulputate elementum ex vel venenatis. Aenean accumsan et neque non porttitor. Nullam eget porttitor elit, id convallis leo. Integer ornare cursus nisi, ac vestibulum ligula. Quisque dignissim quam eu turpis volutpat, quis dapibus ante sagittis. Nunc scelerisque quis lacus ac facilisis. Ut porta rhoncus molestie.
This work investigates new models of cloud computing that combine domain-targeted languages with scalable data processing, sharing, and management abstractions within a distributed service platform that “scales” programmer productivity.
As the cost of computing and communication resources has plummeted, applications have become data-centric with data products growing explosively in both number and size. Although accessing such data using the compute power necessary for its analysis and processing is cheap and readily available via cloud computing (intuitive, utility-style access to vast resource pools), doing so currently requires significant expertise, experience, and time (for customization, configuration, deployment, etc). This work investigates new models of cloud computing that combine domain-targeted languages with scalable data processing, sharing, and management abstractions within a distributed service platform that “scales” programmer productivity. To enable this, this research explores new programming language, runtime, and distributed systems techniques and technologies that integrate the R programming language environment with open source cloud platform-as-a-service (PaaS) in ways that simplify processing massive datasets, sharing datasets across applications and users, and tracking and enforcing data provenance. The PIs’ plans for research, outreach, integrated curricula, and open source release of research artifacts have the potential for making cloud computing more accessible to a much wider range of users: The data analytics community who use the R statistical analysis environment to apply their techniques and algorithms to important problems in areas such as biology, chemistry, physics, political science and finance, by enabling them to use cloud resources transparently for their analyses, and to share their scientific data/results in a way that enables others to reproduce and verify them.
The Applied Machine Learning Group is working with researchers from Harvard Medical School to predict outcomes for multiple sclerosis patients. A focus of the research is how best to interact with physicians to use both human expertise and machine learning methods.
Many of the truly difficult problems limiting advances in contemporary science are rooted in our limited understanding of how complex systems are controlled. Indeed, in human cells millions of molecules are embedded in a complex genetic network that lacks an obvious controller; in society billions of individuals interact with each other through intricate trust-family-friendship-professional-association based networks apparently controlled by no one; economic change is driven by what economists call the “invisible hand of the market”, reflecting a lack of understanding of the control principles that govern the interactions between individuals, companies, banks and regulatory agencies.
These and many other examples raise several fundamental questions: What are the control principles of complex systems? How do complex systems organize themselves to achieve sufficient control to ensure functionality? This proposal is motivated by the hypothesis that the architecture of many complex systems is driven by the system’s need to achieve sufficient control to maintain its basic functions. Hence uncovering the control principles of complex self-organized systems can help us understand the fundamental laws that govern them.
The PI’s goal in this project is to revolutionize media-assisted oral presentations in general, and STEM presentations in particular, through the use of an intelligent, autonomous, life-sized, animated co-presenter agent that collaborates with a human presenter in preparing and delivering his or her talk in front of a live audience.
Although journal and conference articles are recognized as the most formal and enduring forms of scientific communication, oral presentations are central to science because they are the means by which researchers, practitioners, the media, and the public hear about the latest findings thereby becoming engaged and inspired, and where scientific reputations are made. Yet despite decades of technological advances in computing and communication media, the fundamentals of oral scientific presentations have not advanced since software such as Microsoft’s PowerPoint was introduced in the 1980’s. The PI’s goal in this project is to revolutionize media-assisted oral presentations in general, and STEM presentations in particular, through the use of an intelligent, autonomous, life-sized, animated co-presenter agent that collaborates with a human presenter in preparing and delivering his or her talk in front of a live audience. The PI’s pilot studies have demonstrated that audiences are receptive to this concept, and that the technology is especially effective for individuals who are non-native speakers of English (which may be up to 21% of the population of the United States). Project outcomes will be initially deployed and evaluated in higher education, both as a teaching tool for delivering STEM lectures and as a training tool for students in the sciences to learn how to give more effective oral presentations (which may inspire future generations to engage in careers in the sciences).
This research will be based on a theory of human-agent collaboration, in which the human presenter is monitored using real-time speech and gesture recognition, audience feedback is also monitored, and the agent, presentation media, and human presenter (cued via an intelligent wearable teleprompter) are all dynamically choreographed to maximize audience engagement, communication, and persuasion. The project will make fundamental, theoretical contributions to models of real-time human-agent collaboration and communication. It will explore how humans and agents can work together to communicate effectively with a heterogeneous audience using speech, gesture, and a variety of presentation media, amplifying the abilities of scientist-orators who would otherwise be “flying solo.” The work will advance both artificial intelligence and computational linguistics, by extending dialogue systems to encompass mixed-initiative, multi-party conversations among co-presenters and their audience. It will impact the state of the art in virtual agents, by advancing the dynamic generation of hand gestures, prosody, and proxemics for effective public speaking and turn-taking. And it will also contribute to the field of human-computer interaction, by developing new methods for human presenters to interact with autonomous co-presenter agents and their presentation media, including approaches to cueing human presenters effectively using wearable user interfaces.
Northeastern University is a Center of Academic Excellence in Information Assurance Education and Research. It is also one of the four schools recently designated by the National Security Agency as a Center of Academic Excellence in Cyber Operations. Northeastern has produced 20 SFS students over the past 3 years. All of the graduates are placed in positions within the Federal Government and Federally Funded Research and Development Centers. One of the unique elements of the program is the diversity of students in the program.
ABSTRACT
Northeastern University is a Center of Academic Excellence in Information Assurance Education and Research. It is also one of the four schools recently designated by the National Security Agency as a Center of Academic Excellence in Cyber Operations. Northeastern has produced 20 SFS students over the past 3 years. All of the graduates are placed in positions within the Federal Government and Federally Funded Research and Development Centers. One of the unique elements of the program is the diversity of students in the program. Of the 20 students, 5 are in Computer Science, 6 are in Electrical and Computer Engineering, and 9 are in Information Assurance. These students come with different backgrounds that vary from political science and criminal justice to computer science and engineering. The University, with its nationally-recognized Cooperative Education, is well-positioned to attract and educate strong students in cybersecurity.
The SFS program at Northeastern succeeds in recruiting a diverse group of under-represented students to the program, and is committed to sustaining this level of diversity in future recruiting. Northeastern University is also reaching out to the broader community by leading Capture-the-Flag and Collegiate Cyber Defense competitions, and by actively participating in the New England Advanced Cyber Security Center, an organization composed of academia, industry, and government entities.
PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
Sun E. and Kaeli D.. “Aggressive Value Prediction on a GPU,” Journal of Parallel Processing, 2012, p. 1-19.
Azmandian F., Dy. J. G., Aslam J.A., Kaeli D.. “Local Kernel Density Ratio-Based Feature Selection for Outlier Detection,” Journal of Machine Learning Research, v.25, 2012, p. 49-64.
This is a study of the structure and dynamics of Internet-based collaboration. The project seeks groundbreaking insights into how multidimensional network configurations shape the success of value-creation processes within crowdsourcing systems and online communities. The research also offers new computational social science approaches to theorizing and researching the roles of social structure and influence within technology-mediated communication and cooperation processes.
This is a study of the structure and dynamics of Internet-based collaboration. The project seeks groundbreaking insights into how multidimensional network configurations shape the success of value-creation processes within crowdsourcing systems and online communities. The research also offers new computational social science approaches to theorizing and researching the roles of social structure and influence within technology-mediated communication and cooperation processes. The findings will inform decisions of leaders interested in optimizing all forms of collaboration in fields such as open-source software development, academic projects, and business. System designers will be able to identify interpersonal dynamics and develop new features for opinion aggregation and effective collaboration. In addition, the research will inform managers on how best to use crowdsourcing solutions to support innovation and marketing strategies including peer-to-peer marketing to translate activity within online communities into sales.
This research will analyze digital trace data that enable studies of population-level human interaction on an unprecedented scale. Understanding such interaction is crucial for anticipating impacts in our social, economic, and political lives as well as for system design. One site of such interaction is crowdsourcing systems – socio-technical systems through which online communities comprised of diverse and distributed individuals dynamically coordinate work and relationships. Many crowdsourcing systems not only generate creative content but also contain a rich community of collaboration and evaluation in which creators and adopters of creative content interact among themselves and with artifacts through overlapping relationships such as affiliation, communication, affinity, and purchasing. These relationships constitute multidimensional networks and create structures at multiple levels. Empirical studies have yet to examine how multidimensional networks in crowdsourcing enable effective large-scale collaboration. The data derive from two distinctly different sources, thus providing opportunities for comparison across a range of online creation-oriented communities. One is a crowdsourcing platform and ecommerce website for creative garment design, and the other is a platform for participants to create innovative designs based on scrap materials. This project will analyze both online community activity and offline purchasing behavior. The data provide a unique opportunity to understand overlapping structures of social interaction driving peer influence and opinion formation as well as the offline economic consequences of this online activity. This study contributes to the literature by (1) analyzing multidimensional network structures of interpersonal and socio-technical interactions within these socio-technical systems, (2) modeling how success feeds back into value-creation processes and facilitates learning, and (3) developing methods to predict the economic success of creative products generated in these contexts. The application and integration of various computational and statistical approaches will provide significant dividends to the broader scientific research community by contributing to the development of technical resources that can be extended to other forms of data-intensive inquiry. This includes documentation about best practices for integrating methods for classification and prediction; courses to train students to perform large-scale data analysis; and developing new theoretical approaches for understanding the multidimensional foundations of cyber-human systems.
Currently, there are no automated tools that have the capacity to perform tracing tasks on the scale of mammalian neural circuits. Needless to say, the existence of such a tool is critical both for basic mapping of synaptic connectivity in normal brains, as well as for describing the changes in the nervous system which underlie neurological disorders. With this proposal we plan to continue the development of Neural Circuit Tracer – software for accurate, automated reconstruction of the structure and dynamics of neurites from 3D light microscopy stacks of images.
Our understanding of brain functions is hindered by the lack of detailed knowledge of synaptic connectivity in the underlying neural network. While synaptic connectivity of small neural circuits can be determined with electron microscopy, studies of connectivity on a larger scale, e.g. whole mouse brain, must be based on light microscopy imaging. It is now possible to fluorescently label subsets of neurons in vivo and image their axonal and dendritic arbors in 3D from multiple brain tissue sections. The overwhelming remaining challenge is neurite tracing, which must be done automatically due to the high-throughput nature of the problem. Currently, there are no automated tools that have the capacity to perform tracing tasks on the scale of mammalian neural circuits. Needless to say, the existence of such a tool is critical both for basic mapping of synaptic connectivity in normal brains, as well as for describing the changes in the nervous system which underlie neurological disorders. With this proposal we plan to continue the development of Neural Circuit Tracer – software for accurate, automated reconstruction of the structure and dynamics of neurites from 3D light microscopy stacks of images. Our goal is to revolutionize the existing functionalities of the software, making it possible to: (i) automatically reconstruct axonal and dendritic arbors of sparsely labeled populations of neurons from multiple stacks of images and (ii) automatically track and quantify changes in the structures of presynaptic boutons and dendritic spines imaged over time. We propose to utilize the latest machine learning and image processing techniques to develop multi-stack tracing, feature detection, and computer-guided trace editing capabilities of the software. All tools and datasets created as part of this proposal will be made available to the research community.
Public Health Relevance
At present, accurate methods of analysis of neuron morphology and synaptic connectivity rely on manual or semi-automated tracing tools. Such methods are time consuming, can be prone to errors, and do not scale up to the level of large brain-mapping projects. Thus, it is proposed to develop open-source software for accurate, automated reconstruction of structure and dynamics of large neural circuits.
This project will develop new research methods to map and quantify the ways in which online search engines, social networks, and e-commerce sites use sophisticated algorithms to tailor content to each individual user.
ABSTRACT
This project will develop new research methods to map and quantify the ways in which online search engines, social networks, and e-commerce sites use sophisticated algorithms to tailor content to each individual user. This “personalization” may often be of value to the user, but it also has the potential to distort search results and manipulate the perceptions and behavior of the user. Given the popularity of personalization across a variety of Web-based services, this research has the potential for extremely broad impact. Being able to quantify the extent to which Web-based services are personalized will lead to greater transparency for users, and the development of tools to identify personalized content will allow users to access information that may be hard to access today.
Personalization is now a ubiquitous feature on many Web-based services. In many cases, personalization provides advantages for users because personalization algorithms are likely to return results that are relevant to the user. At the same time, the increasing levels of personalization in Web search and other systems are leading to growing concerns over the Filter Bubble effect, where users are only given results that the personalization algorithm thinks they want, while other important information remains inaccessible. From a computer science perspective, personalization is simply a tool that is applied to information retrieval and ranking problems. However, sociologists, philosophers, and political scientists argue that personalization can result in inadvertent censorship and “echo chambers.” Similarly, economists warn that unscrupulous companies can leverage personalization to steer users towards higher-priced products, or even implement price discrimination, charging different users different prices for the same item. As the pervasiveness of personalization on the Web grows, it is clear that techniques must be developed to understand and quantify personalization across a variety of Web services.
This research has four primary thrusts: (1) To develop methodologies to measure personalization of mobile content. The increasing popularity of browsing the Web from mobile devices presents new challenges, as these devices have access to sensitive content like the user’s geolocation and contacts. (2) To develop systems and techniques for accurately measuring the prevalence of several personalization trends on a large number of e-commerce sites. Recent anecdotal evidence has shown instances of problematic sales tactics, including price steering and price discrimination. (3) To develop techniques to identify and quantify personalized political content. (4) To measure the extent to which financial and health information is personalized based on location and socio-economic status. All four of these thrusts will develop new research methodologies that may prove effective in other areas of research as well.
PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH
David Lazer, Ryan Kennedy, Gary King, Alessandro Vespignani. “The Parable of Google Flu: Traps in Big Data Analysis,” Science, v.343, 2014, p. 1203.
This project is using cloud computing to re-architect web-based services in order to enable end users to regain privacy and control over their data. In this approach—a confederated architecture—each user provides the computing resources necessary to support her use of the service via cloud providers.
Users today have access to a broad range of free, web-based social services. All of these services operate under a similar model: Users entrust the service provider with their personal information and content, and in return, the service provider makes their service available for free by monetizing the user-provided information and selling the results to third parties (e.g., advertisers). In essence, users pay for these services by providing their data (i.e., giving up their privacy) to the provider.
This project is using cloud computing to re-architect web-based services in order to enable end users to regain privacy and control over their data. In this approach—a confederated architecture—each user provides the computing resources necessary to support her use of the service via cloud providers. All user data is encrypted and not exposed to any third-parties, users retain control over their information, and users access the service via a web browser as normal.
The incredible popularity of today’s web-based services has lead to significant concerns over privacy and user control over data. Addressing these concerns requires a re-thinking of the current popular web-based business models, and, unfortunately, existing providers are dis-incentivized from doing so. The impact of this project will potentially be felt by the millions of users who use today’s popular services, who will be provided with an alternative to the business models of today.
The thesis of this project is that one can obtain the best of both worlds — performance evaluations with the quality of experts but at the cost of crowd workers — by optimally leveraging both experts and crowd workers in asking the “right” assessor the “right” question at the “right” time.
Evaluating the performance of information retrieval systems such as search engines is critical to their effective development. Current “gold standard” performance evaluation methodologies generally rely on the use of expert assessors to judge the quality of documents or web pages retrieved by search engines, at great cost in time and expense. The advent of “crowd sourcing,” such as available through Amazon’s Mechanical Turk service, holds out the promise that these performance evaluations can be performed more rapidly and at far less cost through the use of many (though generally less skilled) “crowd workers”; however, the quality of the resulting performance evaluations generally suffer greatly. The thesis of this project is that one can obtain the best of both worlds — performance evaluations with the quality of experts but at the cost of crowd workers — by optimally leveraging both experts and crowd workers in asking the “right” assessor the “right” question at the “right” time. For example, one might ask inexpensive crowd workers what are likely to be “easy” questions while reserving what are likely to be “hard” questions for the expensive experts. While the project focuses on the performance evaluation of search engines as its use case, the techniques developed will be more broadly applicable to many domains where one wishes to efficiently and effectively harness experts and crowd workers with disparate levels of cost and expertise.
To enable the vision described above, a probabilistic framework will be developed within which one can quantify the uncertainty about a performance evaluation as well as the cost and expected utility of asking any assessor (expert or crowd worker) any question (e.g. a nominal judgment for a document or a preference judgment between two documents) at any time. The goal is then to ask the “right” question of the “right” assessor at any time in order to maximize the expected utility gained per unit cost incurred and then to optimally aggregate such responses in order to efficiently and effectively evaluate performance.
This project seeks to demonstrate how to build realistic yet secure compilers. This is a notoriously difficult problem. On one hand, a secure compiler must ensure that low-level contexts cannot launch any “attacks” on the compiled component that would have been impossible to launch in the high-level language. On the other hand, a realistic compiler cannot simply limit the expressiveness of the low-level target language to achieve the security goal.
Advanced programming languages, based on dependent types, enable program verification alongside program development, thus making them an ideal tool for building fully verified, high assurance software. Recent dependently typed languages that permit reasoning about state and effects—such as Hoare Type Theory (HTT) and Microsoft’s F*—are particularly promising and have been used to verify a range of rich security policies, from state-dependent information flow and access control to conditional declassification and information erasure. But while these languages provide the means to verify security and correctness of high-level source programs, what is ultimately needed is a guarantee that the same properties hold of compiled low-level target code. Unfortunately, even when compilers for such advanced languages exist, they come with no formal guarantee of correct compilation, let alone any guarantee of secure compilation—i.e., that compiled components will remain as secure as their high-level counterparts when executed within arbitrary low-level contexts. This project seeks to demonstrate how to build realistic yet secure compilers. This is a notoriously difficult problem. On one hand, a secure compiler must ensure that low-level contexts cannot launch any “attacks” on the compiled component that would have been impossible to launch in the high-level language. On the other hand, a realistic compiler cannot simply limit the expressiveness of the low-level target language to achieve the security goal.
The intellectual merit of this project is the development of a powerful new proof architecture for realistic yet secure compilation of dependently typed languages that relies on contracts to ensure that target-level contexts respect source-level security guarantees and leverages these contracts in a formal model of how source and target code may interoperate. The broader impact is that this research will make it possible to compose high-assurance software components into high-assurance software systems, regardless of whether the components are developed in a high-level programming language or directly in assembly. Compositionality has been a long-standing open problem for certifying systems for high-assurance. Hence, this research has potential for enormous impact on how high-assurance systems are built and certified. The specific goal of the project is to develop a verified multi-pass compiler from Hoare Type Theory to assembly that is type preserving, correct, and secure. The compiler will include passes that perform closure conversion, heap allocation, and code generation. To prove correct compilation of components, not just whole programs, this work will use an approach based on defining a formal semantics of interoperability between source components and target code. To guarantee secure compilation, the project will use (static) contract checking to ensure that compiled code is only run in target contexts that respect source-level security guarantees. To carry out proofs of compiler correctness, the project will develop a logical relations proof method for Hoare Type Theory.
The intellectual merit of this project is the development of a proof architecture for building verified compilers for today’s world of multi-language software: such verified compilers guarantee correct compilation of components and support linking with arbitrary target code, no matter its source.
Compilers play a critical role in the production of software. As such, they should be correct. That is, they should preserve the behavior of all programs they compile. Despite remarkable progress on formally verified compilers in recent years, these compilers suffer from a serious limitation: they are proved correct under the assumption that they will only be used to compile whole programs. This is an entirely unrealistic assumption since most software systems today are comprised of components written in different languages compiled by different compilers to a common low-level target language. The intellectual merit of this project is the development of a proof architecture for building verified compilers for today’s world of multi-language software: such verified compilers guarantee correct compilation of components and support linking with arbitrary target code, no matter its source. The project’s broader significance and importance are that verified compilation of components stands to benefit practically every software system, from safety-critical software to web browsers, because such systems use libraries or components that are written in a variety of languages. The project will achieve broad impact through the development of (i) a proof methodology that scales to realistic multi-pass compilers and multi-language software, (ii) a target language that extends LLVM—increasingly the target of choice for modern compilers—with support for compilation from type-safe source languages, and (iii) educational materials related to the proof techniques employed in the course of this project.
The project has two central themes, both of which stem from a view of compiler correctness as a language interoperability problem. First, specification of correctness of component compilation demands a formal semantics of interoperability between the source and target languages. More precisely: if a source component (say s) compiles to target component (say t), then t linked with some arbitrary target code (say t’) should behave the same as s interoperating with t’. Second, enabling safe interoperability between components compiled from languages as different as Java, Rust, Python, and C, requires the design of a gradually type-safe target language based on LLVM that supports safe interoperability between more precisely typed, less precisely typed, and type-unsafe components.
This project will support a plugin architecture for transparent checkpoint-restart.
Society’s increasingly complex cyberinfrastructure creates a concern for software robustness and reliability. Yet, this same complex infrastructure is threatening the continued use of fault tolerance. Consider when a single application or hardware device crashes. Today, in order to resume that application from the point where it crashed, one must also consider the complex subsystem to which it belongs. While in the past, many developers would write application-specific code to support fault tolerance for a single application, this strategy is no longer feasible when restarting the many inter-connected applications of a complex subsystem. This project will support a plugin architecture for transparent checkpoint-restart. Transparency implies that the software developer does not need to write any application-specific code. The plugin architecture implies that each software developer writes the necessary plugins only once. Each plugin takes responsibility for resuming any interrupted sessions for just one particular component. At a higher level, the checkpoint-restart system employs an ensemble of autonomous plugins operating on all of the applications of a complex subsystem, without any need for application-specific code.
The plugin architecture is part of a more general approach called process virtualization, in which all subsystems external to a process are virtualized. It will be built on top of the DMTCP checkpoint-restart system. One simple example of process virtualization is virtualization of ids. A plugin maintains a virtualization table and arranges for the application code of the process to see only virtual ids, while the outside world sees the real id. Any system calls and library calls using this real id are extended to translate between real and virtual id. On restart, the real ids are updated with the latest value, and the process memory remains unmodified, since it contains only virtual ids. Other techniques employing process virtualization include shadow device drivers, record-replay logs, and protocol virtualization. Some targets of the research include transparent checkpoint-restart support for the InfiniBand network, for programmable GPUs (including shaders), for networks of virtual machines, for big data systems such as Hadoop, and for mobile computing platforms such as Android.
PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH
Kapil Arya and Gene Cooperman. “DMTCP: Bringing Interactive Checkpoint?Restart to Python,” Computational Science & Discovery, v.8, 2015, p. 16 pages. doi:10.1088/issn.1749-4699
Jiajun Cao, Matthieu Simoni, Gene Cooperman,
and Christine Morin. “Checkpointing as a Service in Heterogeneous Cloud Environments,” Proc. of 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid’15),, 2015, p. 61–70. doi:10.1109/CCGrid.2015.160
This project will focus on the development of the REDEX tool, a lightweight domain-specific tool for modeling programming languages useful for software development. Originally developed as an in-house tool for a small group of collaborating researchers, REDEX escaped the laboratory several years ago and acquired a dedicated user community; new users now wish to use it for larger and more complicated programming languages than originally envisioned. Using this framework, a programmer articulates a programming language model directly as a software artifact with just a little more effort than paper-and-pencil models. Next, the user invokes diagnostic tools to test a model’s consistency, explore its properties, and check general claims about it.
This award funds several significant improvements to REDEX: (1) a modular system that allows its users to divide up the work, (2) scalable performance so that researchers can deal with large models, and (3) improvements to its testing and error-detection system. The award also includes support for the education of REDEX’s quickly growing user community, e.g., support for organizing tutorials and workshops.
This project addresses an urgent, emergent need at the intersection of software maintenance and programming language research. Over the past 20 years, working software engineers have embraced so-called scripting languages for a variety of tasks. Software engineers choose these languages because they make prototyping easy, and before the engineers realize it, these prototypes evolve into large, working systems and escape into the real world. Like all software, these systems need to be maintained—mistakes must be fixed, their performance requires improvement, security gaps call for fixes, their functionality needs to be enhanced—but scripting languages render maintenance difficult. The intellectual merits of this project are to address all aspects of this real-world software engineering problem.
The “Gradual Typing Across the Spectrum” project addresses an urgent, emergent need at the intersection of software maintenance and programming language research. Over the past 20 years, working software engineers have embraced so-called scripting languages for a variety of tasks. They routinely use JavaScript for interactive web pages, Ruby on Rails for server-side software, Python for data science, and so on. Software engineers choose these languages because they make prototyping easy, and before the engineers realize it, these prototypes evolve into large, working systems and escape into the real world. Like all software, these systems need to be maintained—mistakes must be fixed, their performance requires improvement, security gaps call for fixes, their functionality needs to be enhanced—but scripting languages render maintenance difficult. The intellectual merits of this project are to address all aspects of this real-world software engineering problem. In turn, the project’s broader significance and importance are the deployment of new technologies to assist the programmer who maintains code in scripting languages, the creation of novel technologies that preserve the advantages of these scripting frameworks, and the development of curricular materials that prepares the next generation of students for working within these frameworks.
A few years ago, the PIs launched programming language research efforts to address this problem. They diagnosed the lack of sound types in scripting languages as one of the major factors. With types in conventional programming languages, programmers concisely communicate design information to future maintenance workers; soundness ensures the types are consistent with the rest of the program. In response, the PIs explored the idea of gradual typing, that is, the creation of a typed sister language (one per scripting language) so that (maintenance) programmers can incrementally equip systems with type annotations. Unfortunately, these efforts have diverged over the years and would benefit from systematic cross-pollination.
With support from this grant, the PIs will systematically explore the spectrum of their gradual typing system with a three-pronged effort. First, they will investigate how to replicate results from one project in another. Second, they will jointly develop an evaluation framework for gradual typing projects with the goal of diagnosing gaps in the efforts and needs for additional research. Third, they will explore the creation of new scripting languages that benefit from the insights of gradual typing research.
This research will leverage the sensing capabilities of the TDS system and PI Patel’s expertise in spoken interaction technologies for individuals with speech impairment, as well as Co-PI Fu’s work on machine learning and multimodal data fusion, to develop a prototype clinically viable tool for enhancing speech clarity by coupling lingual-kinematic and acoustic data.
Speech is a complex and intricately timed task that requires the coordination of numerous muscle groups and physiological systems. While most children acquire speech with relative ease, it is one of the most complex patterned movements accomplished by humans and thus susceptible to impairment. Approximately 2% of Americans have imprecise speech either due to mislearning during development (articulation disorder) or as a result of neuromotor conditions such as stroke, brain injury, Parkinson’s disease, cerebral palsy, etc. An equally sizeable group of Americans have difficulty with English pronunciation because it is their second language. Both of these user groups would benefit from tools that provide explicit feedback on speech production clarity. Traditional speech remediation relies on viewing a trained clinician’s accurate articulation and repeated practice with visual feedback via a mirror. While these interventions are effective for readily viewable speech sounds (visemes such as /b/p/m/), they are largely unsuccessful for sounds produced inside the mouth. The tongue is the primary articulator for these obstructed sounds and its movements are difficult to capture. Thus, clinicians use diagrams and other low-tech means (such as placing edible substances on the palate or physically manipulating the oral articulators) to show clients where to place their tongue. While sophisticated research tools exist for measuring and tracking tongue movements during speech, they are prohibitively expensive, obtrusive, and impractical for clinical and/or home use. The PIs’ goal in this exploratory project, which represents a collaboration across two institutions, is to lay the groundwork for a Lingual-Kinematic and Acoustic sensor technology (LinKa) that is lightweight, low-cost, wireless and easy to deploy both clinically and at home for speech remediation.
PI Ghovanloo’s lab has developed a low-cost, wireless, and wearable magnetic sensing system, known as the Tongue Drive System (TDS). An array of electromagnetic sensors embedded within a headset detects the position of a small magnet that is adhered to the tongue. Clinical trials have demonstrated the feasibility of using the TDS for computer access and wheelchair control by sensing tongue movements in up to 6 discrete locations within the oral cavity. This research will leverage the sensing capabilities of the TDS system and PI Patel’s expertise in spoken interaction technologies for individuals with speech impairment, as well as Co-PI Fu’s work on machine learning and multimodal data fusion, to develop a prototype clinically viable tool for enhancing speech clarity by coupling lingual-kinematic and acoustic data. To this end, the team will extend the TDS to track tongue movements during running speech, which are quick, compacted within a small area of the oral cavity, and often overlap for several phonemes, so the challenge will be to accurately classify movements for different sound classes. To complement this effort, pattern recognition of sensor spatiotemporal dynamics will be embedded into an interactive game to offer a motivating, personalized context for speech motor (re)learning by enabling audiovisual biofeedback, which is critical for speech modification. To benchmark the feasibility of the approach, the system will be evaluated on six individuals with neuromotor speech impairment and six healthy age-matched controls.
The goal of the Dialog project is to create channels of communication between these translation processes and software engineers, with the expectation that the latter can use this new source of information to improve the speed, size, or energy consumption of their software.
The “Compiler Coaching” (Dialog) project represents an investment in programming language tools and technology. Software engineers use high-level programming languages on a daily basis to produce the apps and applications that everyone uses and that control everybody’s lives. Once a programming language translator accepts a program as grammatically correct, it creates impenetrable computer codes without informing the programmer how well (fast or slow, small or large, energy hogging or efficient) these codes will work. Indeed, modern programming languages employ increasingly sophisticated translation techniques and have become obscure black boxes to the working engineer. The goal of the Dialog project is to create channels of communication between these translation processes and software engineers, with the expectation that the latter can use this new source of information to improve the speed, size, or energy consumption of their software.
The PIs will explore the Dialog idea in two optimizing compiler settings, one on the conventional side and one on the modern one: for the Racket language, a teaching and research vehicle that they can modify as needed to create the desired channel, and the JavaScript programming language, the standardized tool for existing Web applications. The intellectual merits concern the fundamental principles of creating such communication channels and frameworks for gathering empirical evidence on how these channels benefit the working software engineer. These results should enable the developers of any programming language to implement similar channels of communication to help their clients. The broader impacts are twofold. On one hand, the project is likely to positively impact the lives of working software engineers as industrial programming language creators adapt the Dialog idea. On the other hand, the project will contribute to a two-decades old, open-source programming language project with a large and longstanding history of educational outreach at multiple levels. The project has influenced hundreds of thousands of high school students in the past and is likely to do so in the future.
Prior academic work has examined how to automatically discover vulnerabilities in binary software, and even how to automatically craft exploits for these vulnerabilities, the ability to answer basic security-relevant questions about closed-source software remains elusive. This project aims to provide algorithms and tools for answering these questions.
Software, including common examples such as commercial applications or embedded device firmware, is often delivered as closed-source binaries. While prior academic work has examined how to automatically discover vulnerabilities in binary software, and even how to automatically craft exploits for these vulnerabilities, the ability to answer basic security-relevant questions about closed-source software remains elusive.
This project aims to provide algorithms and tools for answering these questions. Leveraging prior work on emulator-based dynamic analyses, we propose techniques for scaling this high-fidelity analysis to capture and extract whole-system behavior in the context of embedded device firmware and closed-source applications. Using a combination of dynamic execution traces collected from this analysis platform and binary code analysis techniques, we propose techniques for automated structural analysis of binary program artifacts, decomposing system and user-level programs into logical modules through inference of high-level semantic behavior. This decomposition provides as output an automatically learned description of the interfaces and information flows between each module at a sub-program granularity. Specific activities include: (a) developing software-guided whole-system emulator for supporting sophisticated dynamic analyses for real embedded systems; (b) developing advanced, automated techniques for structurally decomposing closed-source software into its constituent modules; (c) developing automated techniques for producing high-level summaries of whole system executions and software components; and (d) developing techniques for automating the reverse engineering and fuzz testing of encrypted network protocols. The research proposed herein will have a significant impact outside of the security research community. We will incorporate the research findings of our program into our undergraduate and graduate teaching curricula, as well as in extracurricular educational efforts such as Capture-the-Flag that have broad outreach in the greater Boston and Atlanta metropolitan areas.
The close ties to industry that the collective PIs possess will facilitate transitioning the research into practical defensive tools that can be deployed into real-world systems and networks.
This project studies the design of highly robust networked systems that are resilient to extreme failures and rapid dynamics, and provide optimal performance under a wide spectrum of scenarios with varying levels of predictability.
Modern information networks are composed of heterogeneous nodes and links, whose capacities and capabilities change unexpectedly due to mobility, failures, maintenance, and adversarial attacks. User demands and critical infrastructure needs, however, require that basic primitives including access to information and services be always efficient and reliable. This project studies the design of highly robust networked systems that are resilient to extreme failures and rapid dynamics, and provide optimal performance under a wide spectrum of scenarios with varying levels of predictability.
The focus of this project will be on two problem domains, which together address adversarial network dynamics and stochastic network failures. The first component is a comprehensive theory of information spreading in dynamic networks. The PI will develop an algorithmic toolkit for dynamic networks, including local gossip-style protocols, network coding, random walks, and other diffusion processes. The second component of the project concerns failure-aware network algorithms that provide high availability in the presence of unexpected and correlated failures. The PI will study failure-aware placement of critical resources, and develop flow and cut algorithms under stochastic failures using techniques from chance-constrained optimization. Algorithms tolerant to adversarial and stochastic uncertainty will play a critical role in large-scale heterogeneous information networks of the future. Broader impacts include student training and curriculum development.
The goal of this project is to study the foundations of policy design for controlling epidemics, using a broad class of epidemic games on complex networks involving uncertainty in network information, temporal evolution and learning.
The control of epidemics, broadly defined to range from human diseases such as influenza and smallpox to malware in communication networks, relies crucially on interventions such as vaccinations and anti-virals (in human diseases) or software patches (for malware). These interventions are almost always voluntary directives from public agencies; however, people do not always adhere to such recommendations, and make individual decisions based on their specific “self interest”. Additionally, people alter their contacts dynamically, and these behavioral changes have a huge impact on the dynamics and the effectiveness of these interventions, so that “good” intervention strategies might, in fact, be ineffective, depending upon the individual response.
The goal of this project is to study the foundations of policy design for controlling epidemics, using a broad class of epidemic games on complex networks involving uncertainty in network information, temporal evolution and learning. Models will be proposed to capture the complexity of static and temporal interactions and patterns of information exchange, including the possibility of failed interventions and the potential for moral hazard. The project will also study specific policies posed by public agencies and network security providers for controlling the spread of epidemics and malware, and will develop resource constrained mechanisms to implement them in this framework.
This project will integrate approaches from Computer Science, Economics, Mathematics, and Epidemiology to give intellectual unity to the study and design of public health policies and has the potential for strong dissertation work in all these areas. Education and outreach is an important aspect of the project, and includes curriculum development at both the graduate and under-graduate levels. A multi-disciplinary workshop is also planned as part of the project.
The objective of the proposed research is to make progress on several mutually enriching directions in computational complexity theory, including problems at the intersections with algorithms and cryptography.
Computational inefficiency is a common experience: the computer cannot complete a certain task due to lack of resources such as time, memory, or bandwidth. Computational complexity theory classifies — or aims to classify — computational tasks according to their inherent inefficiency. Since tasks requiring excessive resources must be avoided, complexity theory is often indispensable in the design of a computer system. Inefficiency can also be harnessed to our advantage. Indeed, most modern cryptography and electronic commerce rely on the (presumed) inefficiency of certain computational tasks.
The objective of the proposed research is to make progress on several mutually enriching directions in computational complexity theory, including problems at the intersections with algorithms and cryptography. Building on the principal investigator’s (PI’s) previous works, the main proposed directions are:
This research is closely integrated with a plan to achieve broad impact through education. The PI is reshaping the theory curriculum at Northeastern on multiple levels. At the undergraduate level, the PI is working on and using in his classes a set of lecture notes aimed towards students lacking mathematical maturity. At the Ph.D. level, the PI is including into core classes current research topics including some of the above. Finally, the PI will continue to do research working closely with students at all levels.
This project will afford the opportunity of greatly expanding the understanding of realistic complex networks by joining theoretical analysis of coupled networks with extensive analysis of appropriately chosen large-scale databases.
The significant advances realized in recent years in the study of complex networks are severely limited by an almost exclusive focus on the behavior of single networks. However, most networks in the real world are not isolated but are coupled and hence depend upon other networks, which in turn depend upon other networks. Real networks communicate with each other and may exchange information, or, more importantly, may rely upon one another for their proper functioning. A simple but real example is a power station network that depends on a computer network, and the computer network depends on the power network. Our social networks depend on technical networks, which, in turn, are supported by organizational networks. Surprisingly, analyzing complex systems as coupled interdependent networks alters the most basic assumptions that network theory has relied on for single networks. A multidisciplinary, data driven research project will: 1) Study the microscopic processes that rule the dynamics of interdependent networks, with a particular focus on the social component; 2) Define new mathematical models/foundational theories for the analysis of the robustness/resilience and contagion/diffusive dynamics of interdependent networks. This project will afford the opportunity of greatly expanding the understanding of realistic complex networks by joining theoretical analysis of coupled networks with extensive analysis of appropriately chosen large-scale databases. These databases will be made publicly available, except for special cases where it is illegal to do so.
This research has important implications for the understanding the social and technical systems that make up a modern society. A recent US Scientific Congressional Report concludes ?No currently available modeling and simulation tools exist that can adequately address the consequences of disruptions and failures occurring simultaneously in different critical infrastructures that are dynamically inter-dependent? Understanding the interdependence of networks and its effect on the system robustness and on the structural and functional behavior is crucial for properly modeling many real world systems and applications, from disaster preparedness, to building effective organizations, to comprehending the complexity of the macro economy. In addition to these intellectual objectives, the research project includes the development of an extensive outreach program to the public, especially K-12 students.
This research targets the design and evaluation of protocols for secure, privacy-preserving data analysis in an untrusted cloud.
Therewith, the user can store and query data in the cloud, preserving privacy and integrity of outsourced data and queries. The PIs specifically address a real-world cloud framework: Google’s prominent MapReduce paradigm.
Traditional solutions for single server setups and related work on, e.g., fully homomorphic encryption, are computationally too heavy and uneconomical and offset cloud advantages. The PIs’ rationale is to design new protocols tailored to the specifics of the MapReduce computing paradigm. The PIs’ methodology is twofold. First, the PIs design new protocols that allow the cloud user to specify data analysis queries for typical operations such as searching, pattern matching or counting. For this, the PIs extend privacy-preserving techniques, e.g., private information retrieval or order preserving encryption. Second, the PIs design protocols guaranteeing genuineness of data retrieved from the cloud. Using cryptographic accumulators, users can verify whether data has not been tampered with. Besides design, the PIs also implement a prototype that is usable in a realistic setting with MapReduce.
The outcome of this project enables privacy-preserving operations and secure data storage in a widely-used cloud computing framework, thus remove one major adoption obstacle, and make cloud computing available for a larger community.
This project aims to comprehensively investigate the resiliency of Wi-Fi networks to smart attacks, and to design and implement robust solutions capable of resisting or countering them. The project additionally focuses on harnessing new capabilities of Wi-Fi radios, such as multiple-input and multiple-output (MIMO) antennas, to protect against powerful adversaries.
Wi-Fi has emerged as the technology of choice for Internet access. Thus, virtually every smartphone or tablet is now equipped with a Wi-Fi card. Concurrently, and as a means to maximize spectral efficiency, Wi-Fi radios are becoming increasingly complex and sensitive to wireless channel conditions. The prevalence of Wi-Fi networks, along with their adaptive behaviors, makes them an ideal target for denial of service attacks at a large, infrastructure level.
This project aims to comprehensively investigate the resiliency of Wi-Fi networks to smart attacks, and to design and implement robust solutions capable of resisting or countering them. The project additionally focuses on harnessing new capabilities of Wi-Fi radios, such as multiple-input and multiple-output (MIMO) antennas, to protect against powerful adversaries. The research blends theory with experimentation and prototyping, and spans a range of disciplines including protocol design and analysis, coding and modulation, on-line algorithms, queuing theory, and emergent behaviors.
The anticipated benefits of the project include: (1) a deep understanding of threats facing Wi-Fi along several dimensions, via experiments and analysis; (2) a set of mitigation techniques and algorithms to strengthen existing Wi-Fi networks and emerging standards; (3) implementation into open-source software that can be deployed on wireless network cards and access points; (4) security training of the next-generation of scientists and engineers involved in radio design and deployment.
The objective of this research is to develop a comprehensive theoretical and experimental cyber-physical framework to enable intelligent human-environment interaction capabilities by a synergistic combination of computer vision and robotics.
Specifically, the approach is applied to examine individualized remote rehabilitation with an intelligent, articulated, and adjustable lower limb orthotic brace to manage Knee Osteoarthritis, where a visual-sensing/dynamical-systems perspective is adopted to: (1) track and record patient/device interactions with internet-enabled commercial-off-the-shelf computer-vision-devices; (2) abstract the interactions into parametric and composable low-dimensional manifold representations; (3) link to quantitative biomechanical assessment of the individual patients; (4) facilitate development of individualized user models and exercise regimen; and (5) aid the progressive parametric refinement of exercises and adjustment of bracing devices. This research and its results will enable us to understand underlying human neuro-musculo-skeletal and locomotion principles by merging notions of quantitative data acquisition, and lower-order modeling coupled with individualized feedback. Beyond efficient representation, the quantitative visual models offer the potential to capture fundamental underlying physical, physiological, and behavioral mechanisms grounded on biomechanical assessments, and thereby afford insights into the generative hypotheses of human actions.
Knee osteoarthritis is an important public health issue, because of high costs associated with treatments. The ability to leverage a quantitative paradigm, both in terms of diagnosis and prescription, to improve mobility and reduce pain in patients would be a significant benefit. Moreover, the home-based rehabilitation setting offers not only immense flexibility, but also access to a significantly greater portion of the patient population. The project is also integrated with extensive educational and outreach activities to serve a variety of communities.
This multi-institutional MIDAS Center of Excellence provides a multi-disciplinary approach to computational, statistical, and mathematical modeling of important infectious diseases.
This is a proposal for a multi-institutional MIDAS Center of Excellence called the Center for Statistics and Quantitative Infectious Diseases (CSQUID). The mission the Center is to provide national and international leadership. The lead institution is the Fred Hutchinson Cancer Research Center (FHCRC). Other participating institutions are the University of Florida, Northeastern University, University of Michigan, Emory University, University of Washington (UW), University of Georgia, and Duke University. The proposal includes four synergistic research projects (RP) that will develop cutting-edge methodologies applied to solving epidemiologic, immunologic and evolutionary problems important for public health policy in influenza, dengue, polio, TB, and other infectious agents: RP1: Modeling, Spatial, Statistics (Lead: I. Longini, U. Florida);RP2: Dynamic Inference (Lead: P. Rohani, U Michigan);RP 3: Understanding transmission with integrated genetic and epidemiologic inference (Co-Leads: E. Kenah, U Florida and T. Bedford, FHCRC);RP 4: Dynamics and Evolution of Influenza Strain Variation (Lead: R. Antia, Emory U). The Software Development and Core Facilities (Lead: A. Vespignani, Northeastern U) will provide leadership in software development, access, and communication. The Policy Studies (Lead: J. Koopman, U Michigan) will provide leadership in communication of our research results to policy makers, as well as conducting novel research into policy making. The Training, Outreach, and Diversity Plans include ongoing training of 9 postdoctoral fellows and 5.25 predoctoral research assistants each year, support for participants in the Summer Institute for Statistics and Modeling in Infectious Diseases (UW) and ongoing Research Experience for Undergraduates programs at two institutions, among others. All participating institutions and the Center are committed to increasing diversity at all levels. Center-wide activities include Career Development Awards for junior faculty, annual workshops and symposia, outside speakers, and participation in the MIDAS Network meetings. Scientific leadership will be provided by the Center Director, a Leadership Committee, an external Scientific Advisory Board as well as the MIDAS Steering Committee.
Public Health Relevance
This multi-institutional MIDAS Center of Excellence provides a multi-disciplinary approach to computational, statistical, and mathematical modeling of important infectious diseases. The research is motivated by multiscale problems such as immunologic, epidemiologic, and environmental drivers of the spread of infectious diseases with the goal of understanding and communicating the implications for public health policy.
Using the Asthma BioRepository for Integrative Genomic Exploration (Asthma BRIDGE), we will perform a series of systems-level genomic analyses that integrate clinical, environmental and various forms of “omic” data (genetics, genomics, and epigenetics) to better understand how molecular processes interact with critical environmental factors to impair asthma control.
The over-arching hypothesis of this proposal is that inter-individual differences in asthma control result from the complex interplay of both environmental, genomic, and socioeconomic factors organized in discrete, scale-free molecular networks. Though strict patient compliance with asthma controller therapy and avoidance of environmental triggers are important strategies for the prevention of asthma exacerbation, failure to maintain control is the most common health-related cause of lost school and workdays. Therefore, better understanding of the molecular underpinnings and the role of environmental factors that lead to poor asthma control is needed. Using the Asthma BioRepository for Integrative Genomic Exploration (Asthma BRIDGE), we will perform a series of systems-level genomic analyses that integrate clinical, environmental and various forms of “omic” data (genetics, genomics, and epigenetics) to better understand how molecular processes interact with critical environmental factors to impair asthma control. This proposal consists three Specific Aims, each consisting of three investigational phases: (i) an initial computational discovery phase to define specific molecular networks using the Asthma BRIDGE datasets, followed by two validation phases – (ii) a computational validation phase using an independent clinical cohort, and (iii) an experimental phase to validate critical molecular edges (gene-gene interactions) that emerge from the defined molecular network.
In Specific Aim 1, we will use the Asthma BRIDGE datasets to define interactome sub-module perturbed in poor asthma control;the regulatory variants that modulate this asthma-control module;and to develop a predictive model of asthma control.
In Specific Aim 2, we will study the effects exposure to air pollution and environmental tobacco smoke on modulating the asthma control networks, testing for environment-dependent alterations in network dynamics.
In Specific Aim 3, we will study the impact of inhaled corticosteroids (ICS – the most efficacious asthma-controller medication) on network dynamics of the asthma-control sub-module by comparing network topologies of acute asthma control between subjects taking ICS to those not on ICS. For our experimental validations, we will assess relevant gene-gene interactions by shRNA studies bronchial epithelial and Jurkat T- cell lines. Experimental validations of findings from Aim 2 will be performed by co-treating cells with either cigarette smoke extract (CSE) or ozone. Similar studies will be performed with co-treatment using dexamethasone to validate findings from Aim 2. From the totality of these studies, we will gain new insights into the pathobiology of poor asthma control, and define targets for biomarker development and therapeutic targeting.
Public Health Relevance
Failure to maintain tight asthma symptom control is a major health-related cause of lost school and workdays. This project aims to use novel statistical network-modeling approaches to model the molecular basis of poor asthma control in a well-characterized cohort of asthmatic patients with available genetic, gene expression, and DNA methylation data. Using this data, we will define an asthma-control gene network, and the genetic, epigenetic, and environmental factors that determine inter-individual differences in asthma control.
Crowdsourcing measurement of mobile Internet performance, now the engine for Mobiperf.
Mobilyzer is a collaboration between Morley Mao’s group at the University of Michigan and David Choffnes’ group at Northeastern University.
Mobilyzer provides the following components:
Measurements, analysis, and system designs to reveal how the Internet’s most commonly used trust systems operate (and misfunction) in practice, and how we can make them more secure.
Research on the SSL/TLS Ecosystem
Every day, we use Secure Sockets Layer (SSL) and Transport Layer Security (TLS) to secure our Internet transactions such as banking, e-mail and e-commerce. Along with a public key infrastructure (PKI), they allow our computers to automatically verify that our sensitive information (e.g., credit card numbers and passwords) are hidden from eavesdroppers and sent to trustworthy servers.
In mid-April, 2014, a software vulnerability called Heartbleed was announced. It allows malicious users to capture information that would allow them to masquerade as trusted servers and potentially steal sensitive information from unsuspecting users. The PKI provides multiple ways to prevent such an attack from occurring, and we should expect Web site operators to use these countermeasures.
In this study, we found that the overwhelming majority of sites (more than 73%) did not do so, meaning visitors to their sites are vulnerable to attacks such as identify theft. Further, the majority of sites that attempted to address the problem (60%) did so in a way that leaves customers vulnerable.
Practical and powerful privacy for network communication (led by Stevens Le Blond at MPI).
Entails several threads that cover Internet measurement, modeling and experimentation.
Understanding the geographic nature of Internet paths and their implications for performance, privacy and security.
This study sheds light on this issue by measuring how and when Internet traffic traverses national boundaries. To do this, we ask you to run our browser applet that visits various popular websites, measures the paths taken, and identifies their locations. By running our tool, you will help us understand if and how Internet paths traverse national boundaries, even when two endpoints are in the same country. And we’ll show you these paths, helping you to understand where your Internet traffic goes
This project will develop methodologies and tools for conducting algorithm audits. An algorithm audit uses controlled experiments to examine an algorithmic system, such as an online service or big data information archive, and ascertain (1) how it functions, and (2) whether it may cause harm.
Examples of documented harms by algorithms include discrimination, racism, and unfair trade practices. Although there is rising awareness of the potential for algorithmic systems to cause harm, actually detecting this harm in practice remains a key challenge. Given that most algorithms of concern are proprietary and non-transparent, there is a clear need for methods to conduct black-box analyses of these systems. Numerous regulators and governments have expressed concerns about algorithms, as well as a desire to increase transparency and accountability in this area.
This research will develop methodologies to audit algorithms in three domains that impact many people: online markets, hiring websites, and financial services. Auditing algorithms in these three domains will require solving fundamental methodological challenges, such as how to analyze systems with large, unknown feature sets, and how to estimate feature values without ground-truth data. To address these broad challenges, the research will draw on insights from prior experience auditing personalization algorithms. Additionally, each domain also brings unique challenges that will be addressed individually. For example, novel auditing tools will be constructed that leverage extensive online and offline histories. These new tools will allow examination of systems that were previously inaccessible to researchers, including financial services companies. Methodologies, open-source code, and datasets will be made available to other academic researchers and regulators. This project includes two integrated educational objectives: (1) to create a new computer science course on big data ethics, teaching how to identify and mitigate harmful side-effects of big data technologies, and (2) production of web-based versions of the auditing tools that are designed to be accessible and informative to the general public, that will increase transparency around specific, prominent algorithmic systems, as well as promote general education about the proliferation and impact of algorithmic systems.
This project aims to investigate the development of procedural narrative systems using crowd-sourcing methods.
This project will create a framework for simulation-based training, which supports a learner’s exploration and replay, and exercise theory of mind skills in order to deliver the full promise of social skills training. The term Theory of Mind (ToM) refers to the human capacity to use beliefs about the mental processes and states of others. In order to train social skills, there has been a rapid growth in narrative-based simulations that allow learners to role-play social interactions. However, the design of these systems often constrains the learner’s ability to explore different behaviors and their consequences. Attempts to support more generative experiences face a combinatorial explosion of alternative paths through the interaction, presenting an overwhelming challenge for developers to create content for all the alternatives. Rather, training systems are often designed around exercising specific behaviors in specific situations, hampering the learning of more general skills in using ToM. This research seeks to solve this problem through three contributions: (1) a new model for conceptualizing narrative and role-play experiences that addresses generativity, (2) new methods that facilitate content creation for those generative experiences, and (3) an approach that embeds theory of mind training in the experience to allow for better learning outcomes. This research is applicable to complex social skill training across a range of situations: in schools, communities, the military, police, homeland security, and ethnic conflict.
The research begins with a paradigm shift that re-conceptualizes social skills simulation as a learner rehearsing a role instead of performing a role. This shift will exploit Stanislavsky’s Active Analysis (AA), a performance rehearsal technique that explicitly exercises Theory of Mind skills. Further, AA’s decomposition into short rehearsal scenes can break the combinatorial explosion over long narrative arcs that exacerbates content creation for social training systems. The research will then explore using behavior fitting and machine learning techniques on crowd-sourced data as way to semi-automate the development of multi-agent simulations for social training. The research will assess quantitatively and qualitatively the ability of this approach to (a) provide experiences that support exploration and foster ToM use and (b) support acquiring crowd-sourced data that can be used to craft those experiences using automatic methods.
This project is unique in combining cutting-edge work in modeling theory of mind, interactive environments, performance rehearsal, and crowd sourcing. The multidisciplinary collaboration will enable development of a methodology for creating interactive experiences that pushes the boundaries of the current state of the art in social skill training. Reliance on crowd sourcing provides an additional benefit of being able to elicit culturally specific behavior patterns by selecting the relevant crowd, allowing for both culture-specific and cross-cultural training content.
Evidence Based Medicine (EBM) aims to systematically use the best available evidence to inform medical decision making. This paradigm has revolutionized clinical practice over the past 30 years. The most important tool for EBM is the systematic review, which provides a rigorous, comprehensive and transparent synthesis of all current evidence concerning a specific clinical question. These syntheses enable decision makers to consider the entirety of the relevant published evidence.
Systematic reviews now inform everything from national health policy to bedside care. But producing these reviews requires researchers to identify the entirety of the relevant literature and then extract from this the information to be synthesized; a hugely laborious and expensive exercise. Moreover, the unprecedented growth of the biomedical literature has increased the burden on those trying to make sense of the published evidence base. Concurrently, more systematic reviews are being conducted every year to synthesize the expanding evidence base; tens of millions of dollars are spent annually conducting these reviews.
RobotReviewer aims to mitigate this issue by (semi-) automating evidence synthesis using machine learning and natural language processing.
View the RobotReviewer page to read more.
Software development is facing a paradigm shift towards ubiquitous concurrent programming, giving rise to software that is among the most complex technical artifacts ever created by humans. Concurrent programming presents several risks and dangers for programmers who are overwhelmed by puzzling and irreproducible concurrent program behavior, and by new types of bugs that elude traditional quality assurance techniques. If this situation is not addressed, we are drifting into an era of widespread unreliable software, with consequences ranging from collapsed programmer productivity, to catastrophic failures in mission-critical systems.
This project will take steps against a concurrent software crisis, by producing verification technology that assists non-specialist programmers in detecting concurrency errors, or demonstrating their absence. The proposed technology will confront the concurrency explosion problem that verification methods often suffer from. The project’s goal is a framework under which the analysis of programs with unbounded concurrency resources (such as threads of execution) can be soundly reduced to an analysis under a small constant resource bound, making the use of state space explorers practical. As a result, the project will largely eliminate the impact of unspecified computational resources as the major cause of complexity in analyzing concurrent programs. By developing tools for detecting otherwise undetectable misbehavior and vulnerabilities in concurrent programs, the project will contribute its part to averting a looming software quality crisis.
The research will enable the auditing and control of personally identifiable information leaks, addressing the key challenges of how to identify and control PII leaks when users’ PII is not known a priori, nor is the set of apps or devices that leak this information. First, to enable auditing through improved transparency, we are investigating how to use machine learning to reliably identify PII from network flows, and identify algorithms that incorporate user feedback to adapt to the changing landscape of privacy leaks. Second, we are building tools that allow users to control how their information is (or not) shared with other parties. Third, we are investigating the extent to which our approach extends to privacy leaks from IoT devices. Besides adapting our system to the unique format for leaks across a variety of IoT devices, our work investigates PII exposed indirectly through time-series data produced by IoT-generated monitoring.
The purpose of this project is to develop a conversational agent system that counsels terminally ill patients in order to alleviate their suffering and improve their quality of life.
Although many interventions have now been developed to address palliative care for specific chronic diseases, little has been done to address the overall quality of life for older adults with serious illness, spanning not only the functional aspects of symptom and medication management, but the affective aspects of suffering. In this project, we are developing a relational agent to counsel patients at home about medication adherence, stress management, advanced care planning, and spiritual support, and to provide referrals to palliative care services when needed.
When deployed on smartphones, virtual agents have the potential to deliver life-saving advice regarding emergency medical conditions, as well as provide a convenient channel for health education to improve the safety and efficacy of pharmacotherapy.
We are developing a smartphone-based virtual agent that provides counseling to patients with Atrial Fibrillation. Atrial Fibrillation is a highly prevalent heart rhythm disorder and is known to significantly increase the risk of stroke, heart failure and death. In this project, a virtual agent is deployed in conjunction with a smartphone-based heart rhythm monitor that lets patients obtain real-time diagnostic information on the status of their atrial fibrillation and determine whether immediate action may be needed.
This project is a collaboration with University of Pittsburgh Medical Center.
The last decade has seen an enormous increase in our ability to gather and manage large amounts of data; business, healthcare, education, economy, science, and almost every aspect of society are accumulating data at unprecedented levels. The basic premise is that by having more data, even if uncertain and of lower quality, we are also able to make better-informed decisions. To make any decisions, we need to perform “inference” over the data, i.e. to either draw new conclusions, or to find support for existing hypotheses, thus allowing us to favor one course of action over another. However, general reasoning under uncertainty is highly intractable, and many state-of-the-art systems today perform approximate inference by reverting to sampling. Thus for many modern applications (such as information extraction, knowledge aggregation, question-answering systems, computer vision, and machine intelligence), inference is a key bottleneck, and new methods for tractable approximate inference are needed.
This project addresses the challenge of scaling inference by generalizing two highly scalable approximate inference methods and complementing them with scalable methods for parameter learning that are “approximation-aware.” Thus, instead of treating the (i) learning and the (ii) inference steps separately, this project uses the approximation methods developed for inference also for learning the model. The research hypothesis is that this approach increases the overall end-to-end prediction accuracy while simultaneously increasing scalability. Concretely, the project develops the theory and a set of scalable algorithms and optimization methods for at least the following four sub-problems: (1) approximating general probabilistic conjunctive queries with standard relational databases; (2) learning the probabilities in uncertain databases based on feedback on rankings of output tuples from general queries; (3) approximating the exact probabilistic inference in undirected graphical models with linearized update equations; and (4) complementing the latter with a robust framework for learning linearized potentials from partially labeled data.
Mass spectrometry is a diverse and versatile technology for high-throughput functional characterization of proteins, small molecules and metabolites in complex biological mixtures. The technology rapidly evolves and generates datasets of an increasingly large complexity and size. This rapid evolution must be matched by an equally fast evolution of statistical methods and tools developed for analysis of these data. Ideally, new statistical methods should leverage the rich resources available from over 12,000 packages implemented in the R programming language and its Bioconductor project. However, technological limitations now hinder their adoption for mass spectrometric research. In response, the project ROCKET builds an enabling technology for working with large mass spectrometric datasets in R, and rapidly developing new algorithms, while benefiting from advancements in other areas of science. It also offers an opportunity of recruitment and retention of Native American students to work with R-based technology and research, and helps prepare them in a career in STEM.
Instead of implementing yet another data processing pipeline, ROCKET builds an enabling technology for extending the scalability of R, and streamlining manipulations of large files in complex formats. First, to address the diversity of the mass spectrometric community, ROCKET supports scaling down analyses (i.e., working with large data files on relatively inexpensive hardware without fully loading them into memory), as well as scaling up (i.e., executing a workflow on a cloud or on a multiprocessor). Second, ROCKET generates an efficient mixture of R and target code which is compiled in the background for the particular deployment platform. By ensuring compatibility with mass spectrometry-specific open data storage standards, supporting multiple hardware scenarios, and generating optimized code, ROCKET enables the development of general analytical methods. Therefore, ROCKET aims to democratize access to R-based data analysis for a broader community of life scientists, and create a blueprint for a new paradigm for R-based computing with large datasets. The outcomes of the project will be documented and made publicly available at https://olga-vitek-lab.khoury.northeastern.edu/.
This award reflects NSF’s statutory mission and has been deemed worthy of support through evaluation using the Foundation’s intellectual merit and broader impacts review criteria.
Northeastern University proposes to organize a Summer School ‘Big Data and Statistics for Bench Scientists.’ The Summer School will train life scientists and computational scientists in designing and analyzing large-scale experiments relying on proteomics, metabolomics, and other high-throughput biomolecular assays. The training will enhance the effectiveness and reproducibility of biomedical research, such as discovery of diagnostic biomarkers for early diagnosis of disease, or prognostic biomarkers for predicting therapy response.
Northeastern University requests funds for a Summer School, entitled Big Data and Statistics for Bench Scientists. The target audience for the School are graduate and post-graduate life scientists, who work primarily in wet lab, and who generate large datasets. Unlike other educational efforts that emphasize genomic applications, this School targets scientists working with other experimental technologies. Mass spectrometry-based proteomics and metabolomics are our main focus, however the School is also appropriate for scientists working with other assays, e.g. nuclear magnetic resonance spectroscopy (NMR), protein arrays, etc. This large community has been traditionally under-served by educational efforts in computation and statistics. This proposal aims to fill this void. The Summer School is motivated by the feedback from smaller short courses previously co-organized or co- instructed by the PI, and will cover theoretical and practical aspects of design and analysis of large-scale experimental datasets. The Summer School will have a modular format, with 8 20-hour modules scheduled in 2 parallel tracks during 2 consecutive weeks. Each module can be taken independently. The planned modules are (1) Processing raw mass spectrometric data from proteomic experiments using Skyline, (2) Begnner’s R, (3) Processing raw mass spectrometric data from metabolomic experiments using OpenMS, (4) Intermediate R, (5) Beginner’s guide to statistical experimental design and group comparison, (6) Specialized statistical methods for detecting differentially abundant proteins and metabolites, (7) Statistical methods for discovery of biomarkers of disease, and (8) Introduction to systems biology and data integration. Each module will introduce the necessary statistical and computational methodology, and contain extensive practical hands-on sessions. Each module will be organized by instructors with extensive interdisciplinary teaching experience, and supported by several teaching assistants. We anticipate the participation of 104 scientists, each taking on average 2 modules. Funding is requested for three yearly offerings of the School, and includes funds to provide US participants with 62 travel fellowships per year, and 156 registration fee wavers per module. All the course materials, including videos of the lectures and of the practical sessions, will be publicly available free of charge.
Different individuals experience the same events in vastly different ways, owing to their unique histories and psychological dispositions. For someone with social fears and anxieties, the mere thought of leaving the home can induce a feeling of panic. Conversely, an experienced mountaineer may feel quite comfortable balancing on the edge of a cliff. This variation of perspectives is captured by the term subjective experience. Despite its centrality and ubiquity in human cognition, it remains unclear how to model the neural bases of subjective experience. The proposed work will develop new techniques for statistical modeling of individual variation, and apply these techniques to a neuroimaging study of the subjective experience of fear. Together, these two lines of research will yield fundamental insights into the neural bases of fear experience. More generally, the developed computational framework will provide a means of comparing different mathematical hypotheses about the relationship between neural activity and individual differences. This will enable investigation of a broad range of phenomena in psychology and cognitive neuroscience.
The proposed work will develop a new computational framework for modeling individual variation in neuroimaging data, and use this framework to investigate the neural bases of one powerful and societally meaningful subjective experience, namely, of fear. Fear is a particularly useful assay because it involves variation across situational contexts (spiders, heights, and social situations), and dispositions (arachnophobia, acrophobia, and agoraphobia) that combine to create subjective experience. In the proposed neuroimaging study, participants will be scanned while watching videos that induce varying levels of arousal. To characterize individual variation in this neuroimaging data, the investigators will leverage advances in deep probabilistic programming to develop probabilistic variants of factor analysis models. These models infer a low-dimensional feature vector, also known as an embedding, for each participant and stimulus. A simple neural network models the relationship between embeddings and the neural response. This network can be trained in a data-driven manner and can be parameterized in a variety of ways, depending on the experimental design, or the neurocognitive hypotheses that are to be incorporated into the model. This provides the necessary infrastructure to test different neural models of fear. Concretely, the investigators will compare a model in which fear has its own unique circuit (i.e. neural signature or biomarker) to subject- or situation-specific neural architectures. More generally, the developed framework can be adapted to model individual variation in neuroimaging studies in other experimental settings.
Easy Alliance, a nonprofit initiative, has been instituted to solve complex, long term challenges in making the digital world a more accessible place for everyone.
Computer networking and the internet have revolutionized our societies, but are plagued with security problems which are difficult to tame. Serious vulnerabilities are constantly being discovered in network protocols that affect the work and lives of millions. Even some protocols that have been carefully scrutinized by their designers and by the computer engineering community have been shown to be vulnerable afterwards. Why is developing secure protocols so hard? This project seeks to address this question by developing novel design and implementation methods for network protocols that allow to identify and fix security vulnerabilities semi-automatically. The project serves the national interest as cyber-security costs the United States many billions of dollars annually. Besides making technical advances to the field, this project will also have broader impacts in education and curriculum development, as well as in helping to bridge the gap between several somewhat fragmented scientific communities working on the problem.
Technically, the project will follow a formal approach building upon a novel combination of techniques from security modeling, automated software synthesis, and program analysis to bridge the gap between an abstract protocol design and a low-level implementation. In particular, the methodology of the project will be based on a new formal behavioral model of software that explicitly captures how the choice of a mapping from a protocol design onto an implementation platform may result in different security vulnerabilities. Building on this model, this project will provide (1) a modeling approach that cleanly separates the descriptions of an abstract design from a concrete platform, and allows the platform to be modeled just once and reused, (2) a synthesis tool that will automatically construct a secure mapping from the abstract protocol to the appropriate choice of platform features, and (3) a program analysis tool that leverages platform-specific information to check that an implementation satisfies a desired property of the protocol. In addition, the project will develop a library of reusable platform models, and demonstrate the effectiveness of the methodology in a series of case studies.
Most computer programs process vast amounts of numerical data. Unfortunately, due to space and performance demands, computer arithmetic comes with its own rules. Making matters worse, different computers have different rules: while there are standardization efforts, efficiency considerations give hardware and compiler designers much freedom to bend the rules to their taste. As a result, the outcome of a computer calculation depends not only on the input, but also on the particular machine and environment in which the calculation takes place. This makes programs brittle and un-portable, and causes them to produce untrusted results. This project addresses these problems, by designing methods to detect inputs to computer programs that exhibit too much platform dependence, and to repair such programs, by making their behavior more robust.
Technical goals of this project include: (i) automatically warning users of disproportionately platform-dependent results of their numeric algorithms; (ii) repairing programs with platform instabilities; and (iii) proving programs stable against platform variations. Platform-independence of numeric computations is a form of robustness whose lack undermines the portability of program semantics. This project is one of the few to tackle the question of non-determinism in the specification (IEEE 754) of the theory (floating-point arithmetic) that machines are using today. This work requires new abstractions that soundly approximate the set of values of a program variable against a variety of compiler and hardware behaviors and features that may not even be known at analysis time. The project involves graduate and undergraduate students.
Side-channel attacks (SCA) have been a realistic threat to various cryptographic implementations that do not feature dedicated protection. While many effective countermeasures have been found and applied manually, they are application-specific and labor intensive. In addition, security evaluation tends to be incomplete, with no guarantee that all the vulnerabilities in the target system have been identified and addressed by such manual countermeasures. This SaTC project aims to shift the paradigm of side-channel attack research, and proposes to build an automation framework for information leakage analysis, multi-level countermeasure application, and formal security evaluation against software side-channel attacks.
The proposed framework provides common sound metrics for information leakage, methodologies for automatic countermeasures, and formal and thorough evaluation methods. The approach unifies power analysis and cache-based timing attacks into one framework. It defines new metrics of information leakage and uses them to automatically identify possible leakage of a given cryptosystem at an early stage with no implementation details. The conventional compilation process is extended along the new dimension of optimizing for security, to generate side-channel resilient code and ensure its secure execution at run-time. Side-channel security is guaranteed to be at a certain confidence level with formal methods. The three investigators on the team bring complementary expertise to this challenging interdisciplinary research, to develop the advanced automation framework and the associated software tools, metrics, and methodologies. The outcome significantly benefits security system architects and software developers alike, in their quest to build verifiable SCA security into a broad range of applications they design. The project also builds new synergy among fundamental statistics, formal methods, and practical system security. The automation tools, when introduced in new courses developed by the PIs, help improving students’ hands-on experience greatly. The project also leverages the experiential education model of Northeastern University to engage undergraduates, women, and minority students in independent research projects.
Nontechnical Description: Artificial intelligence especially deep learning has enabled many breakthroughs in both academia and industry. This project aims to create a generative and versatile design approach based on novel deep learning techniques to realize integrated, multi-functional photonic systems, and provide proof-of-principle demonstrations in experiments. Compared with traditional approaches using extensive numerical simulations or inverse design algorithms, deep learning can uncover the highly complicated relationship between a photonic structure and its properties from the dataset, and hence substantially accelerate the design of novel photonic devices that simultaneously encode distinct functionalities in response to the designated wavelength, polarization, angle of incidence and other parameters. Such multi-functional photonic systems have important applications in many areas, including optical imaging, holographic display, biomedical sensing, and consumer photonics with high efficiency and fidelity, to benefit the public and the nation. The integrated education plan will considerably enhance outreach activities and educate students in grades 7-12, empowered by the successful experience and partnership previously established by the PIs. Graduate and undergraduate students participating in the project will learn the latest developments in the multidisciplinary fields of photonics, deep learning and advanced manufacturing, and gain real-world knowledge by engaging industrial collaborators in tandem with Northeastern University’s renowned cooperative education program.
Technical Description: Metasurfaces, which are two-dimensional metamaterials consisting of a planar array of subwavelength designer structures, have created a new paradigm to tailor optical properties in a prescribed manner, promising superior integrability, flexibility, performance and reliability to advance photonics technologies. However, so far almost all metasurface designs rely on time-consuming numerical simulations or stochastic searching approaches that are limited in a small parameter space. To fully exploit the versatility of metasurfaces, it is highly desired to establish a general, functionality-driven methodology to efficiently design metasurfaces that encompass distinctly different optical properties and performances within a single system. The objective of the project is to create and demonstrate a high-efficiency, two-level design approach enabled by deep learning, in order to realize integrated, multi-functional meta-systems. Proper deep learning methods, such as Conditional Variational Auto-Encoder and Deep Bidirectional-Convolutional Network, will be investigated, innovatively reformulated and tailored to apply at the single-element level and the large-scale system level in combination with topology optimization and genetic algorithm. Such a generative design approach can directly and automatically identify the optimal structures and configurations out of the full parameter space. The designed multi-functional optical meta-systems will be fabricated and characterized to experimentally confirm their performances. The success of the project will produce transformative photonic architectures to manipulate light on demand.
Critical infrastructure systems are increasingly reliant on one another for their efficient operation. This research will develop a quantitative, predictive theory of network resilience that takes into account the interactions between built infrastructure networks, and the humans and neighborhoods that use them. This framework has the potential to guide city officials, utility operators, and public agencies in developing new strategies for infrastructure management and urban planning. More generally, these efforts will untangle the roles of network structure and network dynamics that enable interdependent systems to withstand, recover from, and adapt to perturbations. This research will be of interest to a variety of other fields, from ecology to cellular biology.
The project will begin by cataloging three built infrastructures and known interdependencies (both physical and functional) into a “network of networks” representation suitable for modeling. A key part of this research lies in also quantifying the interplay between built infrastructure and social systems. As such, the models will incorporate community-level behavioral effects through urban “ecometrics” — survey-based empirical data that capture how citizens and neighborhoods utilize city services and respond during emergencies. This realistic accounting of infrastructure and its interdependencies will be complemented by realistic estimates of future hazards that it may face. The core of the research will use network-based analytical and computational approaches to identify reduced-dimensional representations of the (high-dimensional) dynamical state of interdependent infrastructure. Examining how these resilience metrics change under stress to networks at the component level (e.g. as induced by inundation following a hurricane) will allow identification of weak points in existing interdependent infrastructure. The converse scenario–in which deliberate alterations to a network might improve resilience or hasten recovery of already-failed systems–will also be explored.
Students will be working on building a library of cache-oblivious data structures and measuring the performance under different workloads. We will first implement serial versions of the algorithms, and then implement the parallel version of several known cache oblivious data structures and algorithms. Read more.
The training plan is to bring in students(ideally in pairs of 2) who are currently sophomores/junior and have taken a Computer Systems course using C/C++. Students need not have any previous research experience, but generally will have experience using threads(e.g. pthreads) and have taken an algorithms course.
[1] (2 weeks) Students will first work through understanding the basics of Cache-Oblvious Algorithms and Data structures from: http://erikdemaine.org/papers/BRICS2002/paper.pdf
[2] (2 weeks) Students will then work through select lectures and exercises on caches from here: https://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-172-performance-engineering-of-software-systems-fall-2010/video-lectures/
[3] (1 week) Students will then learn the basics of profiling
[4] (2 weeks) Next students will implement a few data structures and algorithms, and then
[5] (4 weeks) Students will work to find good real world benchmarks, mining github repositories for benchmarks that suffer from false-sharing performance related problems.
[6] The remaining time will be writing up and polishing collected results.
The key research questions we are investigating in the Mon(IoT)r research group are:
Our methodology entails recording and analyzing all network traffic generated by a variety of IoT devices that we have acquired. We not only inspect traffic for PII in plaintext, but attempt to man-in-the-middle SSL connections to understand the contents of encrypted flows. Our analysis allows us to uncover how IoT devices are currently protecting users’ PII, and determine how easy or difficult it is to mount attacks against user privacy.
Wehe uses your device to exchange Internet traffic recorded from real, popular apps like YouTube and Spotify—effectively making it look as if you are using those apps. As a result, if an Internet service provider (ISP) tries to slow down an YouTube, Wehe would see the same behavior. We then send the same app’s Internet traffic, but replacing the content with randomized bytes, which prevents the ISPs from classifying the traffic as belonging to the app. Our hypothesis is that the randomized traffic will not cause an ISP to conduct application-specific differentiation (e.g., throttling or blocking), but the original traffic will. We repeat these tests several times to rule out noise from bad network conditions, and tell you at the end whether your ISP is giving different performance to an app’s network traffic.
Type-safe programming languages report errors when a program applies operations to data of the wrong type—e.g., a list-length operation expects a list, not a number—and they come in two flavors: dynamically typed (or untyped) languages, which catch such type errors at run time, and statically typed languages, which catch type errors at compile time before the program is ever run. Dynamically typed languages are well suited for rapid prototyping of software, while static typing becomes important as software systems grow since it offers improved maintainability, code documentation, early error detection, and support for compilation to faster code. Gradually typed languages bring together these benefits, allowing dynamically typed and statically typed code—and more generally, less precisely and more precisely typed code—to coexist and interoperate, thus allowing programmers to slowly evolve parts of their code base from less precisely typed to more precisely typed. To ensure safe interoperability, gradual languages insert runtime checks when data with a less precise type is cast to a more precise type. Gradual typing has seen high adoption in industry, in languages like TypeScript, Hack, Flow, and C#. Unfortunately, current gradually typed languages fall short in three ways. First, while normal static typing provides reasoning principles that enable safe program transformations and optimizations, naive gradual systems often do not. Second, gradual languages rarely guarantee graduality, a reasoning principle helpful to programmers, which says that making types more precise in a program merely adds in checks and the program otherwise behaves as before. Third, time and space efficiency of the runtime casts inserted by gradual languages remains a concern. This project addresses all three of these issues. The project’s novelties include: (1) a new approach to the design of gradual languages by first codifying the desired reasoning principles for the language using a program logic called Gradual Type Theory (GTT), and from that deriving the behavior of runtime casts; (2) compiling to a non-gradual compiler intermediate representation (IR) in a way that preserves these principles; and (3) the ability to use GTT to reason about the correctness of optimizations and efficient implementation of casts. The project has the potential for significant impact on industrial software development since gradually typed languages provide a migration path from existing dynamically typed codebases to more maintainable statically typed code, and from traditional static types to more precise types, providing a mechanism for increased adoption of advanced type features. The project will also have impact by providing infrastructure for future language designs and investigations into improving the performance of gradual typing.
The project team will apply the GTT approach to investigate gradual typing for polymorphism with data abstraction (parametricity), algebraic effects and handlers, and refinement/dependent types. For each, the team will develop cast calculi and program logics expressing better equational reasoning principles than previous proposals, with certified elaboration to a compiler intermediate language based on Call-By-Push-Value (CBPV) while preserving these properties, and design convenient surface languages that elaborate into them. The GTT program logics will be used for program verification, proving the correctness of program optimizations and refactorings.
This award reflects NSF’s statutory mission and has been deemed worthy of support through evaluation using the Foundation’s intellectual merit and broader impacts review criteria.
When building large software systems, programmers should be able to use the best language for each part of the system. But when a component written in one language becomes part of a multi-language system, it may interoperate with components that have features that don’t exist in the original language. This affects programmers when they refactor code (i.e., make changes that should result in equivalent behavior). Since programs interact after compilation to a common target, programmers have to understand details of linking and target-level interaction when reasoning about correctly refactoring source components. Unfortunately, there are no software toolchains available today that support single-language reasoning when components are used in a multi-language system. This project will develop principled software toolchains for building multi-language software. The project’s novelties include (1) designing language extensions that allow programmers to specify how they wish to interoperate (or link) with conceptual features absent from their language through a mechanism called linking types, and (2) developing compilers that formally guarantee that any reasoning the programmer does at source level is justified after compilation to the target. The project has the potential for tremendous impact on the software development landscape as it will allow programmers to use a language close to their problem domain and provide them with software toolchains that make it easy to compose components written in different languages into a multi-language software system.
The project will evaluate the idea of linking types by extending ML with linking types for interaction with Rust, a language with first-class control, and a normalizing language, and developing type preserving compilers to a common typed LLVM-like target language. The project will design a rich dependently typed LLVM-like target language that can encapsulate effects from different source languages to support fully abstract compilation from these languages. The project will also investigate reporting of cross-language type errors to aid programmers when composing components written in different languages.
This award reflects NSF’s statutory mission and has been deemed worthy of support through evaluation using the Foundation’s intellectual merit and broader impacts review criteria.
Modern programming languages ranging from Java to Matlab rely on just-in-time compilation techniques to achieve performance competitive with computer languages such as C or C++. What sets just-in-time compilers apart from batch compilers is that they can observe the programs actions as it executes, and inspect its state. Knowledge of the program’s state and past behavior, allows the compiler to perform speculative optimizations that improve performance. The intellectual merits of this research are to devise techniques for reasoning about the correctness of the transformations performed by just-in-time compilers. The project’s broader significance and importance are its implications to industrial practice. The results of this research will be applicable to commercial just-in-time compilers for languages such as JavaScript and R.
This project develops a general model of just-in-time compilation that subsumes deployed systems and allows systematic exploration of the design space of dynamic compilation techniques. The research questions that will be tackled in this work lie along two dimensions: Experimental—explore the design space of dynamic compilation techniques and gain an understanding of trade-offs; Foundational—formalize key ingredients of a dynamic compiler and develop techniques for reasoning about correctness in a modular fashion.
To provide open-source, interoperable, and extensible statistical software for quantitative mass spectrometry, which enables experimentalists and developers of statistical methods to rapidly respond to changes in the evolving biotechnological landscape.
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed vel justo tellus. Proin volutpat justo at arcu efficitur venenatis. Vestibulum eu pharetra nulla. Curabitur et tempus ante. Nulla id sapien id libero lacinia interdum vitae et ligula. Mauris in aliquet justo. Nam et fringilla leo. Vestibulum scelerisque ipsum mollis quam tristique, vitae consequat ante sollicitudin. Vivamus in tempus lectus, sed aliquet ante. Nullam ut diam a orci tincidunt pellentesque. Praesent at enim ut sem molestie facilisis ut ut sapien. Aenean lacinia erat sit amet tempor sagittis. Integer condimentum luctus lorem, in mattis lectus ullamcorper at. Curabitur eros magna, vulputate id faucibus nec, cursus sit amet odio.
Fusce tristique enim ut turpis consequat, eget porta nisl fringilla. Pellentesque quis tristique ipsum, ut eleifend odio. Sed eget velit magna. Vivamus nec metus sit amet mi sodales cursus et a purus. Duis turpis arcu, fringilla fringilla tincidunt eu, scelerisque vitae risus. Nulla facilisi. Integer at dui volutpat, ornare urna non, malesuada ante. Vestibulum eget purus ac tortor tempus interdum nec ac dolor. Vivamus sed eros eleifend, ornare mi sed, mollis tortor. Donec varius id justo id sollicitudin. Maecenas ut volutpat mi, ut vehicula purus. Nullam eu consequat tellus, ac fermentum felis. Morbi euismod risus ut risus consectetur, a efficitur orci varius. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Mauris ut turpis vel libero finibus interdum eu eget dui. Quisque scelerisque aliquam quam vitae rhoncus.
Ut semper iaculis ante non pretium. Ut sed ligula nibh. Duis sollicitudin, arcu quis mattis posuere, odio mauris rutrum sapien, id cursus nibh tellus in nisi. Cras consequat finibus metus, nec gravida metus varius id. Nulla orci libero, viverra sit amet sollicitudin sit amet, lobortis at mauris. Phasellus ac felis pellentesque, gravida neque ut, dapibus leo. Cras a ex purus. Sed rutrum pretium lacus et aliquam. Curabitur finibus ante non nisl pellentesque, sed rutrum risus hendrerit. Nunc elementum hendrerit nisl vel bibendum. Maecenas auctor lacus id orci condimentum placerat. Ut imperdiet condimentum nulla, non elementum dolor gravida in. Curabitur nec ligula nec sem tincidunt aliquet. Nulla consectetur consectetur viverra. Phasellus scelerisque gravida pharetra.
Nam varius vestibulum metus sit amet porttitor. Nunc a bibendum nunc. In vel laoreet enim. Mauris venenatis nisl lectus, ac tincidunt diam tincidunt ac. Aliquam pellentesque finibus purus, ac suscipit est. Pellentesque at ligula eleifend, varius libero eget, finibus diam. Etiam pulvinar aliquet lectus, vitae condimentum felis pharetra sit amet. Donec neque ligula, interdum ac est vel, lacinia mollis erat. Quisque commodo nisi ipsum, et sollicitudin quam imperdiet et. Curabitur interdum consequat varius. Sed auctor mattis varius. Nam sodales tortor ex, at tempor diam tincidunt eleifend. Aliquam ullamcorper efficitur mauris ac tincidunt. Sed eu elementum nunc. Aliquam auctor varius lacus eu aliquet.
Led By:
Yunsi Fei
Led By:
Yunsi Fei
Led By:
Yongmin Liu
Led By:
Yongmin Liu
Led By:
Ed Boyden
Led By:
Ed Boyden
Led By:
Scott Weiss
Led By:
Scott Weiss
Led By:
Amar Dhand
Led By:
Amar Dhand
Led By:
Alex A. Ahmed
Led By:
Alex A. Ahmed
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Aliquam at nulla vitae ipsum convallis euismod sit amet nec arcu. Ut porttitor ex ipsum, a consequat elit imperdiet non. Maecenas velit lacus, semper at elementum sit amet, semper non odio. Aenean congue justo ac leo porta sollicitudin. Ut et diam in elit suscipit semper. Nullam risus neque, elementum vel sapien a, sollicitudin sollicitudin lectus. Praesent ac ipsum ullamcorper leo rutrum egestas ut eget ex. Quisque sed erat ipsum. Integer a congue ante, nec egestas ligula. Vestibulum molestie risus at mi mattis malesuada. Nulla non dolor non justo fermentum feugiat. Etiam ante tellus, mollis vel urna sed, vestibulum facilisis erat. Proin leo purus, laoreet a arcu sed, pellentesque fringilla nibh.
Interdum et malesuada fames ac ante ipsum primis in faucibus. Proin libero sem, lobortis in orci varius, aliquet hendrerit metus. Donec odio lectus, eleifend eget posuere id, sagittis sit amet erat. Proin vulputate ipsum lectus, a egestas magna dapibus id. Fusce rutrum viverra consequat. Integer eget nisi ultrices, auctor tellus non, convallis arcu. Phasellus gravida leo at pellentesque dapibus. Pellentesque nisl diam, tempus ut nisl at, viverra gravida magna. Ut interdum at velit convallis dapibus. Mauris turpis ligula, pulvinar in eros in, faucibus faucibus metus.
Maecenas laoreet porta cursus. In vulputate elementum ex vel venenatis. Aenean accumsan et neque non porttitor. Nullam eget porttitor elit, id convallis leo. Integer ornare cursus nisi, ac vestibulum ligula. Quisque dignissim quam eu turpis volutpat, quis dapibus ante sagittis. Nunc scelerisque quis lacus ac facilisis. Ut porta rhoncus molestie.
This work investigates new models of cloud computing that combine domain-targeted languages with scalable data processing, sharing, and management abstractions within a distributed service platform that “scales” programmer productivity.
As the cost of computing and communication resources has plummeted, applications have become data-centric with data products growing explosively in both number and size. Although accessing such data using the compute power necessary for its analysis and processing is cheap and readily available via cloud computing (intuitive, utility-style access to vast resource pools), doing so currently requires significant expertise, experience, and time (for customization, configuration, deployment, etc). This work investigates new models of cloud computing that combine domain-targeted languages with scalable data processing, sharing, and management abstractions within a distributed service platform that “scales” programmer productivity. To enable this, this research explores new programming language, runtime, and distributed systems techniques and technologies that integrate the R programming language environment with open source cloud platform-as-a-service (PaaS) in ways that simplify processing massive datasets, sharing datasets across applications and users, and tracking and enforcing data provenance. The PIs’ plans for research, outreach, integrated curricula, and open source release of research artifacts have the potential for making cloud computing more accessible to a much wider range of users: The data analytics community who use the R statistical analysis environment to apply their techniques and algorithms to important problems in areas such as biology, chemistry, physics, political science and finance, by enabling them to use cloud resources transparently for their analyses, and to share their scientific data/results in a way that enables others to reproduce and verify them.
The Applied Machine Learning Group is working with researchers from Harvard Medical School to predict outcomes for multiple sclerosis patients. A focus of the research is how best to interact with physicians to use both human expertise and machine learning methods.
Many of the truly difficult problems limiting advances in contemporary science are rooted in our limited understanding of how complex systems are controlled. Indeed, in human cells millions of molecules are embedded in a complex genetic network that lacks an obvious controller; in society billions of individuals interact with each other through intricate trust-family-friendship-professional-association based networks apparently controlled by no one; economic change is driven by what economists call the “invisible hand of the market”, reflecting a lack of understanding of the control principles that govern the interactions between individuals, companies, banks and regulatory agencies.
These and many other examples raise several fundamental questions: What are the control principles of complex systems? How do complex systems organize themselves to achieve sufficient control to ensure functionality? This proposal is motivated by the hypothesis that the architecture of many complex systems is driven by the system’s need to achieve sufficient control to maintain its basic functions. Hence uncovering the control principles of complex self-organized systems can help us understand the fundamental laws that govern them.
The PI’s goal in this project is to revolutionize media-assisted oral presentations in general, and STEM presentations in particular, through the use of an intelligent, autonomous, life-sized, animated co-presenter agent that collaborates with a human presenter in preparing and delivering his or her talk in front of a live audience.
Although journal and conference articles are recognized as the most formal and enduring forms of scientific communication, oral presentations are central to science because they are the means by which researchers, practitioners, the media, and the public hear about the latest findings thereby becoming engaged and inspired, and where scientific reputations are made. Yet despite decades of technological advances in computing and communication media, the fundamentals of oral scientific presentations have not advanced since software such as Microsoft’s PowerPoint was introduced in the 1980’s. The PI’s goal in this project is to revolutionize media-assisted oral presentations in general, and STEM presentations in particular, through the use of an intelligent, autonomous, life-sized, animated co-presenter agent that collaborates with a human presenter in preparing and delivering his or her talk in front of a live audience. The PI’s pilot studies have demonstrated that audiences are receptive to this concept, and that the technology is especially effective for individuals who are non-native speakers of English (which may be up to 21% of the population of the United States). Project outcomes will be initially deployed and evaluated in higher education, both as a teaching tool for delivering STEM lectures and as a training tool for students in the sciences to learn how to give more effective oral presentations (which may inspire future generations to engage in careers in the sciences).
This research will be based on a theory of human-agent collaboration, in which the human presenter is monitored using real-time speech and gesture recognition, audience feedback is also monitored, and the agent, presentation media, and human presenter (cued via an intelligent wearable teleprompter) are all dynamically choreographed to maximize audience engagement, communication, and persuasion. The project will make fundamental, theoretical contributions to models of real-time human-agent collaboration and communication. It will explore how humans and agents can work together to communicate effectively with a heterogeneous audience using speech, gesture, and a variety of presentation media, amplifying the abilities of scientist-orators who would otherwise be “flying solo.” The work will advance both artificial intelligence and computational linguistics, by extending dialogue systems to encompass mixed-initiative, multi-party conversations among co-presenters and their audience. It will impact the state of the art in virtual agents, by advancing the dynamic generation of hand gestures, prosody, and proxemics for effective public speaking and turn-taking. And it will also contribute to the field of human-computer interaction, by developing new methods for human presenters to interact with autonomous co-presenter agents and their presentation media, including approaches to cueing human presenters effectively using wearable user interfaces.
Northeastern University is a Center of Academic Excellence in Information Assurance Education and Research. It is also one of the four schools recently designated by the National Security Agency as a Center of Academic Excellence in Cyber Operations. Northeastern has produced 20 SFS students over the past 3 years. All of the graduates are placed in positions within the Federal Government and Federally Funded Research and Development Centers. One of the unique elements of the program is the diversity of students in the program.
ABSTRACT
Northeastern University is a Center of Academic Excellence in Information Assurance Education and Research. It is also one of the four schools recently designated by the National Security Agency as a Center of Academic Excellence in Cyber Operations. Northeastern has produced 20 SFS students over the past 3 years. All of the graduates are placed in positions within the Federal Government and Federally Funded Research and Development Centers. One of the unique elements of the program is the diversity of students in the program. Of the 20 students, 5 are in Computer Science, 6 are in Electrical and Computer Engineering, and 9 are in Information Assurance. These students come with different backgrounds that vary from political science and criminal justice to computer science and engineering. The University, with its nationally-recognized Cooperative Education, is well-positioned to attract and educate strong students in cybersecurity.
The SFS program at Northeastern succeeds in recruiting a diverse group of under-represented students to the program, and is committed to sustaining this level of diversity in future recruiting. Northeastern University is also reaching out to the broader community by leading Capture-the-Flag and Collegiate Cyber Defense competitions, and by actively participating in the New England Advanced Cyber Security Center, an organization composed of academia, industry, and government entities.
PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
Sun E. and Kaeli D.. “Aggressive Value Prediction on a GPU,” Journal of Parallel Processing, 2012, p. 1-19.
Azmandian F., Dy. J. G., Aslam J.A., Kaeli D.. “Local Kernel Density Ratio-Based Feature Selection for Outlier Detection,” Journal of Machine Learning Research, v.25, 2012, p. 49-64.
This is a study of the structure and dynamics of Internet-based collaboration. The project seeks groundbreaking insights into how multidimensional network configurations shape the success of value-creation processes within crowdsourcing systems and online communities. The research also offers new computational social science approaches to theorizing and researching the roles of social structure and influence within technology-mediated communication and cooperation processes.
This is a study of the structure and dynamics of Internet-based collaboration. The project seeks groundbreaking insights into how multidimensional network configurations shape the success of value-creation processes within crowdsourcing systems and online communities. The research also offers new computational social science approaches to theorizing and researching the roles of social structure and influence within technology-mediated communication and cooperation processes. The findings will inform decisions of leaders interested in optimizing all forms of collaboration in fields such as open-source software development, academic projects, and business. System designers will be able to identify interpersonal dynamics and develop new features for opinion aggregation and effective collaboration. In addition, the research will inform managers on how best to use crowdsourcing solutions to support innovation and marketing strategies including peer-to-peer marketing to translate activity within online communities into sales.
This research will analyze digital trace data that enable studies of population-level human interaction on an unprecedented scale. Understanding such interaction is crucial for anticipating impacts in our social, economic, and political lives as well as for system design. One site of such interaction is crowdsourcing systems – socio-technical systems through which online communities comprised of diverse and distributed individuals dynamically coordinate work and relationships. Many crowdsourcing systems not only generate creative content but also contain a rich community of collaboration and evaluation in which creators and adopters of creative content interact among themselves and with artifacts through overlapping relationships such as affiliation, communication, affinity, and purchasing. These relationships constitute multidimensional networks and create structures at multiple levels. Empirical studies have yet to examine how multidimensional networks in crowdsourcing enable effective large-scale collaboration. The data derive from two distinctly different sources, thus providing opportunities for comparison across a range of online creation-oriented communities. One is a crowdsourcing platform and ecommerce website for creative garment design, and the other is a platform for participants to create innovative designs based on scrap materials. This project will analyze both online community activity and offline purchasing behavior. The data provide a unique opportunity to understand overlapping structures of social interaction driving peer influence and opinion formation as well as the offline economic consequences of this online activity. This study contributes to the literature by (1) analyzing multidimensional network structures of interpersonal and socio-technical interactions within these socio-technical systems, (2) modeling how success feeds back into value-creation processes and facilitates learning, and (3) developing methods to predict the economic success of creative products generated in these contexts. The application and integration of various computational and statistical approaches will provide significant dividends to the broader scientific research community by contributing to the development of technical resources that can be extended to other forms of data-intensive inquiry. This includes documentation about best practices for integrating methods for classification and prediction; courses to train students to perform large-scale data analysis; and developing new theoretical approaches for understanding the multidimensional foundations of cyber-human systems.
Currently, there are no automated tools that have the capacity to perform tracing tasks on the scale of mammalian neural circuits. Needless to say, the existence of such a tool is critical both for basic mapping of synaptic connectivity in normal brains, as well as for describing the changes in the nervous system which underlie neurological disorders. With this proposal we plan to continue the development of Neural Circuit Tracer – software for accurate, automated reconstruction of the structure and dynamics of neurites from 3D light microscopy stacks of images.
Our understanding of brain functions is hindered by the lack of detailed knowledge of synaptic connectivity in the underlying neural network. While synaptic connectivity of small neural circuits can be determined with electron microscopy, studies of connectivity on a larger scale, e.g. whole mouse brain, must be based on light microscopy imaging. It is now possible to fluorescently label subsets of neurons in vivo and image their axonal and dendritic arbors in 3D from multiple brain tissue sections. The overwhelming remaining challenge is neurite tracing, which must be done automatically due to the high-throughput nature of the problem. Currently, there are no automated tools that have the capacity to perform tracing tasks on the scale of mammalian neural circuits. Needless to say, the existence of such a tool is critical both for basic mapping of synaptic connectivity in normal brains, as well as for describing the changes in the nervous system which underlie neurological disorders. With this proposal we plan to continue the development of Neural Circuit Tracer – software for accurate, automated reconstruction of the structure and dynamics of neurites from 3D light microscopy stacks of images. Our goal is to revolutionize the existing functionalities of the software, making it possible to: (i) automatically reconstruct axonal and dendritic arbors of sparsely labeled populations of neurons from multiple stacks of images and (ii) automatically track and quantify changes in the structures of presynaptic boutons and dendritic spines imaged over time. We propose to utilize the latest machine learning and image processing techniques to develop multi-stack tracing, feature detection, and computer-guided trace editing capabilities of the software. All tools and datasets created as part of this proposal will be made available to the research community.
Public Health Relevance
At present, accurate methods of analysis of neuron morphology and synaptic connectivity rely on manual or semi-automated tracing tools. Such methods are time consuming, can be prone to errors, and do not scale up to the level of large brain-mapping projects. Thus, it is proposed to develop open-source software for accurate, automated reconstruction of structure and dynamics of large neural circuits.
This project will develop new research methods to map and quantify the ways in which online search engines, social networks, and e-commerce sites use sophisticated algorithms to tailor content to each individual user.
ABSTRACT
This project will develop new research methods to map and quantify the ways in which online search engines, social networks, and e-commerce sites use sophisticated algorithms to tailor content to each individual user. This “personalization” may often be of value to the user, but it also has the potential to distort search results and manipulate the perceptions and behavior of the user. Given the popularity of personalization across a variety of Web-based services, this research has the potential for extremely broad impact. Being able to quantify the extent to which Web-based services are personalized will lead to greater transparency for users, and the development of tools to identify personalized content will allow users to access information that may be hard to access today.
Personalization is now a ubiquitous feature on many Web-based services. In many cases, personalization provides advantages for users because personalization algorithms are likely to return results that are relevant to the user. At the same time, the increasing levels of personalization in Web search and other systems are leading to growing concerns over the Filter Bubble effect, where users are only given results that the personalization algorithm thinks they want, while other important information remains inaccessible. From a computer science perspective, personalization is simply a tool that is applied to information retrieval and ranking problems. However, sociologists, philosophers, and political scientists argue that personalization can result in inadvertent censorship and “echo chambers.” Similarly, economists warn that unscrupulous companies can leverage personalization to steer users towards higher-priced products, or even implement price discrimination, charging different users different prices for the same item. As the pervasiveness of personalization on the Web grows, it is clear that techniques must be developed to understand and quantify personalization across a variety of Web services.
This research has four primary thrusts: (1) To develop methodologies to measure personalization of mobile content. The increasing popularity of browsing the Web from mobile devices presents new challenges, as these devices have access to sensitive content like the user’s geolocation and contacts. (2) To develop systems and techniques for accurately measuring the prevalence of several personalization trends on a large number of e-commerce sites. Recent anecdotal evidence has shown instances of problematic sales tactics, including price steering and price discrimination. (3) To develop techniques to identify and quantify personalized political content. (4) To measure the extent to which financial and health information is personalized based on location and socio-economic status. All four of these thrusts will develop new research methodologies that may prove effective in other areas of research as well.
PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH
David Lazer, Ryan Kennedy, Gary King, Alessandro Vespignani. “The Parable of Google Flu: Traps in Big Data Analysis,” Science, v.343, 2014, p. 1203.
This project is using cloud computing to re-architect web-based services in order to enable end users to regain privacy and control over their data. In this approach—a confederated architecture—each user provides the computing resources necessary to support her use of the service via cloud providers.
Users today have access to a broad range of free, web-based social services. All of these services operate under a similar model: Users entrust the service provider with their personal information and content, and in return, the service provider makes their service available for free by monetizing the user-provided information and selling the results to third parties (e.g., advertisers). In essence, users pay for these services by providing their data (i.e., giving up their privacy) to the provider.
This project is using cloud computing to re-architect web-based services in order to enable end users to regain privacy and control over their data. In this approach—a confederated architecture—each user provides the computing resources necessary to support her use of the service via cloud providers. All user data is encrypted and not exposed to any third-parties, users retain control over their information, and users access the service via a web browser as normal.
The incredible popularity of today’s web-based services has lead to significant concerns over privacy and user control over data. Addressing these concerns requires a re-thinking of the current popular web-based business models, and, unfortunately, existing providers are dis-incentivized from doing so. The impact of this project will potentially be felt by the millions of users who use today’s popular services, who will be provided with an alternative to the business models of today.
The thesis of this project is that one can obtain the best of both worlds — performance evaluations with the quality of experts but at the cost of crowd workers — by optimally leveraging both experts and crowd workers in asking the “right” assessor the “right” question at the “right” time.
Evaluating the performance of information retrieval systems such as search engines is critical to their effective development. Current “gold standard” performance evaluation methodologies generally rely on the use of expert assessors to judge the quality of documents or web pages retrieved by search engines, at great cost in time and expense. The advent of “crowd sourcing,” such as available through Amazon’s Mechanical Turk service, holds out the promise that these performance evaluations can be performed more rapidly and at far less cost through the use of many (though generally less skilled) “crowd workers”; however, the quality of the resulting performance evaluations generally suffer greatly. The thesis of this project is that one can obtain the best of both worlds — performance evaluations with the quality of experts but at the cost of crowd workers — by optimally leveraging both experts and crowd workers in asking the “right” assessor the “right” question at the “right” time. For example, one might ask inexpensive crowd workers what are likely to be “easy” questions while reserving what are likely to be “hard” questions for the expensive experts. While the project focuses on the performance evaluation of search engines as its use case, the techniques developed will be more broadly applicable to many domains where one wishes to efficiently and effectively harness experts and crowd workers with disparate levels of cost and expertise.
To enable the vision described above, a probabilistic framework will be developed within which one can quantify the uncertainty about a performance evaluation as well as the cost and expected utility of asking any assessor (expert or crowd worker) any question (e.g. a nominal judgment for a document or a preference judgment between two documents) at any time. The goal is then to ask the “right” question of the “right” assessor at any time in order to maximize the expected utility gained per unit cost incurred and then to optimally aggregate such responses in order to efficiently and effectively evaluate performance.
This project seeks to demonstrate how to build realistic yet secure compilers. This is a notoriously difficult problem. On one hand, a secure compiler must ensure that low-level contexts cannot launch any “attacks” on the compiled component that would have been impossible to launch in the high-level language. On the other hand, a realistic compiler cannot simply limit the expressiveness of the low-level target language to achieve the security goal.
Advanced programming languages, based on dependent types, enable program verification alongside program development, thus making them an ideal tool for building fully verified, high assurance software. Recent dependently typed languages that permit reasoning about state and effects—such as Hoare Type Theory (HTT) and Microsoft’s F*—are particularly promising and have been used to verify a range of rich security policies, from state-dependent information flow and access control to conditional declassification and information erasure. But while these languages provide the means to verify security and correctness of high-level source programs, what is ultimately needed is a guarantee that the same properties hold of compiled low-level target code. Unfortunately, even when compilers for such advanced languages exist, they come with no formal guarantee of correct compilation, let alone any guarantee of secure compilation—i.e., that compiled components will remain as secure as their high-level counterparts when executed within arbitrary low-level contexts. This project seeks to demonstrate how to build realistic yet secure compilers. This is a notoriously difficult problem. On one hand, a secure compiler must ensure that low-level contexts cannot launch any “attacks” on the compiled component that would have been impossible to launch in the high-level language. On the other hand, a realistic compiler cannot simply limit the expressiveness of the low-level target language to achieve the security goal.
The intellectual merit of this project is the development of a powerful new proof architecture for realistic yet secure compilation of dependently typed languages that relies on contracts to ensure that target-level contexts respect source-level security guarantees and leverages these contracts in a formal model of how source and target code may interoperate. The broader impact is that this research will make it possible to compose high-assurance software components into high-assurance software systems, regardless of whether the components are developed in a high-level programming language or directly in assembly. Compositionality has been a long-standing open problem for certifying systems for high-assurance. Hence, this research has potential for enormous impact on how high-assurance systems are built and certified. The specific goal of the project is to develop a verified multi-pass compiler from Hoare Type Theory to assembly that is type preserving, correct, and secure. The compiler will include passes that perform closure conversion, heap allocation, and code generation. To prove correct compilation of components, not just whole programs, this work will use an approach based on defining a formal semantics of interoperability between source components and target code. To guarantee secure compilation, the project will use (static) contract checking to ensure that compiled code is only run in target contexts that respect source-level security guarantees. To carry out proofs of compiler correctness, the project will develop a logical relations proof method for Hoare Type Theory.
The intellectual merit of this project is the development of a proof architecture for building verified compilers for today’s world of multi-language software: such verified compilers guarantee correct compilation of components and support linking with arbitrary target code, no matter its source.
Compilers play a critical role in the production of software. As such, they should be correct. That is, they should preserve the behavior of all programs they compile. Despite remarkable progress on formally verified compilers in recent years, these compilers suffer from a serious limitation: they are proved correct under the assumption that they will only be used to compile whole programs. This is an entirely unrealistic assumption since most software systems today are comprised of components written in different languages compiled by different compilers to a common low-level target language. The intellectual merit of this project is the development of a proof architecture for building verified compilers for today’s world of multi-language software: such verified compilers guarantee correct compilation of components and support linking with arbitrary target code, no matter its source. The project’s broader significance and importance are that verified compilation of components stands to benefit practically every software system, from safety-critical software to web browsers, because such systems use libraries or components that are written in a variety of languages. The project will achieve broad impact through the development of (i) a proof methodology that scales to realistic multi-pass compilers and multi-language software, (ii) a target language that extends LLVM—increasingly the target of choice for modern compilers—with support for compilation from type-safe source languages, and (iii) educational materials related to the proof techniques employed in the course of this project.
The project has two central themes, both of which stem from a view of compiler correctness as a language interoperability problem. First, specification of correctness of component compilation demands a formal semantics of interoperability between the source and target languages. More precisely: if a source component (say s) compiles to target component (say t), then t linked with some arbitrary target code (say t’) should behave the same as s interoperating with t’. Second, enabling safe interoperability between components compiled from languages as different as Java, Rust, Python, and C, requires the design of a gradually type-safe target language based on LLVM that supports safe interoperability between more precisely typed, less precisely typed, and type-unsafe components.
This project will support a plugin architecture for transparent checkpoint-restart.
Society’s increasingly complex cyberinfrastructure creates a concern for software robustness and reliability. Yet, this same complex infrastructure is threatening the continued use of fault tolerance. Consider when a single application or hardware device crashes. Today, in order to resume that application from the point where it crashed, one must also consider the complex subsystem to which it belongs. While in the past, many developers would write application-specific code to support fault tolerance for a single application, this strategy is no longer feasible when restarting the many inter-connected applications of a complex subsystem. This project will support a plugin architecture for transparent checkpoint-restart. Transparency implies that the software developer does not need to write any application-specific code. The plugin architecture implies that each software developer writes the necessary plugins only once. Each plugin takes responsibility for resuming any interrupted sessions for just one particular component. At a higher level, the checkpoint-restart system employs an ensemble of autonomous plugins operating on all of the applications of a complex subsystem, without any need for application-specific code.
The plugin architecture is part of a more general approach called process virtualization, in which all subsystems external to a process are virtualized. It will be built on top of the DMTCP checkpoint-restart system. One simple example of process virtualization is virtualization of ids. A plugin maintains a virtualization table and arranges for the application code of the process to see only virtual ids, while the outside world sees the real id. Any system calls and library calls using this real id are extended to translate between real and virtual id. On restart, the real ids are updated with the latest value, and the process memory remains unmodified, since it contains only virtual ids. Other techniques employing process virtualization include shadow device drivers, record-replay logs, and protocol virtualization. Some targets of the research include transparent checkpoint-restart support for the InfiniBand network, for programmable GPUs (including shaders), for networks of virtual machines, for big data systems such as Hadoop, and for mobile computing platforms such as Android.
PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH
Kapil Arya and Gene Cooperman. “DMTCP: Bringing Interactive Checkpoint?Restart to Python,” Computational Science & Discovery, v.8, 2015, p. 16 pages. doi:10.1088/issn.1749-4699
Jiajun Cao, Matthieu Simoni, Gene Cooperman,
and Christine Morin. “Checkpointing as a Service in Heterogeneous Cloud Environments,” Proc. of 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid’15),, 2015, p. 61–70. doi:10.1109/CCGrid.2015.160
This project will focus on the development of the REDEX tool, a lightweight domain-specific tool for modeling programming languages useful for software development. Originally developed as an in-house tool for a small group of collaborating researchers, REDEX escaped the laboratory several years ago and acquired a dedicated user community; new users now wish to use it for larger and more complicated programming languages than originally envisioned. Using this framework, a programmer articulates a programming language model directly as a software artifact with just a little more effort than paper-and-pencil models. Next, the user invokes diagnostic tools to test a model’s consistency, explore its properties, and check general claims about it.
This award funds several significant improvements to REDEX: (1) a modular system that allows its users to divide up the work, (2) scalable performance so that researchers can deal with large models, and (3) improvements to its testing and error-detection system. The award also includes support for the education of REDEX’s quickly growing user community, e.g., support for organizing tutorials and workshops.
This project addresses an urgent, emergent need at the intersection of software maintenance and programming language research. Over the past 20 years, working software engineers have embraced so-called scripting languages for a variety of tasks. Software engineers choose these languages because they make prototyping easy, and before the engineers realize it, these prototypes evolve into large, working systems and escape into the real world. Like all software, these systems need to be maintained—mistakes must be fixed, their performance requires improvement, security gaps call for fixes, their functionality needs to be enhanced—but scripting languages render maintenance difficult. The intellectual merits of this project are to address all aspects of this real-world software engineering problem.
The “Gradual Typing Across the Spectrum” project addresses an urgent, emergent need at the intersection of software maintenance and programming language research. Over the past 20 years, working software engineers have embraced so-called scripting languages for a variety of tasks. They routinely use JavaScript for interactive web pages, Ruby on Rails for server-side software, Python for data science, and so on. Software engineers choose these languages because they make prototyping easy, and before the engineers realize it, these prototypes evolve into large, working systems and escape into the real world. Like all software, these systems need to be maintained—mistakes must be fixed, their performance requires improvement, security gaps call for fixes, their functionality needs to be enhanced—but scripting languages render maintenance difficult. The intellectual merits of this project are to address all aspects of this real-world software engineering problem. In turn, the project’s broader significance and importance are the deployment of new technologies to assist the programmer who maintains code in scripting languages, the creation of novel technologies that preserve the advantages of these scripting frameworks, and the development of curricular materials that prepares the next generation of students for working within these frameworks.
A few years ago, the PIs launched programming language research efforts to address this problem. They diagnosed the lack of sound types in scripting languages as one of the major factors. With types in conventional programming languages, programmers concisely communicate design information to future maintenance workers; soundness ensures the types are consistent with the rest of the program. In response, the PIs explored the idea of gradual typing, that is, the creation of a typed sister language (one per scripting language) so that (maintenance) programmers can incrementally equip systems with type annotations. Unfortunately, these efforts have diverged over the years and would benefit from systematic cross-pollination.
With support from this grant, the PIs will systematically explore the spectrum of their gradual typing system with a three-pronged effort. First, they will investigate how to replicate results from one project in another. Second, they will jointly develop an evaluation framework for gradual typing projects with the goal of diagnosing gaps in the efforts and needs for additional research. Third, they will explore the creation of new scripting languages that benefit from the insights of gradual typing research.
This research will leverage the sensing capabilities of the TDS system and PI Patel’s expertise in spoken interaction technologies for individuals with speech impairment, as well as Co-PI Fu’s work on machine learning and multimodal data fusion, to develop a prototype clinically viable tool for enhancing speech clarity by coupling lingual-kinematic and acoustic data.
Speech is a complex and intricately timed task that requires the coordination of numerous muscle groups and physiological systems. While most children acquire speech with relative ease, it is one of the most complex patterned movements accomplished by humans and thus susceptible to impairment. Approximately 2% of Americans have imprecise speech either due to mislearning during development (articulation disorder) or as a result of neuromotor conditions such as stroke, brain injury, Parkinson’s disease, cerebral palsy, etc. An equally sizeable group of Americans have difficulty with English pronunciation because it is their second language. Both of these user groups would benefit from tools that provide explicit feedback on speech production clarity. Traditional speech remediation relies on viewing a trained clinician’s accurate articulation and repeated practice with visual feedback via a mirror. While these interventions are effective for readily viewable speech sounds (visemes such as /b/p/m/), they are largely unsuccessful for sounds produced inside the mouth. The tongue is the primary articulator for these obstructed sounds and its movements are difficult to capture. Thus, clinicians use diagrams and other low-tech means (such as placing edible substances on the palate or physically manipulating the oral articulators) to show clients where to place their tongue. While sophisticated research tools exist for measuring and tracking tongue movements during speech, they are prohibitively expensive, obtrusive, and impractical for clinical and/or home use. The PIs’ goal in this exploratory project, which represents a collaboration across two institutions, is to lay the groundwork for a Lingual-Kinematic and Acoustic sensor technology (LinKa) that is lightweight, low-cost, wireless and easy to deploy both clinically and at home for speech remediation.
PI Ghovanloo’s lab has developed a low-cost, wireless, and wearable magnetic sensing system, known as the Tongue Drive System (TDS). An array of electromagnetic sensors embedded within a headset detects the position of a small magnet that is adhered to the tongue. Clinical trials have demonstrated the feasibility of using the TDS for computer access and wheelchair control by sensing tongue movements in up to 6 discrete locations within the oral cavity. This research will leverage the sensing capabilities of the TDS system and PI Patel’s expertise in spoken interaction technologies for individuals with speech impairment, as well as Co-PI Fu’s work on machine learning and multimodal data fusion, to develop a prototype clinically viable tool for enhancing speech clarity by coupling lingual-kinematic and acoustic data. To this end, the team will extend the TDS to track tongue movements during running speech, which are quick, compacted within a small area of the oral cavity, and often overlap for several phonemes, so the challenge will be to accurately classify movements for different sound classes. To complement this effort, pattern recognition of sensor spatiotemporal dynamics will be embedded into an interactive game to offer a motivating, personalized context for speech motor (re)learning by enabling audiovisual biofeedback, which is critical for speech modification. To benchmark the feasibility of the approach, the system will be evaluated on six individuals with neuromotor speech impairment and six healthy age-matched controls.
The goal of the Dialog project is to create channels of communication between these translation processes and software engineers, with the expectation that the latter can use this new source of information to improve the speed, size, or energy consumption of their software.
The “Compiler Coaching” (Dialog) project represents an investment in programming language tools and technology. Software engineers use high-level programming languages on a daily basis to produce the apps and applications that everyone uses and that control everybody’s lives. Once a programming language translator accepts a program as grammatically correct, it creates impenetrable computer codes without informing the programmer how well (fast or slow, small or large, energy hogging or efficient) these codes will work. Indeed, modern programming languages employ increasingly sophisticated translation techniques and have become obscure black boxes to the working engineer. The goal of the Dialog project is to create channels of communication between these translation processes and software engineers, with the expectation that the latter can use this new source of information to improve the speed, size, or energy consumption of their software.
The PIs will explore the Dialog idea in two optimizing compiler settings, one on the conventional side and one on the modern one: for the Racket language, a teaching and research vehicle that they can modify as needed to create the desired channel, and the JavaScript programming language, the standardized tool for existing Web applications. The intellectual merits concern the fundamental principles of creating such communication channels and frameworks for gathering empirical evidence on how these channels benefit the working software engineer. These results should enable the developers of any programming language to implement similar channels of communication to help their clients. The broader impacts are twofold. On one hand, the project is likely to positively impact the lives of working software engineers as industrial programming language creators adapt the Dialog idea. On the other hand, the project will contribute to a two-decades old, open-source programming language project with a large and longstanding history of educational outreach at multiple levels. The project has influenced hundreds of thousands of high school students in the past and is likely to do so in the future.
Prior academic work has examined how to automatically discover vulnerabilities in binary software, and even how to automatically craft exploits for these vulnerabilities, the ability to answer basic security-relevant questions about closed-source software remains elusive. This project aims to provide algorithms and tools for answering these questions.
Software, including common examples such as commercial applications or embedded device firmware, is often delivered as closed-source binaries. While prior academic work has examined how to automatically discover vulnerabilities in binary software, and even how to automatically craft exploits for these vulnerabilities, the ability to answer basic security-relevant questions about closed-source software remains elusive.
This project aims to provide algorithms and tools for answering these questions. Leveraging prior work on emulator-based dynamic analyses, we propose techniques for scaling this high-fidelity analysis to capture and extract whole-system behavior in the context of embedded device firmware and closed-source applications. Using a combination of dynamic execution traces collected from this analysis platform and binary code analysis techniques, we propose techniques for automated structural analysis of binary program artifacts, decomposing system and user-level programs into logical modules through inference of high-level semantic behavior. This decomposition provides as output an automatically learned description of the interfaces and information flows between each module at a sub-program granularity. Specific activities include: (a) developing software-guided whole-system emulator for supporting sophisticated dynamic analyses for real embedded systems; (b) developing advanced, automated techniques for structurally decomposing closed-source software into its constituent modules; (c) developing automated techniques for producing high-level summaries of whole system executions and software components; and (d) developing techniques for automating the reverse engineering and fuzz testing of encrypted network protocols. The research proposed herein will have a significant impact outside of the security research community. We will incorporate the research findings of our program into our undergraduate and graduate teaching curricula, as well as in extracurricular educational efforts such as Capture-the-Flag that have broad outreach in the greater Boston and Atlanta metropolitan areas.
The close ties to industry that the collective PIs possess will facilitate transitioning the research into practical defensive tools that can be deployed into real-world systems and networks.
This project studies the design of highly robust networked systems that are resilient to extreme failures and rapid dynamics, and provide optimal performance under a wide spectrum of scenarios with varying levels of predictability.
Modern information networks are composed of heterogeneous nodes and links, whose capacities and capabilities change unexpectedly due to mobility, failures, maintenance, and adversarial attacks. User demands and critical infrastructure needs, however, require that basic primitives including access to information and services be always efficient and reliable. This project studies the design of highly robust networked systems that are resilient to extreme failures and rapid dynamics, and provide optimal performance under a wide spectrum of scenarios with varying levels of predictability.
The focus of this project will be on two problem domains, which together address adversarial network dynamics and stochastic network failures. The first component is a comprehensive theory of information spreading in dynamic networks. The PI will develop an algorithmic toolkit for dynamic networks, including local gossip-style protocols, network coding, random walks, and other diffusion processes. The second component of the project concerns failure-aware network algorithms that provide high availability in the presence of unexpected and correlated failures. The PI will study failure-aware placement of critical resources, and develop flow and cut algorithms under stochastic failures using techniques from chance-constrained optimization. Algorithms tolerant to adversarial and stochastic uncertainty will play a critical role in large-scale heterogeneous information networks of the future. Broader impacts include student training and curriculum development.
The goal of this project is to study the foundations of policy design for controlling epidemics, using a broad class of epidemic games on complex networks involving uncertainty in network information, temporal evolution and learning.
The control of epidemics, broadly defined to range from human diseases such as influenza and smallpox to malware in communication networks, relies crucially on interventions such as vaccinations and anti-virals (in human diseases) or software patches (for malware). These interventions are almost always voluntary directives from public agencies; however, people do not always adhere to such recommendations, and make individual decisions based on their specific “self interest”. Additionally, people alter their contacts dynamically, and these behavioral changes have a huge impact on the dynamics and the effectiveness of these interventions, so that “good” intervention strategies might, in fact, be ineffective, depending upon the individual response.
The goal of this project is to study the foundations of policy design for controlling epidemics, using a broad class of epidemic games on complex networks involving uncertainty in network information, temporal evolution and learning. Models will be proposed to capture the complexity of static and temporal interactions and patterns of information exchange, including the possibility of failed interventions and the potential for moral hazard. The project will also study specific policies posed by public agencies and network security providers for controlling the spread of epidemics and malware, and will develop resource constrained mechanisms to implement them in this framework.
This project will integrate approaches from Computer Science, Economics, Mathematics, and Epidemiology to give intellectual unity to the study and design of public health policies and has the potential for strong dissertation work in all these areas. Education and outreach is an important aspect of the project, and includes curriculum development at both the graduate and under-graduate levels. A multi-disciplinary workshop is also planned as part of the project.
The objective of the proposed research is to make progress on several mutually enriching directions in computational complexity theory, including problems at the intersections with algorithms and cryptography.
Computational inefficiency is a common experience: the computer cannot complete a certain task due to lack of resources such as time, memory, or bandwidth. Computational complexity theory classifies — or aims to classify — computational tasks according to their inherent inefficiency. Since tasks requiring excessive resources must be avoided, complexity theory is often indispensable in the design of a computer system. Inefficiency can also be harnessed to our advantage. Indeed, most modern cryptography and electronic commerce rely on the (presumed) inefficiency of certain computational tasks.
The objective of the proposed research is to make progress on several mutually enriching directions in computational complexity theory, including problems at the intersections with algorithms and cryptography. Building on the principal investigator’s (PI’s) previous works, the main proposed directions are:
This research is closely integrated with a plan to achieve broad impact through education. The PI is reshaping the theory curriculum at Northeastern on multiple levels. At the undergraduate level, the PI is working on and using in his classes a set of lecture notes aimed towards students lacking mathematical maturity. At the Ph.D. level, the PI is including into core classes current research topics including some of the above. Finally, the PI will continue to do research working closely with students at all levels.
This project will afford the opportunity of greatly expanding the understanding of realistic complex networks by joining theoretical analysis of coupled networks with extensive analysis of appropriately chosen large-scale databases.
The significant advances realized in recent years in the study of complex networks are severely limited by an almost exclusive focus on the behavior of single networks. However, most networks in the real world are not isolated but are coupled and hence depend upon other networks, which in turn depend upon other networks. Real networks communicate with each other and may exchange information, or, more importantly, may rely upon one another for their proper functioning. A simple but real example is a power station network that depends on a computer network, and the computer network depends on the power network. Our social networks depend on technical networks, which, in turn, are supported by organizational networks. Surprisingly, analyzing complex systems as coupled interdependent networks alters the most basic assumptions that network theory has relied on for single networks. A multidisciplinary, data driven research project will: 1) Study the microscopic processes that rule the dynamics of interdependent networks, with a particular focus on the social component; 2) Define new mathematical models/foundational theories for the analysis of the robustness/resilience and contagion/diffusive dynamics of interdependent networks. This project will afford the opportunity of greatly expanding the understanding of realistic complex networks by joining theoretical analysis of coupled networks with extensive analysis of appropriately chosen large-scale databases. These databases will be made publicly available, except for special cases where it is illegal to do so.
This research has important implications for the understanding the social and technical systems that make up a modern society. A recent US Scientific Congressional Report concludes ?No currently available modeling and simulation tools exist that can adequately address the consequences of disruptions and failures occurring simultaneously in different critical infrastructures that are dynamically inter-dependent? Understanding the interdependence of networks and its effect on the system robustness and on the structural and functional behavior is crucial for properly modeling many real world systems and applications, from disaster preparedness, to building effective organizations, to comprehending the complexity of the macro economy. In addition to these intellectual objectives, the research project includes the development of an extensive outreach program to the public, especially K-12 students.
This research targets the design and evaluation of protocols for secure, privacy-preserving data analysis in an untrusted cloud.
Therewith, the user can store and query data in the cloud, preserving privacy and integrity of outsourced data and queries. The PIs specifically address a real-world cloud framework: Google’s prominent MapReduce paradigm.
Traditional solutions for single server setups and related work on, e.g., fully homomorphic encryption, are computationally too heavy and uneconomical and offset cloud advantages. The PIs’ rationale is to design new protocols tailored to the specifics of the MapReduce computing paradigm. The PIs’ methodology is twofold. First, the PIs design new protocols that allow the cloud user to specify data analysis queries for typical operations such as searching, pattern matching or counting. For this, the PIs extend privacy-preserving techniques, e.g., private information retrieval or order preserving encryption. Second, the PIs design protocols guaranteeing genuineness of data retrieved from the cloud. Using cryptographic accumulators, users can verify whether data has not been tampered with. Besides design, the PIs also implement a prototype that is usable in a realistic setting with MapReduce.
The outcome of this project enables privacy-preserving operations and secure data storage in a widely-used cloud computing framework, thus remove one major adoption obstacle, and make cloud computing available for a larger community.
This project aims to comprehensively investigate the resiliency of Wi-Fi networks to smart attacks, and to design and implement robust solutions capable of resisting or countering them. The project additionally focuses on harnessing new capabilities of Wi-Fi radios, such as multiple-input and multiple-output (MIMO) antennas, to protect against powerful adversaries.
Wi-Fi has emerged as the technology of choice for Internet access. Thus, virtually every smartphone or tablet is now equipped with a Wi-Fi card. Concurrently, and as a means to maximize spectral efficiency, Wi-Fi radios are becoming increasingly complex and sensitive to wireless channel conditions. The prevalence of Wi-Fi networks, along with their adaptive behaviors, makes them an ideal target for denial of service attacks at a large, infrastructure level.
This project aims to comprehensively investigate the resiliency of Wi-Fi networks to smart attacks, and to design and implement robust solutions capable of resisting or countering them. The project additionally focuses on harnessing new capabilities of Wi-Fi radios, such as multiple-input and multiple-output (MIMO) antennas, to protect against powerful adversaries. The research blends theory with experimentation and prototyping, and spans a range of disciplines including protocol design and analysis, coding and modulation, on-line algorithms, queuing theory, and emergent behaviors.
The anticipated benefits of the project include: (1) a deep understanding of threats facing Wi-Fi along several dimensions, via experiments and analysis; (2) a set of mitigation techniques and algorithms to strengthen existing Wi-Fi networks and emerging standards; (3) implementation into open-source software that can be deployed on wireless network cards and access points; (4) security training of the next-generation of scientists and engineers involved in radio design and deployment.
The objective of this research is to develop a comprehensive theoretical and experimental cyber-physical framework to enable intelligent human-environment interaction capabilities by a synergistic combination of computer vision and robotics.
Specifically, the approach is applied to examine individualized remote rehabilitation with an intelligent, articulated, and adjustable lower limb orthotic brace to manage Knee Osteoarthritis, where a visual-sensing/dynamical-systems perspective is adopted to: (1) track and record patient/device interactions with internet-enabled commercial-off-the-shelf computer-vision-devices; (2) abstract the interactions into parametric and composable low-dimensional manifold representations; (3) link to quantitative biomechanical assessment of the individual patients; (4) facilitate development of individualized user models and exercise regimen; and (5) aid the progressive parametric refinement of exercises and adjustment of bracing devices. This research and its results will enable us to understand underlying human neuro-musculo-skeletal and locomotion principles by merging notions of quantitative data acquisition, and lower-order modeling coupled with individualized feedback. Beyond efficient representation, the quantitative visual models offer the potential to capture fundamental underlying physical, physiological, and behavioral mechanisms grounded on biomechanical assessments, and thereby afford insights into the generative hypotheses of human actions.
Knee osteoarthritis is an important public health issue, because of high costs associated with treatments. The ability to leverage a quantitative paradigm, both in terms of diagnosis and prescription, to improve mobility and reduce pain in patients would be a significant benefit. Moreover, the home-based rehabilitation setting offers not only immense flexibility, but also access to a significantly greater portion of the patient population. The project is also integrated with extensive educational and outreach activities to serve a variety of communities.
This multi-institutional MIDAS Center of Excellence provides a multi-disciplinary approach to computational, statistical, and mathematical modeling of important infectious diseases.
This is a proposal for a multi-institutional MIDAS Center of Excellence called the Center for Statistics and Quantitative Infectious Diseases (CSQUID). The mission the Center is to provide national and international leadership. The lead institution is the Fred Hutchinson Cancer Research Center (FHCRC). Other participating institutions are the University of Florida, Northeastern University, University of Michigan, Emory University, University of Washington (UW), University of Georgia, and Duke University. The proposal includes four synergistic research projects (RP) that will develop cutting-edge methodologies applied to solving epidemiologic, immunologic and evolutionary problems important for public health policy in influenza, dengue, polio, TB, and other infectious agents: RP1: Modeling, Spatial, Statistics (Lead: I. Longini, U. Florida);RP2: Dynamic Inference (Lead: P. Rohani, U Michigan);RP 3: Understanding transmission with integrated genetic and epidemiologic inference (Co-Leads: E. Kenah, U Florida and T. Bedford, FHCRC);RP 4: Dynamics and Evolution of Influenza Strain Variation (Lead: R. Antia, Emory U). The Software Development and Core Facilities (Lead: A. Vespignani, Northeastern U) will provide leadership in software development, access, and communication. The Policy Studies (Lead: J. Koopman, U Michigan) will provide leadership in communication of our research results to policy makers, as well as conducting novel research into policy making. The Training, Outreach, and Diversity Plans include ongoing training of 9 postdoctoral fellows and 5.25 predoctoral research assistants each year, support for participants in the Summer Institute for Statistics and Modeling in Infectious Diseases (UW) and ongoing Research Experience for Undergraduates programs at two institutions, among others. All participating institutions and the Center are committed to increasing diversity at all levels. Center-wide activities include Career Development Awards for junior faculty, annual workshops and symposia, outside speakers, and participation in the MIDAS Network meetings. Scientific leadership will be provided by the Center Director, a Leadership Committee, an external Scientific Advisory Board as well as the MIDAS Steering Committee.
Public Health Relevance
This multi-institutional MIDAS Center of Excellence provides a multi-disciplinary approach to computational, statistical, and mathematical modeling of important infectious diseases. The research is motivated by multiscale problems such as immunologic, epidemiologic, and environmental drivers of the spread of infectious diseases with the goal of understanding and communicating the implications for public health policy.
Using the Asthma BioRepository for Integrative Genomic Exploration (Asthma BRIDGE), we will perform a series of systems-level genomic analyses that integrate clinical, environmental and various forms of “omic” data (genetics, genomics, and epigenetics) to better understand how molecular processes interact with critical environmental factors to impair asthma control.
The over-arching hypothesis of this proposal is that inter-individual differences in asthma control result from the complex interplay of both environmental, genomic, and socioeconomic factors organized in discrete, scale-free molecular networks. Though strict patient compliance with asthma controller therapy and avoidance of environmental triggers are important strategies for the prevention of asthma exacerbation, failure to maintain control is the most common health-related cause of lost school and workdays. Therefore, better understanding of the molecular underpinnings and the role of environmental factors that lead to poor asthma control is needed. Using the Asthma BioRepository for Integrative Genomic Exploration (Asthma BRIDGE), we will perform a series of systems-level genomic analyses that integrate clinical, environmental and various forms of “omic” data (genetics, genomics, and epigenetics) to better understand how molecular processes interact with critical environmental factors to impair asthma control. This proposal consists three Specific Aims, each consisting of three investigational phases: (i) an initial computational discovery phase to define specific molecular networks using the Asthma BRIDGE datasets, followed by two validation phases – (ii) a computational validation phase using an independent clinical cohort, and (iii) an experimental phase to validate critical molecular edges (gene-gene interactions) that emerge from the defined molecular network.
In Specific Aim 1, we will use the Asthma BRIDGE datasets to define interactome sub-module perturbed in poor asthma control;the regulatory variants that modulate this asthma-control module;and to develop a predictive model of asthma control.
In Specific Aim 2, we will study the effects exposure to air pollution and environmental tobacco smoke on modulating the asthma control networks, testing for environment-dependent alterations in network dynamics.
In Specific Aim 3, we will study the impact of inhaled corticosteroids (ICS – the most efficacious asthma-controller medication) on network dynamics of the asthma-control sub-module by comparing network topologies of acute asthma control between subjects taking ICS to those not on ICS. For our experimental validations, we will assess relevant gene-gene interactions by shRNA studies bronchial epithelial and Jurkat T- cell lines. Experimental validations of findings from Aim 2 will be performed by co-treating cells with either cigarette smoke extract (CSE) or ozone. Similar studies will be performed with co-treatment using dexamethasone to validate findings from Aim 2. From the totality of these studies, we will gain new insights into the pathobiology of poor asthma control, and define targets for biomarker development and therapeutic targeting.
Public Health Relevance
Failure to maintain tight asthma symptom control is a major health-related cause of lost school and workdays. This project aims to use novel statistical network-modeling approaches to model the molecular basis of poor asthma control in a well-characterized cohort of asthmatic patients with available genetic, gene expression, and DNA methylation data. Using this data, we will define an asthma-control gene network, and the genetic, epigenetic, and environmental factors that determine inter-individual differences in asthma control.
Crowdsourcing measurement of mobile Internet performance, now the engine for Mobiperf.
Mobilyzer is a collaboration between Morley Mao’s group at the University of Michigan and David Choffnes’ group at Northeastern University.
Mobilyzer provides the following components:
Measurements, analysis, and system designs to reveal how the Internet’s most commonly used trust systems operate (and misfunction) in practice, and how we can make them more secure.
Research on the SSL/TLS Ecosystem
Every day, we use Secure Sockets Layer (SSL) and Transport Layer Security (TLS) to secure our Internet transactions such as banking, e-mail and e-commerce. Along with a public key infrastructure (PKI), they allow our computers to automatically verify that our sensitive information (e.g., credit card numbers and passwords) are hidden from eavesdroppers and sent to trustworthy servers.
In mid-April, 2014, a software vulnerability called Heartbleed was announced. It allows malicious users to capture information that would allow them to masquerade as trusted servers and potentially steal sensitive information from unsuspecting users. The PKI provides multiple ways to prevent such an attack from occurring, and we should expect Web site operators to use these countermeasures.
In this study, we found that the overwhelming majority of sites (more than 73%) did not do so, meaning visitors to their sites are vulnerable to attacks such as identify theft. Further, the majority of sites that attempted to address the problem (60%) did so in a way that leaves customers vulnerable.
Practical and powerful privacy for network communication (led by Stevens Le Blond at MPI).
Entails several threads that cover Internet measurement, modeling and experimentation.
Understanding the geographic nature of Internet paths and their implications for performance, privacy and security.
This study sheds light on this issue by measuring how and when Internet traffic traverses national boundaries. To do this, we ask you to run our browser applet that visits various popular websites, measures the paths taken, and identifies their locations. By running our tool, you will help us understand if and how Internet paths traverse national boundaries, even when two endpoints are in the same country. And we’ll show you these paths, helping you to understand where your Internet traffic goes
This project will develop methodologies and tools for conducting algorithm audits. An algorithm audit uses controlled experiments to examine an algorithmic system, such as an online service or big data information archive, and ascertain (1) how it functions, and (2) whether it may cause harm.
Examples of documented harms by algorithms include discrimination, racism, and unfair trade practices. Although there is rising awareness of the potential for algorithmic systems to cause harm, actually detecting this harm in practice remains a key challenge. Given that most algorithms of concern are proprietary and non-transparent, there is a clear need for methods to conduct black-box analyses of these systems. Numerous regulators and governments have expressed concerns about algorithms, as well as a desire to increase transparency and accountability in this area.
This research will develop methodologies to audit algorithms in three domains that impact many people: online markets, hiring websites, and financial services. Auditing algorithms in these three domains will require solving fundamental methodological challenges, such as how to analyze systems with large, unknown feature sets, and how to estimate feature values without ground-truth data. To address these broad challenges, the research will draw on insights from prior experience auditing personalization algorithms. Additionally, each domain also brings unique challenges that will be addressed individually. For example, novel auditing tools will be constructed that leverage extensive online and offline histories. These new tools will allow examination of systems that were previously inaccessible to researchers, including financial services companies. Methodologies, open-source code, and datasets will be made available to other academic researchers and regulators. This project includes two integrated educational objectives: (1) to create a new computer science course on big data ethics, teaching how to identify and mitigate harmful side-effects of big data technologies, and (2) production of web-based versions of the auditing tools that are designed to be accessible and informative to the general public, that will increase transparency around specific, prominent algorithmic systems, as well as promote general education about the proliferation and impact of algorithmic systems.
This project aims to investigate the development of procedural narrative systems using crowd-sourcing methods.
This project will create a framework for simulation-based training, which supports a learner’s exploration and replay, and exercise theory of mind skills in order to deliver the full promise of social skills training. The term Theory of Mind (ToM) refers to the human capacity to use beliefs about the mental processes and states of others. In order to train social skills, there has been a rapid growth in narrative-based simulations that allow learners to role-play social interactions. However, the design of these systems often constrains the learner’s ability to explore different behaviors and their consequences. Attempts to support more generative experiences face a combinatorial explosion of alternative paths through the interaction, presenting an overwhelming challenge for developers to create content for all the alternatives. Rather, training systems are often designed around exercising specific behaviors in specific situations, hampering the learning of more general skills in using ToM. This research seeks to solve this problem through three contributions: (1) a new model for conceptualizing narrative and role-play experiences that addresses generativity, (2) new methods that facilitate content creation for those generative experiences, and (3) an approach that embeds theory of mind training in the experience to allow for better learning outcomes. This research is applicable to complex social skill training across a range of situations: in schools, communities, the military, police, homeland security, and ethnic conflict.
The research begins with a paradigm shift that re-conceptualizes social skills simulation as a learner rehearsing a role instead of performing a role. This shift will exploit Stanislavsky’s Active Analysis (AA), a performance rehearsal technique that explicitly exercises Theory of Mind skills. Further, AA’s decomposition into short rehearsal scenes can break the combinatorial explosion over long narrative arcs that exacerbates content creation for social training systems. The research will then explore using behavior fitting and machine learning techniques on crowd-sourced data as way to semi-automate the development of multi-agent simulations for social training. The research will assess quantitatively and qualitatively the ability of this approach to (a) provide experiences that support exploration and foster ToM use and (b) support acquiring crowd-sourced data that can be used to craft those experiences using automatic methods.
This project is unique in combining cutting-edge work in modeling theory of mind, interactive environments, performance rehearsal, and crowd sourcing. The multidisciplinary collaboration will enable development of a methodology for creating interactive experiences that pushes the boundaries of the current state of the art in social skill training. Reliance on crowd sourcing provides an additional benefit of being able to elicit culturally specific behavior patterns by selecting the relevant crowd, allowing for both culture-specific and cross-cultural training content.
Evidence Based Medicine (EBM) aims to systematically use the best available evidence to inform medical decision making. This paradigm has revolutionized clinical practice over the past 30 years. The most important tool for EBM is the systematic review, which provides a rigorous, comprehensive and transparent synthesis of all current evidence concerning a specific clinical question. These syntheses enable decision makers to consider the entirety of the relevant published evidence.
Systematic reviews now inform everything from national health policy to bedside care. But producing these reviews requires researchers to identify the entirety of the relevant literature and then extract from this the information to be synthesized; a hugely laborious and expensive exercise. Moreover, the unprecedented growth of the biomedical literature has increased the burden on those trying to make sense of the published evidence base. Concurrently, more systematic reviews are being conducted every year to synthesize the expanding evidence base; tens of millions of dollars are spent annually conducting these reviews.
RobotReviewer aims to mitigate this issue by (semi-) automating evidence synthesis using machine learning and natural language processing.
View the RobotReviewer page to read more.
Software development is facing a paradigm shift towards ubiquitous concurrent programming, giving rise to software that is among the most complex technical artifacts ever created by humans. Concurrent programming presents several risks and dangers for programmers who are overwhelmed by puzzling and irreproducible concurrent program behavior, and by new types of bugs that elude traditional quality assurance techniques. If this situation is not addressed, we are drifting into an era of widespread unreliable software, with consequences ranging from collapsed programmer productivity, to catastrophic failures in mission-critical systems.
This project will take steps against a concurrent software crisis, by producing verification technology that assists non-specialist programmers in detecting concurrency errors, or demonstrating their absence. The proposed technology will confront the concurrency explosion problem that verification methods often suffer from. The project’s goal is a framework under which the analysis of programs with unbounded concurrency resources (such as threads of execution) can be soundly reduced to an analysis under a small constant resource bound, making the use of state space explorers practical. As a result, the project will largely eliminate the impact of unspecified computational resources as the major cause of complexity in analyzing concurrent programs. By developing tools for detecting otherwise undetectable misbehavior and vulnerabilities in concurrent programs, the project will contribute its part to averting a looming software quality crisis.
The research will enable the auditing and control of personally identifiable information leaks, addressing the key challenges of how to identify and control PII leaks when users’ PII is not known a priori, nor is the set of apps or devices that leak this information. First, to enable auditing through improved transparency, we are investigating how to use machine learning to reliably identify PII from network flows, and identify algorithms that incorporate user feedback to adapt to the changing landscape of privacy leaks. Second, we are building tools that allow users to control how their information is (or not) shared with other parties. Third, we are investigating the extent to which our approach extends to privacy leaks from IoT devices. Besides adapting our system to the unique format for leaks across a variety of IoT devices, our work investigates PII exposed indirectly through time-series data produced by IoT-generated monitoring.
The purpose of this project is to develop a conversational agent system that counsels terminally ill patients in order to alleviate their suffering and improve their quality of life.
Although many interventions have now been developed to address palliative care for specific chronic diseases, little has been done to address the overall quality of life for older adults with serious illness, spanning not only the functional aspects of symptom and medication management, but the affective aspects of suffering. In this project, we are developing a relational agent to counsel patients at home about medication adherence, stress management, advanced care planning, and spiritual support, and to provide referrals to palliative care services when needed.
When deployed on smartphones, virtual agents have the potential to deliver life-saving advice regarding emergency medical conditions, as well as provide a convenient channel for health education to improve the safety and efficacy of pharmacotherapy.
We are developing a smartphone-based virtual agent that provides counseling to patients with Atrial Fibrillation. Atrial Fibrillation is a highly prevalent heart rhythm disorder and is known to significantly increase the risk of stroke, heart failure and death. In this project, a virtual agent is deployed in conjunction with a smartphone-based heart rhythm monitor that lets patients obtain real-time diagnostic information on the status of their atrial fibrillation and determine whether immediate action may be needed.
This project is a collaboration with University of Pittsburgh Medical Center.
The last decade has seen an enormous increase in our ability to gather and manage large amounts of data; business, healthcare, education, economy, science, and almost every aspect of society are accumulating data at unprecedented levels. The basic premise is that by having more data, even if uncertain and of lower quality, we are also able to make better-informed decisions. To make any decisions, we need to perform “inference” over the data, i.e. to either draw new conclusions, or to find support for existing hypotheses, thus allowing us to favor one course of action over another. However, general reasoning under uncertainty is highly intractable, and many state-of-the-art systems today perform approximate inference by reverting to sampling. Thus for many modern applications (such as information extraction, knowledge aggregation, question-answering systems, computer vision, and machine intelligence), inference is a key bottleneck, and new methods for tractable approximate inference are needed.
This project addresses the challenge of scaling inference by generalizing two highly scalable approximate inference methods and complementing them with scalable methods for parameter learning that are “approximation-aware.” Thus, instead of treating the (i) learning and the (ii) inference steps separately, this project uses the approximation methods developed for inference also for learning the model. The research hypothesis is that this approach increases the overall end-to-end prediction accuracy while simultaneously increasing scalability. Concretely, the project develops the theory and a set of scalable algorithms and optimization methods for at least the following four sub-problems: (1) approximating general probabilistic conjunctive queries with standard relational databases; (2) learning the probabilities in uncertain databases based on feedback on rankings of output tuples from general queries; (3) approximating the exact probabilistic inference in undirected graphical models with linearized update equations; and (4) complementing the latter with a robust framework for learning linearized potentials from partially labeled data.
Mass spectrometry is a diverse and versatile technology for high-throughput functional characterization of proteins, small molecules and metabolites in complex biological mixtures. The technology rapidly evolves and generates datasets of an increasingly large complexity and size. This rapid evolution must be matched by an equally fast evolution of statistical methods and tools developed for analysis of these data. Ideally, new statistical methods should leverage the rich resources available from over 12,000 packages implemented in the R programming language and its Bioconductor project. However, technological limitations now hinder their adoption for mass spectrometric research. In response, the project ROCKET builds an enabling technology for working with large mass spectrometric datasets in R, and rapidly developing new algorithms, while benefiting from advancements in other areas of science. It also offers an opportunity of recruitment and retention of Native American students to work with R-based technology and research, and helps prepare them in a career in STEM.
Instead of implementing yet another data processing pipeline, ROCKET builds an enabling technology for extending the scalability of R, and streamlining manipulations of large files in complex formats. First, to address the diversity of the mass spectrometric community, ROCKET supports scaling down analyses (i.e., working with large data files on relatively inexpensive hardware without fully loading them into memory), as well as scaling up (i.e., executing a workflow on a cloud or on a multiprocessor). Second, ROCKET generates an efficient mixture of R and target code which is compiled in the background for the particular deployment platform. By ensuring compatibility with mass spectrometry-specific open data storage standards, supporting multiple hardware scenarios, and generating optimized code, ROCKET enables the development of general analytical methods. Therefore, ROCKET aims to democratize access to R-based data analysis for a broader community of life scientists, and create a blueprint for a new paradigm for R-based computing with large datasets. The outcomes of the project will be documented and made publicly available at https://olga-vitek-lab.khoury.northeastern.edu/.
This award reflects NSF’s statutory mission and has been deemed worthy of support through evaluation using the Foundation’s intellectual merit and broader impacts review criteria.
Northeastern University proposes to organize a Summer School ‘Big Data and Statistics for Bench Scientists.’ The Summer School will train life scientists and computational scientists in designing and analyzing large-scale experiments relying on proteomics, metabolomics, and other high-throughput biomolecular assays. The training will enhance the effectiveness and reproducibility of biomedical research, such as discovery of diagnostic biomarkers for early diagnosis of disease, or prognostic biomarkers for predicting therapy response.
Northeastern University requests funds for a Summer School, entitled Big Data and Statistics for Bench Scientists. The target audience for the School are graduate and post-graduate life scientists, who work primarily in wet lab, and who generate large datasets. Unlike other educational efforts that emphasize genomic applications, this School targets scientists working with other experimental technologies. Mass spectrometry-based proteomics and metabolomics are our main focus, however the School is also appropriate for scientists working with other assays, e.g. nuclear magnetic resonance spectroscopy (NMR), protein arrays, etc. This large community has been traditionally under-served by educational efforts in computation and statistics. This proposal aims to fill this void. The Summer School is motivated by the feedback from smaller short courses previously co-organized or co- instructed by the PI, and will cover theoretical and practical aspects of design and analysis of large-scale experimental datasets. The Summer School will have a modular format, with 8 20-hour modules scheduled in 2 parallel tracks during 2 consecutive weeks. Each module can be taken independently. The planned modules are (1) Processing raw mass spectrometric data from proteomic experiments using Skyline, (2) Begnner’s R, (3) Processing raw mass spectrometric data from metabolomic experiments using OpenMS, (4) Intermediate R, (5) Beginner’s guide to statistical experimental design and group comparison, (6) Specialized statistical methods for detecting differentially abundant proteins and metabolites, (7) Statistical methods for discovery of biomarkers of disease, and (8) Introduction to systems biology and data integration. Each module will introduce the necessary statistical and computational methodology, and contain extensive practical hands-on sessions. Each module will be organized by instructors with extensive interdisciplinary teaching experience, and supported by several teaching assistants. We anticipate the participation of 104 scientists, each taking on average 2 modules. Funding is requested for three yearly offerings of the School, and includes funds to provide US participants with 62 travel fellowships per year, and 156 registration fee wavers per module. All the course materials, including videos of the lectures and of the practical sessions, will be publicly available free of charge.
Different individuals experience the same events in vastly different ways, owing to their unique histories and psychological dispositions. For someone with social fears and anxieties, the mere thought of leaving the home can induce a feeling of panic. Conversely, an experienced mountaineer may feel quite comfortable balancing on the edge of a cliff. This variation of perspectives is captured by the term subjective experience. Despite its centrality and ubiquity in human cognition, it remains unclear how to model the neural bases of subjective experience. The proposed work will develop new techniques for statistical modeling of individual variation, and apply these techniques to a neuroimaging study of the subjective experience of fear. Together, these two lines of research will yield fundamental insights into the neural bases of fear experience. More generally, the developed computational framework will provide a means of comparing different mathematical hypotheses about the relationship between neural activity and individual differences. This will enable investigation of a broad range of phenomena in psychology and cognitive neuroscience.
The proposed work will develop a new computational framework for modeling individual variation in neuroimaging data, and use this framework to investigate the neural bases of one powerful and societally meaningful subjective experience, namely, of fear. Fear is a particularly useful assay because it involves variation across situational contexts (spiders, heights, and social situations), and dispositions (arachnophobia, acrophobia, and agoraphobia) that combine to create subjective experience. In the proposed neuroimaging study, participants will be scanned while watching videos that induce varying levels of arousal. To characterize individual variation in this neuroimaging data, the investigators will leverage advances in deep probabilistic programming to develop probabilistic variants of factor analysis models. These models infer a low-dimensional feature vector, also known as an embedding, for each participant and stimulus. A simple neural network models the relationship between embeddings and the neural response. This network can be trained in a data-driven manner and can be parameterized in a variety of ways, depending on the experimental design, or the neurocognitive hypotheses that are to be incorporated into the model. This provides the necessary infrastructure to test different neural models of fear. Concretely, the investigators will compare a model in which fear has its own unique circuit (i.e. neural signature or biomarker) to subject- or situation-specific neural architectures. More generally, the developed framework can be adapted to model individual variation in neuroimaging studies in other experimental settings.
Easy Alliance, a nonprofit initiative, has been instituted to solve complex, long term challenges in making the digital world a more accessible place for everyone.
Computer networking and the internet have revolutionized our societies, but are plagued with security problems which are difficult to tame. Serious vulnerabilities are constantly being discovered in network protocols that affect the work and lives of millions. Even some protocols that have been carefully scrutinized by their designers and by the computer engineering community have been shown to be vulnerable afterwards. Why is developing secure protocols so hard? This project seeks to address this question by developing novel design and implementation methods for network protocols that allow to identify and fix security vulnerabilities semi-automatically. The project serves the national interest as cyber-security costs the United States many billions of dollars annually. Besides making technical advances to the field, this project will also have broader impacts in education and curriculum development, as well as in helping to bridge the gap between several somewhat fragmented scientific communities working on the problem.
Technically, the project will follow a formal approach building upon a novel combination of techniques from security modeling, automated software synthesis, and program analysis to bridge the gap between an abstract protocol design and a low-level implementation. In particular, the methodology of the project will be based on a new formal behavioral model of software that explicitly captures how the choice of a mapping from a protocol design onto an implementation platform may result in different security vulnerabilities. Building on this model, this project will provide (1) a modeling approach that cleanly separates the descriptions of an abstract design from a concrete platform, and allows the platform to be modeled just once and reused, (2) a synthesis tool that will automatically construct a secure mapping from the abstract protocol to the appropriate choice of platform features, and (3) a program analysis tool that leverages platform-specific information to check that an implementation satisfies a desired property of the protocol. In addition, the project will develop a library of reusable platform models, and demonstrate the effectiveness of the methodology in a series of case studies.
Most computer programs process vast amounts of numerical data. Unfortunately, due to space and performance demands, computer arithmetic comes with its own rules. Making matters worse, different computers have different rules: while there are standardization efforts, efficiency considerations give hardware and compiler designers much freedom to bend the rules to their taste. As a result, the outcome of a computer calculation depends not only on the input, but also on the particular machine and environment in which the calculation takes place. This makes programs brittle and un-portable, and causes them to produce untrusted results. This project addresses these problems, by designing methods to detect inputs to computer programs that exhibit too much platform dependence, and to repair such programs, by making their behavior more robust.
Technical goals of this project include: (i) automatically warning users of disproportionately platform-dependent results of their numeric algorithms; (ii) repairing programs with platform instabilities; and (iii) proving programs stable against platform variations. Platform-independence of numeric computations is a form of robustness whose lack undermines the portability of program semantics. This project is one of the few to tackle the question of non-determinism in the specification (IEEE 754) of the theory (floating-point arithmetic) that machines are using today. This work requires new abstractions that soundly approximate the set of values of a program variable against a variety of compiler and hardware behaviors and features that may not even be known at analysis time. The project involves graduate and undergraduate students.
Side-channel attacks (SCA) have been a realistic threat to various cryptographic implementations that do not feature dedicated protection. While many effective countermeasures have been found and applied manually, they are application-specific and labor intensive. In addition, security evaluation tends to be incomplete, with no guarantee that all the vulnerabilities in the target system have been identified and addressed by such manual countermeasures. This SaTC project aims to shift the paradigm of side-channel attack research, and proposes to build an automation framework for information leakage analysis, multi-level countermeasure application, and formal security evaluation against software side-channel attacks.
The proposed framework provides common sound metrics for information leakage, methodologies for automatic countermeasures, and formal and thorough evaluation methods. The approach unifies power analysis and cache-based timing attacks into one framework. It defines new metrics of information leakage and uses them to automatically identify possible leakage of a given cryptosystem at an early stage with no implementation details. The conventional compilation process is extended along the new dimension of optimizing for security, to generate side-channel resilient code and ensure its secure execution at run-time. Side-channel security is guaranteed to be at a certain confidence level with formal methods. The three investigators on the team bring complementary expertise to this challenging interdisciplinary research, to develop the advanced automation framework and the associated software tools, metrics, and methodologies. The outcome significantly benefits security system architects and software developers alike, in their quest to build verifiable SCA security into a broad range of applications they design. The project also builds new synergy among fundamental statistics, formal methods, and practical system security. The automation tools, when introduced in new courses developed by the PIs, help improving students’ hands-on experience greatly. The project also leverages the experiential education model of Northeastern University to engage undergraduates, women, and minority students in independent research projects.
Nontechnical Description: Artificial intelligence especially deep learning has enabled many breakthroughs in both academia and industry. This project aims to create a generative and versatile design approach based on novel deep learning techniques to realize integrated, multi-functional photonic systems, and provide proof-of-principle demonstrations in experiments. Compared with traditional approaches using extensive numerical simulations or inverse design algorithms, deep learning can uncover the highly complicated relationship between a photonic structure and its properties from the dataset, and hence substantially accelerate the design of novel photonic devices that simultaneously encode distinct functionalities in response to the designated wavelength, polarization, angle of incidence and other parameters. Such multi-functional photonic systems have important applications in many areas, including optical imaging, holographic display, biomedical sensing, and consumer photonics with high efficiency and fidelity, to benefit the public and the nation. The integrated education plan will considerably enhance outreach activities and educate students in grades 7-12, empowered by the successful experience and partnership previously established by the PIs. Graduate and undergraduate students participating in the project will learn the latest developments in the multidisciplinary fields of photonics, deep learning and advanced manufacturing, and gain real-world knowledge by engaging industrial collaborators in tandem with Northeastern University’s renowned cooperative education program.
Technical Description: Metasurfaces, which are two-dimensional metamaterials consisting of a planar array of subwavelength designer structures, have created a new paradigm to tailor optical properties in a prescribed manner, promising superior integrability, flexibility, performance and reliability to advance photonics technologies. However, so far almost all metasurface designs rely on time-consuming numerical simulations or stochastic searching approaches that are limited in a small parameter space. To fully exploit the versatility of metasurfaces, it is highly desired to establish a general, functionality-driven methodology to efficiently design metasurfaces that encompass distinctly different optical properties and performances within a single system. The objective of the project is to create and demonstrate a high-efficiency, two-level design approach enabled by deep learning, in order to realize integrated, multi-functional meta-systems. Proper deep learning methods, such as Conditional Variational Auto-Encoder and Deep Bidirectional-Convolutional Network, will be investigated, innovatively reformulated and tailored to apply at the single-element level and the large-scale system level in combination with topology optimization and genetic algorithm. Such a generative design approach can directly and automatically identify the optimal structures and configurations out of the full parameter space. The designed multi-functional optical meta-systems will be fabricated and characterized to experimentally confirm their performances. The success of the project will produce transformative photonic architectures to manipulate light on demand.
Critical infrastructure systems are increasingly reliant on one another for their efficient operation. This research will develop a quantitative, predictive theory of network resilience that takes into account the interactions between built infrastructure networks, and the humans and neighborhoods that use them. This framework has the potential to guide city officials, utility operators, and public agencies in developing new strategies for infrastructure management and urban planning. More generally, these efforts will untangle the roles of network structure and network dynamics that enable interdependent systems to withstand, recover from, and adapt to perturbations. This research will be of interest to a variety of other fields, from ecology to cellular biology.
The project will begin by cataloging three built infrastructures and known interdependencies (both physical and functional) into a “network of networks” representation suitable for modeling. A key part of this research lies in also quantifying the interplay between built infrastructure and social systems. As such, the models will incorporate community-level behavioral effects through urban “ecometrics” — survey-based empirical data that capture how citizens and neighborhoods utilize city services and respond during emergencies. This realistic accounting of infrastructure and its interdependencies will be complemented by realistic estimates of future hazards that it may face. The core of the research will use network-based analytical and computational approaches to identify reduced-dimensional representations of the (high-dimensional) dynamical state of interdependent infrastructure. Examining how these resilience metrics change under stress to networks at the component level (e.g. as induced by inundation following a hurricane) will allow identification of weak points in existing interdependent infrastructure. The converse scenario–in which deliberate alterations to a network might improve resilience or hasten recovery of already-failed systems–will also be explored.
Students will be working on building a library of cache-oblivious data structures and measuring the performance under different workloads. We will first implement serial versions of the algorithms, and then implement the parallel version of several known cache oblivious data structures and algorithms. Read more.
The training plan is to bring in students(ideally in pairs of 2) who are currently sophomores/junior and have taken a Computer Systems course using C/C++. Students need not have any previous research experience, but generally will have experience using threads(e.g. pthreads) and have taken an algorithms course.
[1] (2 weeks) Students will first work through understanding the basics of Cache-Oblvious Algorithms and Data structures from: http://erikdemaine.org/papers/BRICS2002/paper.pdf
[2] (2 weeks) Students will then work through select lectures and exercises on caches from here: https://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-172-performance-engineering-of-software-systems-fall-2010/video-lectures/
[3] (1 week) Students will then learn the basics of profiling
[4] (2 weeks) Next students will implement a few data structures and algorithms, and then
[5] (4 weeks) Students will work to find good real world benchmarks, mining github repositories for benchmarks that suffer from false-sharing performance related problems.
[6] The remaining time will be writing up and polishing collected results.
The key research questions we are investigating in the Mon(IoT)r research group are:
Our methodology entails recording and analyzing all network traffic generated by a variety of IoT devices that we have acquired. We not only inspect traffic for PII in plaintext, but attempt to man-in-the-middle SSL connections to understand the contents of encrypted flows. Our analysis allows us to uncover how IoT devices are currently protecting users’ PII, and determine how easy or difficult it is to mount attacks against user privacy.
Wehe uses your device to exchange Internet traffic recorded from real, popular apps like YouTube and Spotify—effectively making it look as if you are using those apps. As a result, if an Internet service provider (ISP) tries to slow down an YouTube, Wehe would see the same behavior. We then send the same app’s Internet traffic, but replacing the content with randomized bytes, which prevents the ISPs from classifying the traffic as belonging to the app. Our hypothesis is that the randomized traffic will not cause an ISP to conduct application-specific differentiation (e.g., throttling or blocking), but the original traffic will. We repeat these tests several times to rule out noise from bad network conditions, and tell you at the end whether your ISP is giving different performance to an app’s network traffic.
Type-safe programming languages report errors when a program applies operations to data of the wrong type—e.g., a list-length operation expects a list, not a number—and they come in two flavors: dynamically typed (or untyped) languages, which catch such type errors at run time, and statically typed languages, which catch type errors at compile time before the program is ever run. Dynamically typed languages are well suited for rapid prototyping of software, while static typing becomes important as software systems grow since it offers improved maintainability, code documentation, early error detection, and support for compilation to faster code. Gradually typed languages bring together these benefits, allowing dynamically typed and statically typed code—and more generally, less precisely and more precisely typed code—to coexist and interoperate, thus allowing programmers to slowly evolve parts of their code base from less precisely typed to more precisely typed. To ensure safe interoperability, gradual languages insert runtime checks when data with a less precise type is cast to a more precise type. Gradual typing has seen high adoption in industry, in languages like TypeScript, Hack, Flow, and C#. Unfortunately, current gradually typed languages fall short in three ways. First, while normal static typing provides reasoning principles that enable safe program transformations and optimizations, naive gradual systems often do not. Second, gradual languages rarely guarantee graduality, a reasoning principle helpful to programmers, which says that making types more precise in a program merely adds in checks and the program otherwise behaves as before. Third, time and space efficiency of the runtime casts inserted by gradual languages remains a concern. This project addresses all three of these issues. The project’s novelties include: (1) a new approach to the design of gradual languages by first codifying the desired reasoning principles for the language using a program logic called Gradual Type Theory (GTT), and from that deriving the behavior of runtime casts; (2) compiling to a non-gradual compiler intermediate representation (IR) in a way that preserves these principles; and (3) the ability to use GTT to reason about the correctness of optimizations and efficient implementation of casts. The project has the potential for significant impact on industrial software development since gradually typed languages provide a migration path from existing dynamically typed codebases to more maintainable statically typed code, and from traditional static types to more precise types, providing a mechanism for increased adoption of advanced type features. The project will also have impact by providing infrastructure for future language designs and investigations into improving the performance of gradual typing.
The project team will apply the GTT approach to investigate gradual typing for polymorphism with data abstraction (parametricity), algebraic effects and handlers, and refinement/dependent types. For each, the team will develop cast calculi and program logics expressing better equational reasoning principles than previous proposals, with certified elaboration to a compiler intermediate language based on Call-By-Push-Value (CBPV) while preserving these properties, and design convenient surface languages that elaborate into them. The GTT program logics will be used for program verification, proving the correctness of program optimizations and refactorings.
This award reflects NSF’s statutory mission and has been deemed worthy of support through evaluation using the Foundation’s intellectual merit and broader impacts review criteria.
When building large software systems, programmers should be able to use the best language for each part of the system. But when a component written in one language becomes part of a multi-language system, it may interoperate with components that have features that don’t exist in the original language. This affects programmers when they refactor code (i.e., make changes that should result in equivalent behavior). Since programs interact after compilation to a common target, programmers have to understand details of linking and target-level interaction when reasoning about correctly refactoring source components. Unfortunately, there are no software toolchains available today that support single-language reasoning when components are used in a multi-language system. This project will develop principled software toolchains for building multi-language software. The project’s novelties include (1) designing language extensions that allow programmers to specify how they wish to interoperate (or link) with conceptual features absent from their language through a mechanism called linking types, and (2) developing compilers that formally guarantee that any reasoning the programmer does at source level is justified after compilation to the target. The project has the potential for tremendous impact on the software development landscape as it will allow programmers to use a language close to their problem domain and provide them with software toolchains that make it easy to compose components written in different languages into a multi-language software system.
The project will evaluate the idea of linking types by extending ML with linking types for interaction with Rust, a language with first-class control, and a normalizing language, and developing type preserving compilers to a common typed LLVM-like target language. The project will design a rich dependently typed LLVM-like target language that can encapsulate effects from different source languages to support fully abstract compilation from these languages. The project will also investigate reporting of cross-language type errors to aid programmers when composing components written in different languages.
This award reflects NSF’s statutory mission and has been deemed worthy of support through evaluation using the Foundation’s intellectual merit and broader impacts review criteria.
Modern programming languages ranging from Java to Matlab rely on just-in-time compilation techniques to achieve performance competitive with computer languages such as C or C++. What sets just-in-time compilers apart from batch compilers is that they can observe the programs actions as it executes, and inspect its state. Knowledge of the program’s state and past behavior, allows the compiler to perform speculative optimizations that improve performance. The intellectual merits of this research are to devise techniques for reasoning about the correctness of the transformations performed by just-in-time compilers. The project’s broader significance and importance are its implications to industrial practice. The results of this research will be applicable to commercial just-in-time compilers for languages such as JavaScript and R.
This project develops a general model of just-in-time compilation that subsumes deployed systems and allows systematic exploration of the design space of dynamic compilation techniques. The research questions that will be tackled in this work lie along two dimensions: Experimental—explore the design space of dynamic compilation techniques and gain an understanding of trade-offs; Foundational—formalize key ingredients of a dynamic compiler and develop techniques for reasoning about correctness in a modular fashion.
To provide open-source, interoperable, and extensible statistical software for quantitative mass spectrometry, which enables experimentalists and developers of statistical methods to rapidly respond to changes in the evolving biotechnological landscape.
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed vel justo tellus. Proin volutpat justo at arcu efficitur venenatis. Vestibulum eu pharetra nulla. Curabitur et tempus ante. Nulla id sapien id libero lacinia interdum vitae et ligula. Mauris in aliquet justo. Nam et fringilla leo. Vestibulum scelerisque ipsum mollis quam tristique, vitae consequat ante sollicitudin. Vivamus in tempus lectus, sed aliquet ante. Nullam ut diam a orci tincidunt pellentesque. Praesent at enim ut sem molestie facilisis ut ut sapien. Aenean lacinia erat sit amet tempor sagittis. Integer condimentum luctus lorem, in mattis lectus ullamcorper at. Curabitur eros magna, vulputate id faucibus nec, cursus sit amet odio.
Fusce tristique enim ut turpis consequat, eget porta nisl fringilla. Pellentesque quis tristique ipsum, ut eleifend odio. Sed eget velit magna. Vivamus nec metus sit amet mi sodales cursus et a purus. Duis turpis arcu, fringilla fringilla tincidunt eu, scelerisque vitae risus. Nulla facilisi. Integer at dui volutpat, ornare urna non, malesuada ante. Vestibulum eget purus ac tortor tempus interdum nec ac dolor. Vivamus sed eros eleifend, ornare mi sed, mollis tortor. Donec varius id justo id sollicitudin. Maecenas ut volutpat mi, ut vehicula purus. Nullam eu consequat tellus, ac fermentum felis. Morbi euismod risus ut risus consectetur, a efficitur orci varius. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Mauris ut turpis vel libero finibus interdum eu eget dui. Quisque scelerisque aliquam quam vitae rhoncus.
Ut semper iaculis ante non pretium. Ut sed ligula nibh. Duis sollicitudin, arcu quis mattis posuere, odio mauris rutrum sapien, id cursus nibh tellus in nisi. Cras consequat finibus metus, nec gravida metus varius id. Nulla orci libero, viverra sit amet sollicitudin sit amet, lobortis at mauris. Phasellus ac felis pellentesque, gravida neque ut, dapibus leo. Cras a ex purus. Sed rutrum pretium lacus et aliquam. Curabitur finibus ante non nisl pellentesque, sed rutrum risus hendrerit. Nunc elementum hendrerit nisl vel bibendum. Maecenas auctor lacus id orci condimentum placerat. Ut imperdiet condimentum nulla, non elementum dolor gravida in. Curabitur nec ligula nec sem tincidunt aliquet. Nulla consectetur consectetur viverra. Phasellus scelerisque gravida pharetra.
Nam varius vestibulum metus sit amet porttitor. Nunc a bibendum nunc. In vel laoreet enim. Mauris venenatis nisl lectus, ac tincidunt diam tincidunt ac. Aliquam pellentesque finibus purus, ac suscipit est. Pellentesque at ligula eleifend, varius libero eget, finibus diam. Etiam pulvinar aliquet lectus, vitae condimentum felis pharetra sit amet. Donec neque ligula, interdum ac est vel, lacinia mollis erat. Quisque commodo nisi ipsum, et sollicitudin quam imperdiet et. Curabitur interdum consequat varius. Sed auctor mattis varius. Nam sodales tortor ex, at tempor diam tincidunt eleifend. Aliquam ullamcorper efficitur mauris ac tincidunt. Sed eu elementum nunc. Aliquam auctor varius lacus eu aliquet.
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Aliquam at nulla vitae ipsum convallis euismod sit amet nec arcu. Ut porttitor ex ipsum, a consequat elit imperdiet non. Maecenas velit lacus, semper at elementum sit amet, semper non odio. Aenean congue justo ac leo porta sollicitudin. Ut et diam in elit suscipit semper. Nullam risus neque, elementum vel sapien a, sollicitudin sollicitudin lectus. Praesent ac ipsum ullamcorper leo rutrum egestas ut eget ex. Quisque sed erat ipsum. Integer a congue ante, nec egestas ligula. Vestibulum molestie risus at mi mattis malesuada. Nulla non dolor non justo fermentum feugiat. Etiam ante tellus, mollis vel urna sed, vestibulum facilisis erat. Proin leo purus, laoreet a arcu sed, pellentesque fringilla nibh.
Interdum et malesuada fames ac ante ipsum primis in faucibus. Proin libero sem, lobortis in orci varius, aliquet hendrerit metus. Donec odio lectus, eleifend eget posuere id, sagittis sit amet erat. Proin vulputate ipsum lectus, a egestas magna dapibus id. Fusce rutrum viverra consequat. Integer eget nisi ultrices, auctor tellus non, convallis arcu. Phasellus gravida leo at pellentesque dapibus. Pellentesque nisl diam, tempus ut nisl at, viverra gravida magna. Ut interdum at velit convallis dapibus. Mauris turpis ligula, pulvinar in eros in, faucibus faucibus metus.
Maecenas laoreet porta cursus. In vulputate elementum ex vel venenatis. Aenean accumsan et neque non porttitor. Nullam eget porttitor elit, id convallis leo. Integer ornare cursus nisi, ac vestibulum ligula. Quisque dignissim quam eu turpis volutpat, quis dapibus ante sagittis. Nunc scelerisque quis lacus ac facilisis. Ut porta rhoncus molestie.
This work investigates new models of cloud computing that combine domain-targeted languages with scalable data processing, sharing, and management abstractions within a distributed service platform that “scales” programmer productivity.
As the cost of computing and communication resources has plummeted, applications have become data-centric with data products growing explosively in both number and size. Although accessing such data using the compute power necessary for its analysis and processing is cheap and readily available via cloud computing (intuitive, utility-style access to vast resource pools), doing so currently requires significant expertise, experience, and time (for customization, configuration, deployment, etc). This work investigates new models of cloud computing that combine domain-targeted languages with scalable data processing, sharing, and management abstractions within a distributed service platform that “scales” programmer productivity. To enable this, this research explores new programming language, runtime, and distributed systems techniques and technologies that integrate the R programming language environment with open source cloud platform-as-a-service (PaaS) in ways that simplify processing massive datasets, sharing datasets across applications and users, and tracking and enforcing data provenance. The PIs’ plans for research, outreach, integrated curricula, and open source release of research artifacts have the potential for making cloud computing more accessible to a much wider range of users: The data analytics community who use the R statistical analysis environment to apply their techniques and algorithms to important problems in areas such as biology, chemistry, physics, political science and finance, by enabling them to use cloud resources transparently for their analyses, and to share their scientific data/results in a way that enables others to reproduce and verify them.
The Applied Machine Learning Group is working with researchers from Harvard Medical School to predict outcomes for multiple sclerosis patients. A focus of the research is how best to interact with physicians to use both human expertise and machine learning methods.
Many of the truly difficult problems limiting advances in contemporary science are rooted in our limited understanding of how complex systems are controlled. Indeed, in human cells millions of molecules are embedded in a complex genetic network that lacks an obvious controller; in society billions of individuals interact with each other through intricate trust-family-friendship-professional-association based networks apparently controlled by no one; economic change is driven by what economists call the “invisible hand of the market”, reflecting a lack of understanding of the control principles that govern the interactions between individuals, companies, banks and regulatory agencies.
These and many other examples raise several fundamental questions: What are the control principles of complex systems? How do complex systems organize themselves to achieve sufficient control to ensure functionality? This proposal is motivated by the hypothesis that the architecture of many complex systems is driven by the system’s need to achieve sufficient control to maintain its basic functions. Hence uncovering the control principles of complex self-organized systems can help us understand the fundamental laws that govern them.
The PI’s goal in this project is to revolutionize media-assisted oral presentations in general, and STEM presentations in particular, through the use of an intelligent, autonomous, life-sized, animated co-presenter agent that collaborates with a human presenter in preparing and delivering his or her talk in front of a live audience.
Although journal and conference articles are recognized as the most formal and enduring forms of scientific communication, oral presentations are central to science because they are the means by which researchers, practitioners, the media, and the public hear about the latest findings thereby becoming engaged and inspired, and where scientific reputations are made. Yet despite decades of technological advances in computing and communication media, the fundamentals of oral scientific presentations have not advanced since software such as Microsoft’s PowerPoint was introduced in the 1980’s. The PI’s goal in this project is to revolutionize media-assisted oral presentations in general, and STEM presentations in particular, through the use of an intelligent, autonomous, life-sized, animated co-presenter agent that collaborates with a human presenter in preparing and delivering his or her talk in front of a live audience. The PI’s pilot studies have demonstrated that audiences are receptive to this concept, and that the technology is especially effective for individuals who are non-native speakers of English (which may be up to 21% of the population of the United States). Project outcomes will be initially deployed and evaluated in higher education, both as a teaching tool for delivering STEM lectures and as a training tool for students in the sciences to learn how to give more effective oral presentations (which may inspire future generations to engage in careers in the sciences).
This research will be based on a theory of human-agent collaboration, in which the human presenter is monitored using real-time speech and gesture recognition, audience feedback is also monitored, and the agent, presentation media, and human presenter (cued via an intelligent wearable teleprompter) are all dynamically choreographed to maximize audience engagement, communication, and persuasion. The project will make fundamental, theoretical contributions to models of real-time human-agent collaboration and communication. It will explore how humans and agents can work together to communicate effectively with a heterogeneous audience using speech, gesture, and a variety of presentation media, amplifying the abilities of scientist-orators who would otherwise be “flying solo.” The work will advance both artificial intelligence and computational linguistics, by extending dialogue systems to encompass mixed-initiative, multi-party conversations among co-presenters and their audience. It will impact the state of the art in virtual agents, by advancing the dynamic generation of hand gestures, prosody, and proxemics for effective public speaking and turn-taking. And it will also contribute to the field of human-computer interaction, by developing new methods for human presenters to interact with autonomous co-presenter agents and their presentation media, including approaches to cueing human presenters effectively using wearable user interfaces.
Northeastern University is a Center of Academic Excellence in Information Assurance Education and Research. It is also one of the four schools recently designated by the National Security Agency as a Center of Academic Excellence in Cyber Operations. Northeastern has produced 20 SFS students over the past 3 years. All of the graduates are placed in positions within the Federal Government and Federally Funded Research and Development Centers. One of the unique elements of the program is the diversity of students in the program.
ABSTRACT
Northeastern University is a Center of Academic Excellence in Information Assurance Education and Research. It is also one of the four schools recently designated by the National Security Agency as a Center of Academic Excellence in Cyber Operations. Northeastern has produced 20 SFS students over the past 3 years. All of the graduates are placed in positions within the Federal Government and Federally Funded Research and Development Centers. One of the unique elements of the program is the diversity of students in the program. Of the 20 students, 5 are in Computer Science, 6 are in Electrical and Computer Engineering, and 9 are in Information Assurance. These students come with different backgrounds that vary from political science and criminal justice to computer science and engineering. The University, with its nationally-recognized Cooperative Education, is well-positioned to attract and educate strong students in cybersecurity.
The SFS program at Northeastern succeeds in recruiting a diverse group of under-represented students to the program, and is committed to sustaining this level of diversity in future recruiting. Northeastern University is also reaching out to the broader community by leading Capture-the-Flag and Collegiate Cyber Defense competitions, and by actively participating in the New England Advanced Cyber Security Center, an organization composed of academia, industry, and government entities.
PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
Sun E. and Kaeli D.. “Aggressive Value Prediction on a GPU,” Journal of Parallel Processing, 2012, p. 1-19.
Azmandian F., Dy. J. G., Aslam J.A., Kaeli D.. “Local Kernel Density Ratio-Based Feature Selection for Outlier Detection,” Journal of Machine Learning Research, v.25, 2012, p. 49-64.
This is a study of the structure and dynamics of Internet-based collaboration. The project seeks groundbreaking insights into how multidimensional network configurations shape the success of value-creation processes within crowdsourcing systems and online communities. The research also offers new computational social science approaches to theorizing and researching the roles of social structure and influence within technology-mediated communication and cooperation processes.
This is a study of the structure and dynamics of Internet-based collaboration. The project seeks groundbreaking insights into how multidimensional network configurations shape the success of value-creation processes within crowdsourcing systems and online communities. The research also offers new computational social science approaches to theorizing and researching the roles of social structure and influence within technology-mediated communication and cooperation processes. The findings will inform decisions of leaders interested in optimizing all forms of collaboration in fields such as open-source software development, academic projects, and business. System designers will be able to identify interpersonal dynamics and develop new features for opinion aggregation and effective collaboration. In addition, the research will inform managers on how best to use crowdsourcing solutions to support innovation and marketing strategies including peer-to-peer marketing to translate activity within online communities into sales.
This research will analyze digital trace data that enable studies of population-level human interaction on an unprecedented scale. Understanding such interaction is crucial for anticipating impacts in our social, economic, and political lives as well as for system design. One site of such interaction is crowdsourcing systems – socio-technical systems through which online communities comprised of diverse and distributed individuals dynamically coordinate work and relationships. Many crowdsourcing systems not only generate creative content but also contain a rich community of collaboration and evaluation in which creators and adopters of creative content interact among themselves and with artifacts through overlapping relationships such as affiliation, communication, affinity, and purchasing. These relationships constitute multidimensional networks and create structures at multiple levels. Empirical studies have yet to examine how multidimensional networks in crowdsourcing enable effective large-scale collaboration. The data derive from two distinctly different sources, thus providing opportunities for comparison across a range of online creation-oriented communities. One is a crowdsourcing platform and ecommerce website for creative garment design, and the other is a platform for participants to create innovative designs based on scrap materials. This project will analyze both online community activity and offline purchasing behavior. The data provide a unique opportunity to understand overlapping structures of social interaction driving peer influence and opinion formation as well as the offline economic consequences of this online activity. This study contributes to the literature by (1) analyzing multidimensional network structures of interpersonal and socio-technical interactions within these socio-technical systems, (2) modeling how success feeds back into value-creation processes and facilitates learning, and (3) developing methods to predict the economic success of creative products generated in these contexts. The application and integration of various computational and statistical approaches will provide significant dividends to the broader scientific research community by contributing to the development of technical resources that can be extended to other forms of data-intensive inquiry. This includes documentation about best practices for integrating methods for classification and prediction; courses to train students to perform large-scale data analysis; and developing new theoretical approaches for understanding the multidimensional foundations of cyber-human systems.
Currently, there are no automated tools that have the capacity to perform tracing tasks on the scale of mammalian neural circuits. Needless to say, the existence of such a tool is critical both for basic mapping of synaptic connectivity in normal brains, as well as for describing the changes in the nervous system which underlie neurological disorders. With this proposal we plan to continue the development of Neural Circuit Tracer – software for accurate, automated reconstruction of the structure and dynamics of neurites from 3D light microscopy stacks of images.
Our understanding of brain functions is hindered by the lack of detailed knowledge of synaptic connectivity in the underlying neural network. While synaptic connectivity of small neural circuits can be determined with electron microscopy, studies of connectivity on a larger scale, e.g. whole mouse brain, must be based on light microscopy imaging. It is now possible to fluorescently label subsets of neurons in vivo and image their axonal and dendritic arbors in 3D from multiple brain tissue sections. The overwhelming remaining challenge is neurite tracing, which must be done automatically due to the high-throughput nature of the problem. Currently, there are no automated tools that have the capacity to perform tracing tasks on the scale of mammalian neural circuits. Needless to say, the existence of such a tool is critical both for basic mapping of synaptic connectivity in normal brains, as well as for describing the changes in the nervous system which underlie neurological disorders. With this proposal we plan to continue the development of Neural Circuit Tracer – software for accurate, automated reconstruction of the structure and dynamics of neurites from 3D light microscopy stacks of images. Our goal is to revolutionize the existing functionalities of the software, making it possible to: (i) automatically reconstruct axonal and dendritic arbors of sparsely labeled populations of neurons from multiple stacks of images and (ii) automatically track and quantify changes in the structures of presynaptic boutons and dendritic spines imaged over time. We propose to utilize the latest machine learning and image processing techniques to develop multi-stack tracing, feature detection, and computer-guided trace editing capabilities of the software. All tools and datasets created as part of this proposal will be made available to the research community.
Public Health Relevance
At present, accurate methods of analysis of neuron morphology and synaptic connectivity rely on manual or semi-automated tracing tools. Such methods are time consuming, can be prone to errors, and do not scale up to the level of large brain-mapping projects. Thus, it is proposed to develop open-source software for accurate, automated reconstruction of structure and dynamics of large neural circuits.
This project will develop new research methods to map and quantify the ways in which online search engines, social networks, and e-commerce sites use sophisticated algorithms to tailor content to each individual user.
ABSTRACT
This project will develop new research methods to map and quantify the ways in which online search engines, social networks, and e-commerce sites use sophisticated algorithms to tailor content to each individual user. This “personalization” may often be of value to the user, but it also has the potential to distort search results and manipulate the perceptions and behavior of the user. Given the popularity of personalization across a variety of Web-based services, this research has the potential for extremely broad impact. Being able to quantify the extent to which Web-based services are personalized will lead to greater transparency for users, and the development of tools to identify personalized content will allow users to access information that may be hard to access today.
Personalization is now a ubiquitous feature on many Web-based services. In many cases, personalization provides advantages for users because personalization algorithms are likely to return results that are relevant to the user. At the same time, the increasing levels of personalization in Web search and other systems are leading to growing concerns over the Filter Bubble effect, where users are only given results that the personalization algorithm thinks they want, while other important information remains inaccessible. From a computer science perspective, personalization is simply a tool that is applied to information retrieval and ranking problems. However, sociologists, philosophers, and political scientists argue that personalization can result in inadvertent censorship and “echo chambers.” Similarly, economists warn that unscrupulous companies can leverage personalization to steer users towards higher-priced products, or even implement price discrimination, charging different users different prices for the same item. As the pervasiveness of personalization on the Web grows, it is clear that techniques must be developed to understand and quantify personalization across a variety of Web services.
This research has four primary thrusts: (1) To develop methodologies to measure personalization of mobile content. The increasing popularity of browsing the Web from mobile devices presents new challenges, as these devices have access to sensitive content like the user’s geolocation and contacts. (2) To develop systems and techniques for accurately measuring the prevalence of several personalization trends on a large number of e-commerce sites. Recent anecdotal evidence has shown instances of problematic sales tactics, including price steering and price discrimination. (3) To develop techniques to identify and quantify personalized political content. (4) To measure the extent to which financial and health information is personalized based on location and socio-economic status. All four of these thrusts will develop new research methodologies that may prove effective in other areas of research as well.
PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH
David Lazer, Ryan Kennedy, Gary King, Alessandro Vespignani. “The Parable of Google Flu: Traps in Big Data Analysis,” Science, v.343, 2014, p. 1203.
This project is using cloud computing to re-architect web-based services in order to enable end users to regain privacy and control over their data. In this approach—a confederated architecture—each user provides the computing resources necessary to support her use of the service via cloud providers.
Users today have access to a broad range of free, web-based social services. All of these services operate under a similar model: Users entrust the service provider with their personal information and content, and in return, the service provider makes their service available for free by monetizing the user-provided information and selling the results to third parties (e.g., advertisers). In essence, users pay for these services by providing their data (i.e., giving up their privacy) to the provider.
This project is using cloud computing to re-architect web-based services in order to enable end users to regain privacy and control over their data. In this approach—a confederated architecture—each user provides the computing resources necessary to support her use of the service via cloud providers. All user data is encrypted and not exposed to any third-parties, users retain control over their information, and users access the service via a web browser as normal.
The incredible popularity of today’s web-based services has lead to significant concerns over privacy and user control over data. Addressing these concerns requires a re-thinking of the current popular web-based business models, and, unfortunately, existing providers are dis-incentivized from doing so. The impact of this project will potentially be felt by the millions of users who use today’s popular services, who will be provided with an alternative to the business models of today.
The thesis of this project is that one can obtain the best of both worlds — performance evaluations with the quality of experts but at the cost of crowd workers — by optimally leveraging both experts and crowd workers in asking the “right” assessor the “right” question at the “right” time.
Evaluating the performance of information retrieval systems such as search engines is critical to their effective development. Current “gold standard” performance evaluation methodologies generally rely on the use of expert assessors to judge the quality of documents or web pages retrieved by search engines, at great cost in time and expense. The advent of “crowd sourcing,” such as available through Amazon’s Mechanical Turk service, holds out the promise that these performance evaluations can be performed more rapidly and at far less cost through the use of many (though generally less skilled) “crowd workers”; however, the quality of the resulting performance evaluations generally suffer greatly. The thesis of this project is that one can obtain the best of both worlds — performance evaluations with the quality of experts but at the cost of crowd workers — by optimally leveraging both experts and crowd workers in asking the “right” assessor the “right” question at the “right” time. For example, one might ask inexpensive crowd workers what are likely to be “easy” questions while reserving what are likely to be “hard” questions for the expensive experts. While the project focuses on the performance evaluation of search engines as its use case, the techniques developed will be more broadly applicable to many domains where one wishes to efficiently and effectively harness experts and crowd workers with disparate levels of cost and expertise.
To enable the vision described above, a probabilistic framework will be developed within which one can quantify the uncertainty about a performance evaluation as well as the cost and expected utility of asking any assessor (expert or crowd worker) any question (e.g. a nominal judgment for a document or a preference judgment between two documents) at any time. The goal is then to ask the “right” question of the “right” assessor at any time in order to maximize the expected utility gained per unit cost incurred and then to optimally aggregate such responses in order to efficiently and effectively evaluate performance.
This project seeks to demonstrate how to build realistic yet secure compilers. This is a notoriously difficult problem. On one hand, a secure compiler must ensure that low-level contexts cannot launch any “attacks” on the compiled component that would have been impossible to launch in the high-level language. On the other hand, a realistic compiler cannot simply limit the expressiveness of the low-level target language to achieve the security goal.
Advanced programming languages, based on dependent types, enable program verification alongside program development, thus making them an ideal tool for building fully verified, high assurance software. Recent dependently typed languages that permit reasoning about state and effects—such as Hoare Type Theory (HTT) and Microsoft’s F*—are particularly promising and have been used to verify a range of rich security policies, from state-dependent information flow and access control to conditional declassification and information erasure. But while these languages provide the means to verify security and correctness of high-level source programs, what is ultimately needed is a guarantee that the same properties hold of compiled low-level target code. Unfortunately, even when compilers for such advanced languages exist, they come with no formal guarantee of correct compilation, let alone any guarantee of secure compilation—i.e., that compiled components will remain as secure as their high-level counterparts when executed within arbitrary low-level contexts. This project seeks to demonstrate how to build realistic yet secure compilers. This is a notoriously difficult problem. On one hand, a secure compiler must ensure that low-level contexts cannot launch any “attacks” on the compiled component that would have been impossible to launch in the high-level language. On the other hand, a realistic compiler cannot simply limit the expressiveness of the low-level target language to achieve the security goal.
The intellectual merit of this project is the development of a powerful new proof architecture for realistic yet secure compilation of dependently typed languages that relies on contracts to ensure that target-level contexts respect source-level security guarantees and leverages these contracts in a formal model of how source and target code may interoperate. The broader impact is that this research will make it possible to compose high-assurance software components into high-assurance software systems, regardless of whether the components are developed in a high-level programming language or directly in assembly. Compositionality has been a long-standing open problem for certifying systems for high-assurance. Hence, this research has potential for enormous impact on how high-assurance systems are built and certified. The specific goal of the project is to develop a verified multi-pass compiler from Hoare Type Theory to assembly that is type preserving, correct, and secure. The compiler will include passes that perform closure conversion, heap allocation, and code generation. To prove correct compilation of components, not just whole programs, this work will use an approach based on defining a formal semantics of interoperability between source components and target code. To guarantee secure compilation, the project will use (static) contract checking to ensure that compiled code is only run in target contexts that respect source-level security guarantees. To carry out proofs of compiler correctness, the project will develop a logical relations proof method for Hoare Type Theory.
The intellectual merit of this project is the development of a proof architecture for building verified compilers for today’s world of multi-language software: such verified compilers guarantee correct compilation of components and support linking with arbitrary target code, no matter its source.
Compilers play a critical role in the production of software. As such, they should be correct. That is, they should preserve the behavior of all programs they compile. Despite remarkable progress on formally verified compilers in recent years, these compilers suffer from a serious limitation: they are proved correct under the assumption that they will only be used to compile whole programs. This is an entirely unrealistic assumption since most software systems today are comprised of components written in different languages compiled by different compilers to a common low-level target language. The intellectual merit of this project is the development of a proof architecture for building verified compilers for today’s world of multi-language software: such verified compilers guarantee correct compilation of components and support linking with arbitrary target code, no matter its source. The project’s broader significance and importance are that verified compilation of components stands to benefit practically every software system, from safety-critical software to web browsers, because such systems use libraries or components that are written in a variety of languages. The project will achieve broad impact through the development of (i) a proof methodology that scales to realistic multi-pass compilers and multi-language software, (ii) a target language that extends LLVM—increasingly the target of choice for modern compilers—with support for compilation from type-safe source languages, and (iii) educational materials related to the proof techniques employed in the course of this project.
The project has two central themes, both of which stem from a view of compiler correctness as a language interoperability problem. First, specification of correctness of component compilation demands a formal semantics of interoperability between the source and target languages. More precisely: if a source component (say s) compiles to target component (say t), then t linked with some arbitrary target code (say t’) should behave the same as s interoperating with t’. Second, enabling safe interoperability between components compiled from languages as different as Java, Rust, Python, and C, requires the design of a gradually type-safe target language based on LLVM that supports safe interoperability between more precisely typed, less precisely typed, and type-unsafe components.
This project will support a plugin architecture for transparent checkpoint-restart.
Society’s increasingly complex cyberinfrastructure creates a concern for software robustness and reliability. Yet, this same complex infrastructure is threatening the continued use of fault tolerance. Consider when a single application or hardware device crashes. Today, in order to resume that application from the point where it crashed, one must also consider the complex subsystem to which it belongs. While in the past, many developers would write application-specific code to support fault tolerance for a single application, this strategy is no longer feasible when restarting the many inter-connected applications of a complex subsystem. This project will support a plugin architecture for transparent checkpoint-restart. Transparency implies that the software developer does not need to write any application-specific code. The plugin architecture implies that each software developer writes the necessary plugins only once. Each plugin takes responsibility for resuming any interrupted sessions for just one particular component. At a higher level, the checkpoint-restart system employs an ensemble of autonomous plugins operating on all of the applications of a complex subsystem, without any need for application-specific code.
The plugin architecture is part of a more general approach called process virtualization, in which all subsystems external to a process are virtualized. It will be built on top of the DMTCP checkpoint-restart system. One simple example of process virtualization is virtualization of ids. A plugin maintains a virtualization table and arranges for the application code of the process to see only virtual ids, while the outside world sees the real id. Any system calls and library calls using this real id are extended to translate between real and virtual id. On restart, the real ids are updated with the latest value, and the process memory remains unmodified, since it contains only virtual ids. Other techniques employing process virtualization include shadow device drivers, record-replay logs, and protocol virtualization. Some targets of the research include transparent checkpoint-restart support for the InfiniBand network, for programmable GPUs (including shaders), for networks of virtual machines, for big data systems such as Hadoop, and for mobile computing platforms such as Android.
PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH
Kapil Arya and Gene Cooperman. “DMTCP: Bringing Interactive Checkpoint?Restart to Python,” Computational Science & Discovery, v.8, 2015, p. 16 pages. doi:10.1088/issn.1749-4699
Jiajun Cao, Matthieu Simoni, Gene Cooperman,
and Christine Morin. “Checkpointing as a Service in Heterogeneous Cloud Environments,” Proc. of 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid’15),, 2015, p. 61–70. doi:10.1109/CCGrid.2015.160
This project will focus on the development of the REDEX tool, a lightweight domain-specific tool for modeling programming languages useful for software development. Originally developed as an in-house tool for a small group of collaborating researchers, REDEX escaped the laboratory several years ago and acquired a dedicated user community; new users now wish to use it for larger and more complicated programming languages than originally envisioned. Using this framework, a programmer articulates a programming language model directly as a software artifact with just a little more effort than paper-and-pencil models. Next, the user invokes diagnostic tools to test a model’s consistency, explore its properties, and check general claims about it.
This award funds several significant improvements to REDEX: (1) a modular system that allows its users to divide up the work, (2) scalable performance so that researchers can deal with large models, and (3) improvements to its testing and error-detection system. The award also includes support for the education of REDEX’s quickly growing user community, e.g., support for organizing tutorials and workshops.
This project addresses an urgent, emergent need at the intersection of software maintenance and programming language research. Over the past 20 years, working software engineers have embraced so-called scripting languages for a variety of tasks. Software engineers choose these languages because they make prototyping easy, and before the engineers realize it, these prototypes evolve into large, working systems and escape into the real world. Like all software, these systems need to be maintained—mistakes must be fixed, their performance requires improvement, security gaps call for fixes, their functionality needs to be enhanced—but scripting languages render maintenance difficult. The intellectual merits of this project are to address all aspects of this real-world software engineering problem.
The “Gradual Typing Across the Spectrum” project addresses an urgent, emergent need at the intersection of software maintenance and programming language research. Over the past 20 years, working software engineers have embraced so-called scripting languages for a variety of tasks. They routinely use JavaScript for interactive web pages, Ruby on Rails for server-side software, Python for data science, and so on. Software engineers choose these languages because they make prototyping easy, and before the engineers realize it, these prototypes evolve into large, working systems and escape into the real world. Like all software, these systems need to be maintained—mistakes must be fixed, their performance requires improvement, security gaps call for fixes, their functionality needs to be enhanced—but scripting languages render maintenance difficult. The intellectual merits of this project are to address all aspects of this real-world software engineering problem. In turn, the project’s broader significance and importance are the deployment of new technologies to assist the programmer who maintains code in scripting languages, the creation of novel technologies that preserve the advantages of these scripting frameworks, and the development of curricular materials that prepares the next generation of students for working within these frameworks.
A few years ago, the PIs launched programming language research efforts to address this problem. They diagnosed the lack of sound types in scripting languages as one of the major factors. With types in conventional programming languages, programmers concisely communicate design information to future maintenance workers; soundness ensures the types are consistent with the rest of the program. In response, the PIs explored the idea of gradual typing, that is, the creation of a typed sister language (one per scripting language) so that (maintenance) programmers can incrementally equip systems with type annotations. Unfortunately, these efforts have diverged over the years and would benefit from systematic cross-pollination.
With support from this grant, the PIs will systematically explore the spectrum of their gradual typing system with a three-pronged effort. First, they will investigate how to replicate results from one project in another. Second, they will jointly develop an evaluation framework for gradual typing projects with the goal of diagnosing gaps in the efforts and needs for additional research. Third, they will explore the creation of new scripting languages that benefit from the insights of gradual typing research.
This research will leverage the sensing capabilities of the TDS system and PI Patel’s expertise in spoken interaction technologies for individuals with speech impairment, as well as Co-PI Fu’s work on machine learning and multimodal data fusion, to develop a prototype clinically viable tool for enhancing speech clarity by coupling lingual-kinematic and acoustic data.
Speech is a complex and intricately timed task that requires the coordination of numerous muscle groups and physiological systems. While most children acquire speech with relative ease, it is one of the most complex patterned movements accomplished by humans and thus susceptible to impairment. Approximately 2% of Americans have imprecise speech either due to mislearning during development (articulation disorder) or as a result of neuromotor conditions such as stroke, brain injury, Parkinson’s disease, cerebral palsy, etc. An equally sizeable group of Americans have difficulty with English pronunciation because it is their second language. Both of these user groups would benefit from tools that provide explicit feedback on speech production clarity. Traditional speech remediation relies on viewing a trained clinician’s accurate articulation and repeated practice with visual feedback via a mirror. While these interventions are effective for readily viewable speech sounds (visemes such as /b/p/m/), they are largely unsuccessful for sounds produced inside the mouth. The tongue is the primary articulator for these obstructed sounds and its movements are difficult to capture. Thus, clinicians use diagrams and other low-tech means (such as placing edible substances on the palate or physically manipulating the oral articulators) to show clients where to place their tongue. While sophisticated research tools exist for measuring and tracking tongue movements during speech, they are prohibitively expensive, obtrusive, and impractical for clinical and/or home use. The PIs’ goal in this exploratory project, which represents a collaboration across two institutions, is to lay the groundwork for a Lingual-Kinematic and Acoustic sensor technology (LinKa) that is lightweight, low-cost, wireless and easy to deploy both clinically and at home for speech remediation.
PI Ghovanloo’s lab has developed a low-cost, wireless, and wearable magnetic sensing system, known as the Tongue Drive System (TDS). An array of electromagnetic sensors embedded within a headset detects the position of a small magnet that is adhered to the tongue. Clinical trials have demonstrated the feasibility of using the TDS for computer access and wheelchair control by sensing tongue movements in up to 6 discrete locations within the oral cavity. This research will leverage the sensing capabilities of the TDS system and PI Patel’s expertise in spoken interaction technologies for individuals with speech impairment, as well as Co-PI Fu’s work on machine learning and multimodal data fusion, to develop a prototype clinically viable tool for enhancing speech clarity by coupling lingual-kinematic and acoustic data. To this end, the team will extend the TDS to track tongue movements during running speech, which are quick, compacted within a small area of the oral cavity, and often overlap for several phonemes, so the challenge will be to accurately classify movements for different sound classes. To complement this effort, pattern recognition of sensor spatiotemporal dynamics will be embedded into an interactive game to offer a motivating, personalized context for speech motor (re)learning by enabling audiovisual biofeedback, which is critical for speech modification. To benchmark the feasibility of the approach, the system will be evaluated on six individuals with neuromotor speech impairment and six healthy age-matched controls.
The goal of the Dialog project is to create channels of communication between these translation processes and software engineers, with the expectation that the latter can use this new source of information to improve the speed, size, or energy consumption of their software.
The “Compiler Coaching” (Dialog) project represents an investment in programming language tools and technology. Software engineers use high-level programming languages on a daily basis to produce the apps and applications that everyone uses and that control everybody’s lives. Once a programming language translator accepts a program as grammatically correct, it creates impenetrable computer codes without informing the programmer how well (fast or slow, small or large, energy hogging or efficient) these codes will work. Indeed, modern programming languages employ increasingly sophisticated translation techniques and have become obscure black boxes to the working engineer. The goal of the Dialog project is to create channels of communication between these translation processes and software engineers, with the expectation that the latter can use this new source of information to improve the speed, size, or energy consumption of their software.
The PIs will explore the Dialog idea in two optimizing compiler settings, one on the conventional side and one on the modern one: for the Racket language, a teaching and research vehicle that they can modify as needed to create the desired channel, and the JavaScript programming language, the standardized tool for existing Web applications. The intellectual merits concern the fundamental principles of creating such communication channels and frameworks for gathering empirical evidence on how these channels benefit the working software engineer. These results should enable the developers of any programming language to implement similar channels of communication to help their clients. The broader impacts are twofold. On one hand, the project is likely to positively impact the lives of working software engineers as industrial programming language creators adapt the Dialog idea. On the other hand, the project will contribute to a two-decades old, open-source programming language project with a large and longstanding history of educational outreach at multiple levels. The project has influenced hundreds of thousands of high school students in the past and is likely to do so in the future.
Prior academic work has examined how to automatically discover vulnerabilities in binary software, and even how to automatically craft exploits for these vulnerabilities, the ability to answer basic security-relevant questions about closed-source software remains elusive. This project aims to provide algorithms and tools for answering these questions.
Software, including common examples such as commercial applications or embedded device firmware, is often delivered as closed-source binaries. While prior academic work has examined how to automatically discover vulnerabilities in binary software, and even how to automatically craft exploits for these vulnerabilities, the ability to answer basic security-relevant questions about closed-source software remains elusive.
This project aims to provide algorithms and tools for answering these questions. Leveraging prior work on emulator-based dynamic analyses, we propose techniques for scaling this high-fidelity analysis to capture and extract whole-system behavior in the context of embedded device firmware and closed-source applications. Using a combination of dynamic execution traces collected from this analysis platform and binary code analysis techniques, we propose techniques for automated structural analysis of binary program artifacts, decomposing system and user-level programs into logical modules through inference of high-level semantic behavior. This decomposition provides as output an automatically learned description of the interfaces and information flows between each module at a sub-program granularity. Specific activities include: (a) developing software-guided whole-system emulator for supporting sophisticated dynamic analyses for real embedded systems; (b) developing advanced, automated techniques for structurally decomposing closed-source software into its constituent modules; (c) developing automated techniques for producing high-level summaries of whole system executions and software components; and (d) developing techniques for automating the reverse engineering and fuzz testing of encrypted network protocols. The research proposed herein will have a significant impact outside of the security research community. We will incorporate the research findings of our program into our undergraduate and graduate teaching curricula, as well as in extracurricular educational efforts such as Capture-the-Flag that have broad outreach in the greater Boston and Atlanta metropolitan areas.
The close ties to industry that the collective PIs possess will facilitate transitioning the research into practical defensive tools that can be deployed into real-world systems and networks.
This project studies the design of highly robust networked systems that are resilient to extreme failures and rapid dynamics, and provide optimal performance under a wide spectrum of scenarios with varying levels of predictability.
Modern information networks are composed of heterogeneous nodes and links, whose capacities and capabilities change unexpectedly due to mobility, failures, maintenance, and adversarial attacks. User demands and critical infrastructure needs, however, require that basic primitives including access to information and services be always efficient and reliable. This project studies the design of highly robust networked systems that are resilient to extreme failures and rapid dynamics, and provide optimal performance under a wide spectrum of scenarios with varying levels of predictability.
The focus of this project will be on two problem domains, which together address adversarial network dynamics and stochastic network failures. The first component is a comprehensive theory of information spreading in dynamic networks. The PI will develop an algorithmic toolkit for dynamic networks, including local gossip-style protocols, network coding, random walks, and other diffusion processes. The second component of the project concerns failure-aware network algorithms that provide high availability in the presence of unexpected and correlated failures. The PI will study failure-aware placement of critical resources, and develop flow and cut algorithms under stochastic failures using techniques from chance-constrained optimization. Algorithms tolerant to adversarial and stochastic uncertainty will play a critical role in large-scale heterogeneous information networks of the future. Broader impacts include student training and curriculum development.
The goal of this project is to study the foundations of policy design for controlling epidemics, using a broad class of epidemic games on complex networks involving uncertainty in network information, temporal evolution and learning.
The control of epidemics, broadly defined to range from human diseases such as influenza and smallpox to malware in communication networks, relies crucially on interventions such as vaccinations and anti-virals (in human diseases) or software patches (for malware). These interventions are almost always voluntary directives from public agencies; however, people do not always adhere to such recommendations, and make individual decisions based on their specific “self interest”. Additionally, people alter their contacts dynamically, and these behavioral changes have a huge impact on the dynamics and the effectiveness of these interventions, so that “good” intervention strategies might, in fact, be ineffective, depending upon the individual response.
The goal of this project is to study the foundations of policy design for controlling epidemics, using a broad class of epidemic games on complex networks involving uncertainty in network information, temporal evolution and learning. Models will be proposed to capture the complexity of static and temporal interactions and patterns of information exchange, including the possibility of failed interventions and the potential for moral hazard. The project will also study specific policies posed by public agencies and network security providers for controlling the spread of epidemics and malware, and will develop resource constrained mechanisms to implement them in this framework.
This project will integrate approaches from Computer Science, Economics, Mathematics, and Epidemiology to give intellectual unity to the study and design of public health policies and has the potential for strong dissertation work in all these areas. Education and outreach is an important aspect of the project, and includes curriculum development at both the graduate and under-graduate levels. A multi-disciplinary workshop is also planned as part of the project.
The objective of the proposed research is to make progress on several mutually enriching directions in computational complexity theory, including problems at the intersections with algorithms and cryptography.
Computational inefficiency is a common experience: the computer cannot complete a certain task due to lack of resources such as time, memory, or bandwidth. Computational complexity theory classifies — or aims to classify — computational tasks according to their inherent inefficiency. Since tasks requiring excessive resources must be avoided, complexity theory is often indispensable in the design of a computer system. Inefficiency can also be harnessed to our advantage. Indeed, most modern cryptography and electronic commerce rely on the (presumed) inefficiency of certain computational tasks.
The objective of the proposed research is to make progress on several mutually enriching directions in computational complexity theory, including problems at the intersections with algorithms and cryptography. Building on the principal investigator’s (PI’s) previous works, the main proposed directions are:
This research is closely integrated with a plan to achieve broad impact through education. The PI is reshaping the theory curriculum at Northeastern on multiple levels. At the undergraduate level, the PI is working on and using in his classes a set of lecture notes aimed towards students lacking mathematical maturity. At the Ph.D. level, the PI is including into core classes current research topics including some of the above. Finally, the PI will continue to do research working closely with students at all levels.
This project will afford the opportunity of greatly expanding the understanding of realistic complex networks by joining theoretical analysis of coupled networks with extensive analysis of appropriately chosen large-scale databases.
The significant advances realized in recent years in the study of complex networks are severely limited by an almost exclusive focus on the behavior of single networks. However, most networks in the real world are not isolated but are coupled and hence depend upon other networks, which in turn depend upon other networks. Real networks communicate with each other and may exchange information, or, more importantly, may rely upon one another for their proper functioning. A simple but real example is a power station network that depends on a computer network, and the computer network depends on the power network. Our social networks depend on technical networks, which, in turn, are supported by organizational networks. Surprisingly, analyzing complex systems as coupled interdependent networks alters the most basic assumptions that network theory has relied on for single networks. A multidisciplinary, data driven research project will: 1) Study the microscopic processes that rule the dynamics of interdependent networks, with a particular focus on the social component; 2) Define new mathematical models/foundational theories for the analysis of the robustness/resilience and contagion/diffusive dynamics of interdependent networks. This project will afford the opportunity of greatly expanding the understanding of realistic complex networks by joining theoretical analysis of coupled networks with extensive analysis of appropriately chosen large-scale databases. These databases will be made publicly available, except for special cases where it is illegal to do so.
This research has important implications for the understanding the social and technical systems that make up a modern society. A recent US Scientific Congressional Report concludes ?No currently available modeling and simulation tools exist that can adequately address the consequences of disruptions and failures occurring simultaneously in different critical infrastructures that are dynamically inter-dependent? Understanding the interdependence of networks and its effect on the system robustness and on the structural and functional behavior is crucial for properly modeling many real world systems and applications, from disaster preparedness, to building effective organizations, to comprehending the complexity of the macro economy. In addition to these intellectual objectives, the research project includes the development of an extensive outreach program to the public, especially K-12 students.
This research targets the design and evaluation of protocols for secure, privacy-preserving data analysis in an untrusted cloud.
Therewith, the user can store and query data in the cloud, preserving privacy and integrity of outsourced data and queries. The PIs specifically address a real-world cloud framework: Google’s prominent MapReduce paradigm.
Traditional solutions for single server setups and related work on, e.g., fully homomorphic encryption, are computationally too heavy and uneconomical and offset cloud advantages. The PIs’ rationale is to design new protocols tailored to the specifics of the MapReduce computing paradigm. The PIs’ methodology is twofold. First, the PIs design new protocols that allow the cloud user to specify data analysis queries for typical operations such as searching, pattern matching or counting. For this, the PIs extend privacy-preserving techniques, e.g., private information retrieval or order preserving encryption. Second, the PIs design protocols guaranteeing genuineness of data retrieved from the cloud. Using cryptographic accumulators, users can verify whether data has not been tampered with. Besides design, the PIs also implement a prototype that is usable in a realistic setting with MapReduce.
The outcome of this project enables privacy-preserving operations and secure data storage in a widely-used cloud computing framework, thus remove one major adoption obstacle, and make cloud computing available for a larger community.
This project aims to comprehensively investigate the resiliency of Wi-Fi networks to smart attacks, and to design and implement robust solutions capable of resisting or countering them. The project additionally focuses on harnessing new capabilities of Wi-Fi radios, such as multiple-input and multiple-output (MIMO) antennas, to protect against powerful adversaries.
Wi-Fi has emerged as the technology of choice for Internet access. Thus, virtually every smartphone or tablet is now equipped with a Wi-Fi card. Concurrently, and as a means to maximize spectral efficiency, Wi-Fi radios are becoming increasingly complex and sensitive to wireless channel conditions. The prevalence of Wi-Fi networks, along with their adaptive behaviors, makes them an ideal target for denial of service attacks at a large, infrastructure level.
This project aims to comprehensively investigate the resiliency of Wi-Fi networks to smart attacks, and to design and implement robust solutions capable of resisting or countering them. The project additionally focuses on harnessing new capabilities of Wi-Fi radios, such as multiple-input and multiple-output (MIMO) antennas, to protect against powerful adversaries. The research blends theory with experimentation and prototyping, and spans a range of disciplines including protocol design and analysis, coding and modulation, on-line algorithms, queuing theory, and emergent behaviors.
The anticipated benefits of the project include: (1) a deep understanding of threats facing Wi-Fi along several dimensions, via experiments and analysis; (2) a set of mitigation techniques and algorithms to strengthen existing Wi-Fi networks and emerging standards; (3) implementation into open-source software that can be deployed on wireless network cards and access points; (4) security training of the next-generation of scientists and engineers involved in radio design and deployment.
The objective of this research is to develop a comprehensive theoretical and experimental cyber-physical framework to enable intelligent human-environment interaction capabilities by a synergistic combination of computer vision and robotics.
Specifically, the approach is applied to examine individualized remote rehabilitation with an intelligent, articulated, and adjustable lower limb orthotic brace to manage Knee Osteoarthritis, where a visual-sensing/dynamical-systems perspective is adopted to: (1) track and record patient/device interactions with internet-enabled commercial-off-the-shelf computer-vision-devices; (2) abstract the interactions into parametric and composable low-dimensional manifold representations; (3) link to quantitative biomechanical assessment of the individual patients; (4) facilitate development of individualized user models and exercise regimen; and (5) aid the progressive parametric refinement of exercises and adjustment of bracing devices. This research and its results will enable us to understand underlying human neuro-musculo-skeletal and locomotion principles by merging notions of quantitative data acquisition, and lower-order modeling coupled with individualized feedback. Beyond efficient representation, the quantitative visual models offer the potential to capture fundamental underlying physical, physiological, and behavioral mechanisms grounded on biomechanical assessments, and thereby afford insights into the generative hypotheses of human actions.
Knee osteoarthritis is an important public health issue, because of high costs associated with treatments. The ability to leverage a quantitative paradigm, both in terms of diagnosis and prescription, to improve mobility and reduce pain in patients would be a significant benefit. Moreover, the home-based rehabilitation setting offers not only immense flexibility, but also access to a significantly greater portion of the patient population. The project is also integrated with extensive educational and outreach activities to serve a variety of communities.
This multi-institutional MIDAS Center of Excellence provides a multi-disciplinary approach to computational, statistical, and mathematical modeling of important infectious diseases.
This is a proposal for a multi-institutional MIDAS Center of Excellence called the Center for Statistics and Quantitative Infectious Diseases (CSQUID). The mission the Center is to provide national and international leadership. The lead institution is the Fred Hutchinson Cancer Research Center (FHCRC). Other participating institutions are the University of Florida, Northeastern University, University of Michigan, Emory University, University of Washington (UW), University of Georgia, and Duke University. The proposal includes four synergistic research projects (RP) that will develop cutting-edge methodologies applied to solving epidemiologic, immunologic and evolutionary problems important for public health policy in influenza, dengue, polio, TB, and other infectious agents: RP1: Modeling, Spatial, Statistics (Lead: I. Longini, U. Florida);RP2: Dynamic Inference (Lead: P. Rohani, U Michigan);RP 3: Understanding transmission with integrated genetic and epidemiologic inference (Co-Leads: E. Kenah, U Florida and T. Bedford, FHCRC);RP 4: Dynamics and Evolution of Influenza Strain Variation (Lead: R. Antia, Emory U). The Software Development and Core Facilities (Lead: A. Vespignani, Northeastern U) will provide leadership in software development, access, and communication. The Policy Studies (Lead: J. Koopman, U Michigan) will provide leadership in communication of our research results to policy makers, as well as conducting novel research into policy making. The Training, Outreach, and Diversity Plans include ongoing training of 9 postdoctoral fellows and 5.25 predoctoral research assistants each year, support for participants in the Summer Institute for Statistics and Modeling in Infectious Diseases (UW) and ongoing Research Experience for Undergraduates programs at two institutions, among others. All participating institutions and the Center are committed to increasing diversity at all levels. Center-wide activities include Career Development Awards for junior faculty, annual workshops and symposia, outside speakers, and participation in the MIDAS Network meetings. Scientific leadership will be provided by the Center Director, a Leadership Committee, an external Scientific Advisory Board as well as the MIDAS Steering Committee.
Public Health Relevance
This multi-institutional MIDAS Center of Excellence provides a multi-disciplinary approach to computational, statistical, and mathematical modeling of important infectious diseases. The research is motivated by multiscale problems such as immunologic, epidemiologic, and environmental drivers of the spread of infectious diseases with the goal of understanding and communicating the implications for public health policy.
Using the Asthma BioRepository for Integrative Genomic Exploration (Asthma BRIDGE), we will perform a series of systems-level genomic analyses that integrate clinical, environmental and various forms of “omic” data (genetics, genomics, and epigenetics) to better understand how molecular processes interact with critical environmental factors to impair asthma control.
The over-arching hypothesis of this proposal is that inter-individual differences in asthma control result from the complex interplay of both environmental, genomic, and socioeconomic factors organized in discrete, scale-free molecular networks. Though strict patient compliance with asthma controller therapy and avoidance of environmental triggers are important strategies for the prevention of asthma exacerbation, failure to maintain control is the most common health-related cause of lost school and workdays. Therefore, better understanding of the molecular underpinnings and the role of environmental factors that lead to poor asthma control is needed. Using the Asthma BioRepository for Integrative Genomic Exploration (Asthma BRIDGE), we will perform a series of systems-level genomic analyses that integrate clinical, environmental and various forms of “omic” data (genetics, genomics, and epigenetics) to better understand how molecular processes interact with critical environmental factors to impair asthma control. This proposal consists three Specific Aims, each consisting of three investigational phases: (i) an initial computational discovery phase to define specific molecular networks using the Asthma BRIDGE datasets, followed by two validation phases – (ii) a computational validation phase using an independent clinical cohort, and (iii) an experimental phase to validate critical molecular edges (gene-gene interactions) that emerge from the defined molecular network.
In Specific Aim 1, we will use the Asthma BRIDGE datasets to define interactome sub-module perturbed in poor asthma control;the regulatory variants that modulate this asthma-control module;and to develop a predictive model of asthma control.
In Specific Aim 2, we will study the effects exposure to air pollution and environmental tobacco smoke on modulating the asthma control networks, testing for environment-dependent alterations in network dynamics.
In Specific Aim 3, we will study the impact of inhaled corticosteroids (ICS – the most efficacious asthma-controller medication) on network dynamics of the asthma-control sub-module by comparing network topologies of acute asthma control between subjects taking ICS to those not on ICS. For our experimental validations, we will assess relevant gene-gene interactions by shRNA studies bronchial epithelial and Jurkat T- cell lines. Experimental validations of findings from Aim 2 will be performed by co-treating cells with either cigarette smoke extract (CSE) or ozone. Similar studies will be performed with co-treatment using dexamethasone to validate findings from Aim 2. From the totality of these studies, we will gain new insights into the pathobiology of poor asthma control, and define targets for biomarker development and therapeutic targeting.
Public Health Relevance
Failure to maintain tight asthma symptom control is a major health-related cause of lost school and workdays. This project aims to use novel statistical network-modeling approaches to model the molecular basis of poor asthma control in a well-characterized cohort of asthmatic patients with available genetic, gene expression, and DNA methylation data. Using this data, we will define an asthma-control gene network, and the genetic, epigenetic, and environmental factors that determine inter-individual differences in asthma control.
Crowdsourcing measurement of mobile Internet performance, now the engine for Mobiperf.
Mobilyzer is a collaboration between Morley Mao’s group at the University of Michigan and David Choffnes’ group at Northeastern University.
Mobilyzer provides the following components:
Measurements, analysis, and system designs to reveal how the Internet’s most commonly used trust systems operate (and misfunction) in practice, and how we can make them more secure.
Research on the SSL/TLS Ecosystem
Every day, we use Secure Sockets Layer (SSL) and Transport Layer Security (TLS) to secure our Internet transactions such as banking, e-mail and e-commerce. Along with a public key infrastructure (PKI), they allow our computers to automatically verify that our sensitive information (e.g., credit card numbers and passwords) are hidden from eavesdroppers and sent to trustworthy servers.
In mid-April, 2014, a software vulnerability called Heartbleed was announced. It allows malicious users to capture information that would allow them to masquerade as trusted servers and potentially steal sensitive information from unsuspecting users. The PKI provides multiple ways to prevent such an attack from occurring, and we should expect Web site operators to use these countermeasures.
In this study, we found that the overwhelming majority of sites (more than 73%) did not do so, meaning visitors to their sites are vulnerable to attacks such as identify theft. Further, the majority of sites that attempted to address the problem (60%) did so in a way that leaves customers vulnerable.
Practical and powerful privacy for network communication (led by Stevens Le Blond at MPI).
Entails several threads that cover Internet measurement, modeling and experimentation.
Understanding the geographic nature of Internet paths and their implications for performance, privacy and security.
This study sheds light on this issue by measuring how and when Internet traffic traverses national boundaries. To do this, we ask you to run our browser applet that visits various popular websites, measures the paths taken, and identifies their locations. By running our tool, you will help us understand if and how Internet paths traverse national boundaries, even when two endpoints are in the same country. And we’ll show you these paths, helping you to understand where your Internet traffic goes
This project will develop methodologies and tools for conducting algorithm audits. An algorithm audit uses controlled experiments to examine an algorithmic system, such as an online service or big data information archive, and ascertain (1) how it functions, and (2) whether it may cause harm.
Examples of documented harms by algorithms include discrimination, racism, and unfair trade practices. Although there is rising awareness of the potential for algorithmic systems to cause harm, actually detecting this harm in practice remains a key challenge. Given that most algorithms of concern are proprietary and non-transparent, there is a clear need for methods to conduct black-box analyses of these systems. Numerous regulators and governments have expressed concerns about algorithms, as well as a desire to increase transparency and accountability in this area.
This research will develop methodologies to audit algorithms in three domains that impact many people: online markets, hiring websites, and financial services. Auditing algorithms in these three domains will require solving fundamental methodological challenges, such as how to analyze systems with large, unknown feature sets, and how to estimate feature values without ground-truth data. To address these broad challenges, the research will draw on insights from prior experience auditing personalization algorithms. Additionally, each domain also brings unique challenges that will be addressed individually. For example, novel auditing tools will be constructed that leverage extensive online and offline histories. These new tools will allow examination of systems that were previously inaccessible to researchers, including financial services companies. Methodologies, open-source code, and datasets will be made available to other academic researchers and regulators. This project includes two integrated educational objectives: (1) to create a new computer science course on big data ethics, teaching how to identify and mitigate harmful side-effects of big data technologies, and (2) production of web-based versions of the auditing tools that are designed to be accessible and informative to the general public, that will increase transparency around specific, prominent algorithmic systems, as well as promote general education about the proliferation and impact of algorithmic systems.
This project aims to investigate the development of procedural narrative systems using crowd-sourcing methods.
This project will create a framework for simulation-based training, which supports a learner’s exploration and replay, and exercise theory of mind skills in order to deliver the full promise of social skills training. The term Theory of Mind (ToM) refers to the human capacity to use beliefs about the mental processes and states of others. In order to train social skills, there has been a rapid growth in narrative-based simulations that allow learners to role-play social interactions. However, the design of these systems often constrains the learner’s ability to explore different behaviors and their consequences. Attempts to support more generative experiences face a combinatorial explosion of alternative paths through the interaction, presenting an overwhelming challenge for developers to create content for all the alternatives. Rather, training systems are often designed around exercising specific behaviors in specific situations, hampering the learning of more general skills in using ToM. This research seeks to solve this problem through three contributions: (1) a new model for conceptualizing narrative and role-play experiences that addresses generativity, (2) new methods that facilitate content creation for those generative experiences, and (3) an approach that embeds theory of mind training in the experience to allow for better learning outcomes. This research is applicable to complex social skill training across a range of situations: in schools, communities, the military, police, homeland security, and ethnic conflict.
The research begins with a paradigm shift that re-conceptualizes social skills simulation as a learner rehearsing a role instead of performing a role. This shift will exploit Stanislavsky’s Active Analysis (AA), a performance rehearsal technique that explicitly exercises Theory of Mind skills. Further, AA’s decomposition into short rehearsal scenes can break the combinatorial explosion over long narrative arcs that exacerbates content creation for social training systems. The research will then explore using behavior fitting and machine learning techniques on crowd-sourced data as way to semi-automate the development of multi-agent simulations for social training. The research will assess quantitatively and qualitatively the ability of this approach to (a) provide experiences that support exploration and foster ToM use and (b) support acquiring crowd-sourced data that can be used to craft those experiences using automatic methods.
This project is unique in combining cutting-edge work in modeling theory of mind, interactive environments, performance rehearsal, and crowd sourcing. The multidisciplinary collaboration will enable development of a methodology for creating interactive experiences that pushes the boundaries of the current state of the art in social skill training. Reliance on crowd sourcing provides an additional benefit of being able to elicit culturally specific behavior patterns by selecting the relevant crowd, allowing for both culture-specific and cross-cultural training content.
Evidence Based Medicine (EBM) aims to systematically use the best available evidence to inform medical decision making. This paradigm has revolutionized clinical practice over the past 30 years. The most important tool for EBM is the systematic review, which provides a rigorous, comprehensive and transparent synthesis of all current evidence concerning a specific clinical question. These syntheses enable decision makers to consider the entirety of the relevant published evidence.
Systematic reviews now inform everything from national health policy to bedside care. But producing these reviews requires researchers to identify the entirety of the relevant literature and then extract from this the information to be synthesized; a hugely laborious and expensive exercise. Moreover, the unprecedented growth of the biomedical literature has increased the burden on those trying to make sense of the published evidence base. Concurrently, more systematic reviews are being conducted every year to synthesize the expanding evidence base; tens of millions of dollars are spent annually conducting these reviews.
RobotReviewer aims to mitigate this issue by (semi-) automating evidence synthesis using machine learning and natural language processing.
View the RobotReviewer page to read more.
Software development is facing a paradigm shift towards ubiquitous concurrent programming, giving rise to software that is among the most complex technical artifacts ever created by humans. Concurrent programming presents several risks and dangers for programmers who are overwhelmed by puzzling and irreproducible concurrent program behavior, and by new types of bugs that elude traditional quality assurance techniques. If this situation is not addressed, we are drifting into an era of widespread unreliable software, with consequences ranging from collapsed programmer productivity, to catastrophic failures in mission-critical systems.
This project will take steps against a concurrent software crisis, by producing verification technology that assists non-specialist programmers in detecting concurrency errors, or demonstrating their absence. The proposed technology will confront the concurrency explosion problem that verification methods often suffer from. The project’s goal is a framework under which the analysis of programs with unbounded concurrency resources (such as threads of execution) can be soundly reduced to an analysis under a small constant resource bound, making the use of state space explorers practical. As a result, the project will largely eliminate the impact of unspecified computational resources as the major cause of complexity in analyzing concurrent programs. By developing tools for detecting otherwise undetectable misbehavior and vulnerabilities in concurrent programs, the project will contribute its part to averting a looming software quality crisis.
The research will enable the auditing and control of personally identifiable information leaks, addressing the key challenges of how to identify and control PII leaks when users’ PII is not known a priori, nor is the set of apps or devices that leak this information. First, to enable auditing through improved transparency, we are investigating how to use machine learning to reliably identify PII from network flows, and identify algorithms that incorporate user feedback to adapt to the changing landscape of privacy leaks. Second, we are building tools that allow users to control how their information is (or not) shared with other parties. Third, we are investigating the extent to which our approach extends to privacy leaks from IoT devices. Besides adapting our system to the unique format for leaks across a variety of IoT devices, our work investigates PII exposed indirectly through time-series data produced by IoT-generated monitoring.
The purpose of this project is to develop a conversational agent system that counsels terminally ill patients in order to alleviate their suffering and improve their quality of life.
Although many interventions have now been developed to address palliative care for specific chronic diseases, little has been done to address the overall quality of life for older adults with serious illness, spanning not only the functional aspects of symptom and medication management, but the affective aspects of suffering. In this project, we are developing a relational agent to counsel patients at home about medication adherence, stress management, advanced care planning, and spiritual support, and to provide referrals to palliative care services when needed.
When deployed on smartphones, virtual agents have the potential to deliver life-saving advice regarding emergency medical conditions, as well as provide a convenient channel for health education to improve the safety and efficacy of pharmacotherapy.
We are developing a smartphone-based virtual agent that provides counseling to patients with Atrial Fibrillation. Atrial Fibrillation is a highly prevalent heart rhythm disorder and is known to significantly increase the risk of stroke, heart failure and death. In this project, a virtual agent is deployed in conjunction with a smartphone-based heart rhythm monitor that lets patients obtain real-time diagnostic information on the status of their atrial fibrillation and determine whether immediate action may be needed.
This project is a collaboration with University of Pittsburgh Medical Center.
The last decade has seen an enormous increase in our ability to gather and manage large amounts of data; business, healthcare, education, economy, science, and almost every aspect of society are accumulating data at unprecedented levels. The basic premise is that by having more data, even if uncertain and of lower quality, we are also able to make better-informed decisions. To make any decisions, we need to perform “inference” over the data, i.e. to either draw new conclusions, or to find support for existing hypotheses, thus allowing us to favor one course of action over another. However, general reasoning under uncertainty is highly intractable, and many state-of-the-art systems today perform approximate inference by reverting to sampling. Thus for many modern applications (such as information extraction, knowledge aggregation, question-answering systems, computer vision, and machine intelligence), inference is a key bottleneck, and new methods for tractable approximate inference are needed.
This project addresses the challenge of scaling inference by generalizing two highly scalable approximate inference methods and complementing them with scalable methods for parameter learning that are “approximation-aware.” Thus, instead of treating the (i) learning and the (ii) inference steps separately, this project uses the approximation methods developed for inference also for learning the model. The research hypothesis is that this approach increases the overall end-to-end prediction accuracy while simultaneously increasing scalability. Concretely, the project develops the theory and a set of scalable algorithms and optimization methods for at least the following four sub-problems: (1) approximating general probabilistic conjunctive queries with standard relational databases; (2) learning the probabilities in uncertain databases based on feedback on rankings of output tuples from general queries; (3) approximating the exact probabilistic inference in undirected graphical models with linearized update equations; and (4) complementing the latter with a robust framework for learning linearized potentials from partially labeled data.
Mass spectrometry is a diverse and versatile technology for high-throughput functional characterization of proteins, small molecules and metabolites in complex biological mixtures. The technology rapidly evolves and generates datasets of an increasingly large complexity and size. This rapid evolution must be matched by an equally fast evolution of statistical methods and tools developed for analysis of these data. Ideally, new statistical methods should leverage the rich resources available from over 12,000 packages implemented in the R programming language and its Bioconductor project. However, technological limitations now hinder their adoption for mass spectrometric research. In response, the project ROCKET builds an enabling technology for working with large mass spectrometric datasets in R, and rapidly developing new algorithms, while benefiting from advancements in other areas of science. It also offers an opportunity of recruitment and retention of Native American students to work with R-based technology and research, and helps prepare them in a career in STEM.
Instead of implementing yet another data processing pipeline, ROCKET builds an enabling technology for extending the scalability of R, and streamlining manipulations of large files in complex formats. First, to address the diversity of the mass spectrometric community, ROCKET supports scaling down analyses (i.e., working with large data files on relatively inexpensive hardware without fully loading them into memory), as well as scaling up (i.e., executing a workflow on a cloud or on a multiprocessor). Second, ROCKET generates an efficient mixture of R and target code which is compiled in the background for the particular deployment platform. By ensuring compatibility with mass spectrometry-specific open data storage standards, supporting multiple hardware scenarios, and generating optimized code, ROCKET enables the development of general analytical methods. Therefore, ROCKET aims to democratize access to R-based data analysis for a broader community of life scientists, and create a blueprint for a new paradigm for R-based computing with large datasets. The outcomes of the project will be documented and made publicly available at https://olga-vitek-lab.khoury.northeastern.edu/.
This award reflects NSF’s statutory mission and has been deemed worthy of support through evaluation using the Foundation’s intellectual merit and broader impacts review criteria.
Northeastern University proposes to organize a Summer School ‘Big Data and Statistics for Bench Scientists.’ The Summer School will train life scientists and computational scientists in designing and analyzing large-scale experiments relying on proteomics, metabolomics, and other high-throughput biomolecular assays. The training will enhance the effectiveness and reproducibility of biomedical research, such as discovery of diagnostic biomarkers for early diagnosis of disease, or prognostic biomarkers for predicting therapy response.
Northeastern University requests funds for a Summer School, entitled Big Data and Statistics for Bench Scientists. The target audience for the School are graduate and post-graduate life scientists, who work primarily in wet lab, and who generate large datasets. Unlike other educational efforts that emphasize genomic applications, this School targets scientists working with other experimental technologies. Mass spectrometry-based proteomics and metabolomics are our main focus, however the School is also appropriate for scientists working with other assays, e.g. nuclear magnetic resonance spectroscopy (NMR), protein arrays, etc. This large community has been traditionally under-served by educational efforts in computation and statistics. This proposal aims to fill this void. The Summer School is motivated by the feedback from smaller short courses previously co-organized or co- instructed by the PI, and will cover theoretical and practical aspects of design and analysis of large-scale experimental datasets. The Summer School will have a modular format, with 8 20-hour modules scheduled in 2 parallel tracks during 2 consecutive weeks. Each module can be taken independently. The planned modules are (1) Processing raw mass spectrometric data from proteomic experiments using Skyline, (2) Begnner’s R, (3) Processing raw mass spectrometric data from metabolomic experiments using OpenMS, (4) Intermediate R, (5) Beginner’s guide to statistical experimental design and group comparison, (6) Specialized statistical methods for detecting differentially abundant proteins and metabolites, (7) Statistical methods for discovery of biomarkers of disease, and (8) Introduction to systems biology and data integration. Each module will introduce the necessary statistical and computational methodology, and contain extensive practical hands-on sessions. Each module will be organized by instructors with extensive interdisciplinary teaching experience, and supported by several teaching assistants. We anticipate the participation of 104 scientists, each taking on average 2 modules. Funding is requested for three yearly offerings of the School, and includes funds to provide US participants with 62 travel fellowships per year, and 156 registration fee wavers per module. All the course materials, including videos of the lectures and of the practical sessions, will be publicly available free of charge.
Different individuals experience the same events in vastly different ways, owing to their unique histories and psychological dispositions. For someone with social fears and anxieties, the mere thought of leaving the home can induce a feeling of panic. Conversely, an experienced mountaineer may feel quite comfortable balancing on the edge of a cliff. This variation of perspectives is captured by the term subjective experience. Despite its centrality and ubiquity in human cognition, it remains unclear how to model the neural bases of subjective experience. The proposed work will develop new techniques for statistical modeling of individual variation, and apply these techniques to a neuroimaging study of the subjective experience of fear. Together, these two lines of research will yield fundamental insights into the neural bases of fear experience. More generally, the developed computational framework will provide a means of comparing different mathematical hypotheses about the relationship between neural activity and individual differences. This will enable investigation of a broad range of phenomena in psychology and cognitive neuroscience.
The proposed work will develop a new computational framework for modeling individual variation in neuroimaging data, and use this framework to investigate the neural bases of one powerful and societally meaningful subjective experience, namely, of fear. Fear is a particularly useful assay because it involves variation across situational contexts (spiders, heights, and social situations), and dispositions (arachnophobia, acrophobia, and agoraphobia) that combine to create subjective experience. In the proposed neuroimaging study, participants will be scanned while watching videos that induce varying levels of arousal. To characterize individual variation in this neuroimaging data, the investigators will leverage advances in deep probabilistic programming to develop probabilistic variants of factor analysis models. These models infer a low-dimensional feature vector, also known as an embedding, for each participant and stimulus. A simple neural network models the relationship between embeddings and the neural response. This network can be trained in a data-driven manner and can be parameterized in a variety of ways, depending on the experimental design, or the neurocognitive hypotheses that are to be incorporated into the model. This provides the necessary infrastructure to test different neural models of fear. Concretely, the investigators will compare a model in which fear has its own unique circuit (i.e. neural signature or biomarker) to subject- or situation-specific neural architectures. More generally, the developed framework can be adapted to model individual variation in neuroimaging studies in other experimental settings.
Easy Alliance, a nonprofit initiative, has been instituted to solve complex, long term challenges in making the digital world a more accessible place for everyone.
Computer networking and the internet have revolutionized our societies, but are plagued with security problems which are difficult to tame. Serious vulnerabilities are constantly being discovered in network protocols that affect the work and lives of millions. Even some protocols that have been carefully scrutinized by their designers and by the computer engineering community have been shown to be vulnerable afterwards. Why is developing secure protocols so hard? This project seeks to address this question by developing novel design and implementation methods for network protocols that allow to identify and fix security vulnerabilities semi-automatically. The project serves the national interest as cyber-security costs the United States many billions of dollars annually. Besides making technical advances to the field, this project will also have broader impacts in education and curriculum development, as well as in helping to bridge the gap between several somewhat fragmented scientific communities working on the problem.
Technically, the project will follow a formal approach building upon a novel combination of techniques from security modeling, automated software synthesis, and program analysis to bridge the gap between an abstract protocol design and a low-level implementation. In particular, the methodology of the project will be based on a new formal behavioral model of software that explicitly captures how the choice of a mapping from a protocol design onto an implementation platform may result in different security vulnerabilities. Building on this model, this project will provide (1) a modeling approach that cleanly separates the descriptions of an abstract design from a concrete platform, and allows the platform to be modeled just once and reused, (2) a synthesis tool that will automatically construct a secure mapping from the abstract protocol to the appropriate choice of platform features, and (3) a program analysis tool that leverages platform-specific information to check that an implementation satisfies a desired property of the protocol. In addition, the project will develop a library of reusable platform models, and demonstrate the effectiveness of the methodology in a series of case studies.
Most computer programs process vast amounts of numerical data. Unfortunately, due to space and performance demands, computer arithmetic comes with its own rules. Making matters worse, different computers have different rules: while there are standardization efforts, efficiency considerations give hardware and compiler designers much freedom to bend the rules to their taste. As a result, the outcome of a computer calculation depends not only on the input, but also on the particular machine and environment in which the calculation takes place. This makes programs brittle and un-portable, and causes them to produce untrusted results. This project addresses these problems, by designing methods to detect inputs to computer programs that exhibit too much platform dependence, and to repair such programs, by making their behavior more robust.
Technical goals of this project include: (i) automatically warning users of disproportionately platform-dependent results of their numeric algorithms; (ii) repairing programs with platform instabilities; and (iii) proving programs stable against platform variations. Platform-independence of numeric computations is a form of robustness whose lack undermines the portability of program semantics. This project is one of the few to tackle the question of non-determinism in the specification (IEEE 754) of the theory (floating-point arithmetic) that machines are using today. This work requires new abstractions that soundly approximate the set of values of a program variable against a variety of compiler and hardware behaviors and features that may not even be known at analysis time. The project involves graduate and undergraduate students.
Side-channel attacks (SCA) have been a realistic threat to various cryptographic implementations that do not feature dedicated protection. While many effective countermeasures have been found and applied manually, they are application-specific and labor intensive. In addition, security evaluation tends to be incomplete, with no guarantee that all the vulnerabilities in the target system have been identified and addressed by such manual countermeasures. This SaTC project aims to shift the paradigm of side-channel attack research, and proposes to build an automation framework for information leakage analysis, multi-level countermeasure application, and formal security evaluation against software side-channel attacks.
The proposed framework provides common sound metrics for information leakage, methodologies for automatic countermeasures, and formal and thorough evaluation methods. The approach unifies power analysis and cache-based timing attacks into one framework. It defines new metrics of information leakage and uses them to automatically identify possible leakage of a given cryptosystem at an early stage with no implementation details. The conventional compilation process is extended along the new dimension of optimizing for security, to generate side-channel resilient code and ensure its secure execution at run-time. Side-channel security is guaranteed to be at a certain confidence level with formal methods. The three investigators on the team bring complementary expertise to this challenging interdisciplinary research, to develop the advanced automation framework and the associated software tools, metrics, and methodologies. The outcome significantly benefits security system architects and software developers alike, in their quest to build verifiable SCA security into a broad range of applications they design. The project also builds new synergy among fundamental statistics, formal methods, and practical system security. The automation tools, when introduced in new courses developed by the PIs, help improving students’ hands-on experience greatly. The project also leverages the experiential education model of Northeastern University to engage undergraduates, women, and minority students in independent research projects.
Nontechnical Description: Artificial intelligence especially deep learning has enabled many breakthroughs in both academia and industry. This project aims to create a generative and versatile design approach based on novel deep learning techniques to realize integrated, multi-functional photonic systems, and provide proof-of-principle demonstrations in experiments. Compared with traditional approaches using extensive numerical simulations or inverse design algorithms, deep learning can uncover the highly complicated relationship between a photonic structure and its properties from the dataset, and hence substantially accelerate the design of novel photonic devices that simultaneously encode distinct functionalities in response to the designated wavelength, polarization, angle of incidence and other parameters. Such multi-functional photonic systems have important applications in many areas, including optical imaging, holographic display, biomedical sensing, and consumer photonics with high efficiency and fidelity, to benefit the public and the nation. The integrated education plan will considerably enhance outreach activities and educate students in grades 7-12, empowered by the successful experience and partnership previously established by the PIs. Graduate and undergraduate students participating in the project will learn the latest developments in the multidisciplinary fields of photonics, deep learning and advanced manufacturing, and gain real-world knowledge by engaging industrial collaborators in tandem with Northeastern University’s renowned cooperative education program.
Technical Description: Metasurfaces, which are two-dimensional metamaterials consisting of a planar array of subwavelength designer structures, have created a new paradigm to tailor optical properties in a prescribed manner, promising superior integrability, flexibility, performance and reliability to advance photonics technologies. However, so far almost all metasurface designs rely on time-consuming numerical simulations or stochastic searching approaches that are limited in a small parameter space. To fully exploit the versatility of metasurfaces, it is highly desired to establish a general, functionality-driven methodology to efficiently design metasurfaces that encompass distinctly different optical properties and performances within a single system. The objective of the project is to create and demonstrate a high-efficiency, two-level design approach enabled by deep learning, in order to realize integrated, multi-functional meta-systems. Proper deep learning methods, such as Conditional Variational Auto-Encoder and Deep Bidirectional-Convolutional Network, will be investigated, innovatively reformulated and tailored to apply at the single-element level and the large-scale system level in combination with topology optimization and genetic algorithm. Such a generative design approach can directly and automatically identify the optimal structures and configurations out of the full parameter space. The designed multi-functional optical meta-systems will be fabricated and characterized to experimentally confirm their performances. The success of the project will produce transformative photonic architectures to manipulate light on demand.
Critical infrastructure systems are increasingly reliant on one another for their efficient operation. This research will develop a quantitative, predictive theory of network resilience that takes into account the interactions between built infrastructure networks, and the humans and neighborhoods that use them. This framework has the potential to guide city officials, utility operators, and public agencies in developing new strategies for infrastructure management and urban planning. More generally, these efforts will untangle the roles of network structure and network dynamics that enable interdependent systems to withstand, recover from, and adapt to perturbations. This research will be of interest to a variety of other fields, from ecology to cellular biology.
The project will begin by cataloging three built infrastructures and known interdependencies (both physical and functional) into a “network of networks” representation suitable for modeling. A key part of this research lies in also quantifying the interplay between built infrastructure and social systems. As such, the models will incorporate community-level behavioral effects through urban “ecometrics” — survey-based empirical data that capture how citizens and neighborhoods utilize city services and respond during emergencies. This realistic accounting of infrastructure and its interdependencies will be complemented by realistic estimates of future hazards that it may face. The core of the research will use network-based analytical and computational approaches to identify reduced-dimensional representations of the (high-dimensional) dynamical state of interdependent infrastructure. Examining how these resilience metrics change under stress to networks at the component level (e.g. as induced by inundation following a hurricane) will allow identification of weak points in existing interdependent infrastructure. The converse scenario–in which deliberate alterations to a network might improve resilience or hasten recovery of already-failed systems–will also be explored.
Students will be working on building a library of cache-oblivious data structures and measuring the performance under different workloads. We will first implement serial versions of the algorithms, and then implement the parallel version of several known cache oblivious data structures and algorithms. Read more.
The training plan is to bring in students(ideally in pairs of 2) who are currently sophomores/junior and have taken a Computer Systems course using C/C++. Students need not have any previous research experience, but generally will have experience using threads(e.g. pthreads) and have taken an algorithms course.
[1] (2 weeks) Students will first work through understanding the basics of Cache-Oblvious Algorithms and Data structures from: http://erikdemaine.org/papers/BRICS2002/paper.pdf
[2] (2 weeks) Students will then work through select lectures and exercises on caches from here: https://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-172-performance-engineering-of-software-systems-fall-2010/video-lectures/
[3] (1 week) Students will then learn the basics of profiling
[4] (2 weeks) Next students will implement a few data structures and algorithms, and then
[5] (4 weeks) Students will work to find good real world benchmarks, mining github repositories for benchmarks that suffer from false-sharing performance related problems.
[6] The remaining time will be writing up and polishing collected results.
The key research questions we are investigating in the Mon(IoT)r research group are:
Our methodology entails recording and analyzing all network traffic generated by a variety of IoT devices that we have acquired. We not only inspect traffic for PII in plaintext, but attempt to man-in-the-middle SSL connections to understand the contents of encrypted flows. Our analysis allows us to uncover how IoT devices are currently protecting users’ PII, and determine how easy or difficult it is to mount attacks against user privacy.
Wehe uses your device to exchange Internet traffic recorded from real, popular apps like YouTube and Spotify—effectively making it look as if you are using those apps. As a result, if an Internet service provider (ISP) tries to slow down an YouTube, Wehe would see the same behavior. We then send the same app’s Internet traffic, but replacing the content with randomized bytes, which prevents the ISPs from classifying the traffic as belonging to the app. Our hypothesis is that the randomized traffic will not cause an ISP to conduct application-specific differentiation (e.g., throttling or blocking), but the original traffic will. We repeat these tests several times to rule out noise from bad network conditions, and tell you at the end whether your ISP is giving different performance to an app’s network traffic.
Type-safe programming languages report errors when a program applies operations to data of the wrong type—e.g., a list-length operation expects a list, not a number—and they come in two flavors: dynamically typed (or untyped) languages, which catch such type errors at run time, and statically typed languages, which catch type errors at compile time before the program is ever run. Dynamically typed languages are well suited for rapid prototyping of software, while static typing becomes important as software systems grow since it offers improved maintainability, code documentation, early error detection, and support for compilation to faster code. Gradually typed languages bring together these benefits, allowing dynamically typed and statically typed code—and more generally, less precisely and more precisely typed code—to coexist and interoperate, thus allowing programmers to slowly evolve parts of their code base from less precisely typed to more precisely typed. To ensure safe interoperability, gradual languages insert runtime checks when data with a less precise type is cast to a more precise type. Gradual typing has seen high adoption in industry, in languages like TypeScript, Hack, Flow, and C#. Unfortunately, current gradually typed languages fall short in three ways. First, while normal static typing provides reasoning principles that enable safe program transformations and optimizations, naive gradual systems often do not. Second, gradual languages rarely guarantee graduality, a reasoning principle helpful to programmers, which says that making types more precise in a program merely adds in checks and the program otherwise behaves as before. Third, time and space efficiency of the runtime casts inserted by gradual languages remains a concern. This project addresses all three of these issues. The project’s novelties include: (1) a new approach to the design of gradual languages by first codifying the desired reasoning principles for the language using a program logic called Gradual Type Theory (GTT), and from that deriving the behavior of runtime casts; (2) compiling to a non-gradual compiler intermediate representation (IR) in a way that preserves these principles; and (3) the ability to use GTT to reason about the correctness of optimizations and efficient implementation of casts. The project has the potential for significant impact on industrial software development since gradually typed languages provide a migration path from existing dynamically typed codebases to more maintainable statically typed code, and from traditional static types to more precise types, providing a mechanism for increased adoption of advanced type features. The project will also have impact by providing infrastructure for future language designs and investigations into improving the performance of gradual typing.
The project team will apply the GTT approach to investigate gradual typing for polymorphism with data abstraction (parametricity), algebraic effects and handlers, and refinement/dependent types. For each, the team will develop cast calculi and program logics expressing better equational reasoning principles than previous proposals, with certified elaboration to a compiler intermediate language based on Call-By-Push-Value (CBPV) while preserving these properties, and design convenient surface languages that elaborate into them. The GTT program logics will be used for program verification, proving the correctness of program optimizations and refactorings.
This award reflects NSF’s statutory mission and has been deemed worthy of support through evaluation using the Foundation’s intellectual merit and broader impacts review criteria.
When building large software systems, programmers should be able to use the best language for each part of the system. But when a component written in one language becomes part of a multi-language system, it may interoperate with components that have features that don’t exist in the original language. This affects programmers when they refactor code (i.e., make changes that should result in equivalent behavior). Since programs interact after compilation to a common target, programmers have to understand details of linking and target-level interaction when reasoning about correctly refactoring source components. Unfortunately, there are no software toolchains available today that support single-language reasoning when components are used in a multi-language system. This project will develop principled software toolchains for building multi-language software. The project’s novelties include (1) designing language extensions that allow programmers to specify how they wish to interoperate (or link) with conceptual features absent from their language through a mechanism called linking types, and (2) developing compilers that formally guarantee that any reasoning the programmer does at source level is justified after compilation to the target. The project has the potential for tremendous impact on the software development landscape as it will allow programmers to use a language close to their problem domain and provide them with software toolchains that make it easy to compose components written in different languages into a multi-language software system.
The project will evaluate the idea of linking types by extending ML with linking types for interaction with Rust, a language with first-class control, and a normalizing language, and developing type preserving compilers to a common typed LLVM-like target language. The project will design a rich dependently typed LLVM-like target language that can encapsulate effects from different source languages to support fully abstract compilation from these languages. The project will also investigate reporting of cross-language type errors to aid programmers when composing components written in different languages.
This award reflects NSF’s statutory mission and has been deemed worthy of support through evaluation using the Foundation’s intellectual merit and broader impacts review criteria.
Modern programming languages ranging from Java to Matlab rely on just-in-time compilation techniques to achieve performance competitive with computer languages such as C or C++. What sets just-in-time compilers apart from batch compilers is that they can observe the programs actions as it executes, and inspect its state. Knowledge of the program’s state and past behavior, allows the compiler to perform speculative optimizations that improve performance. The intellectual merits of this research are to devise techniques for reasoning about the correctness of the transformations performed by just-in-time compilers. The project’s broader significance and importance are its implications to industrial practice. The results of this research will be applicable to commercial just-in-time compilers for languages such as JavaScript and R.
This project develops a general model of just-in-time compilation that subsumes deployed systems and allows systematic exploration of the design space of dynamic compilation techniques. The research questions that will be tackled in this work lie along two dimensions: Experimental—explore the design space of dynamic compilation techniques and gain an understanding of trade-offs; Foundational—formalize key ingredients of a dynamic compiler and develop techniques for reasoning about correctness in a modular fashion.
To provide open-source, interoperable, and extensible statistical software for quantitative mass spectrometry, which enables experimentalists and developers of statistical methods to rapidly respond to changes in the evolving biotechnological landscape.
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed vel justo tellus. Proin volutpat justo at arcu efficitur venenatis. Vestibulum eu pharetra nulla. Curabitur et tempus ante. Nulla id sapien id libero lacinia interdum vitae et ligula. Mauris in aliquet justo. Nam et fringilla leo. Vestibulum scelerisque ipsum mollis quam tristique, vitae consequat ante sollicitudin. Vivamus in tempus lectus, sed aliquet ante. Nullam ut diam a orci tincidunt pellentesque. Praesent at enim ut sem molestie facilisis ut ut sapien. Aenean lacinia erat sit amet tempor sagittis. Integer condimentum luctus lorem, in mattis lectus ullamcorper at. Curabitur eros magna, vulputate id faucibus nec, cursus sit amet odio.
Fusce tristique enim ut turpis consequat, eget porta nisl fringilla. Pellentesque quis tristique ipsum, ut eleifend odio. Sed eget velit magna. Vivamus nec metus sit amet mi sodales cursus et a purus. Duis turpis arcu, fringilla fringilla tincidunt eu, scelerisque vitae risus. Nulla facilisi. Integer at dui volutpat, ornare urna non, malesuada ante. Vestibulum eget purus ac tortor tempus interdum nec ac dolor. Vivamus sed eros eleifend, ornare mi sed, mollis tortor. Donec varius id justo id sollicitudin. Maecenas ut volutpat mi, ut vehicula purus. Nullam eu consequat tellus, ac fermentum felis. Morbi euismod risus ut risus consectetur, a efficitur orci varius. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Mauris ut turpis vel libero finibus interdum eu eget dui. Quisque scelerisque aliquam quam vitae rhoncus.
Ut semper iaculis ante non pretium. Ut sed ligula nibh. Duis sollicitudin, arcu quis mattis posuere, odio mauris rutrum sapien, id cursus nibh tellus in nisi. Cras consequat finibus metus, nec gravida metus varius id. Nulla orci libero, viverra sit amet sollicitudin sit amet, lobortis at mauris. Phasellus ac felis pellentesque, gravida neque ut, dapibus leo. Cras a ex purus. Sed rutrum pretium lacus et aliquam. Curabitur finibus ante non nisl pellentesque, sed rutrum risus hendrerit. Nunc elementum hendrerit nisl vel bibendum. Maecenas auctor lacus id orci condimentum placerat. Ut imperdiet condimentum nulla, non elementum dolor gravida in. Curabitur nec ligula nec sem tincidunt aliquet. Nulla consectetur consectetur viverra. Phasellus scelerisque gravida pharetra.
Nam varius vestibulum metus sit amet porttitor. Nunc a bibendum nunc. In vel laoreet enim. Mauris venenatis nisl lectus, ac tincidunt diam tincidunt ac. Aliquam pellentesque finibus purus, ac suscipit est. Pellentesque at ligula eleifend, varius libero eget, finibus diam. Etiam pulvinar aliquet lectus, vitae condimentum felis pharetra sit amet. Donec neque ligula, interdum ac est vel, lacinia mollis erat. Quisque commodo nisi ipsum, et sollicitudin quam imperdiet et. Curabitur interdum consequat varius. Sed auctor mattis varius. Nam sodales tortor ex, at tempor diam tincidunt eleifend. Aliquam ullamcorper efficitur mauris ac tincidunt. Sed eu elementum nunc. Aliquam auctor varius lacus eu aliquet.
Led By:
Yunsi Fei
Led By:
Yunsi Fei
Led By:
Yongmin Liu
Led By:
Yongmin Liu
Led By:
Ed Boyden
Led By:
Ed Boyden
Led By:
Scott Weiss
Led By:
Scott Weiss
Led By:
Amar Dhand
Led By:
Amar Dhand
Led By:
Alex A. Ahmed
Led By:
Alex A. Ahmed
Conducts research on the design, implementation, and analysis of programming languages, and more.
Conducts research on the design, implementation, and analysis of programming languages, and more.
Focuses on core issues and real-world applications of machine learning.
Focuses on core issues and real-world applications of machine learning.
Creates models and tools to understand and anticipate large-scale complex networks and systems.
Creates models and tools to understand and anticipate large-scale complex networks and systems.
Explores complex research problems around social influence, social networks, and network science.
Explores complex research problems around social influence, social networks, and network science.
Solves complex computational problems in algorithmic game theory, cryptography, and learning theory.
Solves complex computational problems in algorithmic game theory, cryptography, and learning theory.
Leverages behavioral informatics to develop affordable, technology-based healthcare solutions.
Leverages behavioral informatics to develop affordable, technology-based healthcare solutions.
Invents and validates systems, methodologies, and algorithms for new mobile health applications.
Invents and validates systems, methodologies, and algorithms for new mobile health applications.
Supports Digital Humanities and Computational Social Science research, coursework, and more.
Supports Digital Humanities and Computational Social Science research, coursework, and more.
Designs conversational agents for healthcare, such as simulating face-to-face counseling.
Designs conversational agents for healthcare, such as simulating face-to-face counseling.
Works on flash algorithms and devices and other emerging storage technologies.
Works on flash algorithms and devices and other emerging storage technologies.
Develops methods for high-throughput large-scale molecular investigations of biological organisms.
Develops methods for high-throughput large-scale molecular investigations of biological organisms.
Researches applied machine learning, social media analytics, human-computer interaction, and more.
Researches applied machine learning, social media analytics, human-computer interaction, and more.
Examines how novel interactive computing systems can help people to achieve wellness.
Examines how novel interactive computing systems can help people to achieve wellness.
Investigates how networks emerge, what they look like, how they evolve, and more.
Investigates how networks emerge, what they look like, how they evolve, and more.
Employs computational, cognitive, and behavioral sciences technologies to study human development.
Employs computational, cognitive, and behavioral sciences technologies to study human development.
Offers a collaborative public cloud for research, based on the Open Cloud eXchange model.
Offers a collaborative public cloud for research, based on the Open Cloud eXchange model.
Develops nursing research expertise and effective technology interventions for at-risk adults.
Develops nursing research expertise and effective technology interventions for at-risk adults.
Develops perception, planning, and control algorithms for robots in built-for-human environments.
Develops perception, planning, and control algorithms for robots in built-for-human environments.
Conducts interdisciplinary work on speech communication and human-computer interaction.
Conducts interdisciplinary work on speech communication and human-computer interaction.
Works on distributed systems and network protocols to boost security, availability, and performance.
Works on distributed systems and network protocols to boost security, availability, and performance.
Supports cross-university visualization research and connects faculty, researchers, and students.
Supports cross-university visualization research and connects faculty, researchers, and students.
Uses AI to design machines that adapt to people—and creates a seamless human-robot experience.
Uses AI to design machines that adapt to people—and creates a seamless human-robot experience.
Works on computational modeling of human behavior for study and education and analysis applications.
Works on computational modeling of human behavior for study and education and analysis applications.
Researches computational social science, information retrieval, machine learning, and more.
Researches computational social science, information retrieval, machine learning, and more.
Explores visualization for human perception and vision, visual encodings, design thinking, and more.
Explores visualization for human perception and vision, visual encodings, design thinking, and more.
Studies research problems and practical applications in scalable data management and analysis.
Studies research problems and practical applications in scalable data management and analysis.
Investigates personal information, data, and privacy risks from Internet of things (IoT) devices.
Investigates personal information, data, and privacy risks from Internet of things (IoT) devices.
Focuses on design, user experience, and AI algorithms within games and interactive arts/media.
Focuses on design, user experience, and AI algorithms within games and interactive arts/media.
Explores collaboration in distributed environments using modeling, experiments, and data analysis.
Explores collaboration in distributed environments using modeling, experiments, and data analysis.
Develops data algorithms, techniques, and methodologies to analyze and solve complex problems.
Develops data algorithms, techniques, and methodologies to analyze and solve complex problems.
Focuses on advancing machine learning through research on large-scale spatiotemporal data.
Focuses on advancing machine learning through research on large-scale spatiotemporal data.
Conducts work on automated theorems, concurrency, formal verification, model checking, and more.
Conducts work on automated theorems, concurrency, formal verification, model checking, and more.
Focuses on applied computer security in collaboration with several other security research labs.
Focuses on applied computer security in collaboration with several other security research labs.
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed vel justo tellus. Proin volutpat justo at arcu efficitur venenatis. Vestibulum eu pharetra nulla.
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed vel justo tellus. Proin volutpat justo at arcu efficitur venenatis. Vestibulum eu pharetra nulla.
Conducts research on the design, implementation, and analysis of programming languages, and more.
Conducts research on the design, implementation, and analysis of programming languages, and more.
Focuses on core issues and real-world applications of machine learning.
Focuses on core issues and real-world applications of machine learning.
Creates models and tools to understand and anticipate large-scale complex networks and systems.
Creates models and tools to understand and anticipate large-scale complex networks and systems.
Explores complex research problems around social influence, social networks, and network science.
Explores complex research problems around social influence, social networks, and network science.
Solves complex computational problems in algorithmic game theory, cryptography, and learning theory.
Solves complex computational problems in algorithmic game theory, cryptography, and learning theory.
Leverages behavioral informatics to develop affordable, technology-based healthcare solutions.
Leverages behavioral informatics to develop affordable, technology-based healthcare solutions.
Invents and validates systems, methodologies, and algorithms for new mobile health applications.
Invents and validates systems, methodologies, and algorithms for new mobile health applications.
Supports Digital Humanities and Computational Social Science research, coursework, and more.
Supports Digital Humanities and Computational Social Science research, coursework, and more.
Designs conversational agents for healthcare, such as simulating face-to-face counseling.
Designs conversational agents for healthcare, such as simulating face-to-face counseling.
Works on flash algorithms and devices and other emerging storage technologies.
Works on flash algorithms and devices and other emerging storage technologies.
Develops methods for high-throughput large-scale molecular investigations of biological organisms.
Develops methods for high-throughput large-scale molecular investigations of biological organisms.
Researches applied machine learning, social media analytics, human-computer interaction, and more.
Researches applied machine learning, social media analytics, human-computer interaction, and more.
Examines how novel interactive computing systems can help people to achieve wellness.
Examines how novel interactive computing systems can help people to achieve wellness.
Investigates how networks emerge, what they look like, how they evolve, and more.
Investigates how networks emerge, what they look like, how they evolve, and more.
Employs computational, cognitive, and behavioral sciences technologies to study human development.
Employs computational, cognitive, and behavioral sciences technologies to study human development.
Offers a collaborative public cloud for research, based on the Open Cloud eXchange model.
Offers a collaborative public cloud for research, based on the Open Cloud eXchange model.
Develops nursing research expertise and effective technology interventions for at-risk adults.
Develops nursing research expertise and effective technology interventions for at-risk adults.
Develops perception, planning, and control algorithms for robots in built-for-human environments.
Develops perception, planning, and control algorithms for robots in built-for-human environments.
Conducts interdisciplinary work on speech communication and human-computer interaction.
Conducts interdisciplinary work on speech communication and human-computer interaction.
Works on distributed systems and network protocols to boost security, availability, and performance.
Works on distributed systems and network protocols to boost security, availability, and performance.
Supports cross-university visualization research and connects faculty, researchers, and students.
Supports cross-university visualization research and connects faculty, researchers, and students.
Uses AI to design machines that adapt to people—and creates a seamless human-robot experience.
Uses AI to design machines that adapt to people—and creates a seamless human-robot experience.
Works on computational modeling of human behavior for study and education and analysis applications.
Works on computational modeling of human behavior for study and education and analysis applications.
Researches computational social science, information retrieval, machine learning, and more.
Researches computational social science, information retrieval, machine learning, and more.
Explores visualization for human perception and vision, visual encodings, design thinking, and more.
Explores visualization for human perception and vision, visual encodings, design thinking, and more.
Studies research problems and practical applications in scalable data management and analysis.
Studies research problems and practical applications in scalable data management and analysis.
Investigates personal information, data, and privacy risks from Internet of things (IoT) devices.
Investigates personal information, data, and privacy risks from Internet of things (IoT) devices.
Focuses on design, user experience, and AI algorithms within games and interactive arts/media.
Focuses on design, user experience, and AI algorithms within games and interactive arts/media.
Explores collaboration in distributed environments using modeling, experiments, and data analysis.
Explores collaboration in distributed environments using modeling, experiments, and data analysis.
Develops data algorithms, techniques, and methodologies to analyze and solve complex problems.
Develops data algorithms, techniques, and methodologies to analyze and solve complex problems.
Focuses on advancing machine learning through research on large-scale spatiotemporal data.
Focuses on advancing machine learning through research on large-scale spatiotemporal data.
Conducts work on automated theorems, concurrency, formal verification, model checking, and more.
Conducts work on automated theorems, concurrency, formal verification, model checking, and more.
Focuses on applied computer security in collaboration with several other security research labs.
Focuses on applied computer security in collaboration with several other security research labs.
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed vel justo tellus. Proin volutpat justo at arcu efficitur venenatis. Vestibulum eu pharetra nulla.
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed vel justo tellus. Proin volutpat justo at arcu efficitur venenatis. Vestibulum eu pharetra nulla.