Yolanda Gil's Awards and Grants
Yolanda Gil (PI), Awards and Grants
-
Artificial Intelligence and Community Driven Wildland Fire Innovation
via a WIFIRE Commons Infrastructure for Data and Model Sharing (Phase II).
National Science Foundation (NSF).
Award number OIA-2134904.
September 2021 - August 2023.
Ilkay Altintas (PI), Yolanda Gil (Co-PI), John Hiers (Co-PI), and Rodman Linn (Co-PI).
This project is creating an artificial intelligence (AI) enabled approach to
support the design and management of controlled fires that aid in
wildfire preparedness and the prevention of megafires.
-
Towards Reflection Competencies for AI Scientists: Developing a Conceptual
Framework and Open Research Platform.
Office of Naval Research (ONR).
Award ID N00014-21-1-2437.
June 2021 - May 2023.
Yolanda Gil (PI).
As scientific questions become more complex, the capabilities of scientists to do
research will need to be augmented with AI systems.
This project will develop an open architecture for cognitive AI scientists
that can formulate scientific questions, devise strategies to answer them,
and place new findings in the context of the original question.
A core aspects of this research is capturing scientific knowledge to reason
about open questions, construct plausible hypotheses, formulate appropriate
methods to test them, and interpret the results obtained. The proposed project
will be carried out in two phases. In the first phase, we will develop the
conceptual framework and prototype its core competencies and demonstrate it
in two scientific disciplines. In the second phase, we will exercise the
conceptual framework in new domains, and extend it with capabilities to
automatically write sections of scientific articles describing approach
and method in selected domains. This research will advances many areas of AI,
including cognitive architectures, knowledge representation, reasoning,
planning, learning, explanation, and metareasoning.
-
Artificial Intelligence and Community Driven Wildland Fire Innovation
via a WIFIRE Commons Infrastructure for Data and Model Sharing.
National Science Foundation (NSF).
Award number OIA-2040676.
September 2020 - May 2022.
Ilkay Altintas (PI), Yolanda Gil (Co-PI), John Hiers (Co-PI), and Rodman Linn (Co-PI).
This project is creating a data-driven, artificial intelligence (AI) enabled and
model-based scientific approach that ultimately aims to limit and even prevent the
devastating effects of wildfires by using advanced technologies to support
fire mitigation, preparedness, response, and recovery.
-
Automating Machine Learning for Time Series Analysis.
JP Morgan Chase.
March 2019 - February 2022.
Yolanda Gil (PI), Deborah Khider (Co-PI).
The goal of this project is to automate time series analysis. This would enable
non-experts to analyze time series data with high-quality, proven methods, and
would also allow efficient analysis of the vasts amount of timeseries data available.
We are developing an automated system for time series analysis.
-
High Resolution Mapping of the Genetic Risk for Disease in the Aging Brain.
National Institutes of Health (NIH).
Award number 1R01AG059874-01.
August 2018 - November 2023.
Neda Jahanshad (PI), Yolanda Gil (Co-PI).
This project is developing a develop a discovery engine, powered with intelligent workflows,
that will continually process neuroscience data. The engine will autonomously trigger
the execution of relevant families of workflows, customize them to the data at hand,
and alert users of interesting findings.
-
MINT: Model INTegration Through Knowledge-Rich Data and Process Composition.
Defense Advanced Research Projects Agency (DARPA).
Award number W911NF-18-1-0027.
December 2017 - November 2021.
Yolanda Gil (PI), Ewa Deelman (Co-PI), Craig Knoblock (Co-PI), Rafael Ferreira (Co-PI),
Kelly Cobourn (Co-PI), Christopher Duffy (Co-PI), Vipin Kumar (Co-PI),
Scott D. Peckham (Co-PI).
Major societal and environmental challenges require forecasting how natural processes
and human activities affect one another. There are many areas of the globe where climate
affects water resources and therefore food availability, with major economic and social
implications. Today, such analyses require significant effort to integrate highly
heterogeneous models from separate disciplines, including geosciences, agriculture,
economics, and social sciences. Model integration requires resolving semantic,
spatio-temporal, and execution mismatches, which are largely done by hand today
and may take more than two years. This project will develop a new approach to use a wide
range of semantics in modeling environments in order to significantly reduce the time
needed to develop new integrated models while ensuring their utility and accuracy.
-
EarthCube Integration: ASSET: Accelerating Scientific Workflows using EarthCube Technologies.
National Science Foundation (NSF).
Award number ICER-1740683.
September 2017 - August 2019.
Scott D. Peckham (PI), Co-PIs: Yolanda Gil (co-PI), Cindy Bruyere (co-PI), Michael D.
Daniels (co-PI), James Done (co-PI).
This project is developing a framework to capture scientific workflows in different
domains of geosciences, characterize the type of work and the amount of effort involved
in each activity, and map workflow activities to available tools and infrastructure.
-
EarthCube Data Infrastructure: A unified experimental-natural digital data system for
analysis of rock microstructures.
National Science Foundation (NSF).
Award number ICER-1639716.
September 2017 - August 2020.
Julie Neuman (PI), Yolanda Gil (co-PI), J. Douglas Walker (co-PI), Philip Skemer (co-PI),
Matty Mookerjee (co-PI), Gurman Gill (co-PI), Chris J. Marone (co-PI), and Basil
Tikoff (co-PI).
This project is developing semantic workflows to analyze geology data about
rock features and microstructures using image analysis and machine learning.
-
Crowdsourcing metadata for the ENIGMA neuroscience collaboration.
The Kavli Foundation.
April 2017 - April 2019.
Paul Thompson (PI), Neda Jahanshad (co-PI), Yolanda Gil (Co-PI).
This project is extending the Organic Data Science framework to use semantics
and metadata to manage data and other information about the ENIGMA neuroscience
collaboration.
-
DSBox: Data Scientist in a Box.
Defense Advanced Research Projects Agency (DARPA).
Award number FA8750-17-C-0106.
May 2017 - May 2021.
Pedro Szekely (PI), Yolanda Gil (Co-PI), Aram Galstyan (Co-PI), Andrew McCallum (co-PI),
Steve Minton (co-PI).
This project is developing an intelligent system that incorporates significant expertise
in data science and machine learning in order to: 1) automatically generate data analysis
workflows that include feature generation, feature selection, data cleaning, and machine
learning steps for any kind of input data including tabular, text, image, audio data and
their combinations; 2) extensible libraries of data processing and machine learning
components with semantic metadata that can be used to compose valid workflows; and
3) interactive generation of multi-step data science solutions that incorporate user
constraints and expertise in the domain.
-
Intelligent Systems Research to Support Geosciences: A Research Coordination Network.
National Science Foundation (NSF).
Award number ICER-1632211. September 2016 - August 2018.
Suzanne Pierce (PI), Imme Ebert-Uphoff (Co-PI), Yolanda Gil (Co-PI), Basil Tikoff (Co-PI).
This project will support an emerging community of interdisciplinary researchers
to enable advances in our understanding of Earth systems through
innovative applications of intelligent and information systems to fundamental
geosciences problems.
-
Model Integration for Big Mechanism.
Defense Advanced Research Projects Agency (DARPA).
Award number: W911NF-14-1-0364.
June 2016 - October 2016.
Yolanda Gil (PI).
This project investigates the integration of models across disciplines through semantic
techniques to support flexible model integration and simulation at large scale.
-
A Discovery Engine for Reproducible Comparable Multi-Omics Analysis.
National Institutes of Health (NIH).
Award number 1R01GM117097-01.
February 2016 - January 2019.
Parag Mallick (PI), Yolanda Gil (Co-PI).
This project is developing a develop an open-source workflow platform
to enable the generation and effective use of multi-omic workflows.
We are using the WINGS semantic workflow reasoner to significantly automate development
and validation of workflows. A discovery engine will autonomously trigger the execution of
families of workflows and alert users of interesting findings.
-
Towards Automating Discovery with DISK: Systematic Data Analysis of Science Repositories.
Defense Advanced Research Projects Agency (DARPA).
Award number W911NF-15-1-0555.
September 2015 - September 2017.
Yolanda Gil (PI), Parag Mallick (Co-PI).
This project is developing a scientific discovery system (DISK) that captures data
analytics expertise, applies it automatically and routinely to scientific data repositories,
and highlights interesting findings and potential discoveries. We are investigating three
fundamental questions: 1) Can we identify domain-independent computational processes and
use them to make autonomous hypothesis-driven discoveries from existing datasets?
2) Can we effectively capture domain-specific knowledge needed to apply those processes
in a new science domain? 3) Can these processes be automatically combined in novel ways
and enable new kinds of discoveries? Our approach is to automate the hypothesize-test-evaluate
discovery cycle with an intelligent system that a scientist can task with lines of inquiry
over existing data repositories. The system will then autonomously test hypotheses by
running analytic workflows on the data and examining the results to report new findings
back to the scientist. DISK extends the existing
WINGS semantic workflow
system. We are initially applying DISK in multi-omics and subsurface water modeling.
-
EarthCube Integrated Activities: LinkedEarth: Crowdsourcing Data Curation and Standards
Development in Paleoclimatology.
National Science Foundation (NSF).
Grant number ICER-1541029.
September 2015 - August 2018.
Julien Emile-Geay (PI), Yolanda Gil (Co-PI), Nicholas McKay (Co-PI).
Paleoclimatology datasets are key to understanding low-frequency, natural climate
variability that significantly modulates anthropogenic global warming. However,
there is currently no universal way to share paleoclimate data between users or
machines, hindering integration and synthesis. The majority of observations are
gathered by independent scientists with no formal language for describing their
data and metadata to each other, or to machines, in a standardized fashion. This is
further aggravated by the diversity of data (e.g. trees, ice cores, lake or marine
sediments, corals, mollusks, speleothems), each having very different characteristics,
and the diversity of measured quantities (e.g. trace metal concentrations, isotope ratios,
layer thickness, etc.). This data diversity is typical in other sciences, particularly
in ecology. Managing and integrating this kind of scientific data is challenging because:
(1) metadata creation and data curation requires expert knowledge; (2) top-down data
management approaches do not tend to be effective; (3) existing infrastructure does
not foster standardization. Therefore, there is a critical need for a flexible platform
enabling crowdsourced data curation and standards development through community
participation. In this project we are investigating a socio-technical system that has
the potential to engage a broad user base in geoscientific data curation. Our approach
is based on semantic wikis that incorporate editorial and community-driven processes
that follow principles from social sciences research on successful on-line communities.
-
EarthCube Integrated Activities: InGeO: Integrated Geosciences Observatory.
National Science Foundation (NSF).
Grant number ICER-1540937.
September 2015 - August 2017.
Asti Bhatt (PI), Yolanda Gil (Co-PI), Russell Cosgrove (Co-PI).
The Integrated Geoscience Observatory (InGeO) is a pilot project for geospace research
that facilitates the integration of resources from different disciplines that study
the Sun-Earth system. The observatory creates an integrated package of software
tools contributed by researchers with specific capabilities, and designed to enable
integration of diverse observational data. Features of the toolkit include:
(1) linking diverse datasets from multiple data repositories and automatically
mapping them to a common user-specified coordinate grid; (2) implementing the
well-known Assimilative Mapping of Ionospheric Electrodynamics (AMIE) procedure
for assimilation of this data to yield a global picture; and (3) using the OntoSoft
registry for sharing analytic software and GeoDataspace for credit attribution of
processed data.
-
SciSpark: Provenance Recording in the Spark Framework for the Regional Climate Model Evaluation System.
National Aeronautics and Space Administration (NASA).
Grant number 14-AIST-14-0034.
September 2015 - February 2017.
Chris Mattmann (PI), Yolanda Gil (Co-PI), Jinwon Kim (Co-PI).
SciSpark is a framework for scaling scientific computations that extends Apache Spark
with new capabilities for processing science data standards for large-scale data,
with a particular focus on climate data. Remote sensing data and climate model
output are multi-dimensional arrays of massive sizes locked away in heterogeneous
file formats (HDF5/4, NetCDF 3/4) and metadata models (HDF-EOS, CF) making it
difficult to perform multi-stage, iterative science processing since each stage
requires writing and reading data to and from disk. Apache Spark implements the
MapReduce paradigm for parallel computing while emphasizing in-memory computation,
and so outperforms the disk-based MapReduce implementation in Apache Hadoop by 100x
in memory and by 10x on disk. SciSpark will enable scalable model evaluation by
executing large-scale comparisons of A-Train satellite observations to model grids
on a cluster of 100 to 1000 compute nodes. This 2nd generation capability for NASA's
Regional Climate Model Evaluation System (RCMES) will compute simple climate metrics
at interactive speeds, and extend to quite sophisticated iterative algorithms such as
machine-learning (ML) based clustering of temperature PDFs, and even graph-based
algorithms for searching for Mesocale Convective Complexes. The goals of SciSpark are to:
(1) Decrease the time to compute comparison statistics and plots from minutes to seconds;
(2) Allow for interactive exploration of time-series properties over seasons and years;
(3) Decrease the time for satellite data ingestion into RCMES to hours; (4) Allow for
Level-2 comparisons with higher-order statistics or PDF's in minutes to hours; and
(5) Move RCMES into a near real time decision-making platform.
-
Intelligent Systems for Geosciences.
National Science Foundation (NSF).
Grant number IIS-1533930.
September 2014 - August 2016.
Yolanda Gil (PI), Suzanne Pierce (Co-PI).
In recent years, intelligent systems have demonstrated significant transformative impact
in the commercial sector. These techniques have been applied to geosciences with some
success, but they are inadequate to meet the challenges presented by geosciences research.
First, using data alone is insufficient to create models of the complex phenomena under
study. Second, geoscientists need to reach across disciplines to synthesize disparate
data and models, which requires extensive qualification and context. Third, scientists
need powerful partnerships with computers in order to explore complex hypotheses and
understand how new findings relate to the existing body of knowledge. Therefore, in order
to tackle complex geosciences phenomena new approaches are needed. The goal of this
project is to produce a report outlining a research agenda for intelligent systems in
geosciences, and their potential to overcome the challenges that geoscientists face.
-
EarthCube Research Coordination Networks: iSamplES: the Internet of Samples in the Earth Sciences.
National Science Foundation (NSF).
Grant number ICER-1440351.
September 2014 - August 2017.
ISI PI: Yolanda Gil.
The Internet of SamplES in the Earth Sciences (iSamples) is a research coordination
network that seeks to advance the use of innovative cyberinfrastructure to connect
physical samples and sample collections across the Earth Sciences with digital
data infrastructures to revolutionize their utility for science. The ultimate goal
of iSamplES is to dramatically improve the discovery, access, sharing, analysis, and
curation of physical samples and the data generated by their study for the benefit of
science and society. As part of this project, we are developing a registry of sample
repositories, together with metadata to describe their holdings, operations, and curation
procedures.
-
Accelerating Map of the World.
National Geospatial-Intelligence Agency (NGA).
January 2015 - September 2015.
ISI PI: Yolanda Gil.
Geographic data is increasingly being shared, interchanged and used for purposes other
than the producers� intended purpose. In addition to traditional institutional sources
of information, recent crowdsourcing approaches have proven to be useful additions to
traditional geospatial data sources. Yet the timeliness and currency advantages of
crowdsourcing bring additional burdens regarding quality and fitness-for-use.
Dynamically integrated datasets created through Web mashups are constantly appearing,
resulting in a variety of new geospatial resources that deliver customized information.
Producers expect that their data will have unanticipated uses, while consumers have
limited insight of into the integrity of producers and data and whether they can be trusted.
Therefore, information about the quality of available geographic data is vital to the
process of selecting information in that the value of data to a consumer is directly
related to its quality. This project investigated a data quality ontology for
geospatial data, and its mapping to the Content Maturity Model proposed by NGA.
-
EarthCube Building Blocks: Collaborative Open Source Software Sharing for the Geosciences.
National Science Foundation (NSF).
Grant number ICER-1440323.
September 2014 - August 2018.
Yolanda Gil (PI), Christopher Duffy (co-PI), Chris Mattmann (co-PI), Scott Peckham (co-PI), Erin Robinson (co-PI).
Geosciences software embodies crucial scientific knowledge, and as such it should be explicitly
captured, curated, managed, and disseminated. The goal of this project is to create a system
for software stewardship in geosciences that will empower scientists to manage their software
as valuable scientific assets. Scientific software stewardship requires a combination of
cyberinfrastructure, social infrastructure, and professional development infrastructure.
The framework will result in an open transparent and broader access to scientific software
to other scientists, software professionals, students, and decision makers. It will
significantly improve the adoption of open data and open software initiatives, improve
reproducibility, and advance scientific scholarship.
-
The Age of Water and Carbon in Hydroecological Systems: A New Paradigm for Science Innovation and Collaboration through Organic Team Science.
National Science Foundation (NSF).
Grant number IIS-1344272.
October 2013 - September 2017.
Yolanda Gil (PI), Christopher Duffy (co-PI), Paul Hanson (co-PI).
The project will develop a new socio-technical framework
for "organic team science" in which
scientists are motivated to collaborate across diverse scientific communities
and to share and normalize data to solve scientific problems through an open
framework.
This project will develop new scientific work practices and associated
cyberinfrastructure to advance the fields of hydrology and limnology (lake ecology).
The project will advance hydrology by making already-collected
geospatial data more usable for analysis and simulations. It will advance limnology
by developing an integrated hydrodynamic model of lakes as connected to the
broader hydrologic network to quantify water, material, nutrient and energy fluxes,
which is potentially transformative for limnology. It will also advance
socio-technical research in the context of distributed scientific collaboration.
-
Learning Big Data Analytic Skills through Scientific Workflows.
National Science Foundation (NSF).
Grant number ACI-1355475.
September 2013 - August 2017.
Yolanda Gil (PI).
Big data analytics has emerged as a widely desirable skill in many areas. Although
courses are now available on a variety of aspects of big data, there is a lack of a
broad and accessible course that covers the variety of topics that concern big data analytics.
As a result, acquiring practical data analytics skills is out of reach
for many students and professionals, posing severe limitations to our ability as a
society to take advantage of our vast digital data resources. The goal of this
work is to develop curriculum materials for big data analytics to provide broad
and practical training in data analytics in the context of real-world and
science-grade datasets and data analytics methods. A key technical basis of the
approach is the use of workflows that capture expert analytic methods that will
be presented to users for practice with real-world datasets within pre-defined
lesson units. The results of this work include lesson units for learning
expert-level skills in big data analytics, a framework for non-programmers to
understand basic concepts in big data analytics, and a hands-on workflow
framework to learn by direct experimentation and exploration with scientific data.
-
Provenance Tracking for Integrated Geospatial Data.
Open Geospatial Consortium (OGC).
September 2013 - August 2014.
Yolanda Gil (PI).
Tracking the provenance of geospatial information is important in order to
understand how to trust and use the information based on what sources generated it
and the processes used to integrate it. This project will analyze the use of the
recent W3C PROV standard in the context of geospatial information integration,
particularly to study scalability, granularity, and presentation of provenance.
-
EarthCube Building Blocks: Software Stewardship for the Geosciences.
National Science Foundation (NSF).
Grant number ICER-1343800.
September 2013 - February 2016.
Yolanda Gil (PI), Christopher Duffy (co-PI), Chris Mattmann (co-PI), Scott Peckham (co-PI), Erin Robinson (co-PI).
Geoscience and environmental science software is crucial for data analysis to
generate new knowledge and understanding about the Earth. Because reproducibility
of operations, calculations, and predictions done with this software is important
for science, commercial, and regulatory applications, it is important that the
software generated by geoscientists and their colleagues be captured, curated,
managed, and shared. The GeoSoft project brings together computer scientists,
geoscientists, and social scientists to assist scientists to describe basic characteristics of their code and share it. GeoSoft will be a social site
where scientists can discover alternative approaches to release free software,
use intelligent interfaces to explain how their software works, and form
productive communities around software projects. This research has the
potential to fundamentally transform geosciences by making scientific software
readily available to researchers and citizen scientists for efficient data
analysis.
-
A Scalable Open Source Platform for Data Processing, Archiving and Dissemination.
Defense Advanced Research Projects Agency (DARPA).
Grant number FA8750-13-C-0016, subcontract to MDA.
June 2011 - March 2015.
Yolanda Gil (co-PI), Chris Mattmann (co-PI), Sam Park (PI).
End users have a lot of data, but do not have the expertise needed to analyze it.
The goal of this project is to empower end users to analyze big data by demonstrating
that: 1) data analytics experts can use open source software to quickly assemble
workflows, 2) end users can easily run these expert-grade workflows and get useful
views on their data. Our work combines semantic workflow capabilities of the WINGS
workflow system with scalable data systems and workflow execution infrastructure
available in the OODT framework. Our work includes a release of the integrated
system as open source software within the Apache OODT project.
-
Workflows for the Regional Climate Model Evaluation System (RCMES).
National Aeronautics and Space Administration (NASA).
Subcontract to JPL.
January 2013 - June 2013.
ISI PI: Yolanda Gil. PI: Chris Mattmann.
The Regional Climate Model Evaluation System (RCMES) enables climate scientists to compare and evaluate regional climate data. The system supports easy access to shared databases such as NASA data sources, and enables scientists to compare climate model predictions with those observations. The goal of this project is to extend RCMES to include workflows for climate model comparison and evaluation, tracking provenance and enabling reproducibility.
-
An Analytical Framework for Provenance-Rich Social Knowledge Collection.
National Science Foundation (NSF).
Grant number IIS-1117281.
September 2011 - August 2014.
Yolanda Gil (PI).
This project will investigate a new generation of provenance-rich social knowledge collection
systems that will greatly improve the ability of people to create online communities of interest
and share information. The research will transform the state of the art in social content
collection in several important ways. First, social knowledge collection systems will be
augmented to support contributors to structure factual content, so that information can be
aggregated to answer reasonably interesting albeit simple factual queries. We will build on
a semantic wiki framework to allow users to create structured factual content as
object-property-value triples. It will not assume pre-defined ontologies, but rather
develop algorithms that analyze current content and suggest opportunities for structuring
contributions so they can be aggregated to answer simple queries. Second, they will include
detailed provenance records that reflect how the content was created, allowing contributors
to enter alternative viewpoints and enabling consumers to make quality and trust judgments.
The research will include developing algorithms that derive trust metrics from the provenance
records, and to allow users to define views on the content based on provenance criteria.
It will create novel approaches to propagate trust across content topics and categories
and complement existing algorithms that propagate trust in social networks. Third, the
systems will proactively guide contributors to invest effort where it is most needed,
developing novel algorithms to detect knowledge gaps, and by allowing users to define
queries that will be used to drive further contributions.
-
An EarthCube Community Group and Roadmap for Workflows for Geosciences.
National Science Foundation (NSF).
Grant number EAR-1238216.
April 2012 - March 2013.
Yolanda Gil (PI), Aaron Braekel (co-PI), Ewa Deelman (co-PI), Ibrahim Demir (co-PI),
Christopher J. Duffy (co-PI), Suresh Marru (co-PI), Marlon Pierce (co-PI).
The goal of this project is to elicit requirements for workflows in geosciences, assess the state of the art and current practices, identify current gaps in both the use of and capabilities of current workflow systems in the earth sciences through use case studies, and identify grand challenges for the next decade along with the possible paths to addressing those challenges. This effort is part of the NSF EarthCube initiative.
-
Discovery Informatics.
National Science Foundation (NSF).
Grant number IIS-1151951.
September 2011 - August 2012.
Yolanda Gil (PI).
In order to address the ambitious research agenda put forward by many science disciplines,
many challenges must be addressed in the areas of information sciences, intelligent systems,
and human-computer interaction. Data modeling and integration still require large investments
of scientist time and effort. The scientific literature grows so quickly in many areas that
it becomes unmanageable for scientists. Many aspects of the scientific discovery process are
often largely manual and could be automated, improved, or made more efficient. Better
interfaces for collaboration, visualization, and understanding would significantly improve
scientific practice. The goal of this project is to produce a report outlining the
opportunities that scientific discoveries present to information sciences and intelligent
systems as a new area of research called discovery informatics.
-
Workflow-Net: Cybersecurity through Nimble Task Allocation:
Workflow Reasoning for Mission-Centered Network Models.
Air Force Office of Scientific Research (AFOSR).
Grant number FA9550-11-1-0104.
June 2011 - March 2015.
Yolanda Gil (PI).
Traditional cybersecurity has focused on techniques to analyze and eliminate vulnerabilities
in a network, often in response to actual security breaches of previously unknown weaknesses.
Recognizing that in practice network operations can never be fully secure, a major focus of
recent research is on intrusions that are assumed to be on-going in the network by one or
more malicious parties. In this new view on cybersecurity, a key desired capability is to
be able to accomplish a mission even while the network is compromised and subject to
deception. However, traditional network models lack a representation of the mission and
of how network resources are utilized to accomplish various aspects of the mission.
In this project, we will investigate a new approach to develop a general framework for
representing models of mission goals and tasks, and to exploit those models to make a
mission more robust to deception operations co-occurring in the network. These
mission-centered network models (MCNMs) will build on and extend current two-layered
(logical/physical) network models by integrating a new layer of task-level representations
of the mission into those models. In this new task-oriented layer, a mission can be
characterized as a set of goals, each accomplished by a set of interdependent tasks
that place requirements on the network resources. The system can then dynamically
control the mappings of those tasks onto network resources using a variety of algorithms
that take into account which resources are currently compromised. As a result, a mission
can be protected from ongoing intrusion and deception activities by dynamically reallocating
resources as they become compromised and by examining provenance records of task outcomes to
determine their reliance on compromised resources. MCNMs can be used to determine which
resources are critical for any given mission, to prioritize the use of uncompromised resources,
to accomplish and estimate the trust on mission tasks when resources are compromised, and to
determine the practical impact on the mission of deception activities. MCNMs will enable a
new approach to cybersecurity in network-based operations.
-
W-SHARING:
Towards Shared Repositories of Computational Workflows
National Science Foundation (NSF).
Grant number IIS-0948429.
September 2009 - August 2011.
Yolanda Gil (PI).
Scientific computing has entered a new era of scale and sharing with
the arrival of cyberinfrastructure for computational experimentation.
A key emerging concept is scientific workflows, which provide a
declarative representation of scientific applications as complex
compositions of software components and the dataflow among them.
Workflow systems manage their execution in distributed resources,
track provenance of analysis products, and enable rapid
reproducibility of results. In current cyberinfrastructure, there are
well understood mechanisms for sharing data, instruments, and
computing resources. This is not the case for sharing workflows,
though there is an emerging movement for sharing analysis processes in
the scientific community. In this grant, we are investigating
computational mechanisms for sharing workflows as a key missing
element of cyberinfrastructure for scientific research. We are
exploring three major research topics. First, we are eliciting new
requirements that workflow sharing poses over current techniques to
share software tools and libraries. Second, we want to understand how
shared workflow catalogs should be designed. Existing shared data
catalogs are a successful model, but software artifacts require
different representations and access functions. Finally, we are
studying what sharing paradigms might be appropriate for scientific
communities, exploring environments ranging from traditional
server-based architectures to wikis to Web 2.0 social sites.
-
PedWorkflow: Workflows for Assessing Student Learning
National Science Foundation (NSF).
Grant number IIS-0917328.
September 2009 - August 2011.
Jihie Kim (PI), Gisele Ragusa (co-PI), Erin Shaw (co-PI), and Yolanda Gil (co-PI).
As on-line learning becomes more popular and is increasingly
integrated in engineering courses, instructors
become overwhelmed with the amount of information
that they have to process.
For example,
discussion boards support collaborative interaction and reflective
problem solving, but instructors need to monitor the student
discussions
in order to adress questions and
corrections as well as for grading student participation.
The goal of this project is to create a novel workflow environment
to support efficient assessment of student learning through the design
and composition of assessment workflows. The workflows will support
data analysis and will be re-usable across curricula and instructors.
-
Designing Scientific Software One Workflow at a Time
National Science Foundation (NSF).
Grant number CCF-0725332.
October 2007 - September 2011.
Ewa Deelman (PI) and Yolanda Gil (co-PI).
Much of science today relies on software to make new discoveries.
This software embodies scientific analyses that are frequently
composed of several application components and created collaboratively
by different researchers. Computational workflows have recently emerged as
a paradigm to manage these large-scale and large-scope scientific analyses.
Workflows represent computations that are often executed in geographically
distributed settings, their interdependencies, their requirements and their
data products. The design of these workflows is at the core of today's
scientific discovery processes and must be treated as scientific products
in their own right. The focus of this research is to develop the foundations
for a science of design of scientific processes embodied in the new artifact
that is the computational workflow. The work will integrate best practices
and lessons learned in existing workflow applications, and extend them in
order to define and formalize design principles of computational workflows.
This work will result in a fundamentally new approach to designing workflows
that will greatly improve the scientific software design methodology by
defining and formalizing design principles, and by familiarizing the
scientific community with these effective workflow design processes.
-
Plato: Phased-Learning through Analyzing Teaching and Observation
DARPA Bootstrapped Learning (BL) program.
Grant number HR0011-07-C-0060, subcontract to SRI International.
August 2007-July 2011.
ISI co-PIs: Paul Cohen and Yolanda Gil.
The goal of this project is to develop an electronic student
that can learn from a teacher using different methods of natural instruction.
We will contribute the strategies to learn from being told by the teacher
a broad range of generalities about
process knowledge. These general descriptions will be tested by the learner with examples
and practice of those processes. We will use Interdependency Models to relate the individual
teacher instructions, check the consistency with the student's prior knowledge,
and detect gaps in the stated instruction that could be filled through practice.
-
Windward: Scalable Knowledge Discovery Through Grid Workflows
Air Force Research Laboratory (AFRL).
Grant number FA8750-06-C-0210.
September 2006 - December 2008.
Yolanda Gil (PI). ISI co-PIs: Paul Cohen and Ewa Deelman.
Distributed workflows are emerging as a key technology to conduct large-scale
and large-scope scientific applications in earthquake science, physics,
astronomy, and many other sciences. In this new project, we will investigate
the use of workflow technologies for Artificial Intelligence applications with
a particular focus on data analysis and knowledge discovery tasks. Based on
the data to be analyzed, an initial workflow template is formed by selecting
from a library of known-to-work compositions of general-purpose machine
learning algorithms. The workflow template is specialized through knowledge-based
selection and configuration of algorithms. Finally, the workflow is mapped to
available resources and restructured to improve execution time. Data analysis
and knowledge discovery applications will benefit from the automation, scale,
and distributed data and resource integration supported by distributed workflow
systems. We will also conduct new research in important aspects of workflow
systems. To what extend can we represent complex algorithms and their subtle
differences so that they can be automatically selected and configured to
satisfy the stated application requirements? Can we develop learning
techniques that improve the performance of the workflow system by exploiting
an episodic memory of prior workflow executions? What mechanisms will be
needed to support autonomous and robust execution of concurrent workflows
over continuously changing data?.
-
Challenges of Scientific Workflows
National Science Foundation (NSF).
Grant number IIS-0629361.
May 2006 - October 2007.
Yolanda Gil (PI) and Ewa Deelman (co-PI).
In recent years, workflows have emerged as a paradigm for conducting large-scale
scientific analyses. The structure of a workflow specifies what analysis
routines need to be executed, the data flow amongst them, and relevant
execution details. Workflows provide a systematic way to capture scientific
methodology and provide provenance information for their results. Robust and
flexible workflow creation, mapping, and execution are largely open research
problems. The aim of this project was to bring
together IT researchers and practitioners working on a variety of aspects
of workflow management as well as domain scientists that use workflows for
day-to-day data analysis and simulation. The project will produce
a final report with recommendations to the community regarding the
challenges of scientific workflows and their role in cyber infrastructure
planning for 21st century science and engineering research and education.
-
C4ML: Metareasoning for Integrated Learning
DARPA Integrated Learning (IL) program.
Grant number FA8650-06-C-7606, subcontract to BBN Technologies.
May 2006 - July 2008.
ISI co-PIs: Paul Cohen and Yolanda Gil.
In this project we will develop a learning metareasoner to coordinate
the activities of many learners in an integrated system that learns
procedural knowledge from user demonstrations and past knowledge.
A learning metareasoner is a problem solver that has explicit representations
of its current learning state, learning goals, and has metareasoning methods to
accomplish those goals. The learning metareasoner will assess its progress
based on four criteria: capability, confidence, coverage, and competence (C4).
-
Intelligent Optimization of Parallel and Distributed Applications
National Science Foundation (NSF).
Grant number CSR-0615412.
August 2006 � September 2009.
Principal Investigators: Mary Hall (PI), Kristina Lerman (co-PI),
Ewa Deelman (co-PI), Aichiro Nakano (co-PI), Joel Saltz (co-PI).
ISI co-PIs: Yolanda Gil.
This project will develop a domain-specific programming system supporting
Petascale application optimization of molecular dynamics simulation, in which
applications will be viewed as workflows consisting of composable components to
be mapped to a diversity of machine resources. The application components will
be viewed as dynamically adaptive algorithms for which there exist a set of
variants and parameters that can be chosen to develop an optimized implementation.
A variant describes a distinct implementation of a code segment, perhaps even a
different algorithm. A paramater is an unbound variable that affects application
performance. By encoding an application in this way, we can capture a large set
of possible application mappings with a very compact representation. Because
the space of mappings is prohibitively large, the system captures and utilizes
domain knowledge from the domain scientists and designers of the compiler,
run-time and performance models to prune most of the possible implementation.
Knowledge representation and machine learning techniques utilize this domain
knowledge and past experience to navigate the search space efficiently.
Incorporating cognitive search techniques and taking advantage of parallel
resources, these alternative implementations are searched automatically by
tools to find a high-quality implementation.
-
MathTrust: Mathematical Analysis of Trust and Deception.
Air Force Office of Scientific Research (AFOSR).
Grant number FA9550-06-1-0031.
December 2005 - May 2009.
Yolanda Gil (PI).
Information systems such as the Web often include open information sources that have very varying quality, and may be subject to deception. The information is often of unknown origins and there is often no prior history with many of the sources that may be used to assess their reputation. This project will investigate how to represent, learn, and characterize the reputation and reliability of sources in an information system that collects from users theit individual trust ratings and derives over time their collective consensus trust. This project will analyze the factors that affect trust in sources, study how to capture user feedback, and develop algorithms to derive source reputation. Based on this model of trust, a mathematical analysis of source trust and deception will relate formally a number of salient factors that influence trust in information systems.
-
Intelligent Design and Optimization of Parallel and Distributed Applications
National Science Foundation (NSF) Computer Science Research program.
Grant number CNS-0509517.
July 2005 - December 2006.
Mary Hall (PI), Kristina Lerman (co-PI), Ewa Deelman (co-PI), Aichiro Nakano (co-PI), Joel Saltz (co-PI).
This project will explore automatic mapping of applications to parallel systems consisting of tens of thousands of processors. This project includes expert domain scientists in molecular dynamics simulation and phylogenetics, who have been developing scalable algorithms that can handle large irregular data sets using high-end computing platforms. Many of these algorithms are hand-tuned and optimized for particular target architectures. The goal of the project is to develop expressive representations of optimization parameters, appropriate learning techniques for exploring the combinatoric optimization space, and automated mapping techniques for performance optimization.
-
Towards Cognitive Grids: Knowledge-Rich Grid Services for Autonomous
Workflow Refinement and Robust Execution.
National Science Foundation (NSF) Shared Cyberinfrastructure program.
Grant number SCI-0455361.
December 2004 - November 2006.
Ewa Deelman (PI), Yolanda Gil (co-PI).
This research combines Artificial Intelligence and Distributed Computing
techniques to create knowledge-rich workflow services that can support the
execution of large-scale scientific workflows. The main foundation will be
provided by expressive formal representations of the application workflow
and of the execution environment. These representations will support
resource selection that will enhance application performance, resource
reservation based on anticipated workflow needs, workflow repair
capabilities in case of failures or in case of new resources coming on line.
-
CALO-KA: Interactive Acquisition of User Advice in a Cognitive Assistant that Learns and Organizes (CALO)
DARPA Personalized Assistant that Learns (PAL) program.
Grant number NBCHD030010, subcontract to SRI International.
May 2003 - May 2008.
USC/ISI co-PIs: Yolanda Gil, Jerry Hobbs, Craig Knoblock.
This work is part of a very large integrated effort involving more than twenty universities and other research institutions throughout the US in order to develop personalized assistants that learn.
The goal of our research is to develop novel
techniques to assist users in specify new knowledge for an automated assistant
that learns to improve its performance over time.
This requires new
research on acquiring advice through natural language interaction,
operationalizing user advice into procedural and task knowledge,
dialogue management to ask follow up questions about the
implications and potential side effects of the user advice,
generalizing user advice based on past experience,
and expanding the set of terms known to the system when
confronted with an unexpected situation.
This work also involves designing a meta-reasoning architecture
that includes a memory structure to index past experiences,
reasoning about the implications of changes to the system's
current knowledge, anticipating user requests and
potential failures and opportunities, and self-motivated
learning goals to prompt focused knowledge acquisition.
-
Just-In-caSe just-in-Time Information Analysis
Grant number N66001-03-C-8006.
December 2002 - December 2005.
PI: Yolanda Gil.
The goal of this research is to develop a Web-based environment
for information analysis that provides an emerging self-organization
of knowledge
through the use of natural language and machine learning techniques, including
topic detection and similarity-based
clustering.
The system will exploit this emerging
organization to support users to work in new topics, to debate
alternative hypotheses, and to locate trustworthy open web sources.
-
SCEC/IT: An Information Infrastructure for System-Level Earthquake Research.
National Science Foundation (NSF), ITR Large Grant.
Grant number EAR-0122464.
September 2001 - September 2006.
PI: Thomas Jordan.
USC/ISI co-PIs: Carl Kesselman, Yolanda Gil, Hans Chalupsky.
UCSD co-PI: Jean Bernard Minster.
SDSC co-PI: Reagan Moore.
This NSF Large ITR project is a collaboration with members of the Southern California Earthquake Center (SCEC) and funds a variety of information technologies to support earthquake research,
including computational grid, digital libraries, ontologies and knowledge representation, planning, and interactive acquisition.
-
TRELLIS: Capturing and Exploiting Semantic Relationships
for Information and Knowledge Management.
Air Force Office of Scientific Research (AFOSR).
Award Number F49620-00-1-0337.
August 2000 - November 2003.
Yolanda Gil (PI).
TRELLIS is an interactive environment that will allow
users add their observations,
opinions, and conclusions as they analyze information by making semantic
annotations to documents and other on-line resources. This is in essence
a knowledge acquisition problem, where the
user is adding new knowledge to the system based on their expertise as
they analyze information.
-
TEMPLE: Template Enhancement through Knowledge Acquisition.
DARPA
Active Templates (AcT) program.
Award Number F30602-00-2-0513.
April 2000 - April 2003.
Yolanda Gil (PI).
The proposed work will develop an acquisition interface
for planning knowledge that relies on script-based wizards
to guide users in adding planning constraints and preferences.
-
PHOSPHORUS: A Knowledge and Experience-Based Agent
Capabilities Matcher.
DARPA
Control of Agent-Based Systems (CoABS) program.
Award Number F30602-97-C-0068.
June 1999 - December 2002.
Yolanda Gil (co-PI) and Robert MacGregor (co-PI).
We are developing Phosphorus, a knowledge-based matcher that accepts
a user's description of a needed service as input and responds with a
ranked list of agents that have the capability to provide that service.
The Phosphorus matcher will exploit subsumption,
goal reformulation, and partial match. The
matcher will also be experienced-based, using learning techniques to
improve the utility of its matches over time.
-
KASPER: Knowledge Acquisition for Solving Problems.
DARPA Rapid Knowledge Formation (RKF) program.
Subcontract to SRI, Award Number N66001-00-C-8018.
April 2000 - May 2003.
Yolanda Gil (PI).
This project will develop, in an integration
effort with other
research groups, tools to enable domain experts
to extend knowledge bases by using natural language interfaces,
commonsense reasoning, and analogy-based reasoning.
Our group's contributions
focus on tools to formulate follow-up
questions when users have not provided
sufficient knowledge, and on the acquisition
of problem solving and process knowledge.
The integrated system will be tested by two challenge
problems designed by DARPA. One problem is based on
how our system will acquire graduate level knowledge
of biology from a textbook and then answer the questions
at the end of the chapter. Another challenge
problem will require developing
expert-level knowledge-based
techniques for advanced genome annotation and
exploitation for pathogen countermeasures.
-
KAMM: Knowledge Acquisition for Objective Grammars in MasterMind Editor.
Air Force Research Laboratory's Joint Defense Planner (JDP) program.
Award Number F30602-97-C-0118.
April 2000 - September 2000.
Yolanda Gil (co-PI) and Pedro Szekely (co-PI).
This project is developing an editor that allows users to
change and extend an initial grammar of objectives.
This editor is integrated with the MasterMind objectives editor and
includes acquisition wizards developed with EXPECT knowledge acquisition
techniques.
The editor was delivered in October 2000, and is now being integrated
into the Global Command and Control System (GCCS) and is
expected to be delivered to Air Operation Centers around the world
by August 2001.
-
SHERPA: Knowledge Acquisition for Large Knowledge Bases -
Integrating Problem-Solving Methods and Ontologies into Applications.
DARPA
High Performance Knowledge Bases (HPKB) program.
Award Number F30602-97-1-0195.
April 1997 - December 2000.
Bill Swartout (co-PI) and Yolanda Gil (co-PI).
This work extends the ISI EXPECT architecture to include
several novel approaches including the derivation and use of knowledge
Interdependency Models, script-based knowledge acquisition, the
integration of natural language techniques in knowledge acquisition
tools, and the use of background knowledge to guide users in adding new
knowledge to a system. We participated in the HPKB annual Challenge
Problems, as well as in the Knowledge Acquisition Critical Component
Experiment held at the Army Battle Command Battle Lab in Ft Leavenworth,
KS, in August 1999. Several Army officers successfully
used EXPECT's knowledge
acquisition tools to extend the knowledge base.
-
INSPECT-II: An Air Campaign Planning Evaluation Aid.
DARPA Joint Forces Air Component Commander (JFACC) program.
Award Number F30602-97-C-0118.
April 1997 - September 2000.
Yolanda Gil (co-PI) and Bill Swartout (co-PI).
We extended the INSPECT air campaign
plan critiquing tool that we had previously developed.
Current funding is supporting thechnology transition
to the Joint Defense
Planner, which is on the path to becoming an integral part of TBMCS.
INSPECT was originally developed under the
DARPA Rome Laboratory Planning Initiative,
and was demonstrated at the first
US Air Force Expeditionary Force Experiment (EFX-98).
-
ROSETTA: Ontology-Based Agent Communication.
DARPA
Information Systems Office (ISO) Technology Integration
Experiment program.
Award Number F30602-97-1-0195.
July 1999 - July 2000.
Yolanda Gil (co-PI) and Robert MacGregor (co-PI).
The purpose of DARPA ISO Technology Integration Experiments is to
investigate high-payoff links across DARPA ISO programs.
Rosetta is a prototype message translation system that
supports communication between heterogeneous agents using ontology
merging technology and exploiting ontologies
developed under the HPKB program
to address inter-agent
communication issues of central relevance to CoABS.
-
EXPECT-II: A User-Centered Environment for the Development and
Adaptation of Knowledge-Based Planning Aids.
DARPA / Rome Planning Initiative (ARPI).
Award Number DABT 63-95-C-0059.
May 1995 - May 1999.
Yolanda Gil (co-PI) and Bill Swartout (co-PI).
EXPECT is a user-centered environment to
develop and maintain knowledge bases.
It includes knowledge acquisition tools,
problem solving and reasoning modules, and
a facility to generate natural language paraphrases
of its knowledge. EXPECT was used to develop
plan evaluation and critiquing systems
for logistics planning and for air campaign planning.
We participated in the Fourth Integrated
Feasibility Demonstration (IFD-4) of the Planning
Initiative, held at US Air Force Air Combat Command in June 1996
and in the Multi-Agent Planning, Visualization, and Simulation (MAPViS)
integrated demonstration in 1998.