Knoesis hosts many foreign students and faculty usually
during the summer. Details of some of our past visitors follow
SemRep is a symbolic natural language processing system that identifies semantic predications in biomedical text. For example, "Acetylcholine STIMULATES Nitric Oxide" is extracted from the sentence "In humans, ACh evoked a dose-dependent increase of NO levels in exhaled air." The system is linguistically based and depends on domain knowledge in the Unified Medical Language System. Underspecified interpretation for a range of syntactic structures is provided, rather than detailed representation for a limited number of phenomena. Thirty core predications in clinical medicine, genetic etiology of disease, pharmacogenomics, and molecular biology are retrieved. Several evaluations report precision around 75% and recall near 65% (both lower for molecular biology). SemRep predications have been exploited for text mining applications in genetic etiology of disease, automatic summarization, literature-based discovery, and enhanced information retrieval.
Thomas C. Rindflesch has a Ph.D. in linguistics from the University of Minnesota and leads the Semantic Knowledge Representation project at
Ying Ding (Assistant Professor of Information Science) and colleagues from School of Library & Information Science, Indiana Universityi
When: November 6, 2009
Venue: 292 Joshi (Brandeberry Conference Room)
Title: chem2bio2rdf: Semantic Systems Chemical Biology
In this talk, we describe the use Semantic Web technologies including RDF and OWL for the integration of chemical, biological and genomic information within the context of Systems Chemical Biology. We describe how two existing resources, Bio2RDF and Linking Open Drug Data (LODD) can be integrated with chemistry-oriented networks to create large-scale systems chemical biology networks that allow links between compounds, protein targets, genes and diseases to be established. In this work, we describe the generation of this Chem2Bio2RDF network and how it can be analyzed in a variety of ways including the use of Semantic Lenses.
Title: Social tagging networks: Cohesiveness and Dynamics
This talk proposes an approach to studying the structure and dynamics of large cohesive groups of tags in online social networks. Given a tag co-occurrence graph defined over a particular time span, the cohesive subgroups of tags are modeled using the graph theoretic concept of a k-plex, which was originally introduced in the social network analysis literature. This model can be thought of as a relaxed, more practical version of the popular clique model that is obtained by allowing a predetermined (and typically small) number k of non-neighbors for each vertex within the group. Intuitively, a maximum k-plex in the graph should be related to one of the most popular topics discussed in the network, and the size of a maximum k-plex can serve as a reasonable global measure of cohesiveness of the network. Moreover, study of the structure and dynamics of changes in the maximum k-plex of the tag co-occurrence graph of the same social network over time can be used to deduce some interesting information about the underlying social network. We illustrate the proposed method on a large set of dynamic data extracted from Delicious social bookmarking community.
Title: Weighted PageRank for heterogeneous scholarly networks
Large scale weighted PageRank can be calculated for heterogeneous citation network, author-citation networks and journal citation networks. Weights are considered as citation time, self-citation, journal impact factors. Weighted PR ranks have been compared with normal citation rank. This method can be easily extended to other heterogeneous networks. Potential interesting issues are dangling nodes, heuristic parameter settings.
Dr. Ying Ding is an Assistant Professor in School of Library and Information Science, Indiana University. Before she worked as a senior researcher at the University of Innsbruck, Austria and as a researcher at the Division of Mathematics and Computer Science at the Free University of Amsterdam, the Netherlands. She completed her Ph.D. in School of Applied Science, Nanyang Technological University, Singapore. She has been involved in various European-Union funded projects: research-oriented EU projects (EASAIER, OntoKnowledge, IBROW, SWWS, COG, Htechsight, Esperonto, SEKT, DIP, Triple Space Computing), thematic network (Ontoweb, knowledgeweb), and Accompanied Measurements (Multiple). She is very active in many consultancy projects between University and companies. She has published more than 70 papers in journals, conferences and workshops. She is Program Committee Member for more than 80 international conferences and workshops. She is co-author of the book "Intelligent Information Integration in B2B Electronic Commerce" published by Kluwer Academic Publishers. She is also co-author of book chapters in the book "Spinning the Semantic Web" published by MIT Press and "Towards the Semantic Web: Ontology-driven Knowledge Management" published by Wiley. Her current interest areas include Webometrics, Semantic Web, citaiton analysis, information retrieval, knowledge management and application of Web Technology.
Prof. Bhavani Thuraisingham and Prof. Latifur Khan from The University of Texas at Dallas
When: November 13, 2009
Venue: 292 Joshi (Brandeberry Conference Room)
Title: Semantic Web Research at the University of Texas at Dallas
We are conducting research on several topics related to semantic web at the University of Texas at Dallas. These include ontology alignment, Managing large RDF (Resource Description Framework) graphs, RDF query management, cloud computing and social networks with semantic web, policy management and formal methods for the semantic web. In this presentation we present two topics: ontology alignment and managing and querying large RDF graph.
Ontology alignment determines the semantic heterogeneity between two or more domain specifications by considering their associated concepts. Our approach considers name, structural and content matching techniques for aligning ontologies. Together with UMN, we justify the conceptual validity of our ontology alignment technique with a series of experimental results that demonstrate the efficacy and utility of our algorithms on a wide-variety of authentic GIS data including multi-jurisdictions.
The second part of the presentation deals with scalable storage and retrieval of large RDF graph. Currently available semantic web frameworks do not work well for this retrieval task. In this talk, we describe a framework that we built using Hadoop to store and retrieve large number of RDF triples. We describe a scheme to store RDF data in Hadoop Distributed File System. We also describe our algorithms to generate the best possible query plan to answer a SPARQL (SPARQL Protocol and RDF Query Language) query based on a cost model. We use Hadoop's MapReduce framework to actually answer the queries. Our results show that we can store large RDF graphs in Hadoop clusters built with cheap commodity class hardware. We conclude that our framework is scalable and efficient and can handle large amounts of RDF data.
Acknowledgements: Our research on semantic web is supported by the National Science Foundation, the Intelligence Advanced Research Projects Activity, the Air Force Office of Scientific Research and the National Geospatial Intelligence Agency.
Biographies: Bhavani Thuraisingham is a Professor of Computer Science and Director of the Cyber Security Research Center in the Erik Jonsson School of Engineering and Computer Science at the University of Texas at Dallas (UTD) since October 2004. Dr. Thuraisingham teaches courses in Data Security and Semantic Web, and her research is sponsored by NSF, AFOSR, IARPA, NGA, NASA and Raytheon among others. Prior to joining UTD, Dr. Thuraisingham worked for the MITRE Corporation for 16 years which included an IPA (Intergovernmental Personnel Act) at the National Science Foundation as Program Director for Data and Applications Security. At MITRE she was a Department Head in Data and Information Management, and established research programs with AFRL, CECOM, SPAWAR, NSA and CIA. Prior to joining MITRE in January 1983, she worked in the Commercial Industry for six years first at the Control Data Corporation and later at Honeywell Inc. She has also worked as adjunct professor of computer science first at the University of Minnesota and later at Boston University. She has been an instructor at AFCEA since 1998. Dr. Thuraisingham was educated in the United Kingdom both at the University of Bristol and at the University of Wales.
Professor Thuraisingham is an elected Fellow of three professional organizations: the IEEE (Institute for Electrical and Electronics Engineers), the AAAS (American Association for the Advancement of Science) and the BCS (British Computer Society) for her work in data security. She received the IEEE Computer Society’s prestigious 1997 Technical Achievement Award for “Outstanding and Innovative contributions to secure data management. Dr. Thuraisingham received her education in the United Kingdom at the University of Bristol and the University of Wales. She was quoted by Silicon India Magazine as one of the top seven technology innovators of South Asian Origin in the USA in 2002.
Prior to joining UTD, Dr. Thuraisingham was an IPA (Intergovernmental Personnel Act) at the National Science Foundation (NSF) in Arlington VA, from the MITRE Corporation. At NSF, she established the Data and Applications Security Program, co-founded the Cyber Trust theme and was involved in inter-agency activities in data mining for counter-terrorism. She worked at MITRE in Bedford, MA between January 1989 and September 2001 first in the Information Security Center and was later a department head in Data and Information Management as well as Chief Scientist in Data Management in the Intelligence and Air Force centers. She has served as an expert consultant in information security and data management to the Department of Defense, the Department of Treasury and the Intelligence Community for over 10 years. Thuraisingham’s industry experience includes six years of research and development at Control Data Corp. and Honeywell Inc. in Minneapolis, MN. While she was in Industry and MITRE, she was an adjunct professor of computer science and member of the graduate faculty first at the University of Minnesota and later at Boston University between 1984 and 2001. She also worked as visiting professor soon after her PhD first at the New Mexico Institute of Technology and later at the University of Minnesota between 1980 and 1983.
Dr. Thuraisingham’s work in Information Security and Data Mining has resulted in over 90 journal articles, over 200 refereed conference papers, over 70 keynote addresses, and three US patents. She is the author of nine books in data management, data mining and data security including one on data mining for counter-terrorism and another on Database and Applications Security and is completing her tenth book on SEcure Service Oriented Information Systems. Dr. Thuraisingham has been invited to speak on data mining for security applications at the United Nations and at the White House Office of Science and Technology Policy and has also participated in panels at the National Academy of Sciences and the Air Force Scientific Advisory Board. She is the President of Bhavani Security Consulting, and supports the Department of Treasury on Software Research Credit. She serves (or has served) on editorial boards of leading research and industry journals including several IEEE and ACM Transactions and served as the Editor in Chief of Computer Standards and Interfaces Journal. She is also an Instructor at AFCEA’s (Armed Forces Communications and Electronics Association) Professional Development Center since 1998 and has served on panels for the Air Force Scientific Advisory Board and the National Academy of Sciences.
During her nearly five years at UTD, Dr. Thuraisingham has established and lead a strong research program in Intelligence and Security Informatics which now includes 4 core professors, and the team has generated over $9m in research funding from agencies such as NSF, AFOSR, IARPA, NGA, NASA and ONR as well as corporations such as Raytheon Inc. The research projects include an NSF Career Grant, an AFOSR Young Investigator Program Award and a DoD MURI Award. Her current focus includes two activities: i) studying how terrorists and hackers function so that effective and improved solutions can be provided and ii) transferring the technologies developed at the university to commercial development efforts.
Dr. Thuraisingham promotes Math and Science to high school students as well as to women and underrepresented minorities and has given featured addresses at conferences sponsored by WITI and SWE. Articles on her efforts as well as her vision have appeared in multiple magazines including the Dallas Morning News, The D Magazine, The MITRE Matters and the DFW Metropolis Technology Magazine. She has also appeared in DFW Television speaking on cyber security related topics.
Latifur Khan is an Associate Professor of Computer Science at the University of Texas at Dallas and joined the university in 2000 after completing his PhD at the University of Southern California on Ontology Management under Prof. Dennis McLeod. His research interests are in Data Mining for Cyber Security, Semantic Web and Geospatial information management and research is funded by NSF, AFOSR, IARPA, NGA, NASA, Raytheon, Nokia and Cisco. He has published papers in VLDB Journal and several IEEE Transactions as well as in conferences such as ICDM, ACM Multimedia and ECML/PKDD. He is the co-author of the book Design and Implementation of Data Mining Tools for CRC Press. He is a senior member of IEEE.
Pankaj Mehra, Chief Scientist, HP Labs Russia, Technical lead of Taxonom.com
When: October 21, 2009
Venue: 292 Joshi (Brandeberry Conference Room)
Title: Beyond Search: 5 steps to insight
Evidence points to the reason why blind application of Web search to enterprises produces undesirable outcomes. First, enterprises are lagging the Web in achieving richly connected information. Second, the level of specificity of meaning and the depth of modeling expected by enterprise users are both higher than by consumers. In this talk, I will show the first houses we have built on the semantic foundations for enterprise search we laid in our previous work.
The talk will begin with an analysis of how people, processes and infrastructure are currently deployed in information-rich businesses in order to make sense of tens of petabytes of open and proprietary information. I will discuss why existing architectures—especially, their implied cost and delay structures—do not scale to the demands and opportunities thrown open by new economic and business models around information. I will then argue that in order to architect for exabytes and beyond, businesses need to make a switch to architecting their information services around an economy of plenty (away from architecting around the economy of dearth that gave us search).
In the process, we will turn on its head the very problem that motivates search technology, for instance, asking and answering the fundamental question: Is it not already too late if you have to look for something? I will present architectures for delivery and sense-making. At the heart of these systems lies an engine that shortens the path from “Got it.” to “Got it!”™, i.e. the all-important path from content to insight. We will show how to couple this engine with context-mapping technologies in order to move beyond searching for documents to having the right insights delivered into the right heads at the right time. The talk concludes with blueprints from information-heavy industries, such as financial and legal services, which I expect will become widely adopted in the near future.
Pankaj Mehra is an HP Distinguished Technologist, Chief Scientist and founder of HP Labs Russia, and technical leader of HP’s Taxonom.com ontology generation service. He was core architect of HP’s NonStop Advanced Architecture, lead architect of HP’s Integrated Archive Platform, and chairman of InfiniBand Trade Association’s Management Working Group. He holds an Industry Visitor position at Stanford University. Please see http://pages.sbcglobal.net/pankaj.mehra for more information.
Prof. Srinivasan Parthasarathy from Ohio State University
When:September 9 2009
Venue: 292 Joshi (Brandeberry Conference Room)
Title: Toward Visual Knowledge Discovery and Analytics
Knowledge discovery and data mining is a process whose goal is to extract interpretable and actionable information from complex (potentially large) data. Visualization can play an important role in this exploratory process. Indeed, a number of important scientific discoveries have ultimately relied on visual confirmation from Galileo seeing the moons of Jupiter to Gerd Binnig and Heinrich Rohrer seeing atoms on a surface. Visualization can also play an important role in understanding the nature of a problem domain and subsequently the patterns governing the underlying solution space. In this talk I will talk about our vision on the roles visualization can play in the knowledge discovery process. Specifically we will examine the use of visualization:
1. As a mechanism to facilitate exploration of complex datasets.
2. As a means to validate and confirm results obtained from the discovery process.
3. As an approach to understand and lend transparency to the discovery process.
In each case I will attempt to illustrate the roles in the context of specific end applications drawn from the domains of physics of materials, bioinformatics, social network analysis and clinical diagnosis of eye disease. No prior knowledge of data mining, knowledge discovery, visualization or any of these application domains will be assumed.
Dr. Srinivasan Parthasarathy (PhD, University of Rochester), is currently an Associate Professor in the Computer Science and Engineering Department at the Ohio State University (OSU). His research interests are broadly in the areas of Data Mining, Databases, Bioinformatics and High Performance Computing. He is a recipient of an NSF CAREER award in 2003, a DOE Early Career Award in 2004, an Ameritech Faculty fellowship in 2001 and an IBM Faculty Award in 2007. His papers have received five best paper awards from leading
conferences in the field, including ones at SIAM international conference on data mining (SDM), IEEE international conference on data mining (ICDM), the Very Large Databases Conference (VLDB) and most recently at ACM Knowledge Discovery and Data Mining (SIGKDD). He is a member of the ACM and the IEEE and has served on the program committees of leading conferences in the fields of data mining, databases, and high performance computing. He currently serves on the editorial boards of several journals including the Data Mining and Knowledge Discovery Journal (DMKDJ), the IEEE Transactions on Knowledge and Data Engineering, the Distributed and Parallel Databases Journal (DAPDJ), and the IEEE Intelligent Systems (IEEE-IS) journal. He served as one of the program chairs of SIAM Data Mining in 2007 and is currently serving as one of the general chairs for the 2009-2010 editions.
Dr Olivier Bodenreider, MD, PhD. National Institutes of Health
When: May 27, 2009
Venue: 292 Joshi (Brandeberry Conference Room)
Title: Ontologies and Data Integration in Biomedicine Seminar
Review examples of successful biomedical data integration projects in which ontologies play an important role, including the integration of genomic data based on Gene Ontology annotations, the cancer Biomedical Informatics Grid (caBIG) project, and semantic mashups created by the Semantic Web for Health Care and Life Sciences community. Challenges to data integration in biomedicine will also be discussed.
Dr. Bodenreider is a Research Scientist at the Lister Hill National Center for Biomedical Communications, US National Library of Medicine, NIH. His research interests include terminology, knowledge representation and ontology in the biomedical domain, both from a theoretical perspective and in their application to natural language understanding, reasoning, information visualization and integration. Dr. Bodenreider is a Fellow of the American College of Medical Informatics. He received a M.D. degree from the University of Strasbourg, France in 1990 and a Ph.D. in Medical Informatics from the University of Nancy, France in 1993. Before joining NLM in 1996, he was an assistant professor for Biostatistics and Medical Informatics at the University of Nancy, France, Medical School.