Special seminar to be given on Sept. 8 Location: Bldg. 31A, 8th floor conference room Time: 2 pm. -----------------part one---------------------- Discovering Conceptual Object Models from Large Relational Databases Weixiong (Wayne) Zhang Research Assistant Professor USC/Information Sciences Institute A conceptual model of a database, such as an Entity-Relationship (ER) model or a frame-based model, is a specification of objects, attributes, and their relationships. Although such a model is critical for developing successful database applications, it may not be always available in practice for legacy databases, and discovering and constructing such a model from the data, and from the data only, is a challenging problem. In this talk, we present an approach to address the problems of object identification and model construction, and show a number of favorable features of this approach, including its robustness in dealing with noise data and scalability to large databases and data sets. This approach has been implemented in a system called McKey (Model Construction with Key identification) for discovering and building ER models from instances of large legacy databases. We have applied McKey to three very large legacy databases, and obtained comprehensive models within hours. This performance represents orders of magnitude in savings of manpower. This is a joint work with Wei-Min Shen, Xuejun Wang and Yigal Arens -----------------part two----------------------- Bidirectional, Divide-and-Conquer State-Space Search for Sequence Alignment State-space search is a general problem-solving paradigm for solving difficult combinatorial optimization problems in many areas. In this talk, we will describe a new approach that applies state-space search to finding an optimal sequence alignment. The main idea of the new method is to perform a bidirectional search, which gives a node on a solution path. This middle point is then used to divided the problem into two problems of smaller size, each of which can be solved recursively using the same method. We will describe how the space requirement can be reduced from O(n^d) to O(n^(d-1)), a significant improvement. The most important feature of this new algorithm is its time complexity. We will show that it explores much less states than the well-known dynamic programming approach to the sequence alignment problem, namely the Hirschberg algorithm, and is possibly the choice of algorithm for multiple sequence alignment, which is guaranteed to give an optimal solution. This is a joint work with Rich Korf. ----------------------------------------------