From walker@sky Mon Feb 23 22:23:34 1998 Return-Path: Received: from ray.nlm.nih.gov by frodo.nlm.nih.gov id WAA11265; Mon, 23 Feb 1998 22:23:34 -0500 Received: from sky.nlm.nih.gov by ray.nlm.nih.gov id WAA27912; Mon, 23 Feb 1998 22:22:19 -0500 Received: by sky.nlm.nih.gov id WAA27210; Mon, 23 Feb 1998 22:22:18 -0500 Date: Mon, 23 Feb 1998 22:22:18 -0500 From: walker@sky (D Roland Walker) Message-Id: <199802240322.WAA27210@sky.nlm.nih.gov> To: ncbi@sky Subject: Seminar Tue 24 Feb Content-Length: 1790 X-IMAPbase: 1000759424 1 Status: RO X-Status: X-Keywords: X-UID: 1 SEALS, A System for Easy Analysis of Lots of Sequences Tue, 24 Feb, 11:00 am. Bldg. 38A, 8th floor Fully automated sequence analysis is not possible or desirable. SEALS (A System for Easy Analysis of Lots of Sequences) is designed to facilitate semi-automatic sequence analysis at the genome scale, leveraging human intelligence by simplifying laborious tasks, without taking human judgement out of the loop. To this end, SEALS includes a large number of user-level commands, which provide common file manipulations glue for other popular analysis programs Entrez retrieval GenBank flatfile manipulation fasta library filtering/sorting BLAST filtering/sorting access to the NCBI taxonomy etc. Graphical interfaces are avoided, as they tend to be slow and do not scale well. Human-readable file formats are used exclusively in order to allow human interaction at every stage of a process. In addition to providing user-level sequence analysis tools, SEALS also aims to provide a rapid application development environment, implementing a set of primitives at the appropriate level of abstraction for current research projects in sequence analysis. New applications can be rapidly prototyped and non-programmers can easily create and modify novel functions using shell scripts. SEALS was written in Perl, for portability and ease of development. I will discuss the design and implementation of SEALS itself, and the design and implementation of several research projects using SEALS, including subtractive comparison of genomes similarity clustering of sequences at the domain level searching for analogous isozymes 3-D fold recognition by sequence similarity This is not a tutorial for users, though I will be happy to give one at a later date. Roland