Scientific Computing, Biology - Biotechnology, Scripting Languages

Beginning Perl for Bioinformatics

Name: Beginning Perl for Bioinformatics
Author: James D. Tisdall
ISBN: 9780596000806

by James D. Tisdall

Available on Bookshop Write a review

Books.org participates in affiliate programs including Bookshop.org and the Amazon Services LLC Associates Program. We may earn a commission from qualifying purchases made through links on this page, at no additional cost to you.

Overview

With its highly developed capacity to detect patterns in data, Perl has become one of the most popular languages for biological data analysis. But if you're a biologist with little or no programming experience, starting out in Perl can be a challenge. Many biologists have a difficult time learning how to apply the language to bioinformatics. The most popular Perl programming books are often too theoretical and too focused on computer science for a non-programming biologist who needs to solve very specific problems.

Beginning Perl for Bioinformatics is designed to get you quickly over the Perl language barrier by approaching programming as an important new laboratory skill, revealing Perl programs and techniques that are immediately useful in the lab. Each chapter focuses on solving a particular bioinformatics problem or class of problems, starting with the simplest and increasing in complexity as the book progresses. Each chapter includes programming exercises and teaches bioinformatics by showing and modifying programs that deal with various kinds of practical biological problems. By the end of the book you'll have a solid understanding of Perl basics, a collection of programs for such tasks as parsing BLAST and GenBank, and the skills to take on more advanced bioinformatics programming. Some of the later chapters focus in greater detail on specific bioinformatics topics. This book is suitable for use as a classroom textbook, for self-study, and as a reference.

The book covers:

Programming basics and working with DNA sequences and strings
Debugging your code
Simulating gene mutations using random number generators
Regular expressions and finding motifs in data
Arrays, hashes, and relational databases
Regular expressions and restriction maps
Using Perl to parse PDB records, annotations in GenBank, and BLAST output

A practical introduction to Perl designed for biologists with little or no programming experience. The book approaches programming as an important new laboratory skill, and shows many Perl programs and Perl programming techniques that can be immediately useful in the lab. Each chapter focuses on a problem or class of problems in bioinformatics, and shows how to use Perl to solve them.

Synopsis

About the Author, James D. Tisdall

James Tisdall has worked as a musician, a programmer at Bell Labs (where he programmed for speech research and discovered a formal language for musical rhythm), and as a bioinformaticist at Mercator Genetics in Menlo Park, California, and at Fox Chase Cancer Center in Philadelphia. He has a B.A. in mathematics from the City College of New York and an M.S. in computer science from Columbia University; he is working towards a Ph.D. in computer science at the University of Pennsylvania. In his spare time Jim teaches computer music at the Settlement Music School in Philadelphia.

Reviews

There are no reviews yet. Log in to write one.

Editorials

From Barnes & Noble

The Barnes & Noble Review
It's hard to believe nowadays, but it wasn't long ago that biology and computing were about as far apart as two sciences could be. That was before the Human Genome Project, before the Protein Data Bank, before the explosion in biological data that has taken place in the past few years -- and the explosion in computational analysis tools for making sense of it all. Even more strikingly, biological experimentation is increasingly taking place "in silico" -- in simulations running in a computer, not in a test tube.

Put simply, programming is becoming a critical skill for more and more biologists. James Tisdall's timely Beginning Perl for Bioinformatics will help them gain the specific programming skills they'll need in their day-to-day work.

This book's examples and exercises -- and there are many -- almost all focus on real biological problems. Many of them use biological data sources working biologists will recognize. And the choice of Perl as a language for bioinformatics is apt: It's got a shallow learning curve, it's portable, fast, and if written properly, requires relatively little maintenance.

Tisdall starts at the highest level. You've been handed a problem -- a simple one, to help you get started. You need to count the regulatory elements in DNA. Where would you start? He walks you through identifying the inputs you'll need, establishing your overall program design, planning for output, refining your design using pseudocode -- an informal program that doesn't bother with correct syntax -- and, finally, writing a real, runnable program.

By Chapter 4, you're writing programs that represent DNA and protein sequence data, transcribe DNA to RNA, concatenate sequences, make the reverse complement of sequences, and read sequence data from files. (These are not examples you'd find in the "Camel" book, O'Reilly's classic introduction to Perl -- or, for that matter, in any other Perl book we've seen!) Of course, as you're writing these programs, you're also learning how to work with scalar and array variables, handling string operations, reading from files -- techniques you'll use constantly.

Perl is just super-duper at finding patterns, and if you're a biologist working with DNA or proteins, it won't take you long to find good applications for it. Chapter 5 teaches you how to search for motifs -- for example, regulatory elements of DNA or short stretches of protein that exist in multiple species -- and examine sequence data in detail. Along the way, you're learning how to use conditional tests, regular expressions, and string operations -- more meat-and-potatoes Perl stuff.

Mutation is a random process, and Tisdall spends a full chapter on randomization: modeling mutations with random numbers, using random numbers to generate DNA sequence data sets, repeatedly mutating DNA to understand how mutations accumulate, and more. Using hash datatypes, you'll learn how to write Perl programs that simulate how the genetic code translates DNA into proteins.

Next, Tisdall focuses on computing restriction maps, which help biologists determine where best to cut a DNA molecule in order to insert a new gene; and on restriction digests, one of the first methods for "fingerprinting" DNA. In so doing, he helps you deepen your skills with regular expressions, and offers practical advice on representing Restriction Enzyme Database data with them.

If you're working with the Genetic Sequence Data Bank (GenBank), Tisdall shows you how to extract information from it, search for patterns, parse its flat-file format to extract what you need, and create a Perl DBM database for rapid lookups on the data you work with most. There are chapters on working with the increasingly-important Protein Data Bank, which stores knowledge about the 3D structure of a growing collection of proteins; and finally, a brief introduction to the open source Bioperl modules, which can streamline sequence manipulation, access to biology databases, and other common bioinformatics tasks.

If you're a working biologist, or working on becoming one, Beginning Perl for Bioinformatics will be an invaluable resource -- and we've seen nothing like it. (Bill Camarda)

Bill Camarda is a consultant, writer, and web/multimedia content developer with nearly 20 years' experience in helping technology companies deploy and market advanced software, computing, and networking products and services. He served for nearly ten years as vice president of a New Jerseybased marketing company, where he supervised a wide range of graphics and web design projects. His 15 books include Special Edition Using Word 2000 and Upgrading & Fixing Networks For Dummies®, Second Edition.