
An Overview Of The PTO Sequence Rules
The importance of DNA and protein research is critical to the discovery of new drugs and vaccines, genetic therapy and transgenic agriculture, to name a few applications. In the field of biotechnology, it is very common for patent applications to disclose nucleic acid and amino acid sequences. While certain sequences are short, others can cover several pages.
In May of 1990, the U.S. Patent and Trademark Office ("PTO") began requiring that practically every unbranched sequence disclosed in a patent application be submitted in a standard format, on a computer readable diskette (branched sequences are specifically excluded by definition). The purpose for the computer readable diskette is to allow for comparison searching with previously submitted or published sequences in a database such as GENEBANK. The PTO sequence rules (37 CFR §§1.821-1.825) apply to essentially any peptide sequence of four (4) or more amino acids or any nucleic acid sequence of ten (10) or more nucleotides. Every sequence disclosed in the specification, claims and figures is covered by the sequence rules and must appear in what is referred to as a "Sequence Listing." This is the case even if the sequence disclosure is described as cited art, used in a comparison figure or table, or not even claimed.
In this regard, the PTO requires that a paper copy of the Sequence Listing be submitted along with the computer readable diskette. See 37 CFR §1.821(f) & §1.821(c), respectively. The paper copy, however, must be formatted with several word-processing codes, which the computer readable diskette must not have. Therefore, not only are the PTO rules pertaining to the content of the Sequence Listing very specific, but those rules concerning the format are very strict and must be followed precisely (Sequence Listings have been rejected for having incorrect punctuation, erroneous spacing and miscounted sequences, to name a few).
The sequence rules also require certain amendments to the specification and claims. Each sequence is given an ID number in the Sequence Listing and must be cited (e.g., "SEQ ID NO:1") at each place in the specification, claims or abstract where that sequence is shown or discussed. The specification must also be amended to insert the printed Sequence Listing into the application immediately preceding the claims, repaginating the claims and abstract (assuming that the Sequence Listing was not filed with the application). See 37 CFR §1.821(c) & §1.823(a), respectively.
The time for submitting a Sequence Listing is not always clear. A common misconception is that the application will be denied a filing date if it does not contain a Sequence Listing. Actually, if an application is filed without a Sequence Listing, the Applications Branch will issue a "Notice to Comply with Requirements for Patent Applications Containing... Sequence Disclosures," roughly within a month from filing. The period for responding with a proper Sequence Listing is one month from the mail date of the notice. See 37 CFR §1.821(g). This deadline may be extended, however, for up to four months by filing Extensions-Of-Time and paying the appropriate government fee. Occasionally, an application with sequence disclosure will slip past the Applications Branch, and the examiners will be responsible for requesting the Sequence Listing.
A few common errors associated with the preparation of the Sequence Listing are:
- Failing to present nucleotide sequences as single stranded, 5' to 3', left to right sequences. See 37 CFR §1.822(j). By convention, complementary DNA is written 3' to 5', left to right, and scientists not familiar with the sequence rules will often disclose the sequence as such in an invention disclosure. These sequences, therefore, must be "reversed" to be properly incorporated in the Sequence Listing.
- Failing to submit the computer readable diskette in ASCII text. See 37 CFR §1.824(b). Preparing the Sequence Listing initially with a word-processor, and attempting to save it to diskette in ASCII text, almost always leads to the inclusion of word-processing "codes" which corrupt the integrity of the computer readable diskette. The diskette will not pass the initial PTO checking software, and the Sequence Listing will be rejected. Since the computer readable diskette must be submitted in ASCII text, it is best to prepare the Sequence Listing in ASCII text, save it to a diskette, and subsequently prepare a paper copy which contains the necessary word-processing codes.
- Failing to present amino acid sequences correctly. The amino acid sequence encoded by a nucleic acid must be typed directly below the corresponding nucleotide, as well as appear as a separate sequence. See 37 CFR §1.822(d) & §1.821(c), respectively. Very often, where an application discloses a cDNA molecule encoding a protein, the Sequence Listing will fail to meet both of the previously mentioned criteria with regard to Amino Acid sequences.
Finally, the PTO recently announced that it would hold public hearings regarding nucleic acid sequences. These hearings were to be held in San Diego, California on November 29, 1995, and Washington, D.C. on December 7, 1995. See 60 Federal Register 57223, November 14, 1995. Both meetings, however, were postponed. The main issues center around the PTO's cost associated with searching very large sequences, and whether the patenting of complete organism genomes or human genome fragments will inhibit rather than promote the advancement of biotechnology. It remains to be seen whether the PTO's attempt to curb searching costs will affect Sequence Listing practice.