2012年6月13日星期三

Perl 中 substr 用法

A lot of strings manipulation can be done using the power of regular expressions but in many cases, the built-in string functions are straightforward and take less time to execute.

To begin with, let's see the syntax forms available for Perlsubstr function:

substr EXPR, OFFSET, LENGTH, REPLACEMENT  substr EXPR, OFFSET, LENGTH  substr EXPR, OFFSET  

where:

  • EXPR is a string expression from which the substring will be extracted
  • OFFSET is an index from where the substring to be extracted starts
  • LENGTH is the length of the substring to extract
  • REPLACEMENT is a string that will replace the substring

Like in the case with other functions, you can use the parentheses or not, do it as you wish. As you can see above, some arguments are mandatory and others are optional. You must mention at least the string expression (EXPR) and the position (OFFSET) from where the substring to be extracted starts.

Before reviewing the Perl substr function parameters, I want to remind you that in Perl the first character of a string has the index 0, the second 1, and so on. Actually, you can modify this by setting the special variable $[ with whatever you want, but be careful however if you decide to change it. For strings $[ is the index of the first character of the string and by default is set to 0.

Take a moment to look at the following example and see a code sample about how to use the Perl substr function:

my $names = "John Peterson Anne Mike";  my $oneName = substr($names, 5, 8);  print "$names\n";  #it prints John Peterson Anne Mike  print "$oneName\n";    #it prints Peterson  

Please note that $names variable value didn't change after using the Perlsubstr function.

We can use Perl substr function either in various comparisons or like a lvaluesuch as an assignment. In this last case, if the EXPR will be a string variable, the value of the string variable will be modified. See the next block of code for this:

my $names = "Alin Fred John";  substr($names, 5, 4) = "Mary";  print "$names\n";  # it prints Alin Mary John  


Check my new How To Tutorial eBooks (PDF format):

to see a lot of examples about how to use built-in functions, complex data structures and statements in Perl.



And now let's get back to our parameters.

OFFSET could be:

  • positive – the substring starts that far from the beginning of the string
  • negative – the substring starts that far from the end of the string
  • 0 - that means that the substring starts at the first character of the string
If the OFFSET is outside the string (for instance, a string has 10 characters andOFFSET is greater than or equal to 10), Perl substr function will return theundef value and it will generate a warning error.

If the substring is used like a lvalue, and the OFFSET is entirely outside the expression string, a fatal error will be issued. See the following snippet code to illustrate the cases discussed above:

my $names = "Alin Fred Peter";  my $oneName = substr($names, -10, 4);  print "Name: $oneName\n";  $oneName = substr $names, -100, 4;  print "Name: $oneName\n";  substr($names, 20, 5) = "Alice";  

This code will produce as result:

Name: Fred  Name:  substr outside of string at 1.pl line 6.  

Well, 1.pl is my script name. You can run the script in a command prompt (on a Windows machine in my case) using the switch –w and you'll get the warning errors too (perl -w 1.pl).

LENGTH could be:

  • omitted – the function will return all the characters beginning with theOFFSET position up to the end character of the string
  • positive – the function will return from the string maximum LENGTHcharacters beginning with the OFFSET position
  • negative – it will return the substring starting with the OFFSET position but without that many characters off the end of the string
  • 0 – in this case the returned substring will be empty, no error warning
See the example below where we supply some limit situations too:

my $names = "Alex James Abby Shannon Monica";  my $strNames = substr $names, 11;  # length omitted  print "$strNames\n";  # prints Abby Shannon Monica  $strNames = substr $names, 24, 100; # length = 100  print "$strNames\n";             # prints Monica  $strNames = substr $names, 24, -2;  # length = -2  print "$strNames\n";             # prints Moni  

And now some examples using substr as a lvalue:

my $names = "Alex James Abby Shannon Monica";  substr($names, 11, 4) = "Alexandra";  print "$names\n";             # prints Alex James Alexandra Shannon Monica  

In the example above, the substring "Abby" (found at offset 11) will be replaced by the substring "Alexandra" although this substring is longer than 4 (the LENGTH supplied to substr function). As you see, $names has now more characters than it had initially.

Next, look at an example where the substring used to be assigned is shorter than the LENGTH supplied to Perl substr function:

my $names = "Alex James Abby Shannon Monica";  substr($names, 11, 4) = "Tom";  print "$names\n";   # prints Alex James Tom Shannon Monica  

After assignment, $names became shorter that the initial string.

You can play around with these examples to see how the Perl substr function works in other similar situations.

REPLACEMENT

I gave you some examples above about how to use Perlsubstr function as a lvalue when you need to replace a substring with another one. Another way to do this is to use the REPLACEMENT parameter of Perl substr function, like in the following example:

my $names = "Alex James Abby Shannon Monica";  substr $names, 16, 7, "Alexandra";   # it will replace "Shannon" with "Alexandra"  print "$names\n";   # it prints Alex James Abby Alexandra Monica  

As you have seen in the example above, $names will change the value after the replacement (like in the case of a lvalue).


Finally, I'll show you a mini script application where you can see how you can use Perl substr function in connection with other string functions.

How to use substr to get the column fields from a flat file database


A flat file database consists of a number of records delimited by a separator, which in most cases is the newline ("\n") character. In this case we say that each record is specified on a single line. Each record consists by one or more fields, either of fixed width or delimited by some special character like whitespace or comma.

For instance, let's suppose that each record of the file customers.txt includes the fields Name, Phone and ZipCode and the entire file has only three records, like in the next figure:

Name PhoneZipCode
John Abbot872-321-1212 55416
Clark Eliot 205-321-120020037
Johnny Randolph345-767-3476 33702

Fixed-width columns

We'll examine first the case when the fields have fixed width: Name – 20, Phone – 12 and ZipCode – 5. If we'll print the file, we'll get something like this:

John Abbot          872-321-121255416  Clark Eliot         205-321-120020037  Johnny Randolph     345-767-347633702  

The following block of code reads the file and prints each record on a single line, the fields being separated by comma:

open FILE, "customers.txt" or die $!;  while (<FILE>)   {    # chomp off the possible ending newline from $_    chomp;    my $name = substr($_,0, 20);    #Trim the end trailling spaces     $name =~ s/ +$//;    my $phone = substr($_,20, 12);    # delete all '-' characters    $phone =~ s/-//g;    my $zipCode = substr($_, 20+12, 5);    print $name, ",",$phone, ",",$zipCode, "\n";  }  close FILE;  

Running this snippet code will produce the following output:

John Abbot,8723211212,55416  Clark Eliot,2053211200,20037  Johnny Randolph,34576734763,33702  

Columns delimitated by separator

The next example will illustrate the case when the fields are delimited by a separator character like comma. In this case the content of our file will be:

John Abbot,872-321-1212,55416  Clark Eliot,205-321-1200,20037  Johnny Randolph,345-767-34763,33702  

Because we want to show you how you can use the Perl substr function to access the fields of the record, we'll not use the split function to do this (although it looks easier). See the next sample code to see how you could implement it:

open FILE, "customers.txt" or die $!;  while (<FILE>)   {    # chomp off the possible ending newline    chomp;    my $pos1 = index($_, ",");     my $name = substr($_,0, $pos1);    my $pos2 = index $_, ",", $pos1+1;     my $phone = substr($_,$pos1+1, $pos2-$pos1-1);    # delete all - characters    $phone =~ s/-//g;    my $zipCode = substr($_, $pos2+1, length($_)-$pos2);    print $name,",",$phone,",",$zipCode,"\n";  }  close FILE;  

The output is the same as in the previous example.

Please click here to download the Perl substr script with all the above examples included.

没有评论: