There are many ways to handle any task on a Unix platform, but some
techniques that are used to process a file waste a lot of CPU time.
Most of the wasted time is spent in unnecessary variable assignment and
continuously opening and closing the same file over and over. Using a
pipe also has a negative impact on the timing.
In this article I will explain various techniques for parsing a file
line by line. Some techniques are very fast and some make you wait for
half a day. The techniques used in this article are measurable, and I
tested each technique with time command so that you can see which tec-
hniques suits your needs.
I don't explain in depth every thing, but if you know basic shell
scripting, I hope you can understand easily.
I extracted last five lines from my /etc/passwd file, and stored in a
file "file_passwd".
[root@www blog]# tail -5 /etc/passwd > file_passwd
[root@www blog]# cat file_passwd
venu:x:500:500:venu madhav:/home/venu:/bin/bash
padmin:x:501:501:Project Admin:/home/project:/bin/bash
king:x:502:503:king:/home/project:/bin/bash
user1:x:503:501::/home/project/:/bin/bash
user2:x:504:501::/home/project/:/bin/bash
I use this file whenever a sample file required.
Method 1: PIPED while-read loop
#!/bin/bash
# SCRIPT: method1.sh
# PURPOSE: Process a file line by line with PIPED while-read loop.
FILENAME=$1
count=0
cat $FILENAME | while read LINE
do
let count++
echo "$count $LINE"
done
echo -e "\nTotal $count Lines read"
With catting a file and piping the file output to a while read loop a
single line of text is read into a variable named LINE on each loop
iteration. This continuous loop will run until all of the lines in the
file have been processed one at a time.
Bash can sometimes start a subshell in a PIPED "while-read" loop. So
the variable set within the loop will be lost (unset) outside of the
loop. Therefore, $count would return 0, the initialized value outside
the loop.
Output:
[root@www blog]# sh method1.sh file_passwd
1 venu:x:500:500:venu madhav:/home/venu:/bin/bash
2 padmin:x:501:501:Project Admin:/home/project:/bin/bash
3 king:x:502:503:king:/home/project:/bin/bash
4 user1:x:503:501::/home/project/:/bin/bash
5 user2:x:504:501::/home/project/:/bin/bash
Total 0 Lines read
Method 2: Redirected "while-read" loop
#!/bin/bash
#SCRIPT: method2.sh
#PURPOSE: Process a file line by line with redirected while-read loop.
FILENAME=$1
count=0
while read LINE
do
let count++
echo "$count $LINE"
done < $FILENAME
echo -e "\nTotal $count Lines read"
We still use the while read LINE syntax, but this time we feed the
loop from the bottom (using file redirection) instead of using a pipe.
You will find that this is one of the fastest ways to process each
line of a file. The first time you see this it looks a little unusual,
but it works very well.
Unlike method 1, with method 2 you will get total number of lines out
side of the loop.
Output:
[root@www blog]# sh method2.sh file_passwd
1 venu:x:500:500:venu madhav:/home/venu:/bin/bash
2 padmin:x:501:501:Project Admin:/home/project:/bin/bash
3 king:x:502:503:king:/home/project:/bin/bash
4 user1:x:503:501::/home/project/:/bin/bash
5 user2:x:504:501::/home/project/:/bin/bash
Total 5 Lines read
Note: In some older shell scripting languages, the redirected loop
would also return as a subshell.
Method 3:while read LINE Using File Descriptors
A file descriptor is simply a number that the operating system assigns
to an open file to keep track of it. Consider it a simplified version
of a file pointer. It is analogous to a file handle in C.
There are always three default "files" open, stdin (the keyboard),
stdout (the screen), and stderr (error messages output to the screen).
These, and any other open files, can be redirected. Redirection simply
means capturing output from a file, command, program, script, or even
code block within a script and sending it as input to another file,
command, program, or script.
Each open file gets assigned a file descriptor. The file descriptors
for stdin,stdout, and stderr are 0,1, and 2, respectively. For opening
additional files, there remain descriptors 3 to 9 (may be vary depend-
ing on OS). It is sometimes useful to assign one of these additional
file descriptors to stdin, stdout, or stderr as a temporary duplicate
link. This simplifies restoration to normal after complex redirection
and reshuffling .
There are two steps in the method we are going to use. The first step
is to close file descriptor 0 by redirecting everything to our new file
descriptor 3. We use the following syntax for this step:
exec 3<&0
Now all of the keyboard and mouse input is going to our new file des-
criptor 3. The second step is to send our input file, specified by the
variable $FILENAME, into file descriptor 0 (zero), which is standard
input. This second step is done using the following syntax:
exec 0<$FILENAME
At this point any command requiring input will receive the input from
the $FILENAME file. Now is a good time for an example.
#!/bin/bash
#SCRIPT: method3.sh
#PURPOSE: Process a file line by line with while read LINE Using
#File Descriptors
FILENAME=$1
count0=
exec 3<&0
exec 0< $FILENAME
while read LINE
do
let count++
echo "$count $LINE"
done
exec 0<&3
echo -e "\nTotal $count Lines read"
while loop reads one line of text at a time.But the beginning of this
script does a little file descriptor redirection. The first exec comm-
and redirects stdin to file descriptor 3. The second exec command red-
irects the $FILENAME file into stdin, which is file descriptor 0. Now
the while loop can just execute without our having to worry about how
we assign a line of text to the LINE variable. When the while loop
exits we redirect the previously reassigned stdin, which was sent to
file descriptor 3, back to its original file descriptor 0.
exec 0<&3
In other words we set it back to the system’s default value.
Output:
[root@www tempdir]# sh method3.sh file_passwd
1 venu:x:500:500:venu madhav:/home/venu:/bin/bash
2 padmin:x:501:501:Project Admin:/home/project:/bin/bash
3 king:x:502:503:king:/home/project:/bin/bash
4 user1:x:503:501::/home/project/:/bin/bash
5 user2:x:504:501::/home/project/:/bin/bash
Total 5 Lines read
Method 4: Process file line by line using awk
awk is pattern scanning and text processing language. It is useful
for manipulation of data files, text retrieval and processing. Good
for manipulating and/or extracting fields (columns) in structured
text files.
Its name comes from the surnames of its authors: Alfred Aho, Peter
Weinberger, and Brian Kernighan.
I am not going to explain everything here.To know more about awk just
Google it.
At the command line, enter the following command:
$ awk '{ print }' /etc/passwd
You should see the contents of your /etc/passwd file appear before
your eyes.Now, for an explanation of what awk did. When we called awk,
we specified /etc/passwd as our input file. When we executed awk, it
evaluated the print command for each line in /etc/passwd, in order.All
output is sent to stdout, and we get a result identical to catting
/etc/passwd. Now, for an explanation of the { print } code block. In
awk, curly braces are used to group blocks of code together, similar
to C. Inside our block of code,we have a single print command. In awk,
when a print command appears by itself, the full contents of the curr-
ent line are printed.
Here is another awk example that does exactly the same thing:
$ awk '{ print $0 }' /etc/passwd
In awk, the $0 variable represents the entire current line, so print
and print $0 do exactly the same thing. Now is a good time for an
example.
#!/bin/bash
#SCRIPT: method4.sh
#PURPOSE: Process a file line by line with awk
FILENAME=$1
awk '{kount++;print kount, $0}
END{print "\nTotal " kount " lines read"}' $FILENAME
Output:
[root@www blog]# sh method4.sh file_passwd
1 venu:x:500:500:venu madhav:/home/venu:/bin/bash
2 padmin:x:501:501:Project Admin:/home/project:/bin/bash
3 king:x:502:503:king:/home/project:/bin/bash
4 user1:x:503:501::/home/project/:/bin/bash
5 user2:x:504:501::/home/project/:/bin/bash
Total 5 lines read
Awk is really good at handling text that has been broken into multiple
logical fields, and allows you to effortlessly reference each individ-
ual field from inside your awk script. The following script will print
out a list of all user accounts on your system:
awk -F":" '{ print $1 "\t " $3 }' /etc/passwd
Above, when we called awk, we use the -F option to specify ":" as the
field separator. By default white space (blank line) act as filed sep-
arator. You can set new filed separator with -F option. When awk proc-
esses the print $1 "\t " $3 command, it will print out the first and
third fields that appears on each line in the input file. "\t" is used
to separate field with tab.
Method 5: Little tricky with head and tail
commands
#!/bin/bash
#SCRIPT: method5.sh
#PURPOSE: Process a file line by line with head and tail commands
FILENAME=$1
Lines=`wc -l < $FILENAME`
count=0
while [ $count -lt $Lines ]
do
let count++
LINE=`head -n $count $FILENAME | tail -1`
echo "$count $LINE"
done
echo -e "\nTotal $count lines read"
On each iteration head command extracts top $count lines, then tail
command extracts bottom line from that lines. A very stupid method,
but some people still using it.
Output:
[root@www blog]# sh method5.sh file_passwd
1 venu:x:500:500:venu madhav:/home/venu:/bin/bash
2 padmin:x:501:501:Project Admin:/home/project:/bin/bash
3 king:x:502:503:king:/home/project:/bin/bash
4 user1:x:503:501::/home/project/:/bin/bash
5 user2:x:504:501::/home/project/:/bin/bash
Total 5 lines read
Time Comparison for the Five Methods
Now take a long breath, we are going test each technique. Before you
get into test each method of parsing a file line by line create a large
file that has the exact number of lines that you want to process.
Use bigfile.sh script to create a large file.
$ sh bigfile.sh 900000
bigfile.sh with 900000 lines as an argument,it has taken more than two
hours to generate bigfile.4227. I don't know exactly how much time it
has taken. This file is extremely large to parse a file line by line,
but I needed a large file to get the timing data greater than zero.
[root@www blog]# du -h bigfile.4227
70M bigfile.4227
[root@www blog]# wc -l bigfile.4227
900000 bigfile.4227
[root@www blog]# time ./method1.sh bigfile.4227 >/dev/null
real 6m2.911s
user 2m58.207s
sys 2m58.811s
[root@www blog]# time ./method2.sh bigfile.4227 > /dev/null
real 2m48.394s
user 2m39.714s
sys 0m8.089s
[root@www blog]# time ./method3.sh bigfile.4227 > /dev/null
real 2m48.218s
user 2m39.322s
sys 0m8.161s
[root@www blog]# time ./method4.sh bigfile.4227 > /dev/null
real 0m2.054s
user 0m1.924s
sys 0m0.120s
[root@www blog]# time ./method5.sh bigfile.4227 > /dev/null
I waited more than half day, still i didn't get result, then I created
a 10000-line file to test this method.
[root@www tempdir]# time ./method5.sh file.10000 > /dev/null
real 2m25.739s
user 0m21.857s
sys 1m12.705s
Method 4 came in first place,it has taken very less time 2.05 seconds,
but we can't compare Method 4 with other methods, because awk is not
just a command, but a programming language too.
Method 2 and method 3 are tied for second place, they produce mostly
the same real execution time at 2 minutes and 48 seconds . Method 1
came in third at 6 minutes and 2.9 seconds.
Method 5 has taken more than half a day. 2 minutes 25 seconds to pro-
cess just a 10000 line file, how stupid it is.
Note: If file contain escape characters, use read -r instead of read,
then Backslash does not act as an escape character. The back-slash is
considered to be part of the line. In particular, a backslash-newline
pair may not be used as a line continuation.
Friday, May 14, 2010
Posted by venu k
68 comments | 11:45 PM
Subscribe to:
Post Comments (Atom)
If i did any mistake, even typo also please inform me.
ReplyDeleteur while loop code reads one line less
DeleteMerhaba,
DeleteWhat you’re saying is absolutely correctUnix platform, but this isn’t the exact situation everywhere. Where most smart folk work on a project - why can’t you do this the Boss asks :).
I think the Linux phenomenon is quite delightful, because it draws so strongly on the basis that Unix provided. Linux seems to be the among the healthiest of the direct Unix derivatives, though there are also the various BSD systems as well as the more official offerings from the workstation and mainframe manufacturers
It was cool to see your article pop up in my google search for the process yesterday. Great Guide.
Keep up the good work!
Ganesh
Thanks for this post and this blog..It is really helpful.
ReplyDeleteReally good your study. Just let me give a hint about the comparison you made: the first method should take longer because the file is not in the buffers (RAM, HD, processor, etc.), so I advise you to do all the 5 tests and then return to the first and test the time again, this way the times should be more accurate.
ReplyDeleteBash can sometimes start a subshell in a PIPED "while-read" loop, That's why Method1 taken more time.
ReplyDeleteCheck bellow output, I changed the order of execution. Still I get almost same result.
[root@localhost procesfile]# time ./method2.sh bigfile.4227 > /dev/null
real 2m46.832s
user 2m37.890s
sys 0m8.429s
[root@localhost procesfile]# time ./method3.sh bigfile.4227 > /dev/null
real 2m47.224s
user 2m38.370s
sys 0m8.369s
[root@localhost procesfile]# time ./method1.sh bigfile.4227 > /dev/null
real 6m10.158s
user 3m2.095s
sys 2m59.299s
how to read line by line by using system calls
ReplyDeleteThnx Dude... Great Article..It helped me alot. :)
ReplyDeleteSimple Tutorial.
ReplyDeleteEasy to understand.
thanks
salih
When the input to the loop comes from a process rather than from a file, things become even trickier.
ReplyDeleteWhat I do in that case is to use a temporary file, but to speed things up I use a FIFO instead of a normal file.
This uses the GNU coreutils package, which is pretty widespread and should work on any POSIX system (but I don't know about filesystem limitations).
A non-complete example:
# make up a temporary file name in system temp directory:
TempFile="$(mktemp -t "$(basename "${0%.sh}")-${UID}-$$.XXXXXX")"
# recreate that file as a FIFO instead:
rm -f "$TempFile"
mkfifo "$TempFile"
# fill it with the output from MyCommand process (including error stream), in background:
MyCommand >& "$TempFile" &
# now process the input as above:
while read -r LINE ; do
# my process
done < "$TempFile"
# clean up
rm -f "$TempFile"
(note that I am writing this in the blog and it could have typos and mistakes -- in fact, most likely it has)
It has always worked forme, but pay attention to both portability (try to ask FAT for a FIFO...) and for synchronization (the effectiveness /might/ depend on MyCommand output speed).
Great article mate, you have indeed covered topic in detail with examples. I have also blogged my experience as 10 examples of grep command in unix ,let me know how do you find it.
ReplyDeleteThanks
Top 30 Unix Interview Questions
Very nice article that I found helpful. Thank you!
ReplyDeletegreat Venu.. realy great. I like awk.
ReplyDeleteHi!
ReplyDeleteI am doing a programm which will read line by line command from a file text. I am doing so:
cat $p1 | ( while read line;
do
----execute----
done )
But i want that the programm doing a pause for press any key. I am doing so:
cat $p1 | ( while read line;
do
----execute----
read -p "Press any key"
done )
But it dont work...
Thanks
not sure what UNIX variant you're using, but in HPUX, the 'read' command has only 1 flag (-r, to not treat backslash char's as special) and 1 parameter, a variable name. For example, you would write the above like this:
ReplyDeleteecho "Press any key"
read mydata
where mydata is the variable that will contain the character you enter from the keyboard.
Give that a try...
NICE ;)
ReplyDeleteThanks for the effort
Thank you very much for your effort.
ReplyDeleteNice explanation and easy to understand.
Nice article bro, very useful one, keep them coming.
ReplyDeleteThank you.
Good 1...G8 effort
ReplyDeletegood artical.
ReplyDeleteMethod #5 is the only one that worked in my script. What I do is read file line-by-line, extract hostname, and do SSH to that host. For some reason, "while ; do ; done < $filename" only works till the first SSH, then the while loop exits.
ReplyDeleteThanks for the post. Very informative and useful.
ReplyDeleteVery nice work indeed. Thank you!
ReplyDeleteAmit Verma / IND
Can you give me an example for read/write by using bash?
ReplyDeletethank u so much sir u done a great job
ReplyDeleteThanks for posting this article
ReplyDeletefind command examples in unix
Basic unix commands
Hi All...
ReplyDeleteI'm working on something where requirement is something like:
1. One Shell Script - abc.sh
2. One Text file - xyz.txt
===========================================
xyz.txt contains values as below:
[1st Section]
a
b
c
d
[2nd Section]
e
f
g
h
i
[3rd Section]
j
k
l
m
n
o
p
========================================
Now I want print the values as per section vise using Menu driven script.
whenevenr I person will execute abc.sh then it will ask a person which section
values you want to see:
======
Example:
1. 1st Section
2. 2nd Section
3. 3rd Section
Choose an option:
(say) person has choosen value 2
then it should print only
e
f
g
h
i
======
so I'm stuck @ how can I read a file from some particular line and till some particular point.
n=$1
Deletecase $n in
1) n=1st ;;
2) n=2nd ;;
3) n=3rd ;;
esac
awk -v n="$n" '$1 ~ n,/^$/ {
if ( $0 == "" || $1 ~ n ) next
print
}' "$file"
This comment has been removed by a blog administrator.
ReplyDeleteHi,
ReplyDeleteI have a site, that i need to monitor and read the content of site
site ex : www.example.com and it has three component like below
component1-true
component2-true
component3-true
need to check three component "true" and get mails if not "true"
i ams using curl command to read out but has some issues,
can some one suggest any process.
mail me on mpasha06@gmail.com
Hi!
ReplyDeleteI am working on a script which should parse an input.txt file which contains more table definitions like this one:
------------------------------------------------
-- DDL Statements for table "CMWSYS"."CMWD_TEC_SUIVI_TRT"
------------------------------------------------
CREATE TABLE "CMWSYS"."CMWD_TEC_SUIVI_TRT" (
"ORDER_ID" VARCHAR(20) ,
"PERIODE_DATE" VARCHAR(50) ,
"NOM_JOB" VARCHAR(100) ,
"D_START" TIMESTAMP ,
"D_END" TIMESTAMP ,
"T_ELAPSED" TIME ,
"STATUS" VARCHAR(50) )
IN "CMW_DA16DN01" INDEX IN "CMW_IX16DN01" ;
and then it should export to a file called dbname_tablename (in this case CMWSYS_CMWD_TEC_SUIVI_TRT) one by one all the table definitions.
Do you have any idea how I can manage this task? At least a small hint.
Thanks
nice efforts
ReplyDeleteGood One !!
ReplyDeleteCan anyone tell me about two statement in shell script
ReplyDeleteexec 3<&0
exec 0< $FILENAME # FILENAME=$1 is first argument as file name
?
Please help..!
Thank you :)
ReplyDeletehow to take input in variable form a file word by word??
ReplyDeleteddl software download
ReplyDeleteRead a File Line by Line in a Shell Script is really informative post. thanks for great information.
ReplyDeletefull version software 2014
full crack software free download windows
ReplyDeleteAwesome Man, This is too much amazing trick...... I did it...
ReplyDeleteFree Download Software | Adobem Acrobat Xi
shells reading is an ancient method of divination practiced throughout human history in various forms by the world cultures.
ReplyDeleteShells readings is a method of shells reading that is intuitive, personal, and accessible to all.
shells readings
Thank u so very much for posting such a wonderful.
ReplyDeleteCrack Software for PC
I must say this is amazing blog,I red many blogs but this is really something impressive.Thanks
ReplyDeletesoftware download free full version
great and very informative blog.thanks
ReplyDeletefree softwares
This comment has been removed by the author.
ReplyDeleteHi Nice post. I am having a requirement something similar to your post i have tried all the 5 methods but it did not work out well. Below is my requirement. i have to read a log file generated by kernel continuously /var/log/syslog and i have to search for a pattern if that matches then i have to execute some script. Below is the algorithm i followed While[true] do if tail -F /var/log/syslog | grep "pattern" then #action echo print line fi done but after some time system is not responding can you give me a solution for this please?
ReplyDeleteWell post its tell us how read scripts line by line and how to main-tan in math with easy way thanks for share it statistical analysis of financial data .
ReplyDeleteThe best services have great writers and editors on staff. Some even use college professors! You could learn so much about the craft of writing from these professionals. Since you’re paying for the service,http://www.huffingtonpost.com/donna-marie-williams/college-students-use-essa_b_8773028.html
ReplyDeleteyou might as well learn as much from the experience as possible.
Hiring essay writing services to meet class requirements is a short-term fix for skill deficits in writing and time management. After graduation, as you’re seeking employment and a big salary to pay back your student loans, you may be given a writing test during an interview
ReplyDeleteReally I am using this commds more useful tips.
ReplyDeleteChase4Net is a reputed software training Institute for CCNA Training in Marathahalli,JAVA Training in Marathahalli,Python Training in Marathahalli,Android Training in Marathahalli .
Our main focus areas are Java Certification Training and Software Development.Chase4Net has been already ranked as the No.1 for CCNA Training in Bangalore,JAVA Training in Bangalore,Python Training in Bangalore,Android Training in Bangalore.
We provide quality education to the students at low cost.Students will get real experience.We are the Best Java Certification Training institute in bangalore.
Pleasant line use and great picture this post. this post will be successfully a debt of gratitude is in order for sharing.
ReplyDeleteThe website is looking bit flashy and it catches the visitors eyes. Design is pretty simple and a good user friendly interface.
ReplyDeleteccna training london
i got nice blog python training in bangalore
ReplyDeleteblockchain training in bangalore
Your good knowledge and kindness in playing with all the pieces were
ReplyDeletevery useful. I don’t know what I would have done if I had not
encountered such a step like this.
aws training in bangalore
aws training in chennai
I simply wanted to write down a quick word to say thanks to you for those wonderful tips and hints you are showing on this site.
ReplyDeletepython training in bangalore
This comment has been removed by the author.
ReplyDeleteI believe there are many more pleasurable opportunities ahead for individuals that looked at your site.
ReplyDeletepython training in btm
Nice blog
ReplyDeletepythyon training in bangalore
aws training in bangalore
data science training in bangalore
hadoop training in bangalore
devops training in bangalore
uipath training in bangalore
Iot Online Training
ReplyDeleteArtificial Intelligence Training in Bangalore
Machine Learning Training in Bangalore
Good article
ReplyDeletepython training in bangalore
aws training in bangalore
Thanks for this useful info
ReplyDeletejennifer
Thanks for sharing your information with everyone and keep posting
ReplyDeletethank you for sharing valuable information.
ReplyDeletePython Course in bangalore
thank you for sharing valuable information.
ReplyDeletePython Course in bangalore
It is really amazing...thanks for sharing....provide more useful information...Nice post.Thank you so much for sharing...
ReplyDeleteEmbedded System training in Chennai | Embedded system training institute in chennai | PLC Training institute in chennai | IEEE final year projects in chennai | VLSI training institute in chennai
They simply need an auto to fit the bill for the credit. Loaning firms would just expect you to utilize the title of your auto as security for the credit. It isn't simply the auto which should be surrendered. It is just used to survey the sum you can get from your auto.car title loans chicago
ReplyDeleteThank you for sharing the post! This is what I need to find.
ReplyDeleteinstagram
Nice post..
ReplyDeletedata science training in BTM
best data science courses in BTM
data science institute in BTM
data science certification BTM
data analytics training in BTM
data science training institute in BTM