Thursday, April 26, 2007

Regular expressions for fun and profit!

So the year is 2007 the century 21-th, but still there are ppl who deny using regular expressions to do the job. Why? I don't know. May be they just don't know how to do it, may be they think it's slow, or they just never heard of it. But it makes me sick looking at lame pseudo parsers their awkward logic and tons of stupid useless code around for something as simple as digging a file for a pattern. I've seen guys spending a day writing code that can be replaced with just one line using a regular expression, and why? Do they gain something out of this? Performance? Speed? No, they usualy do this for a lame script to automate something and they don't care much if it's gonna take 10mS ot a second. The truth is they realy don't know how much easier it just because they have never done it before.

So for all of them here's a example of a quick and dirty script I used to generate a report. It simply goes through every file in a directory and looks for the following pattern "/*#.*?#*/". It's a bracket structure taken as a comment within C files. It's simple and i use it to write stuff about something that later might end up in a report. Sometimes i write in there something like "See line 1923" or "Line 192" wich ofcourse means a reference to that line, that's why i made my script look for such patterns and if one exists it'll dump it.

So enough goofing around this is the script:

1 #!/usr/bin/env python
2
3
import os, re, sys
4
5 Cover = re.compile("/\*\#.*?\#\*/", re.DOTALL|re.MULTILINE)
6 SeeAlso = re.compile("(see|line)+.*?\d+", re.DOTALL|re.MULTILINE|re.IGNORECASE)
7 Number = re.compile("\d+")
8
9 class Result:
10 pass
11
12 def GetLinesAround(Data = '', Line = 0, LinesAround = 5):
13 ret = ''
14 d = Data.split('\n')
15
16 if Line >= LinesAround/2:
17 upper = Line - LinesAround/2
18 else:
19 upper = 0;
20
21 lower = Line + LinesAround/2 + 1
22 if lower > d.__len__():
23 lower = d.__len__()
24
25 try:
26 d = d[upper : lower]
27 except Exception:
28 return ret
29
30 for s in d:
31 ret = ret + str(s) + '\n'
32
33 return ret
34
35 def GenUtReport(File = '', RelativePath=''):
36 f = open(File, 'r')
37 data = f.read()
38 f.close()
39
40 ret = []
41
42 utIter = Cover.finditer(data)
43 for i in utIter:
44 res = Result()
45
46 avLine = data[0:i.start()].count('\n') + (data[i.start():i.end()].count('\n')/2)
47
48 res.fInfo = "File: " + str(RelativePath) + " Line: " + str(avLine)
49 res.utInfo = data[i.start():i.end()].split(':')[1].replace('#*/', '')
50 res.codeAroundUt = GetLinesAround(data, avLine)
51
52 res.seeAlsoInfo = []
53 seeIter = SeeAlso.finditer(res.utInfo)
54 for si in seeIter:
55 seeLine = Number.findall(res.utInfo[si.start():si.end()])
56 if seeLine != None:
57 seeLine = seeLine[0]
58 res.seeAlsoInfo.append( ( seeLine, 10, GetLinesAround(data, int(seeLine), 10) ) )
59
60 ret.append(res)
61
62 return ret
63
64 def PrityPrint(ret = []):
65 tmpStr = ''
66
67 for res in ret:
68 tmpStr = tmpStr + "<tr>"
69 tmpStr = tmpStr + "<td>" + res.fInfo + "</td>"
70 tmpStr = tmpStr + "<td>" + res.utInfo + "</td>"
71 tmpStr = tmpStr + "<td>" + res.codeAroundUt + "</td>"
72 tmpStr = tmpStr + "<td>"
73 for i in res.seeAlsoInfo:
74 tmpStr = tmpStr + "See Also. Code around LINE: " + i[0] + " (+/-)"+ str(i[1]/2) +"\n\n"+ str(i[2])
75 tmpStr = tmpStr + "</td>"
76 tmpStr = tmpStr + "</tr>"
77
78 return tmpStr.replace('\n', '<br>')
79
80 tmpStr = """<html><head></head><body><table border="1">
81 <tr>
82 <td><b>Filename, Line number</b></td>
83 <td><b>UT not covered reason</b></td>
84 <td><b>Related code</b></td>
85 <td><b>See also</b></td>
86 </tr>"
""
87
88 for r, d, f in os.walk(sys.argv[1]):
89 for file in f:
90 tmpStr = tmpStr + PrityPrint(GenUtReport( os.path.join(r, file) , file))
91
92 tmpStr = tmpStr + "</table></body></html>"
93 print tmpStr



So you see in less than 100 lines I did it all. Parsing, referencing, generating a report in html. It took me less than 2 hours to finish this.

No comments: