"Python 2.6 Text Processing: Beginner's Guide"
by Jeff McNeil
Link: http://www.amazon.co.uk/Python-Text-Processing-Beginner-27s-Guide/dp/1849512124/
Reading history and reviews
Finished on 30th March 2011
This is a practical introduction to a wide range of methods for reading, processing and writing textual data from a variety of structured and unstructured data formats. Aimed primarily at novice Python programmers with elementary knowledge of the language basics but new to text processing, the book offers hands-on examples for various processing techniques: these range from low-level (e.g. Python’s built-in libraries for handling strings, regular expressions, and formats such as JSON, XML and HTML) through to the more advanced (using 3rd party libraries to parse custom grammars, and for indexing and searching large text archives). In addition to chapters on Unicode, internationalisation, working with templates, and writing formats like PDF and Excel, there is also a great deal of general supporting material on working with Python (including installing packages, and using virtualenv), and the differences introduced by Python 3.
The book covers a lot of ground and moves quickly - I think it’s fair to say that the range of techniques is quite ambitious, with the inevitable consequence that many chapters are more introductory than definitive. However the hands-on approach is largely successful at providing working examples at each stage to illustrate the key points. (I also felt that the example code was of better-than-average quality.) Aside from a surprisingly unsatisfying chapter on structured markup (reluctantly, I would recommend looking elsewhere for an introduction to XML processing with Python) and a few niggling typos, there’s a lot of excellent material in this book, and the author has a knack for presenting some tricky concepts in a deceptively easy-to-understand manner: the chapter on regular expressions is possibly one of the best introductions to the subject that I’ve ever seen. Chapters on encodings and internationalization, advanced parsing, and indexing and searching were also highlights (as was the section on Python 3 in the appendix).
So overall I really enjoyed working through the book and felt I learned a lot. I think this is a great introduction to a wide range of text processing techniques in Python, both for novice Pythonistas (who will undoubtedly also benefit from the more general Python tips and tricks presented in the book) and more experienced programmers who are looking for a place to start learning about text processing. Finally, I should disclose that I got a free (e)copy of the book from the publisher in return for reviewing it, and this is an edited version of my review (posted on Amazon and on my blog). Also, a sample chapter can be freely downloaded from the book’s website.