WARNING: This post contains geeky coding fun with a very mild regular expression. Read and try to understand at your own risk.
Fun project today, I had to substitute each & with its HTML equivalent (&) but not any ampersands where the substitution had already taken place. Basically where data entry had been sloppy and included both HTML formatted code and non-formatted code.
For example B&W should become B&W but B&W should not become B&amp;W.
Using http://regexpal.com to check my work I came up with the following regular expression:
Interpreted: Match & (except when immediately followed by (one or more letters (A-Z upper or lower)) OR (a # followed by 1 or more digits) AND a ; )
My Test Case:
American Journal of Baskets & Societies
British Journal of Periodicals & Highlighters
Society for — & – & other dashses
Society for &s
Society for &#
Society for &#;
Society for Air "s
I also found a handy Regular Expressions Cheat Sheet by DaveChild at cheatography.com