What does this mean to you?

((EC)|(NH))\d+[ABCDEFGHJKLMNPQRSTUVWXYZ]\d*

The EC and NH are prefixes for Sound Effects Centre catalogue numbers, so that’s a clue. The capital letters are a complete alphabet minus the letters I and O, which is another.

Expressed Purpose

This is a regular expression. You don’t come across them in everyday programmes like MS Office applications but they make searching for text with regular a pattern, like catalogue numbers, much easier. Try it for your self here: https://regexr.com/.

The pattern above will match any text which follows the pattern of a BBC Sound Effects Centre catalogue number. Although the BBC may have called them disc numbers, I’m calling them catalogue or cat. numbers. I’m going to work through this pattern match in detail below, for those that are interested. For now just remember that I can find all the catalogue numbers in a text file if they follow that regular pattern.

I was on about searching. Search what? Why, the November 1985 BBC Sound Effects Catalogue, of course! The cherry on the top of the mega donation and a gift for anyone who wants to attempt a discography. Here’s page 1 of 358.

One day, I hope, I will be able to scan this whole thing line by line and have the entire thing searchable. Let’s just say that is a non-trivial problem for now. What I can do is get a quick OCR scan of the text. Here’s an extract from another page.

Chatter: Exterior.
General chatter. - Nov 66 - 3'0" EC40A b01
Cheerful chatter. - Nov 66 - 3'O" EC40A b02
Chatter and footsteps on gravel. - Nov 66 - 3'2" EC40A f01
Chatter and footsteps on gravel, animated. - Nov 66 - 2'27" EC40A f02
Crowd leaving mosque, busy atmosphere with some children. (Kano,
Northern Nigeria) - 1967 - 3'30" EC51C b02
Native village chatter, medium-sized crowd. - 1970 - 1'170 EC51K b02
Excited chatter from large crowd. (Wide perspective) ¿„Jun 70 - 6'35" EC40F f
Mixed cheerful chatter. (Thirty people) - Aug 70 - 40" EC40G b04
General atmosphere of crowd at protest meeting, 6000 people. (Recorded
outdoors, England) - Sep 71 6'30" ЕСДОН f
Chatter at close of protest meeting. (6000 people, recorded outdoors in
England) - Sep 71 - 1'35" ECAOJ f04
Large cosmopolitan crowd with footsteps and speech. - Nov 71 6/2" ЕС4OK f

It’s not too bad as these things go, but I had to do some manual fiddling to get it that good! The problem really is that instead of the neat lines of entries in the catalogue it all gets messed up and jumbled about. The OCR needs to be told the format of the page to reassmble it in the same format, otherwise it assumes it’s paragraph text like you’re reading now, and it gets very confused.

Anyway, at least there are a catalogue numbers in there. EC40K, for instance. But also garbled numbers like ECA0K, which looks like it should be EC40J.

My theory was that even without reassmbling the original pages’ format I could do a quick and dirty scan of all the pages and search for cat. numbers. It’s worthwhile searching for them because knowing all the catalogue numbers in the catalogue in 1985 should provide a very good, maybe even complete, list of all the 7″ discs the BBC ended up with before switching entirely to CD. I’ll save ytou the anticipation by stating now that there were more released after this catalogue. Still, I’ve only had a vague idea till now, based on my collection and Mike’s Collection at https://www.6868.me.uk/ how many records were released. This snapshot from November 1985 will be useful.

Luckily I had spent time in early 2022 photographing every page in the catalogue. That had shown me how difficult this scanning was going to be and I set them aside for another day. Sometime later I realised that the catalogue numbers alone would be quite useful and as they are quite easy to pluck from the mess of text.

Once I’ve got all the cat. numbers searched out I will find new cat. numbers, not currently in my list. I’m sure of that. Then there’ll be ones that aren’t in there but I know for a fact exist and can pull out of my collection to prove it.

Next time the scanning…

Or read on for more regex…

Regular Expression

((EC)|(NH))\d+[ABCDEFGHJKLMNPQRSTUVWXYZ]\d*

The EC and NH are inside brackets, which is to say they are a group of characters we want to find together. We want to find EC or NH at the start of the thing we’re looking for so we have both seperated by the pipe symbol | which is typically useing to make a logical OR in programming.

All catalogues start with an EC or and NH so that whole group is saying, ‘find text with EC or NH’

((EC)|(NH))

And that might be enough to find most of the catalogue numbers. It depends on the text though. The word EFFECTS is on every page and that matches (EC), so let’s keep going.

In all the SEC catalogue numbers the catalogue prefix is followed by a number. In Regular Expressions (regex, for short) we find digits with the ‘\d’ character class. That ‘\d’ will find a single digit 0-9, but not if there are more one digit, say 57. Then it would only return 5 in the search result. That why in my pattern I have to have ‘\d+’. That ‘+’ is saying find one or more of something, in this case digits.

\d+

Strictly speaking I only need to find 1-4 digits because there are no 5-digit or higher numbers. For that kind of search there’s a range quantifier in curly brackets. I could have used \d{1,4}. I gambled that no scan would incorrectly match 5 or more digits and if I was wrong about the 4-digit maximum I’d find out. Also I only thought of that when I was writing this. Moving on!

The next bit is in square brackets ‘[ ]’ The idea here is to find any one of these characters. In the case of SEC catalogue numbers the next element is a single capital letter, so I want ‘any one of these capital letter characters’. The BBC had a rule though, never to use ‘I’ or ‘O’. Presumably because they can too easily be mistaken for a 1 or 0 (one or zero). EC 11I1 would be terrible. EC10O5 would be similarly troublesome. Hence it’s the whole alphabet minus those two letters.

[ABCDEFGHJKLMNPQRSTUVWXYZ]

Now, for the original mono SEC records that’s all you need.

EC + digit(s)* + a captial letter.

*See what I did there? There’s a convention built into common written English which allows for one or more, a bit like regex.

When the stereo ECS records came in a final digit, or digits, was added on the end, so instead that pattern is:

EC + digit(s)* + a captial letter. + digit(s)

We’ll need another \d to find all catalogue numbers of both types and a zero or more quantifier so we can find either pattern, with and without the final digit(s). That quantifier is ‘*’

\d*

Hold on! I hear someone muttering at the back, you missed out the ‘S’ on the end of ECS. Yes, that’s right. I did that on purpose because in the search I’m doing the ‘S’ for stereo isn’t used. Nor is there a space between the prefix and number. Instead of, say ‘ECS 1T3’ I only have to search for EC1T3. Thats’s the convention in the printed catalogue. When I come to doing this on the record labels it will be a different regular expression I need.

Okay, regex lesson over!

Leave a comment

Your email address will not be published. Required fields are marked *