Alright, let’s dive into this “tom east” thing. I was messing around with some data stuff at work last week, trying to wrangle a bunch of messy information into something usable. Sounds boring, right? Well, stick with me.

First off, the setup: I needed to extract specific info from a pile of text files. These files were named all sorts of crazy things, but a bunch of them had “tom” and “east” somewhere in the filename or inside the file content. So that’s where “tom east” came in. I figured, “Hey, let’s see if I can pull all the files and data related to ‘tom east’.”
The initial plan: I thought, “Simple enough, I’ll just use some basic command-line tools.” So, I fired up my terminal and started with a simple grep
command to find the files containing “tom east” in their names. Something like this:
find . -name "tomeast"
That worked, kind of. It found some files, but not all of them. Turns out, some files only had “tom” OR “east” in the name, or maybe “Tom East” with a capital. Case sensitivity, ugh!
The first hiccup: I realized I needed to be smarter about this. Case-insensitive search was the key. I tweaked the find
command to ignore the case:
find . -iname "tomeast"
Better, but still not perfect. Some files had “tom” and “east” in completely different parts of the filename. Time for a different approach.

Digging deeper: Okay, filename searching alone wasn’t cutting it. I needed to search the contents of the files too. So, I combined find
with grep
to search within the files themselves:
find . -type f -print0 xargs -0 grep -i "tom east"
That command basically says: “Find all files (`find . -type f`), print their names separated by null characters (`-print0`), and then use `xargs` to pass those filenames to `grep`. The `-0` tells `xargs` to expect null-separated filenames, and `-i` makes `grep` case-insensitive.”
Now we were getting somewhere! This found a bunch more files that mentioned “tom east” inside. But then another problem popped up.
Problem #2: Encoding issues: Some of the files were using different character encodings. grep
was choking on them, spitting out weird characters and missing some matches. Argh!
The encoding fix: This was a pain. I had to figure out the encoding of each file and then tell grep
to use that encoding. I ended up using the file
command to try and detect the encoding, and then iconv to convert them all to UTF-8 before searching:

- First, tried to find out encoding:
file -bi filename
- Then, convert to utf-8
iconv -f original_encoding -t UTF-8 filename > filename_utf8
It’s a real headache and I had to write a small shell script to automate the encoding detection and conversion process. I don’t have that script handy right now, but the gist is: loop through the files, detect encoding, convert to UTF-8, then run the grep command.
Cleaning up the mess: After all that, I had a list of files that contained “tom east” somewhere. But the output was still messy – it included the filenames, the line numbers, and the actual lines that matched. I only wanted the filenames.
The final touch: So, I piped the output of the grep
command through awk
to extract just the filenames:
find . -type f -print0 xargs -0 grep -i "tom east" awk -F ":" '{print $1}' sort -u
That awk -F ":" '{print $1}'
part splits each line at the colon (“:”, which grep
uses to separate the filename from the matched line) and then prints the first part, which is the filename. The sort -u
sorts the filenames and removes duplicates.
The result: Finally, I had a clean list of filenames that contained “tom east” either in the filename or in the content. I could then feed this list into another script to process the files further. It was a pain, but I learned a lot about file encodings, command-line tools, and the importance of being specific with your searches.
Lessons Learned:
- Case-insensitive search is your friend.
- File encodings are a nightmare, be prepared to deal with them.
- Combining command-line tools can be incredibly powerful.
- Always clean up your output to get exactly what you need.
So, yeah, that’s the story of my “tom east” adventure. It might sound like a small thing, but it taught me a lot about data wrangling and the joys of command-line Kung Fu. Hope it helps you out too!