Okay, so lemme tell you about this little project I tackled – Puerto Rican Baseball League stats. Sounds kinda random, right? But stick with me, it was a fun deep dive.

It all started ’cause I was bored, plain and simple. Scrolling through Reddit one night, saw some folks chatting about the league and how hard it was to find good, consistent data. Challenge accepted!
First things first: sourcing the data. Man, that was a pain. No clean API or anything. I ended up scraping a bunch of different websites. Some were official league sites, others were fan blogs. The formatting was all over the place – different date formats, inconsistent team names, you name it. It was a mess.
Next step? Cleaning that mess up. I used Python with Pandas, naturally. Wrote a bunch of scripts to standardize the data. Had to deal with a lot of manual cleaning, too. Spent hours renaming teams, fixing typos, and making sure the dates were all in the same format. It was tedious, but crucial.
Then came the fun part: actually analyzing the data. I was mostly interested in things like batting averages, home run leaders, win-loss records, all that classic baseball stuff. Again, Pandas to the rescue. I calculated all the basic stats and started looking for trends. Did home field advantage really exist? Were there any breakout players no one was talking about? That kind of thing.
I also messed around with some data visualization. Used Matplotlib and Seaborn to create some charts and graphs. Nothing fancy, just basic stuff like bar charts and scatter plots. But it helped me see the data in a different way and pick out some interesting insights.

Biggest hurdle? Definitely the data inconsistencies. Some websites would report different stats for the same game. Sometimes I had to cross-reference multiple sources to figure out what was actually correct. It was a real detective job.
So, what did I learn? Besides the fact that data cleaning is a necessary evil, I also learned a lot about the Puerto Rican Baseball League. Turns out they got some serious talent down there. And I got a chance to brush up on my Python skills, which is always a good thing.
Would I do it again? Probably. It was a good way to kill some time and learn something new. Plus, now I got some bragging rights when it comes to baseball trivia.
- Scraped data from multiple websites
- Cleaned and standardized the data using Python and Pandas
- Calculated batting averages, home run leaders, win-loss records
- Created charts and graphs using Matplotlib and Seaborn
Final Thoughts
Honestly, the project was more about the process than the final product. It was a fun little challenge that kept me busy for a few weeks. And who knows, maybe someone else will find the data useful. If not, at least I had a good time doing it.