acedev003@rand_bytes-earth1:~$

Tweet ID's are More than Enough

TLDR; You can access a tweet with only the given tweet id

While scouring through the internet for suitable NLP datasets (tweets to be exact), one thing that crossed my eye was the fact the most datasets did not have the actual text content in them. Instead it was just a random number called as the tweet_id. I suppose this was most likely due to some policy restriction by twitter and as a way to reduce the dataset file size (Leaving the hard job of scraping the entire stuff to us poor souls).

One thing one wastes a lot of time initially was on the fact that twitter posts are of the form https://twitter.com/<username>/status/<tweetid> which leaves us with an unknown variable i.e. the username.

However, the fun fact is that twitter doesn’t even care about the username. You can put any existing username in that URL template, and twitter will automatically redirect to the actual username if that tweet id did not belong to the specified user.

For example, this is a post from the tweet /tunguz/status/1611051712479121408, but as we can see is does not have any affect because acedev003/status/1611051712479121408 also renders the same result ¯\(ツ)/¯.