Haystack Release 0.1 Devblog
To find a needle in a haystack.
Do you have valid and broken links that need to be fixed? Haystack will find those links and report them back to you! Under the instructions of my first Open Source assignment I have set out to develop a program to do such a thing.
When I first read the requirements of the assignment I was delighted to know that we were given a lot of freedom to go about this task however we saw fit. Though, when I actually started thinking about it… the freedom was a lot to handle. What language do I use? C++, Java, something that I have experience in? What libraries? How do I even check if a link has gone bad? In the end I went with Python, as a way to challenge myself and also because it interested me for a long time now.
I broke down the task into three main parts: reading from the file, parsing the file and finding URLs, and lastly validating those links. To read from the file we need to first specify a file for the program to open. This is done via the command line arguments which we can use through sys.argv which is imported from the sys library.
From there you simply pass the file name that we got from sys.argv and call open that file via codecs.open which comes from the codecs library.
This part was smooth sailing as it is very similar to the other object-oriented programming languages I know. Furthermore, Python syntax is intuitive and easy to understand so transferring those OOP skills over was not an issue. Next up, parsing the file and extracting URLs. Parsing the file was easier than I thought — regex was used to find URLs and was done using the re library.
Once I’ve obtained my list of URLs I passed it through my link validator. The method that checks and prints the status code of the links uses the requests library. There are three main results, either the link is valid (code 200), broken (code 400 or 404), or unknown.
Lastly, I added optional (but in my opinion very important) features such as error handling and help/usage prompts. When all was said and done, I think what I can say with 100% certainty is that the internet is your friend. Anytime I was unsure what to do or how to go about implementing something I simply did a quick Google search to see what options and libraries I had available. I’ve come to learn that there is no shame in asking for help, especially from others in the Open Source community.
I am happy with the results so far, I still have to share the finished product to the Open Source community to be checked for any bugs that I may have missed but other than that I think it is a refined tool that will be useful to me in the future.