Adhar(UIDAI) dataset is a wonderful data provided by Indian government.
Things I like about this dataset:
- It fits into relation schemas pretty well
- Great way for beginners like me to explore Data Science basics using latest tools like ipython, Pandas, Anaconda etc.
- This dataset is being used by UDACITY courses (Introduction to data science) see references for videos
- It is a real-time data, it updates every other day
- You can use REST api calls to get the data for a particular day, particular month OR just the latest data.
- Its probably going to be a huge data thinking of India’s population.
- This data can be mixed with some other interesting datasets provided by http://data.gov.in/
- Its size is around 30 GB when summed up, but you can download in bits and pieces and play with whatever size you want.
You can see this notebook in following ways.
1. If you don’t have iPython but want to quickly see this, Follow these steps:
- Go to http://nbviewer.ipython.org/
- Enter following: https://gist.github.com/GauravBhardwaj/cc05a5dc255d03c1d86a
2. If you have iPython installed and want to run this notebook quickly to get the dataset, follow these steps
- download file: http://1drv.ms/1oaBXZd
- Run this notebook on your machine: ipython notebook ParseUIDAIdataset.ipynb
3. If you don’t want to do anything, give me your address i will personally come and run it on your machine (HaHA) Just kidding ….both above mentioned ways should work.
- Udacity lecture1: https://www.youtube.com/watch?v=PxS2Z7g2Xdc
- Udacity lecture2: https://www.youtube.com/watch?v=c5sxdOU7Xuo
- To Understand dataset: https://data.uidai.gov.in/uiddatacatalog/dataCatalogHome.do
- To install iPython: https://blog.safaribooksonline.com/2013/12/12/start-ipython-notebook