A Digital India Initiative
-A A +A

Publishing & Management of Resources (Datasets/Apps)

Contribution of datasets/apps is by login into a simple web based Dataset Management System.

Resources to be contributed under Departments are processed through a predefined workflow, ensuring compliance with government policies. Data Contributors nominated by PMC or department are authorized to publish datasets in open format on Pune DataStore Platform.

Once the Contributor is created by the Chief Data Officer, a mail is sent to the mail id of the contributor. The Contributor then can login and contribute datasets along with its metadata for further approval by the Controller. However, the responsibility on the relevancy and quality of datasets published on the portal rests with Chief Data Officer.

  • 1.   View & Respond to Queries on Published Datasets

    Citizens can browse, search, filter, sort and access the datasets on the OGD Platform. Citizens also have the option to send their queries and feedbacks about the published datasets. This feedback would be available on the dashboard of the Chief Data officer to take further necessary action.

  • 2.   Respond to Suggestions for new Datasets

    The portal has a strong Citizen Engagement feature built in. While browsing through the catalogue of datasets, if one is not able to find the dataset which is of interest to him then he can request for the same through suggestions module. Suggestions already made for particular datasets are displayed and one can also endorse the same. The suggested list i.e. the requirement for new datasets is sent to the respective department‘s data contributor. This would facilitate the data contributor to prioritize his release of datasets on the platform. They are expected to send in their response on the same.

  • 3.   Review Analytics & Plan

    The Dashboard would be available for the datasets contributed by all the contributors of that Department. Feedback related to datasets would also be available along with the suggestions from citizen with respect to the requirement of new datasets. This feature would facilitate him to watch the analytics and accordingly plan his course of action.

  • 4.   DOs for Data Contribution and Approval

    • Identify and prioritize the release of datasets; categorize the type of access granted for them and publish as many high value datasets as possible.

    • Contribute datasets which are in the Open List and do not fall under the Negative List.

    • Ensure that the quality standards are met i.e. accuracy, free from any sort of legal issues, privacy of an individual is maintained and does not compromise with the National security.

    • Ensure that the datasets being published through a workflow process are in compliance with the policy. Details on original source of the dataset and methodology of the data collection should be provided in metadata.

    • Prepare and contribute the metadata in predefined format for the Catalogs and Resources (Datasets/Apps). The key metadata elements are Title, Description, category, Sector/Sub-Sector, Dataset Jurisdiction, Keywords, Access Method, Reference URLs, Data Group Name, Frequency, Granularity of Data and Policy Compliance. All the metadata elements must be filled with utmost quality and ease of use.

    • Pricing of data, if any, would be decided by the data owners as per the government policies.

    • Ensure that data being contributed to the OGD Platform are in machine readable or in specified open data format only. The advisable formats are:

      • CSV (Comma separated values)
      • XLS (Spread sheet - Excel)
      • ODS (Open Document Formats for Spreadsheets)
      • XML (Extensive Markup Language)
      • KML (Keyhole Markup Language used for Maps)
      • GML (Geography Markup Language)
    • Ensure that the data being uploaded on the portal is as complete as possible, reflecting the entirety of what is recorded about a particular subject and is de-normalized. The datasets also should be optimized by adding redundant data or by grouping data before uploading.

    • Priority should be given to data whose utility is time sensitive. Real time information updates would maximize the utility the public can obtain from this information.

    • Replace any Not Available, Not Reported or missing values in the data with ‘NA’.

    • Metadata that defines and explains the raw data should be included as well, along with formulas and explanations for how derived data was calculated.

    • Keywords must be defined in data catalog to make it search friendly.

    • Provide link to the reference documents (if any) or website for detailed information and explanation on the method of calculation or source of data.

    • Prioritize the release of datasets and take relevant action on the basis of feedbacks and suggestions received on the portal m from citizen‘s pertaining to the Department.

  • 5.   DON’Ts for Data Contribution and Approval

    • Don‘t contribute datasets which fall under the negative List e.g. the datasets which are confidential in nature and are in the interest of the country’s security.

    • Don‘t impose ‘Terms of Service’, attribution requirements, restrictions on dissemination and so on, which act as barriers to public use of data.

    • Don‘t impose cost on the public for access of datasets, as imposing fees for access skews the pool of who is willing (or able) to access information.

    • Don‘t publish hand written note, as it is very difficult for machines to process. Scanning text via Optical Character Recognition (OCR) results in many matching and formatting errors. Information shared in the widely used PDF format is very difficult for machines to parse. Hence, the data in these formats should be avoided.

    • Data in non-Unicode formats should be avoided.

    • Don‘t contribute datasets with any special characters (e.g. @, %, $, &, etc.) or missing values.

    • Don‘t provide any explanation, including the method of calculation or source of data in data file to be attached in the web form.

Suggest a Dataset