If you want to work with a lot of Kiva's data, making hundreds or thousands of requests to the Kiva API can be overbearing — both for you and your network. As such, we make much of our data available through snapshots which are compressed into a simple singular download. The data is archived nightly so it is most useful for apps that don't require live data, such as data analyses and visuals. However, some applications might find these handy as a way to seed local data sources, supplementing the snapshots with calls to the Kiva API for the most recent data.
A data snapshot is composed of multiple files, delivered in a single compressed ZIP archive. Data snapshots are available in XML and JSON, just like responses in the Kiva API. For the most part, the format of the documents in the snapshots are the same as an API response, with a few exceptions.
The latest data snapshots are available in the format of your choice at the following URLs:
When you decompress and extract a data snapshot you'll have a collection of documents with the following structure:
At the root level of the snapshot is a collection of directories, one each for the type of data archived. For now, we simply capture detailed data about a singular object such as a loan or lender. In the future, there might be data collections for relational data, such as
lenders_to_loan. Each of these data collections is a series of numbered files in the chosen format, JSON or XML. Together all of these files make up the complete data collection as if we captured the data in a single file (e.g.,
lenders.json). The complete data set is broken up into multiple files so it is easy to process the data with conventional parsers.
The number of each file in the collection corresponds to the page of data in the series, much like pages of data accessed through the Kiva API. So,
lenders/2.json would be similar to
http://api.kivaws.org/v1/lenders/search.json?page=2. The only difference is that the page size used in data snapshots is much larger than that in the Kiva API.
As mentioned, the format of data in each snapshot document is similar to that of a response in the Kiva API. The main difference is the way the data segment is wrapped and how meta-data is expressed for each segment of the snapshot. Here's an example of the beginning of a sample document,
The structure begins with a header element which contains meta-data for this snapshot segment. We see the total number of elements in the data collection, 444938, as well as the date of the snapshot (which should be identical for all documents in the snapshot). We read from the header that size of each data page is 1000 elements (in this case, lender elements) so we know there are 445 total pages (ceiling of total/page_size). The page number of the document is also reinforced in the header.
Afterwards, follows an ordered array of lender elements, sorted from oldest to newest. Each of these elements is identical to a detailed lender record in the Kiva API.
The XML version of snapshots is a direct translation of the JSON format shown above. The root node used for XML documents is