SQL Interface - HOWTO
Ok, this isn't the most user-friendly interface. Other than building lots of pre-defined queries there's not much I can do to make it usable for the average person. But if you've got some programming experience, especially with databases, hopefully the information below will help you get the most out of the data.First, take a look at the entity relationship diagram to understand the table structure. Although most of the data is directly from Kiva, some of it is derived from the data or from external sources.
The database uses Oracle 10g. For a good SQL tutorial check out SQL Zoo, and for a thorough description of the syntax you might want to use the SQL Reference. The database runs on an old PC, so it may not always be available or very responsive.
Most of the data in the tables map directly to data available through the Kiva API. Many of the fields and terms are explained on build.kiva.org. Below are some explainations of the data, as well as areas where my database differs from the Kiva API. This is not a comprehensive list of the differences, just some information to make the database more useful.
- All dates are in GMT.
- The relationship between Country and Region comes from the Partner API.
- Not every loan has a longitude and latitude at the city level. Some coordinate are at the country level, and these values are used to populate most of the Country latitude and longitude values. For countries without this information, I manually chose some coordinates that seemed to be in the center of the country.
- Currency names are from http://en.wikipedia.org/wiki/ISO_4217
- loansAvailableWhenPosted was generated from the LOAN data since calculating this data takes many hours. Each loan is compared with all other loans where the first loans' posted_date falls between the other loans' posted_date and funded_date. Finding the amount of time to fund a loan is very complicated, and it helps to know what other loans it was competing with.
- MV_Months is a materialized view that is helpful for grouping the data by month or quarter. It is refreshed everyday, and starts with the earliest data and ends with the current month and quarter. It is much faster to use these results than to re-query for them when needed. For some queries it is only useful to look at continuous data, and the column goodDate is used to mark when the data becomes useful for comparison. Without it many of the charts would look jagged because there are some months without any data.
- Many of the functions need to aggregate strings, which unfortunately is a task that SQL does not handle well. There are dozens of different approaches to this problem, and my preferred solution is the fastest but also requires some complex features. For example, this SQL will generate regions and a pipe-separated list of countries in that region:
select region, to_string2(cast(collect(name order by name) as varchar2_ntt), '|') countries from country where region is not null group by region
The to_string2 function can be used like any other grouping function. - The visualizations are created by views. You can view the code behind the views here.