Non-English climate science
Posted on 25 January 2013 by Ari Jokimäki
Today we are used to receiving new climate research written in English. That has not always been the case. There even was a time when English was a very minor language in science. Some time ago I started thinking that by concentrating on research written in English we might be missing lot of climate science, especially historically. I decided to take a look at the situation.

I used Google Scholar and Google Translator for searching papers containing the word "climate" in all languages supported by Google Translator. I recorded the number of hits for each language. Results of this are shown below in a table. Note that this analysis is very rough, so I suggest that the presented numbers should only be taken as directional, and that the big picture presented in the table is more meaningful. The resulting numbers have a lot of uncertainties, some of which I explain below. Here's the result table:
| Country/language | Word | Results |
| English/Latin | climate | 2550000 |
| Spanish/Italian/Portuguese | clima | 954000 |
| China-simple | 气候 | 614000 |
| Germany/Norway/Denmark | klima | 350000 |
| France/Romania | climat | 318000 |
| Russia/Serbia | климат | 93800 |
| Japan | 気候 | 49400 |
| Turkey | iklim | 43600 |
| Sweden/Poland | klimat | 34100 |
| Korea | 기후 | 33900 |
| China-traditional | 氣候 | 31100 |
| Netherlands/Afrikaans | klimaat | 24100 |
| Ukraine/Belarus | клімат | 23500 |
| Albania | klimë | 7600 |
| Arabic | مناخ | 6610 |
| Lithuania | klimatas | 6270 |
| Finland | ilmasto | 3980 |
| Persia | اقلیم | 3850 |
| Greece | κλίμα | 3500 |
| Esperanto | klimato | 3480 |
| Czech | podnebí | 2830 |
| Vietnam | khí hậu | 1390 |
| Azerbaijan | iqlim | 883 |
| Hindi | जलवायु | 821 |
| Estonia | kliima | 584 |
| Slovenia | podnebne | 575 |
| Slovakia | podnebie | 346 |
| Thailand | ภูมิอากาศ | 468 |
| Latvia | klimats | 255 |
| Hebrew | האקלים | 244 |
| Iceland | loftslag | 179 |
| Swahili | hali ya hewa | 113 |
| Yiddish | קלימאַט | 83 |
| Welsh | yn yr hinsawdd | 28 |
| Armenia | կլիմա | 18 |
| Irish | aeráide | 12 |
| Urdu | آب و ہوا | 3 |
| Gujarati | આબોહવા | 1 |
There are 2,550,000 hits in the English/Latin languages. Non-English (excluding Latin of course) languages have 2,235,364 hits. So, it seems that almost an equal number of climate papers exist in English as in non-English languages. Some languages are missing from the table because they didn't produce any hits (and of course lot of others that are not supported by Google Scholar).
Like I mentioned above, the numbers have a lot of uncertainties. Google Scholar returns a lot more search results than just peer-reviewed papers. There are books, reports, and even some blog posts. This distorts the resulting number of hits. This seems to be a substantial problem for example in the search results for my native language, Finnish.
Another source of error is that Google Scholar returns search results for both author names and journal names. This is a big issue for example in German results. There seems to be lot of papers published by many authors who have the last name "Klima". 350,000 hits for the German language therefore seems to be off by quite a lot. A search for "Klimawandel" (climate change) resulted in 21,900 hits. English "climate change" gives 1,570,000 hits, so the resulting ratio of climate/climate change = 1.62 for English. Assuming the same ratio for German, it would result in 21,900 * 1.62 = 35,600 hits for "klima" (climate). However, this feels somewhat too low considering that German is a common language in science, and that other comparable languages have many more hits (for example, French has over 318,000 hits - but see below for the need to correct French results). Also, most of Hungary's results seem to be from author's names.
Yet another problem is that not all of the search results are in the language that was intended. This is partly due to the issue mentioned above about Google Scholar returning results both for author and journal names. There are also occasions where another language has the same word (or close enough for Google Scholar) in another meaning, or has an author's name matching the search word. French search results, for example, includepapers in other languages. According to the first result page (yes, I know it's not a very big sample...), French results are 20% non-French. This would reduce the number of French language hits to 254,400.
Albania's word for climate is "klimë", but almost all search results are for "klime", so Google Scholar sometimes gives additional results for words that are close to the actual search.
Search results might also not be climate related. The word "climate" has other, non-meteorological, meanings. Such as the political climate, or a climate of fear. The possibility for this source of error might be even worse for some other languages.
There are also duplicate entries for some papers. And these probably are not all error sources. Some non-English papers have also been published in English (or vice versa), so the ratio of non-English and English papers (= 0.87) might not be accurate. Additionally, some non-English papers have English abstracts.
So, it seems that despite all of my search results, there are not 5 million climate papers out there. But there are a lot of them - and quite a few of them might be in a language other than the English and Finnish that I understand. It sure would be nice to be able to read all those papers when needed.

Arguments




























To estimate the probability of a "lost" foreign work, I need to have an idea of what "climate science" is and what is required to make a significant contribution. First, I think "climate science" is not a fundamental science: it can involve all the basic and second-level sciences (physics, chemistry, biology, geology, ...; it's hard to think of a discipline that would never be relevant to climate science) and also many quasi- or pseudo-sciences (economics, history, futurology). Further, work in climate science can be of different types: theoretical, theoretical-phenomenological, observational (both contemporary and historical; I class ice core data as historical-observational, for example), and experimental.
For reasons relating to need for resources and for awareness of and communications with other research and researchers, I conclude that the probability of a "lost" foreign work is generally low, being highest in the area of contemporary observational work.
PS. On Google Scholar main page, click "Settings" (upper right), the select "Languages" (on the left), and you can choose to "Search only for pages written in these language(s)".
New Zealand seems to have few, and Ireland (where I live) has no political deniers at all, and the few in the media just echo their overseas counterparts. Objectors to wind farms are more vocal here, though as yet have no political muscle.
As a whole, clima now gives 968,000 hits. Narrowing the search to Italian only gives 90,300 hits, Portuguese 277,000, and Spanish 421,000. Together these make 788,300 hits, so there's about 180,000 hits missing from the whole clima search. I assume that most of these are false hits from other languages due to author names or other issues mentioned in my post above.
I tried this with English search but it gave peculiar result: whole search gives 2,540,000 hits but when narrowing to English pages only, the search gives 2,550,000 hits.
On your comments about probability of lost foreign work being generally low, that might be the case, but I have checked this issue further to see that there exists substantial body of scientific literature in other languages that is not available in English. For example, in my searches I have seen hits from Russian journals that don't even seem to be available online. These journals have been publishing since 1940's. One of them was called Hydrology and Meteorology, if I recall correctly.
1. Select "Advanced Search" (to the right of the main search box.
2. Put the desired search text (e.g., "Klima") in the box labeled "all of the words"
3. Put the non-desired author name, preceded by a - (e.g. "-Klima") in the box labeled "Return articles written by".
And now you have articles with the keyword "Klima" but not with author "Klima".
To get articles about Klima AND also written by Klima, put "Klima" in both boxes.
Bill, I know that there are some journals in Finnish, but I don't know what the situation is with same results being available also in English. English version of the Russian journal is a good find. I hope they offer also the past several decades of that journal at least in abstract level some day.
It is quite fortunate that researchers write their stuff in different languages so that at least some of their results are available also in English. For example, there is a Russian climate science pioneer Mikhail Budyko who published some work in the above mentioned Russian journal but also published in English.
Another interesting language in this sense might be China with many times more search results than Russia, and perhaps behind even stronger language barrier.
Many non-English areas publish local reports that are not peer reviewed. The IPCC uses this grey literature in locations where peer reviewed material is not available. They should also pick up non-English. Can you suggest literature you think is relevant that was not included in the last IPCC report?
Also, if we continue with Budyko, he published both in English and in Russian. For me it would be nice if his Russian output would be available too in some format. One (or several) example where results were caught by English world doesn't prove that there are no important results hidden in non-English journals. Gladly scientists have largely both published in many languages and picked up results from other languages, so the situation might not be that bad either. This needs further study.