7 Ekim 2012 Pazar

The Internal Classification & Migration of Turkic Languages

The Internal Classification &
Migration of Turkic Languages

Version 7.32

v.1 (04/2009) (first online, phonological studies) > v.4.3 (12/2009) (major update, lexicostatistics added) >
v.5.0 (11/2010) (major changes, grammar added) > v.6.0
(11-12/2011) (major corrections to the text; maps, illustrations, references added) > v.7.0 (02-04/2012) (corrections to Yakutic, Kimak, lexicostatistics; the chapter on Turkic Urheimat made into a separate article; grammatical and logical corrections)


The internal classification of the Turkic languages was rebuilt from scratch based upon the phonological, grammatical, lexical, geographical and historical evidence. The resulting phylogeny mostly coincides with previous common taxonomic systems but contains several unusual points. Separate articles with meticulous lexicostatistiocal research and geomigrational analysis are viewed as part of the same work.

1. Introduction
2. Collecting factual material
3. Making Taxonomic Conclusions

    4.The Resulting Internal Classification of Bulgaro-Turkic Languages
    5. References and sources

    1. Introduction

    The present study of the Turkic languages (2009-2012) was started as brief online notes, but has gradually grown into a series of online publications. The study is mostly an original research that includes relatively few references to previous theoretical work. Most conclusions were based upon factual evidence collected from dictionaries, grammars, language textbooks, native speakers on the web, sound and video fragments, books and articles with detailed descriptions of specific languages. The resulting conclusions rarely directly refer to historiographic opinions or draw from assumptions produced by other researchers, rather attempting to build a logically consistent view of the spread of Turkic languages and their internal classification based upon nearly independent and relatively comprehensive step-by-step analysis. Nevertheless, the author deeply appreciates the extensive input from people who worked on the vast amount of turkological literature dedicated to numerous Turkic languages, as well as those who helped directly or indirectly by providing corrections and valuable notes by email or through web forums, without whose interest and collaboration this work would never have come to life.
    The current article provides all the linguistic argumentation and other theoretical studies concerning the internal classification of Bulgaro-Turkic languages. The rest of the work consists of the following articles:
    The Lexicostatistics and Glottochronology of the Turkic Languages (2009-2012), a meticulous lexicostatistical study of Swasdesh-210 lists dating the Turkic Proper split to about 300-400 BC, and Bulgaro-Turkic split to about 900 BC. [The split of Proto-Yakutic, however, has been understudied due to difficulties in determining loanwords from an unknown adstrate.]

    The Proto-Turkic Urheimat & The Early Migrations of the Turkic Peoples
    a detailed geomigrational analysis of the early Turkic branches, with the Urheimat area of Proto-Turkic Proper being positioned near the Altai Mountains, and Proto-Bulgaro-Turkic in northern Kazakhstan along the Irtysh drainage basin. The work explores the associations with major archaeological cultures of the Bronze and Iron Age period in West Siberia, such as Pazyryk for Proto-Turkic Proper and the Andronovo archaeological horizon for Proto-Bulgaro-Turkic.
    The Turkic Languages in a Nutshell (2009-2012), a brief outline of the final classification and some of the stable results obtained in the current research, including notes on history, ethnography and typical linguistic features, which essentially makes it an introduction into turkology for beginners.

    1.1 Preliminary notes on the reconstruction of Proto-Turkic

    The proto-language reconstructions are often based entirely on the supposed readings of the oldest attested family representatives. However, such an approach can result in erroneous outcomes. When reconstructing Bulgaro-Turkic proto-forms, we should use special lineal coefficients that would be drastically different from the old-fashioned "Old-Turkic-for-all" model.
    To make a proper reconstruction of a proto-form, we should roughly assign about 50% to Bulgaric and about 50% to Turkic Proper, and then more or less evenly divide the second half among the ost archaic representatives from the main branches, e.g. (1) Proto-Sakha, (2) Proto-Khakas + Proto-Kyrgyz, and (3) Proto-Orkhon-Karakhanid + Proto-Turkmen, hence each one of the main Turkic branches would receive about 17% (see the classification dendrogram above). Even though these coefficients and divisions are not particularly precise and may be modified in a number of ways, this example has been provided as a first-approximation approach to address the potential Old-Turkic-centristic attitude, which supposedly claims that "nothing that's not in Old Turkic could exist in Proto-Turkic". To the contrary, the current revised model states that Gökturk Old Turkic was just one of several early Turkic branches, and it is hardly any more important for reconstruction purposes than 17% or less.
    The common objection to this suggestion is that Old Turkic is an ancient language, therefore it is more suitable for historical reconstruction. However, generally speaking, an ancient language by itself can only be seen more suitable for reconstruction or classification purposes, if several conditions are met, such as: (1) it is a unique language close to a proto-state and no alternative branches exist; (2) it is so well-attested that its data are completely reliable and no significant misinterpretations could occur from occasional mistakes in ancient writing, reading (e.g., from abraded petroglyphs), translation, verification of the material, etc; (3) the script closely and adequately reflects the pronunciation and we know full well how to correctly reconstruct the pronunciation from that script.
    Obviously, Orkhon Old Turkic fails to meet the first criterion, it barely gets in with the second one, and may have quite a few dubious points with the third one. In other words, Orkhon Old Turkic may just be insufficiently old or much too off-centered for something close enough to the proto-state. Moreover, it may have too little material available for the solid attestation and interpretation of some of the finer linguistic points. To put it in another way, Old Turkic is not as well reconstructed as, say, Latin and Greek in the Indo-European studies, therefore one should not confuse methodological patterns established for the Indo-European reconstruction with those in other families: an old language is not necessarily always good enough.

    An example from the Revised Model: the reconstruction of *S-

    The above reasoning can be exemplified by the following analysis and reconstruction of the Proto-Turkic *S- (this symbol should be seen herein as just an arbitrary way to designate the phoneme under consideration). A very common error resulting from the Turkish-for-all or Karakhanid-for-all model is the conclusion that such modern words as Turkish yer "place, earth", yol "way", yetti "seven" were pronounced exactly the same way in Proto-Bulgaro-Turkic. This idea is very common even among turkologists outside Turkey, and seems to go as far back as the Mahmud al-Kashgari's classical work.
    Before proceeding with any further argumentation, we should confine ourselves only to the material internal to the Turkic languages, the Altaic and Nostratic languages being a completely separate issue, which cannot be regarded herein at any length. This method can generally be called as internally-based reconstruction vs. full reconstruction.
    However, it is evident already from the consideration of non-Seljuk and non-Karakhanid languages, that the existence of /y-/ in Proto-Turkic is quite statistically improbable. Consider the data from the following table:

    The Reconstruction of Proto-Bulgaro-Turkic *S
    Subgroup Phoneme
    NB: /j/ or /J/ stands for an affricate, as in English
    Dunai-Bulgar, Kuban-Bulgar d'; zh-/ch- Dunai-Bulgar texts were written in Cyrillic, though their originals had possibly been written in Greek.
    Bulgaric words in Hungarian are written with gy-, which should be read /J-/ (as in Italian that provided basis for the orthography) (see Rona-Tash, and A. Dybo). Also, some Hungarian words with the initial sh-, such as shel (shelet) "wind" (cf. Chuvash s'il). Also, cf. the borrowing zhenchugê "pearls" into Old Russian (att. 1161) and gyongy into Hungarian
    Chuvashs'-palatalized, soft
    Turkic Proper
    Yakut, Dolgans-,
    > h-
    Aspirated between vowels,
    hence /h/ in Dolgan due to the Evenk substratum.
    Tuvan, Tofa ch'-slightly palatalized
    Khakas, Shor, Chulymch'-, n'- slightly palatalized;
    sometimes an irregular /n-/ before /-i, -ï/
    Kumandy (North Altai)ch'-, n'- as in Khakas
    Standard South Altaid'-/ j-a palatalized soft /dj'/, though pronounced much like English /j-/, maybe shorter and with more palatalization.
    Karakalpak, Kazakh, Kyrgyzzh- < j-
    (west to east);
    Also, the English-type /J-/ in the eastern dialect of Kazakh probably due to the contact with the Altai-type /d'-/,
    but /zh-/ in the western dialects due to contact with y-type languages.
    Although at least one speaker suggested that /J-/ (voiced /ch-/) is in fact original even
    in central Kazakhstan, whereas /zh-/ developed in the course of the 20th cent. due to a Russified spelling and pronunciation. That can be true in some cases due to mass bilingualism in Kazakhstan.
    Similarly, this suggestion is partly supported in Melioransky's textbook (1894), who wrote that this sound would be similar in pronunciation to the Russian /dzh/ with "a weak beginning", whereas "the pre-sound ("d") entirely disappears in the western part of the steppe". Consequently, */j-/ rather than /y-/ is reconstructed for the early Kazakh.
    Also, note /J-/ but /-VzhV-/. between the vowels;

    English-type /J/ in Kyrgyz
    Kazan Tatar
    and most other Kimak-Kypchak
    j'- before -e,-i
    before -a, -o, -u
    Other Kimak-Kypchak languages may have been influenced by Kazan Tatar in the course of the 20th century.
    Al-Kashgari (1072) reports /j-/ for Kypchak.
    A speaker of Kazan Tatar insists that in his dialect (South Eastern Tatarstan) a soft /j-/and /y-/ are in allophonic distribution
    North Crimean Tatarj-, sometimes y- Mostly, always /j-/ in the northern (steppe) dialect, though /y-/ in numbers and a few other common words (yaxshi), probably due to borrowings at marketplaces and the like (?)
    Also, /j-/ is reported in Yevpatorian Crimean Tatar
    Karachay-Balkar(1) j- and ch-;
    (2) z- and ts-
    There are two different dialects in Karachay-Balkar.
    No signs of /y-/ even in marginal dialects is reported.
    Early Kypchaky-Attested as /y-/ in the Armenian and Mamluk sources.
    Yughur y-, sometimes tsh'-There are a few reports from Tenishev about /tsh'-/, as if in Mandarin, but mostly /y-/ (which could be either an allophonic distribution or an unknown dialect of Yugur)
    Salary-, sometimes dzh'-Just as in Yugur, Poppe mentions a few words from Potanin materials, where /y-/ is irregularly rendered as /dzh'-/ (= /j-/), eg. dzhigirme : igermi (twenty).
    Transoxanian Oghuz (c. 11th century)j- and y-Confusingly attested as both /j-/ and /y-/ by al-Kashgari, but /j-/ is more certain.
    Turkmeny- < *j-(?) Because of the attestation of /j-/ in Transoxanian Oghuz, the accepted source of the Seljuk languages, we should deduce that /y-/ may be a later development, for instance, due to the Karakhanid, Chagatai and Uzbek influence.
    Azeri0- < y- A regular loss of /y/, as in üræk < yürek
    Turkishy- In some instances, /y-/ may even be weakened further or disappear, as in Azeri, e.g. /biliyor/ "knows" > /bilior/ in the real pronunciation.
    Orkhon Old Turkic (c. 9th century)?Commonly interpreted as /y-/, but no exact evidence
    Karakhanid (11th c.)y-Clearly attested as /y-/ in al-Kashgari's work
    Uzbek, Uyghur y- < *zh-;
    (Kypchak Uzbek)
    j-, y- (Uyghur)
    Presently, written /y-/ probably, due to the Karakhanid influence; originally, probably /zh-/ or /j-/ because of the close relatedness to the early Kazakh-Kyrgyz-Kypchak (see below). The /j-/ phoneme is found in the Kypchak dialect of Uzbek (as in jaxshï : yaxshï "good").
    Interestingly, Uyghur mostly has interchangably /j-/ and /y-/ in an allophonic distribution.

    The table shows that the pure /y-/ pronunciation is attested only within the following subtaxa:

    (1) in the languages historically connected with the Orkhon-Karakhanid and Seljuk subtaxa, though there seems to exist /y-/-to-/J-/ allophonic distribution in Uyghur, some Uzbek dialects and some Oghuz dialects;

    (2) partly, in Yugur and Salar, which also belong to the southern Orkhon-Karakhanid habitat and may have been contaminated by it, considering they are located along the Silk Road outposts, where the Tian-Shan – China migrations were a very common thing. (Though an allophonic /y/-to-/J/ distribution there is also suggested for some cases).
    (3) partly, in /ya-/, /yu-/, /yo-/ syllables, in the languages descending from the late expansion of the Golden Horde, such as Kazan Tatar (but not the early separated Kimak languages, such as Karachay-Balkar). Even in Tatar, many speakers still report an allophonic distribution, therefore a clear-cut /y-/ exists mostly in writing and the recently Russified speech rather than in older dialects or geographically marginal languages, such as North Crimean Tatar, Eastern Bashkir, etc. Moreover, we still have "Jil", not "yil" (wind), even in standard Kazan Tatar.

    (1) Only Orkhon-Karakhanid and its neighboring languages seem to have a clear-cut historical attestation of /y-/, whereas the majority of the early separated and well-isolated branches either get jumbled data or seem to be clearly going back to something like a strongly palatalized sibilant /s'-/, /J-/ or similar.
    This provides a statistical argument for our conclusion: there are more separate language branches that originally had had an /s'-/- or /j-/-type phoneme than those that finally developed a /y/-phoneme. To put it in other words, it is statistically implausible that the supposed /y-/ > /j-/ mutation would have occurred simultaneously and independently in many or several historical branches.
    (2) As we can see in the fig. below, the distribution of the y-type phoneme seems to be located outside of the main historical diversification area of Turkic languages, therefore it appears to be a recent phonological mutation, apparently linked to the migration of the Orkhon-Karakhanid and Oghuz languages, which again implies that the development of /y/ might have been a rather unique phonological innovation in Orkhon-Karakhanid Old Turkic. This provides us with the phono-geographical argument: only the J-type phoneme seem to be distributed near the putative homeland area.

    The distribution of the /J/ and /y/ phoneme in the Turkic languages
    As to the existence of the allophonic /y-/-to-/J-/ distribution in the Kimak-Kypchak-Tatar languages of the Golden Horde, such as Kazan Tatar, it may be explained as an early Oghuz influence. As we will show below, the Golden Horde languages and Oghuz share many features at several levels, therefore this type of borrowing is well corroborated by other similar matches.
    (3) Moreover, if /y-/ were present in the proto-form, we would rather observe a distribution of phonological variations of the semi-vowel /y-/ (not /J-/): e.g. we would observe /y-/, /i-/, /0-/, /ê-/, /l'-/, /J-/, /zh-/in the most archaic and highly diversified Siberian branches in the east (near the historical homeland of the Turkic languages), but what we do observe there are rather phonological variations of the palatalized consonant /s'-/: /s'-/, /s-/, /h-/, /ch'-/, /J-/, /zh-/, /d'-/, /ni-/, /y-/. On the other hand, the zero phoneme resulting from the loss of y-, that should be expected as a natural outcome of such a diversification, is only present in the westernmost languages, such as Azeri (ulduz < yulduz, il < yil), and, partly, in Turkish (cf. ïlïk, but Turkmen yïlï "warm"), which marks the /y-/-phoneme as a relatively recent and rather westernmost phenomenon connected with the spread of the Seljuk-Oghuz languages. This provides us with a phonological diversification argument: if /y/ were original, there would be predictable secondary sound changes in the eastern, early diversified branches, which are in fact absent.
    Therefore, from the evidence internal to the Turkic languages alone, we may conclude that the *S- proto-phoneme in question can be placed somewhere within the set of sibilants {/s'-/, /s-/, /h-/, /ch'-/, /J-/, /zh-/, /d'-/}, but could not have been similar to the /y-/ semivowel as in modern Oghuz-Seljuk langauges.
    Moreover, the additional evidence from the Altaic and Nostratic languages, which has not been discussed herein, points to a highly palatalized /s'/, similar to the one in Chuvash, which allows to reconstruct *s'er "place, earth", *s'ol "way", *s'ettê "seven" for the middle and late Proto-Bulgaro-Turkic, with the pronunciation of this palatalized /s'/ probably being similar to /sh/ in modern Japanese.
    Actually, this view on the reconstruction of the Proto-Turkic *S- is hardly novel and has long been expounded several times by different authors, such as A.N. Bernshtam (1938), S.E. Malov (1952), N. A. Baskakov (1955), A.M. Scherbak (1970), as well as by the authoritative Russian edition, usually abbreviated as SIGTY [Sravnintelno-istoricheskaya grammatka tyurkskikh yazykov (SYGTY). Pratyurkskiy yazyk-osnova. Kartina mira pratyurkskogo etnosa po dannym yazyka. ("The Comparative Grammar of the Turkic Languages. The Proto-Turkic Language. The Worldview of the Proto-Turkic Ethnic Group based on the linguistic data."), Moscow (2006)].
    As an additional argument, the authors of SIGTY add that since other sonants such as *r- and *l- were atypical in the word-initial position, there is no reason to believe that *y-, as a semi-vowel, was there, either.

    The opposite view, which mostly goes back to Radlov's work in the end of the 19th century is usually based on the following incorrect presumptions (1) that the Karakhanid Old Turkic of Makhmud al-Kashgari is equal to all of the Turkic languages (in other words, that Old Turkic = late Proto-Turkic); (2) that Orkhon Old Turkic has been correctly and uncontroversially reconstructed from the script and it reflects /y-/; (3) that the high level of differentiation among different Turkic subgroups can be ignored, including the evidence for the strong differencies in the Siberian languages and Chuvash — in this approach the evidence from Kimak-Kypchak-Tatar languages, for instance, may play the same role as the evidence from Chuvash, and indeed this was the situation in Russian and European turkology until the beginning of the 20th century, when most Turkic languages were officially viewed as merely dialects of each other.

    2. Collecting factual material
    Most 19th century's turkological classifications were originally built upon phonological criteria alone, then grammatical features were slowly added, whereas detailed lexcicostatistical analysis seems to be the thing that appeared only in the beginning of the 21st century. In this chapter, we will briefly summarize phonological, grammatical, and lexical material that were analyzed in detail in this study, as well as other recent turkological classifications.


    2.1 The comparison of lexical features (lexicostatistical research)

    Beginning of the 21st century, several authors attempted to build purely lexicostatistical studies of the Turkic languages.

    Starostin (1991)
    Sergey Starostin had included very detailed 110-word lists for 21 Turkic languages in his book [Altajskaja problema i proiskhozhdenije japonskogo jazyka (The Altaic Problem and the Origins of the Japanese Language), Moscow (1991)]. These lists apparently were later reintegrated into the Starling database, and were used by other researchers.

    Dyachok (2001)
    The work was published online as brief preliminary notes. In the beginning of his concise article, M. Dyachok [pronounced: D-yah-chOk] reminds the reader of the old geography-based Samoylovich classification (1922), which has similar results, then and performs the lexicostatistical and glottochronological analysis of the 13 major languages. As a result, the Turkic languages were subdivided roughly into merely four basic groups (1) Bulgaric (2) Yakut, (3) Tuvan, (4) Western (= any other), which conforms to the idea that their area of maximum diversification was located somewhere in the east.

    Dybo (2002, 2007)

    The study of Anna Dybo [pronounced: AHN-nah de-BAW] was first published in 2001 as part of the articles collected in SIGTY (= Sravnitelnaja grammatika tyurkskikh jazykov (The Comparative Grammar of the Turkic languages), which is a large multivolume Moscow encyclopedic edition with detailed cross-comparative descriptions of morphology, syntax, vocabulary, semiotics and other aspects of Turkic languages). Then, it was republished in 2007 as part of a separate book [Anna Dybo, Lingvisticheskije kontakty rannikh tyurkov. Leksicheskij fond. (The Linguistic Contacts of the Early Turks: the Lexical Fund), Moscow (2007)].
    The study cites Dyachok as a recent lexicostatistical paper and then briefly describes the methodology, "All the languages, for which 100-Swadesh lists could be collected through written sources, were included into our lists. The 100-word Yakhontov-Starostin lists were used, because they allow better accuracy [= than the classical Swadish-100]; they were processed according to Starostin's methodology by excluding the recognizable borrowings and then applying the STARLING program..." As a result, the following dendrogram was obtained:

    The lexicostatistical phylogeny of Turkic languages by Anna Dybo 

    Dybo, Anna, The Chronology of the Turkic Languages and the Linguistic Contacts of the Early Turks (2006)

    There also exists a second version of this table that drastically differs from the first one, because of some kind of procedure that was applied to synonyms. This is slightly confusing and may result in the underestimation of the table's significance, however the former dendrogram partly matches the outcome obatined in other investigations. Apart from such unconventional points as (1) the splitting of Turkmen and Turkish between two different taxa, (2) the positions of Yugur and Salar, (3) slightly misplaced Kazakh (which cannot be directly related to Uzbek) and Uzbek (which is known historically to be related to Uyghur), it is in fact in relatively good correspondence with other studies. However, the glottochronological part based on Starostin's formulas should be taken with a grain of salt.
    It should also be noted that the use of shorter 110-word lists results in lower statistical robustness than in the current series of publications that uses larger 215-word lists, however, this work has an advantage of representing a greater set of languages, especially those of Altay-Sayan habitat, which are normally underestimated in other studies.

    ASJP (2009)

    As an example of phonostatistical research that merits mentioning, here's a preliminary (simplified) first-approximation phonostatistical dendrogram (gif) of the Turkic languages composed by the Automated Similarity Judgment Program (04.2009) for most languages of the world. The study was based on a simple 40-word list. Many branches seem to be mispositioned, apparently due to certain limitations of the ASJP's early approach, however you can see the early separation of Proto-Chuvash, then Proto-Oghuz, and then other languages, which accords with the conclusions obtained in the present and other studies.

    Darkstar (2009, 2012)

    In a similar fashion, the author of the present publication decided to use the readily available 200-word Swadesh lists from Wiktionary.org in his independent lexicostatistical analysis. After the great amount of checking and correcting the available material, as well as building up his own lists for certain languages (such as Khakas, Tuvan, and Altai) (2009), composing a php-program to perform all the routine calculations, and then performing additional verification and adding some new lexical material thus expanding the lists to 215 entries (2012), another lexicostatistical study The Lexicostatistics and Glottochronology of the Turkic Languages was finally produced. It should be noted that figures obtained in 2009 and 2012 differed sometimes significantly because of different approaches used to account for synonyms. The 2009 approach was much too basic and was significantly updated in 2011-12, which included both reworking the original lists and introducing changes into the program application, therefore the latter version is to be considered more correct.
    Most borrowings (Persian, Arabic, Mongolian, Russian, etc) were excluded wherever possible, so only the verified cognates were counted in the final glottochronological section of the study. In doubtful cases, the cognation was determined according to the [Etymologicheskij slovar chuvashskego jazyka (The etymological Dictionary of Chuvash), by M. Fedotov; volume 1-2, Cheboksary (1996)] and the [Etymologicheskij slovar tyurkskikh jazykov (The etymological Dictionary of the Turkic Languages), E. V. Sevortyan, Vol. 1-7, Moscow (1974-2003)]. The lexical lists presently differ from the Wiktionary.org materials and are available online as a Word document.

    The Lexicostatistical Matrix of Turkic Languages,
    Swadesh-215 (02.2012), borrowings excluded
    ChuvashSakhaTuvanKhakasStandard AltayKyrgyzKazakhUzbekUyghurKarachayBashkirTatarTurkmenAzeri 
    Standard Altay50.9%
    66.9% 66%68.4%78.2%

    A purely lexicostatistical dendrogram of the Turkic languages is not be built at this point, since the accurate analysis is supposed to include phonological, grammatical, historical and other non-lexical evidence. However, we can use the values obtained in the lexicostatistical study to build a wave model of Turkic languages that would reflect the calculated relationships directly. The wave model is based on the borrowings-included matrix, because it is supposed to represent the mutual language intelligibility as it is, without any additional exclusions, hence some discrepancy with the table above.

    The lexicostatistical wave model of the Turkic languages (2012)

    The wave model of the Turkic Languages with borrowings included, [The Lexicostatistics and Glottochronology of the Turkic Languages, Darkstar (2009-2012)]

    2.2 Dissimilar Basic Lexemes in the Turkic Languages

    Another study in this article dates back to 2009. It includes a visual overview of certain lexemes that are known to be dissimilar within the core Turkic languages. These lexemes help to pick up dissimilarities between otherwise closely related groups and assist in identifying large supertaxa.

    Dissimilar Basic Words in the Turkic Languages
    Red is a more ancient layer connected with the eastern Siberian languages, brown marks the Oghuz-Turkmen innovations; blue is a more recent layer probably connected with the spread of the Gökturks; green marks probable "Central" innovations; orange marks the Yenisei Kyrgyz (Tuvan + Khakas + Altai) innovations; purple marks the Yakutic innovations or otherwise differentiated Yakutic words; gray and black are "other" or unclassified. Borrowings are included.

    Turkmen Uzbek
    Kazan Tatar

    not (adj, nouns)Tk. deGil;
    Az. deyil
    dälUz. emas;
    Uy. emes
    Kh. ermes
    KT. tügel;
    KB. tüyse
    emes emesKh. nimes; chox
    Al. emes; d'ok
    eves; chok suox
    hereTk. burada;
    Az. burada < *bu ara-da
    shu tayda;
    Uz. buyerda;
    Uy. buyerde; mana
    K. munda
    KT. monda, bireda;
    KB. mïnda, blaida
    mûndamïndaKh. mïnda
    Al. mïnda
    thereTk. orada;
    Az. orada
    < *o ara-da
    o tayda;
    ol yerde
    Uz. uyerda;
    Uy. uyerde;
    KT. anda, shul zherde;
    KB. anda, alaida
    ondaandaKh. anda
    Al. anda
    howTk. nasïl;
    Az. nechê
    nähiliUz. qanday
    Uy. qandaq
    KT. nichek;
    KB. qalay
    qalayqandayKh. xaidi
    Al. kandïy
    manyTk. chok;
    Az. chox
    köpUz. kûp
    Uy. köp
    Kh. talim; kûp
    KT. küp
    köp köpKh. köp
    Al. köp
    xöyelbex, ügüs
    wideTk. genish;
    Az. genish
    giNish; giNUz. keN
    Uy. keN
    Kh. keN
    KT. kiN
    keN keNKh. chalbax
    Al. d'albak
    kalbak, chalbak kieN
    forestTk. orman;
    Az. orman
    tokayUz. ûrmon
    Uy. ormanliq
    KT. urman
    Kh. agas;
    Al. arka
    arga, arïgtïa
    rootTk. kök;
    Az. kök
    kökUz. ildiz
    Uy. iltiz
    Kh. yildiz
    KT. tamïr
    tamïr tamïrKh. tazïl; chilige
    Al. tazïl
    bark (n) Tk. kabuk;
    Az. qabïq
    gabïkUz. qobuq
    Uy. qovzaq
    KT. kabïk
    qabïq qabïqKh. xabïx
    Al. chobra
    flowerTk. gül "rose"; chichek
    Az. gül; chichêk
    gül Uz. gül; chichak
    Uy. gül; chichek
    Kh. chichek
    KT. göl; chêchêk
    gül; gokka
    gülKh. chaxayax
    Al. chechek
    fat (n) Tk. yaG;
    Az. yaG;
    yaGUz. yoG; may
    Uy. yaG; may
    KT. may;
    maymayKh. üs, zhaG
    Al. üs
    üs, chaGsïa
    nose Tk. burun;
    Az. burun
    burunUz. burun;
    Uy. burun
    KT. borïn;
    mûrïnmurunKh. purun, tumzux;
    Al. tumchuq
    Tk. el;
    Az. êl
    el Uz. qûl
    Uy. qol
    Kh. elig
    KT. kul;
    qolqolKh. xol
    xol ili:
    liverTk. (kara) chiGer
    Az. chiyer
    bagïrUz. zhigar; baGir;
    Uy. jiger; beGir
    Kh. baGir
    KT. bawïr;
    bawïr boorKh. paar
    Al. buur
    thinkTk. düshün-
    Az. düshün-
    öyt-Uz. ûyla-;
    Uy. oyli-
    KT. uyla-;
    oyl-oyl-Kh. sagïn-
    Al. sanan
    liveTk. yasha-
    Az. yasha-
    yasha-Uz. yasha-;
    Uy. yashi-
    KT. yashê-;
    zhas-zhash-Kh. churt-
    Al. d'ür-
    churtt-olor; sïrït
    sayTk. de-
    Az. de-
    diy Uz. ayt-; de-
    Uy. eyt-; de-
    Kh. ay-; de-
    KT. êyt-
    ait-; de- ait-; desh Kh.cho:xt-
    Al. ayt-
    chug-; t.e:- die, et
    skyTk. gök
    Az. göy
    gökUz. kûk; asman
    Uy. kök; asman
    KT. kük
    kök (rare); aspankök (rare); asman Kh. tigir
    Al. teNeri
    t.e:r xalla:n
    burn (intr.)Tk. yan-
    Az. yan-
    öt-; yan- Uz. yon-
    Uy. yan-; köy-
    KT. yan-
    zhan-köy-; zhan-Kh. köy-
    Al. küy-
    nightTk. geche
    Az. geche
    gije Uz. tün
    Uy. tün
    Kh. tün; kecha
    KT. tün
    tüntünKh. tün
    Al. tün
    yesterdayTk. dün
    Az. dünên
    düynUz. kech
    Uy. tünügün
    KT. kichê
    keshekecheKh. kiche
    Al. keche
    eveningTk. aksham
    Az. axsham
    agsham Uz. okshom; kecha
    Uy. axsham; keche;
    Kh. axsham
    KT. kich
    Al. engir
    bigTk. büyük
    Az. böyük
    ulï; chishik Uz. büyük; katta
    Uy. büyük; yoGan,zor;chong
    Kh. uluG
    KT. zur
    chongKh. ulug;
    Al. d'a:n
    childTk. choJuk
    Az. ushaq, chaga
    chaga Uz. bola;
    Uy. bala
    KT. bala; sabii
    KB. sabii
    balabalaKh. pala;
    Al. bala
    faceTk. yüz;
    Az. üz
    yüzUz. yuz
    Uy. yüz
    KT. bit; yöz;
    KB. bet

    betKh. sïray;
    Al. d'üs; chïray
    islandTk. ada;
    Az. ada
    ada Uz. orol;
    Uy. aral;
    Kh. utruG
    KT. utrau;
    KB. ayrïmkan
    aralaralKh. oltïrïx;
    Al. ortolïk
    owlTk. baykush
    Az. baykush
    baygushUz. boygoli;
    Uy. baykux
    KT. yabalaq; ökö (dial.)
    KB. uku
    üki üküKh. tasxa;
    Al. mechirtke
    tomorrowTk. yarïn
    Az. sabah
    ertirUz. ertaga
    Uy. ete
    KT. irtêgê;
    KB. tambla
    erteN erteNKh. taNda;
    Al. erten
    erten; t.a:rtasarsïn
    voiceTk. ses
    Az. sês
    sesUz. ovoz
    Uy. awaz
    KT. tavïsh, avaz
    KB. auaz
    dawïsün Kh. ün
    Al. ün
    ünkuolas, saNa
    wetTk . yash
    Az. yash
    ölUz. ho'l
    Uy. höl
    KT. yuesh, dïmlï
    KB. m
    ïlï, Jibigen
    nïm, nïmdu: Kh. öl
    Al. ülüsh , chïqtu
    öl, mö:n, shal incheGey, u:la:x, si:kte:x

    2.3 The comparison of phonological and grammatical features

    Mudrak (2002, 2009)
    The multivolume Moscow edition Sravnintelno-istoricheskaya grammatka tyurkskikh yazykov. Regionalnyiye rekonstruktsii ("The Comparative Grammar of Turkic Languages. Regional Reconstructions") (2002) included an abbreviated article by Oleg Mudrak Ob utochnenii klassifikatsii tyurkskikh yazykov s pomosch'yu morphologicheskoy lingvostatistiki (On the clarification of the Turkic languages classification by means of morphological linguostatistics). It was subsequently republished in full detail as a separate book (only 100 copies in circulation) (2009), and then briefly reviewed in a public lecture on the history of Turkic languages. The study uses the unique morphological statistical analysis of noticeable grammatical and phonological features counted up for as many as 42 Turkic languages and major dialects, and builds trees with glottochronological dates (based again on Starostin's formulas), checked for historical consistency. This purely morphostatistical analysis is an extremely interesting and apparently completely novel approach in historical linguistics. The obtained dendrograms roughly coincide with the present study by about 80%, though differ in certain aspects.

    Darkstar (2009)
    Mudrak's purely grammatical approach prompted the author of this publication to take a closer look at grammatical features, which are known to be more resistant to borrowings than common words. Finally, a study of phono-morphological differences within the Turkic languages was conducted. The following table contains a list of certain phonological and grammatical features known to be different across Turkic languages, so studying them will help us to establish the exact order of their taxonomic diversification.
    It should be acknowledged that the former analysis [Mudrak (2009)] is apparently more detailed (particularly as far as the number of languages is concerned), however there are many additional grammatical and phonological characteristics not mentioned in the table below but described under different paragraphs for specific Turkic languages.
    The morphological and phonological evidence has mostly been collected from the encyclopedic edition [Jazyki mira: Tyurkskije jazyki (The Languages of the World: The Turkic Languages); editorial board: E. Tenishev, E. Potselujevskij, I. Kormushin, A. Kibrik, et al; The Russian Academy of Sciences (1996)] — which is a detailed, authoritative publication consisting of articles by specific authors with a brief phonetical and grammatical description of each language—as well as from grammar books on specific languages.

    Some of the phonological and morphological differences within the Turkic languages
    The table may contain simplifications in transcribing vocal harmony

    Negation of adjectives, nouns"We did"
    "We do"
    "I do"
    Use of
    tur- or
    any other copula
    no one,
    Chuvashs'--v--r-p-, t-,
    -, x-
    -pa, -peGoal-directed
    -sem-a, -e mar-r-âmâr,
    -at-, -et-
    ta-kam; tashta;
    nikam ta; nishta ta
    Sakhas--0:--t-b-, t-,
    k-, k-

    -lar, -ler, -lor, -lör, -nar, -ner,
    -dar, der,
    -tar, etc
    -ga-bit, -bït suox;
    -bït/bit, -pït/pit -bïn/bin, -pïn/pinverb-an+ tur + pronoun = past tense -ïah-;
    -a:ya- /
    -eye-i = optative
    kim ere,
    xanna ere,
    kim da + negative,
    xanna da + negat
    Tuvanch--0:--d-weak semivoiced
    : strong unvoiced:
    *q > x
    -düve,-tïva, etc
    -lar, -ler, -nar, -ner, -tar, -ter, -dar, -der
    -gan, etc eves; chok-dï-vïs-vïs, -vis -vüs, -vus
    menverb + p + tur (chïdïr, olur) + pronoun =Present -ïr-;
    qai/kei = optative
    bir-(le) kizhi;
    bir-(le) cherde;

    kïm-da: + negativ;
    kaida-da: + negative

    Tofalarch--0:--d-weak semivoiced
    : strong unvoiced
    -da, -de,
    -ta, -te

    -lar, -ler, -nar, -ner, -tar, -ter-Ga/Ge,
    -qan/ qen
    emes-dï-vïs-bismenverb + p + turu (chïêtïrï, oluru) + pronoun = Present tense -ar/er/ïr/ir-;
    qai/kei = optative
    qum-ta: + negat.
    -0:--z-p-, t-,
    k-, x-
    -za, -zer,
    -sar, -ser,
    -nzar, -nzer

    -lar, -ler,
    -nar, -ner,
    -tar, -ter
    -xa/ke, -na/ne, -a/e
    nimes; chox-dï-bïs-bïs/bis
    -ïm, -am
    verb + (p) + tur + pronoun = Audative or Archaic past;
    qai/kei = optative
    kem-de + negat.
    xayda-da + negat.
    -0:--y-b/p-, t-,
    -za, -ze,
    -sa, -se

    -lar, -ler,
    -nar, -ner,
    -dar, der,
    -tar, -ter
    -ga, -ge, -ka, -ke
    -a, -e,
    -gan, -gen, -kan, -keneves, emes;
    chok, chox
    -di-bis, -dï-vïs
    -bïs, -bis,
    -pïs, -pis
    -ïm, -amverb + ïp + tur + pronoun = Audative past;
    verb + a/e + tur + -ar + pers ending = Present Future;
    -ad, -ed
    qai/kei = Optative
    d'--0:--y-b-, t-,
    k-, q-
    -lar, -ler, -lor, -lör,
    -dar, der,
    -dor, dör,
    -tar, -ter
    -tor, -tör
    -ga, -ge,
    -go, -gö, etc
    -gan/gên, -kan/kênemes; d'ok -(ï)bïs/(i)bis,
    ïs/is, -ïk/ik
    verb + dïr + pers ending = audative past;
    verb + a/e + dïr + pers ending = Present Continuous;
    verb + ïp/ip + tur + d + pers ending = Past Continuous;
    qai/kei = Optative
    KyrgyzJ--0:--y-b-, t-,
    k-, q-
    -lar, -ler, -lor, -lör,
    -dar, der,
    -dor, dör,
    -tar, -ter
    -tor, -tör
    -ga, -ge, -go, -gö, -ka, etc
    -gan-emes-dik, etc -(ï)bïz-mïnverb + ïptïr = audative past;
    verb + ïp + tur (otur, Jat, Jur) + pronoun = Present Continuos;
    qai/kei = Optative
    (kimdir) birö:,
    kayda-dïr (bir Jerde);

    ech kim;
    ech kaida, ech Jerde

    siz (polite)
    KazakhJ-, zh--w--y-b-, t-,
    k-, q-
    -men, -pen-lar, -ler,
    -dar, der,
    -tar, -ter
    -Ga, -ge,
    -qa, -qe
    -Gan, -Gen
    -qan, -qen
    emes-dïq, -dik-mïz, -miz-bïn/bin
    verb + ïp + tûr (otur, Jatïrt, Jür) + pronoun = Present Continuos;-ar/er/r;
    êlde-bireu, êldekim
    bir Jerde
    esh kim;
    esh kaida, esh Jerde
    Uzbeky--G--y-b-, t-,
    k-, q-
    -lar-ga-gan, -qan,
    emas-dik; -dimiz
    -(i)miz-manverb + ïp + tûr (ûtir, yot, yür) + pronoun = Present Continuos;-a-, -y-;
    allakim, kimdir
    hech kim;
    hech qayerda;
    Uyghury--G--y-b-, t-,
    k-, q-
    -lar, -lêr-gê, -qa, -ka,-kê,-qê-Ganêmês-duk, -tuq -(i)miz-mênverb + ïp + tur (oltur, yat, yür) + pronoun = Present Continuos;-i--;
    kimdu, biri
    qaysi, hech kim;
    hech yerde;
    siz (polite)
    Chagataiy--G--y-b-, t-,
    k-, q-
    -lAr-Ga, -gä,
    -qa, -kä
    -Gan, -Gän
    -mïsh- (rare)
    e(r)mäs, yoq -dïq (or similar) -(i)bïz-men
    noun + dur(ur);
    verb + -A + dur-pronoun;
    verb +Yp + -dur;

    Barabay- -y-b-, t-,
    k-, q-
    -lar, -nar, -tar-qa-Gantügil-dïq, etc -bïs, -mïn,
    verb + ïp + tur (otïr, yat) + pronoun = Present Continuos (rare);-ïr;
    siz (polite)
    KarachayJ-, ch--w--y-b-, t-,
    k-, q-
    -la, -lê-ga/-xa/ -ge, -na/ -ne, -a/e-Gan/gentüyül-diq, -duk, -dük, etc-bïz, -biz, etc-ma, -meverb + a/e + tur + pronouns = Present Continuous; -ïr;

    kim ese da,
    qaida ese da,
    Tatary-, Ji-, Je--w--y-b-, t-,
    k-, q-
    -day, -tay,
    -dêy, -dïy,
    -dagï, -tagï,
    -lar, -lêr, -nar, -nêr-ga, -gê, -ka, -kê; -na/nê, -a/ê-gan, -kêntügel;
    participle + pers. ending + yuk
    -dïk, etc -bïz, etc -m(ïn)noun (3rd pers) + -dYr, -tYr-ïr;
    kemder; kaidadïr;
    ber kem (dê), hichkem;
    (ber) kaida da
    hich ber Jirdê;
    Cuman-Polovtsian -y-b-, t-,
    k-, q-
    -lar, -ler-Ga, -ge, -qa, -ke; -a, -ê-mYsh- -bïz-man,
    noun (3rd pers) + -dYr, -tYr-Gai/-gei,
    Turkmeny--G--y-b-, d-,
    g-, G-

    -lar, -ler-a, -ä, -e;
    -na, -ne
    Used only as audative particle
    participle + pers. ending + -ok
    verb + ïp + dur (otïr, yat) + pronoun = Present Continuos;
    verb + ïp + tïr + pronoun = Past Audative;
    verb, noun (3rd pers) + -dYr, -tYr
    -ar, -ïr;
    -Jak, -Jek (no endings)
    Azeriy--G--y-b-, d-,
    g-, G-
    -lar, -ler-a, -ê-mYsh-
    Used as audative particle and perfect tense
    verb, noun (3rd pers) + -dYr, -tYr-(y)acak(G-,
    hech kimsiz
    Turkishy--G--y-b-, d-,
    g-, G-
    -lar, -ler-(y)a, -(y)e -mYsh-
    Used as audative particle and perfect tense
    deil, de(G)il-dYk-Yz-ïm,
    verb, noun (3rd pers) + -dYr, -tYr-ar, -ïr;
    bir shey;
    hich kimse,
    hich bir shey
    Khalajy--G--d-b-, t-,
    k-, q-
    -lar-ka, -qa, -yä-mYsh-daG-dimiz,
    -dYk < Azeri
    -uq < Azeri
    -Vm är (conjugated
    -(ï)Ga siz
    Karakhanidy--G--ð-b-, t-,
    k-, q-
    -ïn, -in, -un, -ün, -nïn,-nin
    -lar, -lär-qa, -kê,
    -Ga, -gê
    -a, -ê,
    -Garu, -gerü
    -mïsh-, -mish;
    -gen-, -qan,
    -biz, -miz ol (3rd pers. copula)-Gay, -gey, -qay, -kêy siz
    Khorezmiany- b-, t-,
    k-, q-
    -n, -ïn, -in, -un, -ün, -an, -än -lar -qa, -kä, -a, -ä
    ärmäz, ärmäs;
    däGül, dügül (rare);
    -duq, -dïq-biz-mäner-;
    -b turur = perfect past;
    -a turur = repetetive present

    -Gay, -gäy, -qay, -käy, -Ga, -gä, -qa, -kä (siz)
    Old Uyghur (Kojo)y---ð-,
    b-, t-,
    k-, q-
    -ïn, -in, -un, -ün, -nïn,-nin
    -lar, -lär-qa, -kä,
    -Ga, -gê, -Na, Nä;
    -Garu, -gärü
    -biz, -miz, -bïz -mïz
    -mänärür (copula) -Gay, -gäy
    -tachï, -dachï
    Old Turkic
    -ð-b-, t-,
    k-, q-
    -ïn, -inEquative
    -lar, -lär
    -qa, -gä,
    -ya, -yä;
    -Garu, -gärü
    -mïsh-, -mish;
    –; jok -timiz,
    -biz-mäner--tachï, -dachï siz
    weak semivoiced
    : strong unvoiced
    -lar, -lär, -ner-Ga, -ge,
    -qa, -ke,
    -a, -e
    -Gan, -gen;
    emes-tïr, emes-ar,
    noun + dïr (idïr-, oN; irar); adj + dïr (idïr + oN; irar);
    verb + p + o(r) + (tur) = Present I;
    verb + qu(r) +
    ( tur) = Future I;

    verb + q/Gan + dïr = Past II;


    Yugur y-
    -G--d-weak semivoiced
    : strong unvoiced
    -daG, -deg,
    -lar, -ler, -nar, -ner,
    -dar, -der,
    -tar, -ter
    -Ga, -ge,
    -qa, -qe
    i:re = copula;
    verb + Gan + tïr = Present Tense;
    verb + qïsh + tro = Future;
    verb + Gan + tro = Past II;
    verb + ïp/ip + tro = Past III;


    -Gu, -gu, -Go, -go; -Gï, -ge, -kï, -ke

    3. Making Taxonomic Conclusions

    With all the lexical and grammatical material collected in the previous chapter, we can finally get down to the analysis of each Turkic branch and attempt to make taxonomic conclusions concerning the position of each language on the genealogical dendrogram.


    Chuvash, the only modern-day representative of Volga Bulgaric within the Bulgaric taxon, was definitively shown to be related to Turkic by Nicholas Poppe [Chuvashskij jazyk i jego otnoshenije k mongolskomu i tyurkskim jazykam (Chuvash and its relatedness to Mongolian and the Turkic languages), Nicholas Poppe (1924)]. Poppe established regular phonological correspondences between Chuvash and other Turkic languages. He also lists many influential turkologists (Vitzen (1692), Adelung (1820), Rask (1834), Ramstedt (1922-23)) who had understood and accepted the Turkic origins of Chuvash before him. Moreover, according to Alexander Samoylovich, Poppe had shown that "the Chuvash and Bulgaric languages do not stem from "Proto-Turkish" (z-group), but rather from the common progenitor of both of these separate groupings" [Alexander Samoylovich, K voprosu o klassifikatsiji turetskikh jazykov (Towards the question of the classification of Turkish languages, the Bulletin of the 1st Turkological Congress of the Soviet Union (1926); reprinted in the collection of his works (2005)].

    This position in turkology has changed little ever since. For this reason, Chuvash has not been considered herein in much detail, mostly because of its evidently early separation that does not cause any controversy among scholars. The only innovation that is suggested in the present study is the usage of the term Bulgaro-Turkic instead of just Turkic for the two major groupings. This terminology modification seems to be reasonable and arises from the practical need to avoid the continual use of periphrastic expressions like "the Turkic languages outside Chuvash" or "the Proto-Turkic homeland excluding Proto-Bulgaric", etc.

    Some of the unique Bulgaric features

    Bulgaric phonology

    (1) The famous Bulgaric rhotacism vs. the Turkic Proper zetacism, or the persistent use of –r- where other Turkic languages normally have -z- (though in some cases –r- can also be found in certain positions in Turkic Proper, as well).
    (2) Chuvash -l- vs. Turkic Proper -sh-;

    To put it simply, the huge phonological difference between Chuvash and any other Turkic can also be easily observed by comparing almost any word, such as 1-10 numbers, to its Turkic Proper equivalent.

    Bulgaric grammar

    (1) the peculiar plural marker –sem in Chuvash (of seemingly unknown origin), absent not only in Turkic but apparently in other Altaic languages;
    (2) the peculiar goal-directed case in Chuvash expressed by –shan, -shen;
    (3) contracted grammatical forms and rather simplified grammar in Chuvash (typical of contact or "creolized" languages)

    Bulgaric lexis

    The lexical difference between Chuvash and any other Turkic language amounts to an average of 54.5% (Swadesh-215, borrowings excluded). That is roughly equivalent to the difference between English and any other Germanic language. A similar conclusion has been made by Talat Tekin in [Talat Tekin, Türk Dilleri Ailesi (The Turkic Language Family) // Genel Dilbilim Dergisi, Vol. 2, pp. 7-8, Ankara (1979)], who compared the actual difference between Chuvash and Turkish to the difference between English and German, the latter two, of course, though formally belonging to the same Germanic group and apart from sharing a number of common words, are too far from being mutually intelligible.

    A considerable number of Tatar lexemes is present in Chuvash basic vocabulary. They are normally recognizable by their typical non-Bulgaric phonological shape or/and the existence of a parallel native word, cf. yapâx "bad", yeshêl "green (about grass)", tinês "sea", chechek "flower", vârlâx "seed", kashkâr "wolf", kuyan "hare", utrav "island", yêbe "wet" (cf. Tatar jeben-, Bashkir yeben- "to get wet"), têrês "right, correct". Such common words as kus' "eye" and pus' "head" may in fact too be Tatar borrowings, taken that they lack the r-ending, with something like *xêl and *pul being the likely reconstructions for Proto-Volga-Bulgaric. The abbreviated grammar and the large number of Tatar borrowings should be taken into consideration when making conclusions about the origins of Chuvash. However, this number is not in any way much greater than the number of loanwords from neighbors in most other Turkic languages.

    Bulgaric glottochronology
    Glottochronologically, the separation of a language with the 55% of lexicostatistical differentiation should roughly correspond to anything between 900-1100 BC on the temporal scale. Note that this number has been calculated according to the local temporal calibration—neither the standard textbook figure, nor Starostin's method—see again The Glottochronology of the Turkic languages. However, there is some uncertainty concerning this value, which is due to the logarithmic and statistical nature of the glottochronological law, making it prone to errors in cases of standalone languages Indeed, the lack of any present-day siblings of Chuvash which could allow to make a statistical comparison to other similar Bulgaric languages and cancel out any statistical fluctuations, raises doubts about the robustness of this figure. As a result, a relatively small error, which may be due to the infiltration of Tatar borrowings, may result in a great discrepancy when logarithmically extrapolated into the past.
    At any rate, despite the aforementioned doubts, the number of about 54-55% is relatively stable, and nearly all the previous estimations performed between 2009-2012 (with borrowings excluded or included, with different ways to treat synonyms, etc.) have pointed to the early separation of Chuvash, at least as early as 500 BC, but with 1000 BC being a more likely period. Archaeologically, this period (c. 800 BC) coincides with onset of the Early Iron Age in West Siberia, so we may further attempt to support this date by making tentative assumptions about the active use of iron swords and horse harness during that period, which might somehow have contributed to the separation.
    The presence of relatively late dates in other parallel works [Dybo (2006), Mudrak (2009)] is most likely rooted in the application of Starostin's glottochronological formulas.

    Bulgaric history and geography

    In geography, a rather unique European position of Chuvash to the west of the Ural Mountains, a long way from the supposed Turkic homeland near the Altai Mountains (let alone Mongolia, as assumed in certain alternative Urheimat theories) is evident at the very first glance, which, again, indirectly corroborates the the idea of early separation, because longer distances presumably correlate with longer migration time.

    By the 13th century, Volga Bulgaria must have extended approximately within the 200-km (120-mile) radius from the confluence of the Volga and Kama River. It was probably almost entirely destroyed during the Mongol invasion, making the Volga Bulgarians take refuge in the forested area of the Volga's right (western) bank, situated within the same 120-mile circle. There, near the forests of Chuvashia, the Mongolian and Tatar raids legacy must have been less pronounced. The refugium-type Chuvash settlements in a small area along the Sura (=a tributary of the Volga) are very similar to those of Mari in the forests and hills of the Volga's left and right bank. Unsurprisingly, both ethnicities seem to share certain common ethnological and linguistic features. Consequently, the Chuvash people may be those Volga Bulgarians that survived the 13th century's invasion or any later military and cultural interventions by confining themselves to the woodland of Chuvashia and ceding their former territory to the ancestors of Kazan Tatars. The latter ones were clearly first attested in the proximity of the Volga-Kama confluence by Ibn-Fadlan as "al-Bashkird" as early as 922.
    However, the participation of Kazan Tatar people in the migrational seclusion of Chuvash is obscure. One may easily assume that the Kazan Tatars did not necessarily occupy the Volga Bulgarian region by force as part of the Mongolian army in 1230-40's, rather their settlement in the area of the present-day Tatarstan, though catalyzed by the Mongols, could have been the outcome of a long and slow migration and linguistic assimilation of Volga Bulgaria that extended over the period of several centuries.
    It should also be noted that the Chuvash people were first attested in the historical sources only in 1508, and then in 1551, during the rule of Ivan the Terrible and the siege of Kazan. Their association with Volga Bulgarians has mostly been the outcome of the historical and linguistic analysis of the 19th century's turkologists (Kunik, Radlov, Amsharin, etc.) [see
    the Brockhaus and Efron Encyclopedic Dictionary (1906)], however this conjecture is now considered to be well-demonstrated.


    The discrepancy between Chuvash and other Turkic languages is so pronounced and its geographical position is so detached from the area of maximum diversification of other Turkic languages that it would be appropriate to separate Chuvash as part of a special Bulgaric taxon within the larger Bulgaro-Turkic supertaxon. For most practical purposes, we may assume the date of about 800-1100 BC is a plausible period of separation of Proto-Bulgaric from the rest of the Turkic languages.

    The Yakutic subgroup

    Where does Sakha actually belong?

    It has been widely accepted since the 19th century's research work, that Sakha, the language of the Yakuts, is almost as distant from other Turkic languages as Chuvash. Nevertheless, the matter is far from simple. It has occurred to many researches that Yakuts may actually be directly related to other Turkic ethnic groups of Siberia, such as Tuvan, Khakas or Altay. So the alternative hypothesis would be the possible existence of a "Siberian" taxon, which would include all the Turkic languages to the east of the Irtysh line. However, trying to prove the existence of this taxon turns into a complicated turkological problem. At first glance, Sakha differs drastically not only from any other Turkic language, but also from its closest Siberian neighbors. But in other respects, it seems to share with them certain linguistic features that are hard to delineate from common archaisms. Below we will study some of these shared or unique features in detail.

    Yakutic phonology

    In phonology, the Yakutic subgroup is characterized by the following local innovations not shared by any other branches:
    (1) The loss of the Proto-Turkic probably aspirated *sH as in Old Turkic sekiz "eight" > Sakha aGïs; sen > Sakha en "you"; Old Turkic suNok [N=ng] > Sakha uNuok "bone" (not to confuse with the Proto-Turkic *S as in Turkish yer "place", yol "way", which is a different sound);
    (2) The stabilization of the strongly palatalized Proto-Turkic *S into an "ordinary" s-, cf. Chuvash s'altar but Sakha sulus "star";
    (3a) The transition of the intervocalic -s-, -z- into -h- as in Old Turkic qïzïl > Sakha kïhïl "red";
    (3b) the transition of -ch- into -X- as in bïXax "knife", as opposed to bïchaq in many other Turkic languages [Baskakov, 1969]. This aspiration is even more pronounced in Dolgan, the northernmost offshoot of Sakha, where s- > h- even in the beginning of the word.
    (4) The late development of several diphthongs, as in uon < *on "ten". "Late" since the vocalism is normally much less historically stable than the consonantism and thus should belong to a relatively recent period.
    (5) Various assimilations and dissimilations, which mark the existence of a Proto-Yakutic substrate with strong lenition, which made many original sounds unpronounceable and created the hot-potato effect, such as in the borrowing pahï:ba from the Russian /spasiba/ "thanks".

    Among notable archaisms, the following features can be listed:
    (1) The full retention of the archaic intervocal -t- as in atax "foot", xatïN "birch" probably with some fortition, which is similar only to Tuvan -d/t- (where this phoneme is semivoiced), but which is quite unlike the Khakas -z-.
    (2) The probable retention of the so called "primary" long vowels, as in sa:s "springtime", xa:r "snow", ti:s "tooth", which, in other branches, are mostly found in Turkmen and Khalaj, and are believed to be possible remnants from the Proto-Turkic period.

    Yakutic grammar

    In grammar, in most other respects, Sakha exhibits more grammatical differences than similarities to most other Turkic languages, with the exception of Tuvan, Khakas, Altay, where certain local Siberian similarities have been found.
    The following grammatical features seems to be unique:
    (1) Sakha does not seem to use the negative form similar to e(r)mes or deGil, which is common in other Turkic languages, rather suox (after the verbs in the future tense and after the adjectives) and buol-ba-tax (after nouns) are used instead. The latter seems to be unique among Turkic languages. Cf. men uchuta:l buol-ba-tax-pïn "I teacher being-not-am." (Note that buol- is an obvious Nostratic parallel to the English "be", which is present in all of the Bulgaro-Turkic languages).
    (2) The loss of the genitive marker
    (3) The usage of kini "he, she" and kini-ler "they" (along with the common Turkic ol "that (one)"), which finds parallels probably only in the Bulgaric ku "this, that". There exists a hypothesis of its relatedness to Turkish kendi, Karakhanid kendü "self" (probably going back at least to Ubryatova (1960-80's), the famous researcher of Dolgan), which runs into semantic difficulties, though cannot be completely excluded;
    (4) The phonologically odd plural pronoun ehigi (you) with its unique phonological shape, so different both from the conventional siz and the seler;
    (5) The unusual comparative case with -ta:Gar, -da:Gar, -la:Gar, -na:Gar;

    The following grammatical features in nouns and pronouns seem to be shared with the Altay-Sayan subgroup:
    (1) The typical and persistent usage of expressions like kim-da, kaida-da + a positive verbal construction denoting indefinite pronouns as in "something does", "somewhere is" and kim-da, kaida-da + negative denoting negative pronouns as in "no one did", "nowhere does", etc.
    Cf. Sakha kim-da, hanna-da, Tuvan kïm-da, kaida-da, Tofa qum-ta, Khakas kem-de, xayda-da, Kumandy kem-de, kaida-da, Standard Altay kem-de, *kaida-da;
    However, this syntactic model is by no means unique to "Siberian", since similar models exist in Karachay kim ese da "someone", qaida ese da "sometimes", Tatar ber-kem (de), (ber) kaida da and probably elsewhere. In other western Turkic languages, these constructions have mostly been displaced by phrases with Persian words, therefore this aforementioned feature is most likely to be a Proto-Turkic archaism, not a Siberian innovation;
    (2) The peculiar instrumental case ending in -nan shared with the Khakas instrumental case ending in -naN, -neN. Nevertheless, this feature is evidently a retention, taken that Karakhanid, Old Uyghur, Orkhon Old Turkic and Khorezmian all had a very similar instrumental case with the (n)ïn,(n)un, (n)an, (n)ün marker.

    Furthermore, we will provide a brief summary of the Sakha verbal morphology:

    Notable features of Sakha verbal morphology
    and their Turkic parallels

    TenseSakhaParallels in other Turkic languages
    Imperative 2 bar-ïy "please go";  
    Imperative 3 bar-ar "go later";  
    Tense with -dïr- bar-dar-mïn  "if I go"; Cf. Tofa bar-dïr-men "going-am" (Present Continuos)", however with a different meaning (?)
    Tuvan aytïr-a-dïr-men "I'm just asking it";
    Khakas paz-a-dïr-zïN "you're writing";
    Altay men bara-dïr-ïm "I'm going";
    However, Karachay-Balkar and Turkmen dialects are also said to have similar expressions, which makes this grammatical expression a probable archaism.
    bar-a:ya-mïn "I think I'd better go (get out)"; Cf. Tofa al-Gay-men "I'd better take it" (Optative), with a little different connotation. A similar marker is also present in Tuvan, Khakas, Altai, Kyrgyz, the languages of the Great Steppe, Cuman-Polovtsian, Karakhanid, Old Uyghur, Khalaj, Yugur, which makes it non-Siberian.
    Probability with -tax bar-daG-ïN "you probably go";
    as-taG-ïm "I seem to open";
    The (-dïk-) suffix is present at least in Oghuz-Seljuk and Old Turkic and therefore cannot be Siberian-specific. It seems to be an archaic retention.
    Past, Negative with -tax bar-ba-tax "I have not gone"; Old Turkic (-maduq), but not in Siberian Turkic, apparently a retention, as well
    Sporadic necessity with -tax bar-ar-da:x-pïn "Once, I had to go"; Probably, unique to Yakutic
    Future with -ïax bar-ïaG-ïm
    "I will go", lit. "my going";
    May be akin to Tuvan bar-gash "having gone", churu-ash "having drawn". Also, al-gash baar "He will take", kir-gesh kelir "He will come". Apparently, a different usage of the same marker, so it could be Yakutic-Tuvan specific
    Necessity in the future with -ïax bar-ïah-ta:x-xïn
    "you will have to go";
    Probably, unique to Yakutic
    Subjuntive 1 with -ïax bar-ïax et-iN "if you go"; 
    Subjunctive 2 with -ïax bar-ïax e-bi-kkiN "it turns out that you would go" 
    Optative-Subjunctive with -ïax ah-ïax-pït ete  "(if) we were opening";
    ah-ïa suox eti-bit "(if) we weren't opening";
    Usual action with -chï bar-a:chchï-gïn "you normally go"; Probably, akin to -chi in Turkish and other Turkic when denoting professions and occupations, so literally meaning "you are a goer", therefore an archaism with local extra development.
    Positivity bar-ï:hï-gïn
    "you will evidently go";
    An archaism, it is also found in Turkish al-asï-yïm, Bashkir al-ahï-yïm
    Probability 2 bar-a:ini-bin "I will probably go"; 
    Unfinished action with -ilik bar-a ilik-kiN
    "you haven't gone yet";
    This construction also exists in Khakas (par-galax-sïn) "you haven't gone yet", Tuvan (-galak, -qalaq), Tofa (-halaq), Kyrgyz (-a elek), possibly Uyghur (?). Also, cf. Tofa alïr iik sen "even if I take it". It is the only nearly-certain Siberian isogrammeme, though, according to Shirokobokova (2005), it seems to be now rarely used in Khakas, Tuvan, absent in Todzin, archaic in Tofa.
    Past unfinished
    action ("used to")
    bar-ar et-im "I used to go"; Present in Oghuz, cf. Turkish varïr-dïm, therefore cannot be Siberian-specific; a typical retention
    Past Tense with-bït- bar-bït-ïm ba:r  lit. "my going there is";
     "I have gone";
    bar-bït etim "I had gone";
    A similar suffix (-mïsh-) is present Old Turkic, Old Uyghur, Khorezmian, Karakhanid, Khalaj, Oghuz-Seljuk, and Tuvan e.g. Tuvan al-bïsha:n-men "I'm still getting", but not in other Altay-Sayan languages; an archaic retention. On the other hand, the Great Steppe and Altay-Sayan -Gan past tense is mostly absent in Yakutic.
    Past finished
    action ("once I had to")
    bar-bït-ta:x-pïn  "I had to go once"; 
    Past, Result bar-an tur-a-bïn  lit. "Going, I stand", "I have gone";
    bar-an tur-ar-da:x-pïn  lit. "Going, I stand", "I have gone";
    Apparently, similar to the usage of (-Gan-) suffix in the languages of the Great Steppe and Altay-Sayan, however the syntactic structure herein is entirely different. Looks like a rather unique Yakutic development.
    As it is evident from the table above, most of the shared allegedly "Siberian" features in verbal morphology are in fact old archaisms found in other branches. Among the features shared with Orkhon-Oghuz-Karakhanid, and even going back to Proto-Turkic, we could mention the following:
    (1) The use of -myt- / -byt- tenses, which are akin to the Old Turkic and Oghuz -mïsh- tenses. It is used only in Oghuz, Salar, Old Turkic, Karakhanid, Khalaj, Cuman-Polovtsian, Uzbek, but not any Altay-Sayan or most Great Steppe languages.
    Based on its phonetic similarity to buol- < Proto-Turkic *bol "to be" (and the lack of any other specific Yakutic-[Oghuz-Orkhon-Karakhanid] innovations), we can infer that this suffix is most likely an archaism going back to Proto-Turkic. Semantically, both the -bït- and the -Gan- are in complimentary distribution, so apparently, -Gan- has been displaced -bït- in Altay-Sayan and most Great-Steppe languages, most likely due to the semantic similarity of both tenses.

    (2) The use of -dax- / -tax- / -daG- / -tax- tenses, which are akin to the Old Turkic and Oghuz -dïG- / -tïG- masdar suffixes.
    (3) Cf. the usage of -er- instead of et-, it- as an auxiliary verb, as in oGo utuyan erer "The child is falling asleep" (similar to Khalaj and Old Uyghur), but barar etim "I used to go" (similar to Turkish and other modern languages). However, this apparently Orkhon-Karakhanid-Oghuz-related archaic feature is defined less clearly.

    On the other hand, there are a few unstable Siberian-specific tenses, which can be regarded as suspected Siberian innovations, such as:
    (1) The tense with the -dïr-personal ending- as in *bar-dïr-men "maybe I go, if I go" is actually very typical in the Altay-Sayan languages, however similar forms have also been found in Turkmen dialects, and are said to be "understandable" by Turkmen speakers, which may be indicative of their existence in Proto-Oghuz.

    (2) The tense with the -a ilik- construction exists not only in Altay-Sayan but also in Kyrgyz. However, it seems to have become extinct in most Altay-Sayan languages [Shirokobokova, N.N. Otnoshenije jakutskogo jazyka k tyurkskim jazykam Yuzhnoj Sibiri (The relatedness of the Yakut language to the Turkic languages of South Siberia), Novosibirsk (2005)], so presently it only forms a shadow of something that it might have originally been.
    (3) The use of the -Gay participle to show the optative mood, as in bar-a:ya-mïn in Sakha and *bar-Gay-mïn "I'd better go" in Altay-Sayan, whereas in most other TL's this tense seems to express the direct future, however such a purely semantic feature is too semantically unstable and could be a natural independent development both in Proto-Yakutic and Proto-Altay-Sayan;

    Most other verbal Yakutic constructions cannot be found in other Turkic languages, making Sakha verbal morphology rather unique.

    Borrowings and odd words in the Sakha vocabulary
    Sakha contains lots of words which make one wonder where they could possibly have come from.
    In fact, Sakha was described as a mixed tongue at least as earlier as Radlov (1908), who counted that out of 1750 words in a glossary, about 33% were Turkic, 26% Mongolic, and the rest were of unknown origin.
    Presently, we believe that all these borrowings come from three main sources: (1) Middle Mongolian (or Middle Buryat, pronunciation: /boo-RAHT/), as it happens in most "Siberian" languages, with Buryat being particularly probable due to its geographical proximity to Kurykans; (2) Russian, again as in most "Siberian", with the number of loanwords in abstract and cultural lexis being exceedingly high; (3) an unknown early substrate, most likely of Yeniseian type.
    Among Mongolic borrowings in the basic lexis, one could easily name the following words:
    (1) Khakas sïray, Altay chïray, Tuvan shïray, Sakha sirey "face" probably from Mongolic, cf. Middle Mongolian chiray, Buryat sharay. Also, "beauty" in Kyrgyz and Kazakh;
    (2) Altay mechirtke, Tuvan merzhergen, Sakha mekchirge "owl" from Mongolic *begchergen, Buryat begserge "barred owl";
    (3) Sakha bey-em, Tuvan bod-um, Khakas poz-ïm, Altay boy-ïm "self", which is probably akin to the Mongolian bod and biye "body", though this is not necessarily a loanword;
    (4) Tuvan iye, Sakha iye "mother", cf. Khalkha Mongolian ex "mother";
    (5) Sakha kharba: "to swim", cf. Khalkha Mongolian khayiba, khaiva;
    (6) Sakha khallan "sky", cf. Middle Mongolian e'ülen "cloud(s)";
    (7) Sakha moGoy "snake", cf. Middle Mongolian moqai, Khalka mogoi;
    (8) Sakha mas "tree", cf. Khalka mod, Middle Mongolian mod-un, Daur mo:d, etc., as well as Evenk mo:, Nanai mo:, Written Manchu mo:;
    (9) Sakha ergilin "to turn", cf. Khalka ergeG "turn around";
    (10) Sakha suruy "to write", suruk "letter, mail", cf. Written Mongolian zhiru-, Buryat zura- "to draw"
    Russian words are often hard to recognize because they are modified in accordance with the Sakha phonology, cf. the following examples from Swadesh-215: Sakha chierbe, Russian cherv' "worm"; Sakha sieme, Russian semya "seed"; Sakha ba:lkï, Russian palka "a stick"; Sakha bï:l, Russian pïl' "dust"; Sakha muora, Russian mor'e "sea". This phonological discrepancy implies that other borrowings and archaisms may have become unrecognizable. For instance, the following Sakha words of Turkic origin are rather hard to spot at first glance:
    Sakha tïmnï "cold", akin to Karakhanid tum, tumlïG "cold";
    Sakha xaya "mountain" akin to kaya "rock" in most other TL's;
    Sakha ürüN "white", akin to Orkhon, Old Uyghur, Karakhanid ürüN, Khalaj hirin "white" (a rare archaism);
    Sakha buruo "smoke" akin to Old Turkic bur- "to boil, evaporate";

    The presumable Yeniseian borrowings are particularly interesting.
    Sakha "to fly", cf. Ket
    Sakha kötör "bird", cf. Ket keNassel
    Sakha kini "he, she, it", cf. Ket ki, kide [Note that kini is normally (probably, according to Ubryatova (1960-80's) explained as being akin to the Seljuk kendi "self", however herein we wonder about a different perspective.]
    Sakha kuttan "to fear", cf. Ket koran, qoren', qoranai
    Sakha söp, söptö:x "right, correct", cf. Ket sotdas'
    Sakha sü:r "to flow",
    cf. Ket sennei

    It should be noted that Proto-Sakha could not borrow from Ket, the only living representative of the Yeniseian family, but rather from an unknown extinct Yeniseian language. In any case, these presumable cognates are uncertain and are provided herein only as a matter of tentative conjecture.
    Curiously, no clear-cut borrowings from Evenk(i) (Tungusic) were found, and the resemblance with some of them may either be coincidental or some of the presumable loanwords were in fact borrowed the other way around, that is, in some cases into Evenk:
    Sakha mas "tree", cf. Evenk mo: (an Altaic root) (probably, from Tungusic to Sakha)
    Sakha seri: "war", cf. Evenk kusi:n, buleme:chik, cherig, serI: (probably, from Sakha into Evenk)
    Sakha örüs "river", cf. Evenk birag, ene, olus (dialectical), orus (dialectical) (apparently, from Sakha into Evenk).

    We might conclude that Evenk played no significant role in the formation of Sakha. That is not so surprising considering that Sakha acted as a cultural superstratum to Evenk, whereas Evenk, being scattered over enormous territory, was apparently losing ground to Sakha in the course of the 15-20th centuries.

    Few lexical similarities between Sakha and Altay-Sayan
    With only 57% to Tuvan and 61% to Khakas and 56% to Altay in Swadesh-215 (borrowings excluded), Sakha seems to be a deep-going branch, no doubt of that. It is obviously strikingly different from any other Turkic language. This is because Sakha has many lexical innovations, whose etymology is often hard to explain, and which may in fact turn out to be borrowings from an unknown substrate. However, there seems to exist a number of words common only to "Siberian" languages (= Sakha, Khakas, Tuvan, Altay). Consequently, we should study these suspected examples, attempting to distinguish between archaisms and innovations.

    (1) Khakas ïzïr-, Tuvan ïzïr-, Sakha ïtïr- "bite"; however, ïsïr- is also found in Turkish, Tatar, Karakhanid and possibly elsewhere, therefore it is an archaism;
    (2) Khakas chïz-, Tuvan chod-, Sakha sot- "to wipe"; however, it's akin to Chuvash sâtâr-, therefore it is an archaism;
    (3) Khakas köni, Tuvan xönü, Sakha könö "straight (as a road)", also Turkmen göni (found in many TL's, but this particular meaning only in Siberian Turkic, Altay dialects and Turkmen [see Sevortyan's dictionary, V-G-D letters (1980)]. In other TL's it has other semantic variations. Therefore, apparently, an archaism;
    (4) Khakas xarax, Tuvan karak, Sakha xarax "eye". However, *qaraq is also found in Kyrgyz, Old Uyghur and Karakhanid, which makes it a notable but hardly unique Siberian isolexeme. In the meaning "pupil", it is also found in Turkmen and Kyrgyz; the original etymology of this word is evidently "the black part of the eyeball, the pupil". Therefore, apparently, an archaism;
    (5) Altay sogon, Tofa, Tuvan, Chulym sogun, Khakas sogan, Sakha onoGos "arrow" is usually explained as a cultural borrowing from Samoyedic [Dybo (2007)]);

    On the other hand, the following isolexemes seem to be innovative formations not found outside the supposed "Siberian" subtaxon:
    (1) Khakas churt-, Altay d'ür- (jurtaar), Tuvan churtt-, Sakha sïrït "to live"; obviously, from *jurt "home", "place of pasture", probably innovative, or at least an independent simultaneous semantic formation; note that Sakha includes an additional (prothetic?) vowel into the root;
    (2) Khakas chïzïG, Tuvan chïdïg, Sakha sïtïy-bït "rotten" as opposed to *chiriq in most other TL's, including Chuvash; apparently, from *J'it- "to get lost, die, fade";
    (3) Khakas irgi, Tuvan ergi, Sakha erge "old" as opposed to *eski in most other TL's;
    (4) Altay tük, Tuvan tük, Sakha tü: "wool" instead of the usual *Jün. The original meaning of this word probably was "fluff, fur". Could be coincidental as an independent development;
    (5) Altay mösh, Tuvan pösh, Tofa bösh, Sakha bes "pine" [Rassadin (1981)];

    In any case, you can see that the number of the shared phono-semantic and lexical "Siberian" innovations seems to be exceedingly small: we have found only 4-5 words which are difficult to discard outright. It is highly questionable whether this amount could be sufficient to demonstrate the hypothetical Sakha-Altay-Sayan ("Siberian Turkic") common descent
    On the other hand, there exist certain words shared not just by Altay-Sayan but including the languages of the Great Steppe as well, that is excluding Orkhon-Oghuz-Karakhanid and Chuvash, e.g.
    (1) *but "leg" as opposed to Oghuz-Seljuk *but "thigh";
    (2) tün "night" as opposed to Oghuz-Seljuk *dün "yesterday", Chuvash s'er "night", ener "yesterday";
    (3) Sakha aha:, Khakas azraan, Tatar asharga, Bashkir ashau, Karachay asharGa "to eat", whereas in most other TL's the word ash is used only to mean "food (noun)";
    (4) Sakha xatïr-ïq, Khakas xastïr-ïx, Tatar qayrï, Bashkir qayïr "(tree) bark", also Tuvan qazïr-ïq "scales, a layer of dirt". Chuvash xuyâr "bark" seems to be a borrowing from Tatar.
    These findings seem to be more interesting, they may suggest that Yakutic—Altay-Sayan—Great-Steppe may have once constituted a single unity, as opposed to Orkhon-Oghuz-Karakhanid, which was a different Turkic branch.

    Unexpected similarities between Sakha and Tofa
    The similarities with Tofa are evident already from the fact that they share a unique partial case in -ta/-da. This and other similar features were first discovered by Rassadin in Morfologiya tofalarskogo yazyka v sravnitelnom osveschenii (The comparative morphology of Tofa) (1978). Among other common grammatical features cited are (1) -ïn in accusative; (2) adjective ending in -sïN gï (Sakha -sïN ï); (3) a similar system of onomatopoetic verbs.
    However, Tofa is undoubtly much more similar to Tuvan subtaxon, than to Yakutic, so no direct genetic unity unifying Sakha and Tofa is viable in any way. This makes us suspect that Proto-Sakha may rather have acted as substrate for Proto-Tofa, whereas Tofa may have formed when early Proto-Yakutic speakers switched to Tuvan. For the geographical explanation of how this might have been possible, see below.

    Lack of dialectical differentiation in Sakha
    Notably, despite the drastic linguistic differences from other Turkic languages and the gigantic geographical territory it covers, Sakha is rather surprisingly uniform as far as its dialectical differentiation is concerned. It has only one closely related sibling language (Dolgan) and only a few mutually intelligible internal dialects which, for the most part, are reported to differ only in phonology.
    This particular point of absent siblings makes us assume that the expansion of Yakuts along the Lena has been a relatively recent event. In other case, how can we explain the uniform spread over such a great geographical area extending for three thousand miles? Indeed, in a similar case with the Khanty language (pronunciation: /HUN-tee, HAHN-tee/) (Finno-Ugric family), in which Khanty must have expanded in a similar way over the lower Ob watershed in the course of the last one or two thousand years, we find much stronger linguistic diversification. The dendrogram produced by the group of Georgiy Starostin (2010) confirms the complexities of the Khanty-Mansi internal phylogeny, that consists of multiple language-dialects, so, for all practical purposes, Khanty can presently be viewed as a taxon, not a single language.
diversification of Khanty and Mansi languages

    The diversification of Khanty-Mansi [Straling database (2010)]
    [See here for details].
    The absence of a similar tree for Sakha and the existence of multiple, highly-diversified dialects and lesser-known sub-languages in Khakas, Tuvan, Altai and other "Siberian" Turkic languages of comparable age, the abundance of Mongolian borrowings in Sakha's basic vocabulary, all make us wonder about the peculiarities of Yakutic linguistic prehistory.

    Naturally, a similar scenario is well-known for Middle English, which had become completely unrecognizable since the Anglo-Saxon times, absorbed many Scandinavian, French and Latin borrowings, but developed very few natural siblings (though its dialectical differentiation is far stronger, and it has many creole relatives).
    However, to could be surmised that a similar kind of process may have affected Sakha, as well. It seems there could have been a dramatic turning point in Sakha's prehistory that resulted in an ethnological crisis, the inflow of Mongolian loanwords and the arrested development of the Old Sakha language. These event could have exterminated any possible siblings that had existed before that period.
    Judging by the lack of dialectical diversification, and the fact that the in-group sibling languages (outside Dolgan) did not have enough time to develop, this event must have occurred during the recent historical past, probably, less than a 600-900 years ago. The widespread turmoil of the 13th century connected with the expansion of Mongols could be the likely candidate for that type of ethnolinguistic crisis.

    The lack of genetic differentiation in Sakha
    According to Brigitte Pakendorf [Brigitte Pakendorf, Contact in the Prehistory of the Sakha, Linguistic and Genetic Perspective, (2007)], "the genetic results provide clear evidence for the strong founder effect in the Sakha paternal lineage — thus, it is clear that the group of Sakha ancestors who migrated to the north must have been very small". The expansion of the Sakha haplotypes (N1c1), found in 90% of Yakut population, falls with 95% confidence within the temporal interval between 700 and 1500 CE (idem).
    Similar consideration can be found in a different source [Eric Crubezy et al, Human evolution in Siberia: from frozen bodies to ancient DNA, BMC Evol Biol. (2010)], which states that the origins of the Yakut male lineages can be traced down to a small group of horse-riders from the Cis-Baikal area (that is, located to the west of Baikal), which began to spread before the 15th century AD.

    Positioning Proto-Sakha near Lake Baikal
    According to legends, the progenitor of all Yakuts was Elley Bootur, who was of "Tatar" origin and who fled to the middle course of the Lena, running from "a great war or persecution". Elley Bootur married the daughter of Omogoy Bay, who had been of Mongol (Buryat) descent and who had also fled to the north when the wars during the Genghis Khan rule (?) broke out, and who settled down in the delta of the Chara River (a tributary of the Olyokma) near the Lena about 300 miles from present-day Yakutsk.

    Before that time of great change, the Proto-Yakuts can probably be identified with Kurykans (üch qurïqan), mentioned in one of the Orkhon inscriptions c. 730, and seemingly forming the Kurumchin archaeological culture situated near the western shores of Lake Baikal and dated to the 6th-9th century AD. The identification of Proto-Sakha with this culture is an well-known hypothesis, based on temporal and geographical considerations and the medieval Chinese records [A. P. Okladnikov, Origins of the Yakut people (1951)]. The Kurumchin culture (stone walls, sacrificial stones, petroglyphs, agriculture (wheat, rye, millet), iron-making forges, cattle, camel and horse breeding) is focused near Irkutsk City and around the area of the Murin River (the name itself can probably be akin to Buryat müren "river"). It can also be found on Olkhon Island in Lake Baikal, which is just miles away from the many sources of the Lena basin (including its large upper tributary Kirenga). This easily explains the geographical connection between northern Yakuts of the Lena basin and their possible southern ancestors, the Kurykans of Lake Baikal.
    Note: This may also explain why the word Baikal seems to be a Turkic hydronym (from bay "rich" and köl "lake").

    That Proto-Sakha tribes could have been persecuted by Mongols during the early 13th century is corroborated by the passages in the Secret History of Mongols (1227) (which seems to be the story of Genghis Khan's life told by himself), where the genocide of "Tatars" is mentioned during the early 1200's. The "Tatars" are said to have been the old enemies of the Mongols that had lived somewhere near the confluence of the Orkhon and Selenga, in other words, near the eastern shores of Lake Baikal, which leads to a hypotheis that either "Tatars" were an unrelated Mongolic-speaking tribe or they could have been an eastern offshoot of Proto-Sakha.

    Geography predicts a raft migration from Baikal to Yakutsk
    It should be noted that the physical distance from the Altai and West Sayan Mountains to Yakutsk is just enormous and exceeds 3500 km (2200 miles) in a straight line, being approximately equal to the distance from the Altai Mountains to Chuvashia and Volga Bulgaria. That marks a noticeable curve on the globe and provides an interesting geographical perspective on the matter, making Sakha and Chuvash look like sort of mirror images of each other. That also poises questions on how Yakuts could have covered that immense distance. Specifically, how did they migrate from Lake Baikal to the present-day area of Yakutsk.
    However, there seems to be a simple solution to the seemingly complex problem: they could use a raft or boat migration downstream along the Lena, so this gigantic journey could be accomplished during a relatively short period of time.
    Note that a similar raft migration towards Baikal along the Angara from the west was much less likely, because the Angara flows from Lake Baikal.

    But how did Proto-Sakha even get to Lake Baikal?
    We have established that Sakha shares some common features with the Altay-Sayan and probably the Great Steppe languages, all of which are located either along the Yenisei river or further west. But how could Proto-Sakha move from the Yenisei area to the Kurykan settlements at Lake Baikal? Even if it moved to Baikal from an area other than the Yenisei, the migration evidently proceeded from the west, which is getting us back to the same question.

    The early migration of Proto-Yakutic
    The early migration of Proto-Yakutic [Darkstar (2011)]
    Essentially, there exist three plausible routes from the Yenisei to the western Baikal area.

    (1) Across the taiga?
    The Proto-Yakuts may have moved along the East Sayan Mountains and across the taiga (which includes some of the land belonging to South Samoyedic), that is, roughly along the way of the Trans-Siberian railroad built by the beginning of the 20th century. In a straight line, this potential track would cover a huge distance of over 900 km (550 ml) (from present-day Krasnoyarsk to Irkutsk). It would mostly cut across rivers, so one would have to know precisely which direction one is taking to get to the destination, taken that, there is no natural orientation system, when traveling across a river course, and such migrations would most likely have had to proceed in a rather random and unsystematic way before the migrants could reach Baikal. If this route were actually taken, we would presently find many post-Proto-Sakha groups scattered all over the forests between the East Sayan Mountains and the Angara River. We should also take into consideration the perils of the taiga travel, such as deep snow in winter, gnat in summer and the evident lack of water as soon as one turns away from the river course. These are obvious reasons why much of this area is still uninhabited up to this day, except for regions with modern roads, railroad tracks and city areas. The attestation of South Samoyedic (Kamassian, Karagas) in the western part of this track, which had supposedly arrived in the area before the Turks and which could probably present some military opposition to them, also implies that this territory had most likely been undisturbed until the beginnings of the 17th century. Therefore, we should conclude that this route was probably never taken by Proto-Sakha.

    (2) Along the Angara?
    Another passable route goes up the Angara River, starting from its confluence with the Yenisei to the Angara's source near the southern edge of Lake Baikal. That route is even longer — actually, its length is impossible to calculate in a simple manner because of the many twists and turns of the meandering course of the Angara — but it probably extends for a couple of thousand of kilometers making the potential migrants row hard upstream all the way, with some dense woods and forests along the riverbanks, so neither a natural naval transportation system nor an easily-available shoreline horseback travel could be used for that endeavor. Winter travel on the ice is possible but could be hindered by low January temperatures. As in the previous case, no remnants of Turkic tribes were ever found along the Angara or its tributaries. Also note that the many tributaries would tend to divert the migrants away from the initially undetermined destination into even remoter corners of taiga. We should also keep in mind the possible opposition from the Yeniseian hunting tribes supposedly inhabiting at least some parts of this region. The earliest Russian Cossack records (1620-1630) in the area of Bratsk fortress mention clashes with "Buryats" and "Tunguses" but apparently no Turks/Kyrgyzes/Tatars in the area, which they had already been familiar with and should have been able to recognize. It is theoretically possible, however, that this type of migration could have begun to take place at some point in the past, but probably could not progress very far.

    (3) The Mongolian track?
    The third possibility is traveling all the way along the upper course of the Yenisei, which would finally land any potential migrants in the East Sayan MKountains (if they follow the Greater Yenisei)—where the Tofa presently people live—or in the Darkhat Depression with its relatively small lake called Drod-Tsaagan (if they follow the Lesser Yenisei), where the Tsaatans and Soyots still wander with their reindeer herds.
    The Darkhat Depression, the habitat of Tsaatans, is located across the watershed from Lake Hövsgöl, the largest lake of Mongolia. Even though, the entire area there is mountainous, traveling along the course of the Lesser Yeneisei among relatively sparse Mongolian forests and far away from colder northern tracks, makes it a viable option. For centuries, this route must have been extensively explored by many reindeer and horse breeding herdsmen from Tuva and Mongolia, and it is evidently passable. At the northern edge of Lake Hövsgöl, there is another watershed, beyond which there is the habitat of Soyots and the source of the Irkut river. As soon as the migrants reach the Irkut, it can carry them downstream to the upper Angara in the matter of weeks, and land them where the present-day Irkutsk City is located, that is, near the area where the Kurykan settlements were attested. The overall track length from Yenisei to Baikal is the same as above, about 1000 km (600 mil), though requiring less effort in the second half of the journey.

    Curiously, the self-appellation of Tsaatans is in fact "Tu'kha" (with an aspirated [t] and a glottal stop) which is immediately reminiscent of Sakha. However, this may be purely coincidental (or, if not, a clan name borrowing). Moreover, the travel through Mongolia could help to explain Mongolian borrowings in Sakha, though these could also be acquired later from Buryat, when the Kurykan people were already near Lake Baikal. Additionally, as noted above, Tofa curiously shares a partial case in -ta/-da with Sakha and a few other grammatical features. This can easily be explained geographically, judging from the fact that one can get to the Tofa habitat by traveling along the Greater Yenisei and to the Tsaatan area by choosing to travel along the Lesser Yenisei. Therefore, we may conclude that Proto-Sakha could be a substrate both for Tofa and Tu'kha, both of which later switched to Tuvan, and this is how these languages had probably evolved.
    The presence of reindeer economy in Mongolia, typical of Sakha and other North Siberian peoples, is also surprising. It may even shed some light on how Sakha and other North-Siberians became reindeer herders.
    In any case, the Mongolian track sounds far more plausible than any other option, as far as the lack of geographical obstacles and the presence of ethnographic and linguistic evidence is concerned.

    The analysis of dialectical differentiation, genetic evidence and the oral history all imply that Sakha could have become what it is only after (or, less likely, shortly before) the Mongol expansion of the 13th century, when the Kurykan Turkic tribes must have tried to escape the Mongolian invasion by moving north along the Lena and its southern tributaries, most likely using water transport, such as rafts. This migration that could have occurred rather swiftly on historical scale. Consequently, we may infer that, before that period, Sakha had existed in a remote southeastern area, such as forested regions adjacent to the western shores of Lake Baikal near the sources of the Lena, staying there for a prolonged period without much linguistic and genetic diversification (or otherwise, the tribes closely-related to it must have become extinct during the Mongolian invasion). Judging from the Chinese history records and the local geography and archaeology, these Proto-Sakha tribes may possibly be identifiable with Kurykans.
    Moreover, the analysis of borrowings in the basic lexis may indicate that Sakha could have initially developed upon an unknown Yeniseian substratum.
    The number of possible grammatical and lexical shared innovations in the Yakutic-Altay-Sayan ("Siberian") super-grouping is rather small and in many cases, these innovations exist only as a tiny trace. However, they cannot be discarded outright. In any case, if such a Altay-Sayan-Yakutic proto-state ever existed, it must have been of very short duration, considering there are so few common linguistic elements. Moreover, the majority of Altay-Sayan isolexemes (see below) cannot be found in Sakha, so, in any case, Sakha was the first to separate at a very early stage leaving enough time for these Altay-Sayan shared innovations to develop.

    Similar considerations refer to the few grammatical and lexical features that unify Sakha with Altay-Sayan AND the Great-Steppe taxon (Yakutic-Altay-Sayan-Great-Steppe). The number of these isolexemes and isogrammemes is insufficient to make any firm conclusions. However, this latter option of a deeper unification seems to be more plausible, especially considering the drastic lexical differences separating Yakutic from Altay-Sayan (hardly 58% of common words in Swadesh-215).
    Altogether, it seems that Sakha just won't fit into any other Turkic taxon, apparently being pretty much independent. However, it also seems to be a fact that it must have affected the grammar and lexis of Proto-Altay-Sayan in the distant past leaving a few unexpected traces here and there. That is particularly true of Tofa, as found by Rassadin (1978). Therefore, we may conclude that the features shared between Yakutic and Altay-Sayan do not come from genetic relatedness but rather emerge from a secondary contact. Therefore we may infer that Proto-Yakutic could serve as a substrate for Altay-Sayan which later moved along the same route in a secondary migration wave, and thus interacted with Proto-Yakutic features, which to lead to their acquisition or stabilization in the Altay-Sayan languages.
    In any case, for most taxonomic purposes, the Yakutic subgroup can still be viewed as an early-diversified and quite independent subgroup of the Turkic languages, strongly affected by Mongolic superstratum and an unknown substratum of probably Yeniseian origin, but still retaining many archaic features important in the reconstruction of Proto-Turkic.

    On the origins of Turkic ethnonymy
    The present study suggests that nearly all the Turkic ethnonyms must have had their origins in the names of their clan progenitors. The earliest recorded oral Turkic histories, as exemplified by the Oghuz-Khan Narratives, written down by Rashid-al-Din (c. 1300), or the work of Abu_al-Ghazi_Bahadur Shajare-i Türk (The Genealogy of Turks) (c. 1659), were essentially descriptions of series of legendary events occurring to Turkic clans and their original male founders with very clear and unmistakable identification of most Turkic ethnonyms as nothing but patronymic surnames adopted by all the members of that clan.
    For instance, in al-Gazi Bahadur's work, such names as Turk, Oghuz, Uyghur, Kypchak were clearly and unambiguously associated with single individuals and clan progenitors (with many presumably fictional or partly real details from their personal lives), which leaves little room for other etymological speculations.
    He [Japheth] had eight sons [...] Their names are as follows: Turk, Hazar, Saklab, Rus, Ming, Chin, Kemeri, Tarykh.
    But before the Begs gave the answer, the child said, "My name is Oghuz."
    She bore the child in an old (rotten) tree with a hollow. When they told the khan about this, the khan said, "His father died before my very eyes; he has no one to protect him," and adopted him. He gave him the name Kypchaq. These days a tree with a hollow is called "chypchaq". Humble people, due to slips of tongue, pronounce "kaf" as "chim", thus"Kypchaq" is pronounced as "chypchaq".
    Similarly, according to the legend recorded by Ye. S. Filimonov in 1890 [cited in L.V. Dmitriyeva, Yazyk barabinskikh tatar (materialy i issledovanija) (The language of the Baraba Tatars (materials and studies)), Leningrad (1981)] the progenitor of all the Baraba Tatars was the old man named Baram who migrated from a southern land to the north, where between the Irtysh and Ob, he found plenty of fur animals, birds and fish; there, he had eleven sons (Kelem, Uguy, Uzun, Tukus, Lyubar, Kargal, Kirkach, Choy, Turas, Teren, Baram), who after Baram's death divided the land into eleven parts (aymaks). According to Dmitriyeva, these name still mostly correspond to the names of local auls (villages). This legend renders unfounded all the frequent interpretations of the Baraba name as barma "don't go", baraman "I'm going", etc. The existence of the Baraba clan among other Baraba Tatar clans was supported by statistical data collected and cited by Radlov in 1865 [Aus Sibirien. Lose Blätter aus meinem Tagebuche (From Siberia: Torn pages from my diary), Wilhelm Radloff, Leipzig, 1893]:
    The reason why this evidence is not widely accepted is probably because at some point the scientifically-oriented researchers began to doubt the correctness of mythical factoids described in such legends. However, even if we doubt specific facts, there is hardly any reason to doubt the semantic worldview itself as adopted by the early Turks, including the known and unknown recorders of these legends. The early Turkic oral history was documented in a society that reflected the typical male clan social structure, similar to that of described in the Tora and Quran, where all historical events were often seen as actions of strong and powerful clan forefathers. However, in the course of the 20th century, the original clan structure and the associated ethnographic tradition was almost entirely destroyed, and the number of folk etymologies concerning the origin of Turkic and Mongolic ethnonyms seem to have appeared.
    By the same token, for instance, the Khakas legends attribute the origins of the Khobyy seok (where "seok" means "bone", that is "clan" among Altay and Khakas people, and which is one of the largest among Sagai and Shor peoples) to the legendary progenitor named Kobïy Adas.
    On the other hand, we know full well from historical records that such modern names as Nogai, Uzbek, Seljuk had originally been nothing but personal names, later spreading to the respective dynasty or ethnicity name.
    The expansion from clan name to the ethnicity name also seems to be common. For instance, it was noted as early as Gerhard Miller (1733-1743):
    "...because the Barabas are, of course, Tatars, as their language shows, whereas 'Baraba' or 'Barama' is not the name of the people, but rather the title of a certain special generation, since others from the same people also title their generations [in a similar way], e.g. Luba, Terenya, Tunus, etc." [Gerhard Miller, Istorija Sibirskaja (The History of Siberia), Saint-Petersburgh (1750)]
    In the same way, the European surnames seem to go back to the personal names or aliases of single male individuals, such as Johnson to John, etc. In both cases, we witness the remnants of the patriarchal clan structure and the associated worldview. In the instance of Nogai, we can see that, even though the name originally meant "dog" in Mongolian, there is just as little association with dogs as in Bush, Green, Taylor, etc with the respective concepts they represent. Therefore, we may conclude that nearly all ethnonymic hypotheses or folk etymologies, that attempt to refer a name of a Eurasian ethnic group directly to some real-world phenomena, are usually unfounded, since nearly all such names originally referred to the personal name or alias of the clan genetic progenitor or male leader.

    In fact, the very usage of the word adam for man (from Semitic *adam) in most western Turkic languages (Azeri, Turkish, Tatar, Bashkir, Uzbek, Uighur, Kazakh, Kyrgyz, etc), as well as in Persian, Hindi, Fulani, Indonesian etc., reflects the same tradition of ascribing the descent of the whole ethnic group, even the whole humanity, to one single individual. In this worldview, the history of the whole ethnicity is often seen as an outcome of some action of a legendary ancestor, whose life is poorly understood, with just a few reminiscences surviving in legends, but who presumably passed on his blood to the whole clan, then a confederacy of clans, and finally to a whole ethnic group. (In some cases, however, the name does not go back to the semi-legendary figure himself but rather to that of his father or grandfather, cf. Seljuk and Togrul Beg)

    Herein, we suggest to refer to this historiographical conception as Adamic ethnonymic paradigm.
    It should be stressed that this historiographic worldview is not based on or borrowed from the Abrahamic religions, rather being part of a much older naturally-occurring human tradition.
    By the same token, we should infer that the names of other oldest Turkic clans, whose ethnonymic origins have been lost, such as Kyrgyz, Bashkir, Kimak, Sakha and so on, also go back to personal names, rather than any abstract or natural concepts, just because there seems to be hardly any other way of naming clans and ethnic groups in the old Turkic tradition.

    The Altay-Sayan subgroup

    The Sayan-Altay subgroup supposedly includes at least the following languages:
    (1) Tuvan, Todzhin, Tofa(lar), Tsaatan, Soyot;
    (2) Sagai Khakas (whence Standard Khakas), Kacha Khakas, Kyzyl Khakas, Fuyu Kyrgyz, Mras-Su Shor, Kondoma Shor, Middle Chulym;
    (3) Altay-kizhi (whence Standard Altay), Telengit, Teleut, Tuba, Kumandy, Kuu, etc.

    Below, we will try to show why this taxonomy seems to be valid.

    Tofa and Soyot are related to Tuvan
    The fact that Tofa and Soyot are related to Tuvan, follows at least from the following evidence:

    Tuvan, Tofa, Soyot vocabulary
    (1) Dybo's lexicostatistical research (see above)
    (2) The fact that most words which are unique to Tuvan (among other TL's) are also present in Tofa and Soyot:

    Tuvan chu:(l), Tofa chü, Soyot chü "what?" (from Mongolian);
    Tuvan bichi:, Tofa biche, Soyot biche "few, little";
    "small", also cf. Chuvash pêchêk, akin to Mongolian *bici-qan "small";
    Tuvan ïndï:, Tofa ïndï: "the other one", apparently, from the Turkic *onda "over there, that one";
    Tuvan uruG, Tofa uruG, Soyot urïG "child" of Turkic origin, with the initial meaning "seed";
    Tuvan ashaq, Tofa ashïNaq, Soyot ashshyaq "husband" from Turkic;
    Tuvan iye, Tofa iGe, Soyot i'hê "mother" from Mongolian ekh, Buryat ehe;
    Tuvan but, Tofa but, Soyot but "foot" from Turkic instead of *azaq;
    Tuvan xat, Tofa qat "wind";
    Tuvan xadï:r, Tofa qadï:r "blow (as of wind)";
    Tuvan kesh, Tofa ke'sh, Soyot ke'sh "skin", cf. Karakhanid qas(uq);
    Tuvan dïNna:r, Tofa dïNna:r, Soyot dïNna:(r) "to hear" from Turkic;
    Tuvan mana:r, Tofa mana:r, Soyot mana:(r) "to wait" from Mongolian manax "to guard"(?);
    Tuvan eshti:r, Tofa e'sht:r "to swim", also cf. Chuvash ish-;
    Tuvan da:ra:r, Tofa da:ra:r, Soyot da:ra:(r) "to sew", apparently, a cognate of the normal *tik root as in Khakas tigerge but with some specific phonological changes;
    Tuvan xem, Tofa xöm "river";
    Tuvan oruq, Tofa oruq, Soyot orïq "road" of Turkic origin, from *or- "to dig" [see SIGTY, Lexis (2002)];
    Tuvan eqi, Tofa e'qqi, Soyot eqqi "good", apparently, an archaism, also exists as the Old Turkic eDgü, Turkish iyi and Karachay-Balkar igi, and probably as Sakha üchügey;
    Tuvan baq, baGay, Tofa ba'q, ba'xay "bad";

    Even though some of these words share parallels with Mongolian, many of them seem to be original Turkic words found mostly only in Tuvan and Tofa, which suggests their close relationship.

    Tuvan geography

    The geographical relationship between Tuvan and Tofa can be explained in the following way. Initially, the Tuvan people were those Turkic tribes that followed the upper reaches of the Yenisei into the East Sayan Mountains. There exist two main sources of the Yenisei, the Greater Yenisei (Biy-Xöm) and the Lesser Yenisei (Ka-Xöm). The Tuva capital Kyzyl is located at their confluence. The many tributaries and sources of the Greater Yenisei lead to the northeast, towards the East Sayan Ridge. This bordering area between Tuva and Irkutsk Oblast at the West Sayan Ridge is known historically as Tofalaria, because Tofa mostly inhabit the East Sayan Mountains separating the basins of the Greater Yenisei and the Angara.
    On the hand, the Lesser Yenisei leads to the east towards Lake Khövsgöl in Mongolia, an area originally inhabited by Tsaatans (in Mongolia) and Soyots (in Russia) (nearly extinct), which, according to Rassadin, the main field researcher of these languages, are also thought to be closely related to Tofa and Tuvan [see V.I. Rassadin, O probemakh vozrozhdeniya i sokhraneniya nekotorykh tyurkskikh narodov Yuzhnoy Sibiri (na primere tofalarskogo i soyotskogo) (2006)]. Soyots are said to have moved north into Russia from Lake Khövsgöl only 300-400 years ago, though this is mostly based on hearsay evidence from their legends.
    Consequently, Todzin and Tofa must have formed when part of the Proto-Tuvan tribes moved along the Greater Yenisei (Biy-Khem), until they reached the forests of the Eastern Sayan mountains. And, Tsaatan and Soyot must have formed when Proto-Tuvan tribes moved along the Lesser Yenisei (Ka-Xöm) towards Lake Khövsgöl in northern Mongolia.

    Tuvan hydronymy
    Curiously, the hydronyms of Tyva are clearly and specifically Tuvan, considering they often involve words present only in the Tuvan-Tofa subgroup. Cf. Biche Bash "small head (river)", Ulugan Khöl "large lake", Bash Khöm "head river", Choygan Khöl "pine lake", Many Khöl "Marble Lake", Chazag "summer camp (river)", Kargy (river) (from kargaar "to damn"), Balyktyg Khem "fishy river", Ulug Orug "big way (river)", Tashty Khem "stony river", Ak Sug "white water (river)", Chadan (from chada "step" > river rapid), Uyuk "dumbfounding (because of the noise) (a river)", Chas-Adyr "springtime fork (spur) (a river)", Kara Khöl "black lake", Khadyn "birch (lake)", etc. However, the hydronyms quickly change into Mongolian beyond Mongolia and Buryatia border.
    This phenomenon of the local hydronymic continuity not as common as it may seem and it is probably indicative of the lack of a stable pre-Tuvan substratum in Tyva, and relatively early occupation of this territory by Proto-Tuvan tribes (about 1500-2000 years ago), which is supported glottochronologically.

    The Khakas languages

    On the origins and usage of the ethnonym Khakas

    The term Khakas has been introduced only in 1918 during the turmoil of the Russian Revolution, and seems to be nothing but the then-accepted reading of the supposed word "Kyrgyz" in Chinese chronicles, which presumably referred to the Yenisei Kyrgyz people [see the discussion by S. Yakhontov, V. Butanajev, S. Klyashtornyj in the Etnograficheskoje obozrenije (1992)]. The ethnonym Khakas is rarely used among native speakers up to these days, except maybe in formal situations. In fact, Altay and Khakas people had traditionally referred to themselves as simply Tadar(lar), either because this was the usual name given by Russian Cossacks to nearly all the Turkic peoples in the course of the 17-19th centuries, or because it had existed even earlier. The latter point is, however, uncertain.
    In any case, in reality, the Khakas taxon is subdivided de facto into a number of major dialect-languages, such as Sagai (first mentioned in 1311 in Persian and in 1620 in Russian sources), Kacha (fist attested in 1608), Kyzyl (nearly extinct), Koybal, Beltir (extinct), etc.
    The Sagai people are mostly scattered in rural areas along the foothills of western Khakassia, so pure Sagai is now rarely spoken in cities and seems to be confined to the Abakan Range and the south of the Kuznetsk Alatau Range.
    Just like Standard Altay and Standard Crimean Tartar, the written Khakas is more or less a 20th century's artificial creation based on Sagai, so most features that are mentioned as typically Khakas in fact refer to Sagai. Since the beginning of the 20th century, when Kacha on the plains has gradually become marginal, Sagai can be considered as a good sample of the native vernacular.

    Outside the Khakas dialect-languages, the Khakas subgroup includes two other subtaxa—Shor and Chulym—which have long been formally recognized as separate languages, but which too turn out to be small subgroupings: Shor (including Mras-Su Shor and Kondom Shor dialect-languages), who mostly live to the west of the Minusinsk Depression, and Chulym (including Middle Chulym and Lower Chulym), who live to the north of the Minusinsk Depression along the Chulym River. Lower Chulym is presently extinct, while Middle Chulym is at the verge of extinction.

    Standard Khakas phonology
    The most striking phonological features in Standard Khakas, as recorded in a textbook, in fact come from the Sagai dialect and are not reflected in other Khakas group members (Kacha and Kyzyl), therefore they may be the result of a recent substratum effect (for instance, due to Samoyedic influence). The following mutation are typical of Sagai as compared to other Khakas dialects:
    (1) the -sh > -s mutation as in Sagai Khakas tas "stone", pas (as in Sakha ta:s); but Kachin Khakas tash, Shor tash "stone", pash "head", Tuva, Tofa t/dash "stone", p/ba'sh "head";
    (2) the -ch > -s mutation as in Sagai Khakas as- "open", sas "hair", but Kachin Khakas ach-, chach, Shor ash-, shash, Tuvan ash-, chash, Tofa ash-, chesh; Khakas aGïs "tree", but Shor aGash, Tuvan ïyash, Tofa n'esh;
    (3) the q- > x- mutation in Sagai Khakas as in xara "black", but Kachin Khakas qara, Tuva qara, Tofa qara;
    Therefore, we may conclude that the phonological changes in Standard Khakas and Sagai are relatively recent, whereas Proto-Khakas sounded in a much the same way as Proto-Tuvan-Tofa or Proto-Altay or many other languages of the region, that is without these peculiar local phonological mutations.

    Khakas and Tuvan share few or no unique innovations

    Below, we should study the degree of relatedness between Khakas and Tuvan and the possibility of a separate Khakas-Tuvan proto-state.

    Khakas and Tuvan phonology
    In phonology, Khakas and Tuvan share the following innovative features:
    (1) *S > ch-, as in Chuvash s'ichê, Sakha sette, but Tofa chedi, Tuvan chedi, Khakas cheti "seven", and Standard Altay d'eti (which is basically pronounced almost in the same way as Jeti). However, note that the *S- > n- transition is mostly confined to the Khakas subgroup:
    (1a) chi-, che- > ni, ne as Khakas nïmïrxa, Shor nïbïrtqa "egg" as opposed to Tuvan chuurGa, but Tofa n'umurxa; Khakas na:x, Shor na:q, but Tuvan cha:k "cheek", which sets Tuvan apart from Khakas.

    (2) Apparently, a secondary -w > -G innovative transition in the final syllable (much less likely, a retention, since we assume herein that the original proto-form of "water" was *suw), cf. Tofa suG, Tuvan suG, Khakas suG, Shor suG, also Kumandy (a North Altay language-dialect) su:G / su:, but Standard Altay su: "water". That's apparently the only phonolgical feature shared by Khakas-Shor and Tuvan-Tofa.
    Generally speaking, we have more phonological differences than similarities between Tuvan-Tofa and Khakas-Shor-Chulym. For instance, there are different renderings for the -d- > -z- mutation as in Khakas, Shor azaq "foot", but Tuvan adaq "down"; Kakas xazïN, Shor qazïN "birch", as opposed to Tuvan xadïN, Tofa qadïN.
    Besides, Tuvan-Tofa uses the typical local "Mandarin" (or rather "Manchurian" or "Mongolic") system of weak semi-voiced vs. strong unvoiced plosives, which is probably derived from Mongolic languages, and is also very typical of some other languages in the region.

    Khakas and Tuvan grammar

    There are very few innovations in grammar shared by Tuvan-Tofa and Khakas-Shor only.

    The comparison
    of Khakas and Tuvan grammatical features

    Directive case 1 -che / -zhe -zar / -zer / -sar / -ser / -nzar /-nzer
    Rather rare. Also found in Kumandy -za, -ze-, -sa, -se
    Directive case 2 -dive / -duva / -düve / -dïva / -tive / -tuva / -tüve / -tïva Shor taba, tebe, also Tatar taba, Kumyk taba, Kazakh taman, etc
    Differences in the Present Tense Oynap tur "He is playing"; men tur men "I'm standing"; men chor men "I'm walking"; sen chïdïr sen "you're lying (on the ground)". The original expression has been preserved in Tuvan and Tofa, whereas the Khakas subgroup developed a strong contraction.Khakas, Shor oynapcha "He is playing" is in fact a a standard contraction from *oynap chor.
    There is some similarity with Tuvan-Tofa, but it is unlikely to be exclusive to Tuvan-Khakas only.
    The use of pronoun endingsmen nomcha:n men "I read"min khïGïrgam "I read"; this Khakas construction uses a different ending, so they do not match
    Differences in the Past Tensemen alGan men "I have taken"Khakas min alGam, Shor men aglGam "I have taken" apparently, with a contraction in the ending. There is some possible similarity, but the origin of the Khakas construction is uncertain.
    Differences in the Audative Tenseaytïr-a-dïr-men "I'm just asking it", "It turns out I just ask it" (the usage of this construction is similar to the -mïsh- in Turkish)Khakas paz-a-dïr-zïN "you're writing"; this construction is shared with Sakha, and therefore cannot be exclusive to Tuvan-Khakas.
    Differences in the Audative TenseKazhan al-chïk? "When did he take it, anyway?"
    Kazhan bar-zhïk? "When did he go, anyway?"
    Cf. Khakas kil-er-chïx-pïn "I would come", kil-chiq-ter "Just came". Evidently similar, but is also attested in Kyrgyz.
    Continious Gerund
    kas-pïsha:n "(still) digging"; al-bïsha:n "(still) taking"; al-bïsha:n men" I'm (still) taking"
    Negative Gerundolur-bain "not sitting, without sitting",  
    Unfinished actional-gïzhe-m-che "until (before) I take it"Khakas, Shor, Altay, Kumyk, Bashkir, Tatar, Uyghur, Karakalpak -gancha- / -genche-, showing unfinished action. Evidently, not exclusive to Khakas-Tuvan
    You (plural)Tuvan siler, Tofa silerKhakas sirer, Kumandy sner, snir, Standard Altai slerler, Uyghur silêr. Not exclusive to Khakas-Tuvan.

    So far, we were unable to identify grammatical features shared exclusively at the level of Khakas-Shor-Chulym and Tuvan-Tofa only. If there are any, they are hardly exclusive to these two subtaxa and just seem to point to a different genealogical level

    Khakas and Tuvan lexis
    With about 72% for the Tuvan-Khakas pair in Swadesh-215 (as opposed to 76% for Altay-Khakas and 69% for Tuvan-Altay), the Tuvan and Khakas languages must be a little further apart than the members of the Oghuz subtaxon.
    There is hardly any lexicostatistical evidence for Tuvan being any closer to Khakas than to Altay.

    Most differences between Khakas and Tuvan are due to the large amount of "odd" words in Tuvan and, to a lesser extent in Tofa. Many of these words turn out to be Mongolic borrowings. Cf. Tuvan, Tofa chu: "what" (Khalkha chu:), Tuvan xöy "many" (Khalkha xu "all"), Tuvan, Tofa urug "child" (Khalkha ür), Tuvan, Tofa t.ük "hair" (Khalkha da:x "(entangled) hair"), Tuvan noGa:n "green", also in Khakas (Khalkha nogo:n "green"), Tuvan mugur "dull (of a knife)" (Khalkha molgor), Tuvan day
    ïn "war" (Khalkha dayin). However, some of the other Tuvan-Tofa etymologies are much harder to figure out.

    Khakas and Tuvan geography

    Judging from the geographic perspective, Tuvan is essentially a branch of Proto-Yenisei-Kyrgyz that migrated further south along the upper reaches of the Yenisei. Proto-Khakas-Shor-Chulym originally seemed to inhabit the Minusinsk Depression, whereas Proto-Tuvan-Tofa-Tsataan-Soyot moved further into the Western Sayan mountains, following the course of the Yenisei.
    In other words, from the geographic perspective, Khakas-Shor and Tuvan-Tofa (and the closely related language-dialects) are related in the same way as any two ethnicities living in the same river basin. Their mutual contacts, or even the separation from the same stem, should be easily predictable from their geographic position alone. However, one should also take into consideration that both of the subgroups inhabit different valleys and are well-separated from each other by the Western Sayan Ridge.


    After exploring phonological, grammatical and lexicostatistical evidence, we have found no specific innovations shared by Proto-Tuvan-Tofa and Proto-Khakas-Shor only. Furthermore, from the geographic perspective, the two subgroups are separated by the Western Sayan Mountain Ridge. For this reason, the Khakas-Tuvan (the Sayan) subgrouping alone (without the inclusion of the Altay subgroup and other members) seems to be poorly supported.

    Altay, Khakas and Tuvan form the Altay-Sayan subgroup
    Below, we will study the relatedness of Altay to Tuvan and Khakas and demonstrate that, when considered together, these languages form a separate genetically related subgroup roughly in the same way as Turkmen, Azeri and Turkish form Oghuz.

    Altay (Turkic) is not a single language
    First of all, as it is well-known today, Altay (Turkic) is not a single language, but rather a complex network of independent languages and dialects. According to Baskakov (1969), the Altay subtaxon should include the following groups of "dialects": (1) Southern [(1a) Altay-kizhi, (1b) Telengit, (1c) Teleut] and (2) Northern [(2a) Tuba, (2b) Kumandy, (2c) Kuu (lit. "swan" after the river name) (or Chelkan)], which are probably in fact separate languages.
    However, the appellation of the Altay language is still widely employed apparently due to traditionalism. This name has been accepted even in Baskakov's works (1952-88), who had done field studies and written individual books on Kuu (Chalkan) and Kumandy in the 1960-70's.
    The strong diversification within Altay (and its relatedness to Khakas) is corroborated by the lexicostatistical study by Anna Dybo (2006).

    Altay languages glottochronology
    [Dybo, Anna, The Chronology of the Turkic Languages and the Linguistic Contacts of the Early Turks (2006)]

    Similar results have been obtained in a phono-morphostatistical study by Oleg Mudrak (2007). Note that by Oirot the Starostin group's members apparently mean Standard Altay or Altay-kizhi (Proper), which was its official name until 1947.
    Moreover, some of the Altay "dialects", such as Kumandy and Kuu (Chelkan), have recently obtained the de jure status of separate ethnicity. Curiously, there has even been a sort of small scandal in the press when two different book authors writing in Kuu argued with each other (2011), which language version is more correct, so we may infer there may be some dialectal differentiation even among the speakers of nearby Kuu villages.

    The strong diversification within the Altay dialect/languages suggests that Altay (Turkic) peoples have inhabited the Altai mountains for a long time, probably at least about a thousand years.

    In any case
    , the Altay Turkic languages are much too peculiar, much too diverse, and were much too poorly studied in the 20th century. Both the Khakas-Shor-Chulym and North-South-Altay subtaxa constitute a rather complex superposition of dialect-languages that could not be explored herein with sufficient elaboration. However, we will attempt to provide a brief argumentation for the Sayan-Altay relatedness below.

    Altay, Khakas and Tuvan phonology

    It is hard to identify specific phonological features shared exclusively by Altay and Khakas-Tuvan.
    Instead, however, we have at least one series of typical contractions shared by Khakas (and partly, Tuvan), Altay, and Kyrgyz. These contractions might have been either archaic or innovative.

    (a) as in "liver",
    cf. Khakas pa:r, Tuvan pa:r, Standard Altay bu:r / pu:r , Kyrgyz bo:r , as opposed to Sakha bïar, Proto-Kimak-Kypchak *bawur, Chuvash pôver <*poör (?) [the Chuvash intervocalic -v- seems to result from the late labialization of narrow vowels], as opposed to Old Turkic baGïr — probably, from Proto-Bulgaro-Turkic *Bawïr or *Baïr.

    (b) as in "bone",
    cf. Khakas sö:k, Tuvan sö:k, Standard Altay sö:k, Kyrgyz sö:k, as opposed to Sakha unuoh, Chuvash s'ômô, Old Turkic süNök [note that N denotes a nasal as the Engl. /ng/], Proto-Kimak-Kypchak *süyek probably, from Proto-Bulgaro-Turkic *süNök.

    (c) as in "horn",
    cf. Tuvan mïyïs, Tofa mi:s, Khakas mü:s, Standard Altay mü:s, as opposed to Chuvash mây, Sakha muos, Old Turkic müNüz, Proto-Kimak-Kypchak and Kazakh-Kyrgyz *müyüz — probably, from Proto-Bulgaro-Turkic *maNüR or *maiR.

    The details and the direction of these contractions are ambiguous. They seem to be innovative at first, since most contractions are innovative. However, judging by their partial presence in Sakha, and the partial absence in Tuvan, some of them might as well be quasi-independent mutations or even retentions, so the matter is not entirely clear.
    Also note that Kumandy (a North Altay language) exhibits more Khakas features than Standard Altay (Altay-kizhi, "Oirot") [Baskakov (1972)]:

    (1) Kumandy n'- as in nimirtka, cf. Khakas nimirxa "egg", but Jïmïrtka (d'ïmïrtka) in Standard Altay;
    (2) Kumandy sug / su "water, river" as in Khakas suG, Shor suG, and Tuvan suG, but suu in Standard Altay and southern dialects; Kumandy tag / tu "mountain" as in Khakas tag, Shor taG, Tuvan taG, Tofa taG, but tuu in Standard Altay and southern dialects;
    (3) The Khakas-style ch- instead of the Altay-style d'- pronunciation in northern vs. southern Altay dialects,
    as in chïl : d'ïl "year"

    This affinity has been noted Baskakov (1969, 1988), who clearly maintained that Northern Altay is rather related to Khakas, whereas Southern Altay to Kyrgyz, illogically, despite the fact that he wrote of Altay as a single language. In any case, it is quite reasonable to focus on the Southern Altai dialect-languages (Standard Altay, Altay-kizhi, Teleut, Telengit) below, because their relatedness to Khakas seems less obvious.

    Altay, Khakas and Tuvan grammatical features

    The shared (and mostly innovative) morphological features in Altay-Sayan seem to include at least the following:

    (1) The use of choq after nouns or adjectives to express negatives instead of or parallel to the standard Turkic emes. This feature is typical of many "Siberian" Turkic languages. It may also be found in Kyrgyz.
    (2) The use of a special contracted form for "you" (plural), apparently from *senler. Cf. Tuvan siler, Tofa siler, Khakas sirer, Kumandy sner, snir, Standard Altay slerler, Kyrgyz siler. Also found in Baraba as silär.
    (3) The use of a grammeme similar to bara-dïr-mïn "I'm going", which also exists in Sakha.
    (4) The retention of archaic forms for the past tense 1st person plural (as in "we did"): -dï-bïs, -di-bis in Standard Altay and -di-bis, -di-vis in Kumandy, though the non-"Siberian" innovative -d'ik, -d-uk , etc. are also reported (rather confusingly) in Standard Altay.
    (5) The retention of apparently archaic Optative mood with the -Gai-/-gei- suffix shared by Sakha, Tuvan, Tofa, Khakas, Standard Altay, Kumandy, Kyrgyz. Even though similar grammemes also exists in other languages, particularly in the Southern supertaxon (see below), they may have a different phonological shape and meaning there (usually the meaning of the future tense).
    (6) The directive case in Kumandy (but not Standard Altay) expressed by -za, -ze, -sa, -se, cf. Khakas -za, -zer, -sar, -ser, -nzar, -nzer. Apparently, this feature is quite unique;

    Altay, Khakas and Tuvan lexis
    Proficient Kyrgyz speakers sometimes report good mutual intelligibility with Standard Altay.
    Indeed, we have 76% for Khakas-Altay as opposed to 75% Kyrgyz-Standard Altay pairs in Swadesh-215 (borrowings excluded). The distance to any other non-Siberian language is even greater, with an average of about 70%, including 69% for Tuvan.

    An attempt to find common Altay-Khakas-Tuvan innovative isoglosses produces a bunch of potential lexical innovations:

    Basic vocabulary words shared by Altay, Khakas and Tuvan languages

    Standard AltayStandard KhakasTuvan
    arrowsogonsoGansogunA cultural borrowing from Ket "soom", probably into Proto-Altay-Sayan (originally, a special kind of a blunt-end arrow used to hunt squirrels, see [Dybo (2006)]
    bodynemenimeet-botA possible shared semantic innovation, probably akin to *neme "what".
    fleasegertkishsegirtkeskara-bytA possible shared innovation
    bazhyN (also ög (yurt)Tura is either a shared borrowing from Samoyedic
    or an innovative noun formed from the verb tur- "stand"
    hungerach-toroasta:nïashta:nïBut ach, achliq, achtyk in other Turkic. Presumably, a phonological innovation.
    youngd'it; d'ash chi:t; chas cha:lï < Mong. tsalu:Cf. the normal *chash in other Turkic, whereas *chiit is akin the to western Turkic *yigit, *Jigit "brave young man", acc. to Starling database. A phono-semantical innovation with the typical Altay-Sayan contraction.
    A shared innovation in the basic vocabulary; the root also exists in other TLs, but is more common and persistent in this cluster in this particular meaning.
    smoothtüstüstasAlso, düz in Oghuz-Seljuk, but mostly *tegiz in most languages of the Great Steppe, therefore an archaism.
    chïnsïnshïnAlso, Chuvash chan, therefore probably an archaism, which disappeared in other branches of the TL's.
    badqomoyxomaybagayPresumably innovative.
    Also, Tofa dazïl
    A shared innovation in the basic vocabulary
    bark (n) chobraxabïxchövüre:A shared semantic innovation in the basic vocabulary, probably from *jaburgak (leaf) acc. to Starostin's database
    facechïraysïrayshïrayFrom Mongolian tsaray from the earlier charay; however, note that shared borrowings into three languages might not have been borrowed independently from each other.
    As opposed to Kyrgyz Jalbirak, Sakha sebirdeq, etc, which is probably from Proto-Bulgaro-Turkic *SalbirGaq (or a similar proto-form). Either an archaism or innovation.
    to laughqatqïrxatxïrqatqïPresumably innovative.
    to rubPresumably jïzhar, jïzhip sïyma:rchïzarGat.ürbürPresumably innovative.
    to split (such as wood) japo:darGao:ndakta:rApparently, absent in other TL's. Presumably innovative.
    to scratch (a surface) jap, cf. tïrmaq "fingernail"tïrbax-tïr-Gat.ïrbaq; also t.ïrbaq "fingernail" Other TL's have the verbal form based on tïrnaq "fingernail", but that's phonologically different. Presumably innovative.
    to sing sarïnda-, sarna-sarïn sarnirGaïrla:rA similar word exist in Uygur sayri-maq, Turkmen sayra-mak, but its phonetical shape is different there.
    to burn (intr.) küyerköyerGekïvarAlso in Kyrgyz küyü:. Presumably innovative. Concerning the relatedness of Kyrgyz, see below.
    to search, look for bedre:rti:lirget.ile:rPresumably innovative.
    to understand pilip alar pilip alarGap.ilip alïrNote that the use of the double verbal construction with the -p participle is also very typical of Altay-Sayan and especially Altay languages.
    to be over bozho:rto:zïl pa:rarGat.ozarPresumably innovative, at least in this particular meaning.
    skyteNeritigirt.er:r This one seems to be semantically archaic, preserving the original meaning of Tengri "sky" instead of "God, Heaven". There exists a hypothesis that this might by a Yeniseian borrowing, akin to Ket tïNgal "tall (about person)", though that is controversial.
    smoke (n)ïshïs, tüdünïshWith the meaning "soot", it is also known in many other TL's, see [Sevortyan's Dictionary, Vol. I (1974)], but semantically different in Altay-Sayan.
    fat(n)üsüsüsUnlike the Proto-Bulgaro-Turkic *Su(g). Presumably innovative.
    beast, prey;
    to hunt
    aNdap turar
    Also, noted in Kyrgyz, Kazakh, Karakalpak, Nogai and possibly in Sakha dialects (?) but is much less frequent n these languages. Apparently, from Mongolian aN [Sevortyan's dict. Vol. I (1974)]. In any case, the verb with the aN-root is Altay-Sayan specific.
    to lived'urt-churt-churtt-Also known in Sakha
    tomorrowjüntaNdat.a:rta Presumably innovative
    featherjünchügchügPresumably innovative
    icetoshpuzt.oshFrom ton- "to freeze" > "what is frozen, the frost". Presumably innovative
    forest Iarkaagasarga, arïgPresumably innovative.
    forest IIaGashaGasarga, arïgEvidently from *aGach "tree". Also cf. Karachay-Balkar aGach "forest", where it could be an independent development
    raind'a:shnaNmïrcha'sMoreover, there are Turkmen yaGish, Azeri yaGïsh, but these are have a different phonological shape and are too far geographically.
    mountainkïrtaGkïrNot found in other TL's in this meaning
    ward'u:cha:dayïn (Mong.)Presumably innovative.
    islandortolïkoltïrïxortulukA peculiar shared innovative formation from *ortu "middle", cf. Kazan Tatar utrau, Kyrgyz aral, Karakhanid utruG, Tukish and Turkmen ada, etc
    uchauchao:rgaPresumably innovative; apparently, not found elsewhere
    nosetumchuqtumzuxt.umchuqA possible semantic innovation in the basic vocabulary, probably from a slangy word for "snout", also found in the other TL's, but standard in this meaning only in Altay-Sayan

    One should also take into consideration the number of "Siberian" isoglosses described above, which include Sakha, as the presumable source of early borrowings.
    As you can clearly see from the table above that Altay, Khakas and Tuvan share a rather huge number of apparently innovative lexemes, some of which are shared only between one pair of languages, while some of the others are shared across the board. These isolexemes provide substantial support for the existence of the Altay-Sayan genetic unity.

    As to the reported Altay-Kyrgyz partial mutual intelligibility, it should be noted that most of the lexemes found above are not shared with Kyrgyz, setting it apart from Altay-Sayan languages. Moreover, certain proximity between Altay and Kyrgyz can also be explained by theconsiderable linguistic archaism of these two languages (also see Kyrgyz-Altay isoglosses below).

    Altay, Khakas and Tuvan history and geography
    The Altai and the Western Sayan Mountains belong to the same mountain system, whereas the Tian Shan is a different system separated form the Altai by the watershed of the upper Irtysh river. The distance from Lake Issyk-Kul, where Kyrgyz people are presently located, to the Altai is over 800 km (500 miles).
    On the other hand, the habitat of the Altay (Turkic) people is very close to the traditional habitat of Khakas, and especially Shor. For instance, the map from the The Atlas of the Peoples of the World (1964), which supposedly reflects the distribution of ethnic groups during the first half of the 20th century, clearly shows the position of Northern Altay peoples in the direct vicinity of Shor and Khakas.

    The map of the ethnic groups near the Altai Krai The Altai Republic Khakassia
    Old Soviet ethnographic maps of the Altay-Sayan area (1940-60's)

    Note: The presence of the many "unexpected" ethnic groups that you can find on the first map, such as Chuvash, Tatar, Mordvins, (Volga) Germans, etc., scattered all over the Altai Krai and Khakassia, is apparently connected with the famine of the 1920's, when there was a mass railroad migration from the Middle Volga to West Siberia, Uzbekistan and other areas unaffected by famine. Presently, most of these groups must have become ethnically assimilated, at least for the most part, and presumably lost their original languages, though some of them may still exist.

    In any case, we have come to the conclusion that the geographical considerations generally vote for the high probability of Altay-Khakas relatedness and against a readily-available physical connection between Altay and Kyrgyz languages.

    Little is known about the local history. Curiously, as Radloff mentions about the Shor people in 1861 [Aus Sibirien. Lose Blätter aus meinem Tagebuche (From Siberia: Torn pages from my diary), Wilhelm Radloff, Leipzig, 1893]:
    In vain did I try to exact any historical legends from them [the Mrassu Shors], they could not even name the five ancestors, which any Altayan knows. The 102-year old man could only say that, as he had heard from his father, they had always lived peacefully in this land, and nothing had changed about their way of life except for their faith; they had always been fishermen, and as far as he could remember, everything stayed the same.

    We may hypothesize that the migration from the Altai to Khakassia or vice versa might actually have proceeded along the Abakan river, which takes source in the Altai mountains, near the approximate separation area of the Northern and Southern Altay dialects, and flows through the lands of Sagai Khakas and Beltir Khakas into the Yenisei River. The Abakan seems to provide an easily-available geographic link between Proto-Khakas and Proto-Altay areas.

    Note: The interpretation of the Abakan river's name as "bear's blood" is an unlikely option and may constitute a folksy etymology, since there exists a separate tributary of the Yenisei named Kan, as well a number of other rivers in Siberia exhibiting the same root. Moreover, other hydronyms in the area do not seem to point towards Turkic origin, therefore the hydronym Abakan is likely to be non-Turkic.
    The enthno-geographical distribution of the Altay Turkic, Khakas and Tuvan subgroups can be summarized on the map below. As in the other cases, this distribution mostly reflects the early 20th century situation, when most textbook data were collected. In the early 21st century, these areas have narrowed down and some dialects (such as Lower Chulym) even become extinct.

    The distribution of the Altay Turkic,  Khakas, Chulym, Tuvan 
languages and dialects

    The approximate distribution of the Altay, Khakas and Tuvan peoples by the beginning of the 20th century [Darkstar (2012)]

    Additionally, the complexity of this geographic distribution allows us to infer that the amount of dialectal and linguistic diversification among the members of the Altay, Khakas and Tuvan subtaxa is rather profound and implies at least 1000 years of internal differentiation. By no means, Altay, Khakas and Tuvan presently constitute standalone languages.


    Based upon (1) several probable phonological innovations; (2) several innovations in grammar; (3) the large amount of mostly innovative shared isolexemes exclusive to the Altay-Sayan subgrouping, including good lexicostatistical relatedness between Altay, Khakas and Tuvan in Swadesh-215; (4) the geographic proximity and evident geographic connection between Altay, Khakas and, to a lesser extent, Tuvan languages and dialects;
    the existence of the Altay-Sayan proto-state seems a very plausible hypothesis.
    Moreover, as lexicostatistical calculations show, there's more proximity between Standard Altay and Standard Khakas, than between Standard Khakas and Tuvan. On the other hand as we have shown above, Tuvan and Khakas share no exclusive innovations. These considerations imply that Tuvan must have been first to separate from the Altay-Sayan branch, whereas Khakas and Altay either followed much later or strongly interacted with each other for several centuries. At least, the particular relatedness of Kumandy (and reportedly other Northern Altay languages) to Khakas, noted by Baskakov (1969), can probably be attributed to this later interaction.

    During the 2nd millennium CE, a further diversification of Tuvan, Khakas and Altay into smaller languages produced considerable linguistic and dialectal variation in the Altay-Sayan area.


    The Languages of the Great Steppe

    Kimak-Kypchak-Tatar, Kyrgyz-Kazakh, and Chagatai-Uzbek-Uyghur seem to form a genetic unity
    According to the present publication, the Turkic languages of the Great Steppe include, among most important representatives, (1) Karluk, Kyrgyz, Kazakh, Karakalpak; (2) Chagatai, Uzbek, Uyghur; (3) Baraba, Bashkir, Kazan Tatar, Nogai, Kumyk, North Crimean Tatar, Karachay-Balkar and other closely related languages and dialects. By the Great Steppe, we understand herein the western and largest part of the Eurasian Steppe that stretches from the Altay Mountains to the Black Sea.
    The Great-Steppe languages seem to share many common features and are reported to retain good mutual intelligibility (subjectively up to 80% in actual speech). Their speakers often get the impression that all of the Turkic languages are very close to each other, even though this impression is in fact connected with the intelligibility of these neighboring languages scattered across the Eurasian steppeland areas in the countries of the former Soviet Union. In any case, we may suppose they are particularly closely related, and we will attempt to demonstrate this below.

    The history and geography of the Great-Steppe languages
    Apparently, until about 700 AD, all of the proto-members of this presumable supertaxon had occupied the area near the Irtysh River in the Altay Krai region. During the rise and fall of the Göktürk-Uyghur Kaganate between the 720-840's, these tribes were affected by the strife with the Göktürks (described in the Orkhon inscriptions), and, probably, must have been compelled to move from the Irtysh towards the present-day Kazakhstan, northern Tian Shan, and then deeper into the Great Steppe, though the connection with the Göktürks-Uyghyrs, and the migration details are rather hypothetical.
    Specifically, the following migrations seem to have occurred:

    (1) The Karluks are reported to migrate from the Altay Mountains to Suyab and establish their confederacy in the Jeti-Su (Zhetisu) by about 760-766 AD. However, actually, virtually nothing is known of this Karluk dialect, and its relatedness to other languages under consideration is purely conjectural.

    (2) We know that the Tatars were first attested, among other Turkic tribes, in the Kul Tegin Orkhon inscription c. 732 in reference to the burial of Bumin Kagan in 552.
    Judging from their later location, the Proto-Kimak-Kypchak-Tatar tribes must have been situated along the middle course of the Irtysh River, where they formed their own Kimak Kaganate by 840 AD.

    (3) The Kyrgyz tribes of Kyrgyzstan could have migrated from the Irtysh towards the Jeti-Su probably after the 840's, that is after the fall of the Uyghur Kaganate (which, essentially, was the continuation of the Göktürk Empire), when the Yenisei Kyrgyz tribes allegedly sacked the Uyghur capital in Mongolia's Orkhon valley and driven the Uyghurs out, establishing their own Kyrgyz Kaganate soon afterwards. However, the exact details of these events are very unclear, and there are more interpretations in the Russian and Kyrgyz historiography about the origins of Kyrgyz of Kyrgyzstan than solid facts.

    Despite the vagueness of the earliest records, the historical evidence seems to point to the existence of an early tribal unity located along the thin strip of land near the upper and middle course of the Irtysh river as it passes near the Altay mountains flowing from Lake Zaysan. Until about 600-800 AD, the Karluk, Kyrgyz, Tatar, Kimak tribes were apparently all situated in the close vicinity of the Kulunda Steppe, Altai Mountains and Lake Zaysan, being subject to intense ethnic interaction.

    The phonology of the Great-Steppe languages
    Most phonological similarities of Kimak-Kypchak-Tatar, Kyrgyz-Kazakh and Chgatai-Uzbek-Uyghur are not exclusive to them, they can also be found in Southern Altay and Oghuz (especially Turkmen), which can probably be attributed to the formation of a linguistic area. In other words, besides the Great-Steppe languages being a genetic unity, we may also speak of the Great-Steppe languages as a Sprachbund, with some additional ethnicities included and with some features present and in some of the languages but absent in others.
    In any case, most languages of the Great Steppe can be characterized by the following phonological characteristics:
    (1) A further lenition of the intervocalic -z- > -y-: cf. Khakas azaq, but Standard Altay and Kumandy ayak, Kyrgyz ayaq, Kazakh ayaq, Chagatai ayaq, Kimak-Kypchak-Tatar *ayaq, Oghuz *ayaq. Note that this feature was originally absent from the descendants of Proto-Orkhon-Karakhanid, which preserved a fortified -d- or -ð-, cf. Orkhon Old Turkic aDaq, adaq, Karakhanid aðak (=the exact pronunciation may be uncertain, possibly as interdental /ð/), Khalaj hadaq.
    (2) The absence of the final -G/-g, as in Standard Altay tu:, Kyrgyz to:, Kazakh to:, Karachay taw, Bashkir taw, Kazan Tatar taw "mountain", but Tuvan taG/daG, Khakas taG, Kumandy (a Northern Altay language-dialect) taG, Oghuz-Seljuk *dag.
    (3) Apparently, the i > e innovative mutation, as in Standard Altay eki, Kumandy eki, ekki, iki (depends on the dialect), Kyrgyz eki, Kazakh eki, Karachay eki, Nogai eki, Kumyk eki "two", but Tuvan ihi, Khakas iki, yet Oghuz *iki. Note again that transitions in vowels are often unreliable, lack sufficient historical stability, may emerge independently, or be an areal feature.
    (4) A special voicing pattern as in Kazan Tatar sigez "eight", tugïz "nine", Karachay-Balkar segiz, toGuz, Kyrgyz segiz, toGuz. Here, the second and third consonants are voiced as opposed to Altay, Kumandy segis, togus, Khakas segis, toGis, Yugur saGïs, doGïs, Orkhon Old Turkic sekiz, toquz, Uzbek sakkiz, to'kkiz.
    The grammar of the Great-Steppe languages
    (1) The languages of the Great Steppe are characterized by a unique and a very typical shared innovation: the -ik/-ïk/-ük/uk, etc. the Past Tense suffix in the 1st person plural. It can be found in some of the Southern Altay language-dialects, Kyrgyz, Kazakh, most Chagatai languages, all of the Kimak-Kypchak-Tatar and Oghuz languages. The suffix is almost entirely absent from the Orkhon-Karakhanid branch [though occasionally present in late Karakhanid and Khalaj (where it was probably borrowed from Azeri)], "Siberian" Turkic, Yugur, Salar and Chuvash, where the historical archaic *-d-imiz or a similar form is used instead. As a matter of fact, this *-d-imiz suffix is so recognizably Nostratic— actually, -miz is one of the earliest Nostratic morphemes mentioned by H. Pedersen in his article on Turkish phonology in 1903 — therefore, we may conclude that -ik/-ïk/-ük/uk is a later innovation.
    (2) At least such languages as Kyrgyz, Kazakh, Chagatai-Uzbek-Uyghur, Karachay-Balkar, Nogai, Karaim exhibit a very odd 3rd person singular ending in verbs: cf. Kyrgyz bara-t "s/he will go", Kazakh bara-dï "s/he is going", Nogai bara-dï "s/he goes", Sibir Tatar (Tyumen) para-tï "he goes" , Uzbek borap-ti "s/he is going", bara-di "s/he will go", Uyghur yazi-du "s/he, they (will) write". This pretty striking 3rd person verbal marker, so similar to that of Latin, may make one wonder whether the above-mentioned Turkic languages retained a Nostratic feature. However, it seems to be that this ending is a mere contraction of the common Turkic -dïr, -dir, -dur, -dür, -tïr, -tir, -tur, -tür, used in different connotations in nearly all Turkic grammars, and mostly expressing certainty or audative mood. The key to understand how this contraction could have come to life is to realize that the ending -r in Turkic Proper is generally unstable and must either transform into a -z (according to the law of zetacism) or simply disappear as it happens in Turkish dialects, Uyghur and possibly elsewhere. Hence, apparently this -tïr > -tï > -t transition in Kyrgyz.

    The lexis of the Great-Steppe languages
    The lexicostatistical proximity of most Great Steppe languages (except for certain members on the geographic periphery) is quite undeniable and can easily be observed. See, for instance, the diagram for the The Wave Model of the Turkic Languages above. Many of these similarities turn out to be archaisms, shared with Standard Altay, and sometimes even Khakas, Turkmen and other neighboring "fringe" languages, whereas true innovations are harder to detect.

    In any case, consider the following lexical and phono-semantical instances, mostly from Swadesh-215:

    (1) Kimak-Kypchak *üy, Kyrgyz üy, Kazakh üy, Uzbek öy, Uyghur uy, also St. Altay öy, Turkmen öy as opposed to Khakas ib, Tuvan ög, Kumandy ük, Turkish ev "home". This word may in fact be akin to Great-Steppe *uya, Seljuk *yuwa, Chuvash yâwa "nest", though this etymology does not seem to have been noted anywhere else;
    Kimak-Kypchak *tüye, Kyrgyz , Kazakh tüye, Uzbek tuya, Uyghur töga, also St. Altay , tebe Turkmen tüye as opposed to Khakas tibe, Tuvan teve, Sakha taba, Karakhanid teve, Old Uyghur teve, Azeri devä, Turkish deve "camel", Chuvash teve; Apparently, innovative in Great-Steppe;

    (2) Kimak-Kypchak *may, Kyrgyz, Kazakh may, Uzbek moy, Uyghur may, also St. Altay may, Turkmen may "fat" (noun), apparently innovative, absent elsewhere;
    (3) Kimak-Kypchak *ayt, Kyrgyz, Kazakh ayt-, Uzbek ayt, Uyghur eyt, also St. Altay ayt-, Turkmen ayt-"to say", though cf. Turkish ayït- "to concern"; apparently an archaism, since it is also found in Sakha as et
    (4) St. Altay bet, Kimak-Kypchak *bet, Kyrgyz, Kazakh bet, Uzbek bet, Uyghur bet "face";

    (5) Kyrgyz sürt-, Kazakh sürt-, Uzbek sürt-, Uyghur sürt-, Tatar sürt-, Bashkir hört-, Karachay-Balkar sürt- "to wipe" as opposed to Altay arla:r, archïnar, Khakas chïzrga, Turkmen süpür- "to wipe". Apparently, innovative;
    (6) Kyrgyz oylo:, Kazakh oylau, Uzbek oyla-, Uyghur oyli-, Tatar uyla-, Bashkir utla-, Karachay-Balkar , Turkmen üyt-, pikir et-, say-, as opposed to St. Altay sanan, Khakas saGïn-, "to think, ponder". Apparently, innovative;
    (7) Kyrgyz jïrlau, Kazakh zhïrlau, Tatar jïrla-, Bashkir yïrla-, Karachay-Balkar jïrla-, as opposed to St. Altay qozhoNdor, Khakas ïrl-, Turkmen sayra- "to sing". Apparently, innovative;
    (8) Kyrgyz qursaq, Kazakh qursaq, Uyghur qorsaq, Tatar qorsaq, Bashkir qorhaq "belly", as opposed to Oghuz-Seljuk *qarïn, St. Altay ich, Khakas xarïn, isti. Apparently, innovative;
    (9) Kyrgyz ïshku:, Kazakh ïskïlau, Uzbek ishqala-, Tatar ïshqïrga, Bashkir ïshqïu, Karachay-Balkar ïshïrGa "to rub", as opposed to Oghuz-Seljuk *sürt(en), St. Altay jïzhar, Khakas chïzarGa. Apparently, innovative;
    (10) Kyrgyz sürtu, Kazakh sürtü:, Uygur sürt, Tatar sörtörgê, Bashkir hörtöü, Karachay-Balkar sürterge "to wipe", as opposed to Turkmen süpür- Seljuk *sil-, St. Altay arla:r, archanïr. Apparently, innovative;
    (11) Kyrgyz ïrGïtu:, Kazakh ïrGïtu, Tatar ïrgïtu, Bashkir ïrGïtïu "to throw", as opposed to Uzbek, Uyghur at-, Oghuz-Seljuk *at-, St. Altay chachar, Khakas tastirGa, silerge. Apparently, innovative;
    (12) Kazakh dala, Kyrgyz tala:, Tatar dala, Bashkir dala, Uyghur dala "steppe, desert". Apparently, innovative but could be a borrowing (?);
    (13) Kazakh dawïs, Tatar tawïsh, Bashkir tawïsh, Karachay tawush, Uzbek towush, Uyghur tawush "voice". Apparently, is not found elsewhere, therefore probably innovative;

    The abundance of archaisms can too contribute to the demonstration validity, it is just indicative that the Great-Steppe proto-state must have been of rather short duration, so it could not provide enough time for the proto-language to develop unique innovative lexemes. Below, there are a few archaic words:
    (1) Kyrgyz ötkür, Kazakh ötkir, Uzbek o'tkir, Uyghur ökür, Tatar ütken, Bashkir ütker, Turkmen ötgür "sharp" as opposed to Karachay-Balkar jiti, St. Altay kurch, Khakas chitig "sharp"; also found in Tuvan, therefore probably a retention;
    (2) Kyrgyz tishte, Kazakh tisteu, Uzbek tishla-, Uyghur chishli-, Tatar teshle-, Bashkir teshle-, Sr. Altay tishte, as opposed to Karachay-Balkar qab-, Khakas ïzïr- "bite"; a retention;
    (3) Kyrgyz keN, Kazakh keN, Uzbek keN, Uyghur keN, Tatar kiN, Bashkir kiN, Karachay-Balkar keN "wide", as opposed to Oghuz-Seljuk genish, St. Altay d'albaq, Khakas chalbaq, a retention;
    (4) Kyrgyz qatïn, Kazakh qatïn, Uzbek xotun, Uyghur xotin, Tatar xadïn, Bashkir qatïn, Karachay-Balkar qatïn "wife", as opposed to Oghuz-Seljuk kadïn "woman", St. Altay üy, Khakas ipchizi "wife", probably a retention;
    (5) Kyrgyz tayaq, Kazakh tayaq, Uzbek tayoq, Uyghur tayaq, Tatar tayaq, Bashkir tayaq, Karachay-Balkar tayaq "stick", as opposed to Oghuz-Seljuk chöp, chubuk, St. Altay agash, Khakas agas, tayax, a retention since it is known even in Chuvash tuya;
    (6) Kyrgyz soGush, Kazakh sogïs, Tatar suGïsh, Bashkir huGïsh "war", as opposed to Uzbek, Uyghur, Turkmen *urush, St. Altay d'u:, Khakas cha:, Turkish savash. Either archaic or innovative;
    (7) Kyrgyz burulu:, Kazakh bu^ru, Uzbek bur-, Uyghur buri-, Tatar borïrga, Bashkir borolou, Karachay-Balkar bururGa, St. Altay burïlar "to turn (right, left)", as opposed to Oghuz-Seljuk *dön-, Khakas aylanarGa. A retention;

    The lingustically related conglomeration of tribes along the upper course of the Irtysh River near Lake Zaysan and the Altai Mountains, that existed there before 600-700 AD, finally resulted in the formation of Kimak-Kypchak-Tatar, Kyrgyz-Kazakh, and Chagatai-Uzbek-Uyghur subtaxa. The descendants of these subtaxa are hereinafter referred to as the languages of Great Steppe, or Great-Steppe (super)taxon. Most languages of the Great-Steppe supertaxon share relatively good mutual intelligibility and many common lexemes, both because of their innate linguistic relatedness and, in some cases, because of their posterior interaction.
    Moreover, the languages of the Great Steppe may have also affected the development of Turkmen and Southern Altay, in which case we may additionally speak of the Great Steppe Sprachbund, that includes some languages on the periphery.

    Great-Steppe and Altay-Sayan seem to be closer to each other than to Oghuz-Seljuk
    Below, we will briefly study the features that may relate the members of the Great-Steppe supertaxon to the representatives of the Altay-Sayan supertaxon. This hypothesis suggests that the Orkhon-Oghuz-Karakhanid branch and Proto-Yakutic branch were the first to separate from Proto-Turkic Proper, whereas Proto-Great-Steppe-Altay-Sayan split up only a few centuries later.
    Judging by the many unexpected similarities here and there, Proto-Great-Steppe-Altay-Sayan and Proto-Yakut may also be very close, though the latter point is far from obvious.

    The grammar of Great-Steppe and Altay-Sayan

    (1) The extensive usage of -Gan- / -ken- in the Perfect Tense instead of the Oghuz-Seljuk -mysh-/-mush- or Sakha -byt-/-myt- is rather typical of the Great Steppe and Altay-Sayan languages. Moreover, the -Gan suffix seems to be at least sporadically present in various functions in Orkhon Old Turkic, Karakhanid, Salar, Yugur, whereas -mysh- is also known in Cuman-Polovtsian, Uzbek and some others. The suffixes may also be used in participles, but herein we mostly address their usage in the Perfect Tense only. Despite some intermingling, the distinction between mysh-languages and gan-languages, which separates the Great Steppe and Altay-Sayan taxa from Sakha and Oghuz-Seljuk, altogether, seems to be rather sharp.

    The lexis of Great-Steppe and Altay-Sayan
    A few examples of the presumable innovations shared by the Great-Steppe and Altay-Sayan are listed below.

    (2) Khakas omas, Altay ötpös, Tatar ütmês, Kazakh ötpês, Kyrgyz ötpögön, Uzbek ûtmas, Uyghur ötmes "dull (of a knife)";
    (3) Tuvan kïlïr, Bashkir kïlïu, Kyrgyz kïlu:, Uzbek qilmoq, Uyghur qilmak "to do", whereas in Sejuk-Oghuz it has been mostly displaced by etmek, and by tu in Chuvash;
    (4) Khakas kiche:, Altay keche, Tatar kichê, Bashkir kisê(ge), Kazakh keshe, Kyrgyz keche, Uzbek kecha "yesterday", as opposed to probably more archaic Tuvan dün, Uzbek tünügün, Karachay tünene, Oghuz-Seljuk *dün;
    (6) Altay ölöN, Tatar ülên, Bashkir ülên "grass". Moreover, according to Sevortyan's dictionary, cf. Khakas, Kumyk ölöN (or similar) meaning "feather grass (=Stipa, one of the most typical kinds of grass in the steppe)"; "Elytrigia (type of grass)" in Sakha; "Carex (sedge)" in Kyrgyz, Kazakh; "grass" in Uyghur, Uzbek, though modern dictionaries do not confirm most of that;
    (7) Khakas köberge, Altay köbör, Karachay köberge, Kyrgyz köbü:, Uyghur qaparmak "to swell (as of a finger, foot)";
    (8) Khakas sörtirge, Altay sü:rte:r, Tatar söyrêu, Bashkir höyrêu, Kazakh süyrêu, Kyrgyz süyrö: "to pull (behind oneself)";
    (1) Khakas, Tuvan,Tatar, Bashkir, Karachay, Kyrgyz, Kazakh, Altay, *qol as opposed to Oghuz-Seljuk *el, *elig, Sakha il:i, Chuvash alâ; probably an archaism;
    (9) Tuvan t.ö:, Khakas tigi, Tatar tege, Bashkir tege, Kyrgyz tigi "that (furthest) (adj)", e.g. "that book"; probably a retained archaism, perhaps even of Nostratic type;

    It seems that Great-Steppe and Altay-Sayan are more closely related to each other, than either of them is related to Oghuz-Seljuk, Sakha or any other remaining Turkic taxa.


    The Kyrgyz-Chagatai subtaxon

    The languages that supposedly belong to this subtaxon are (1) Kyrgyz, Kazakh and Karakalpak, (2) medieval Chagatai, modern Uzbek and Uyghur.

    The history of Kyrgyz-Chagatai
    According to historical records, the Karluks left the Altay mountains c. 665, reaching the Amu-Darya River c. 700. After the Battle of Talas in 751, when the Chinese were defeated by the Arabs and the Arabic supremacy in the region was established, the Karluks were able to form the Karluk Kaganate (in 766) by occupying Suyab, the capital of the Western Turkic Kaganate. The final fall of the Eastern Gökturk Kaganate in 840 left the Karluks in full possession of the Jeti-Su region (the area between the northern Tian Shan and Lake Balkhash). These events must have led to the formation of the early Kyrgyz of Kyrgyzstan and, ultimately (after the 1450's) the Kazakh and Karakalpak languages, though neither the exact details, nor the historical relatedness between Karluk and Kyrgyz were clearly documented.

    Virtually nothing is left of the Karluk language, however based on its geographical, temporal proximity to Kyrgyz-Kazakh we may assume that Karluks were essentially a tribe closely related to the early Kyrgyz of Kyrgyzstan, therefore they may be included into the same subtaxon.
    As we will explain below, the Kyrgyz-Kazakh subtaxon is also closely related to the Chagatai subtaxon (often named "Karluk" in Baskakov's classification) (see a separate paragraph below), which includes Uzbek and Uyghur. The Baskakov's name "Karluk" for this subtaxon is unacceptable on the same grounds explained above: the ethnic affiliation and the exact Turkic dialect spoken by the Karluks were a rather obscure matter.

    Kazakh is closely related to Kyrgyz
    Before we proceed with the discussion of larger taxa, we will attempt to show the close linguistic relatedness between Kazakh and Kyrgyz, which is an important question for the historiography of Kazakhstan and Kyrgyzstan.

    The Kyrgyz and Kazakh ethnonymic confusion
    Before the 1920s the Kazakh people were traditionally known as Kirgizy "Kyrgyzes" among Russians. As the often cited anecdote goes [apparently, first mentioned by Kurbangali Khalid (1843-1913)], when asked about his ethnic affiliation, a Kazakh would normally answer something like, "Men Qazaq-pyn" but corrected by a 19th century's Russian officer, "What kind of Kazak you are! You're a Kirgiz!".
    The discrepancy is probably due to the frequent application of the ethnonym "Kazak" to the Cossacks of the Polovtsian Steppe (pronounced in Russian as /kazAk/, nearly in the same way as /kazAkh/ "Kazakh"), which inevitably resulted in conflation. As Max Vasmer's Russisches Etymologisches Woerterbuch (1950-58) suggests, based on Radlov, the original meaning of Kazak was "free-lancer, an independent adventurer, soldier of fortune", thus it could be applied in the medieval period to many different unconnected groups of Turkic, Slavic or any other origin. Consequently, to avoid confusion, the Kazakh were officially called Kazakh Kirgizes, whereas "Kyrgyzes of Kyrgystan" — Kara Kirgizes. And indeed, in many 19th century's publications, such as Radloff's Versuch eines Woerterbuches der Tuerk-Dialekte (1893) printed in German and Russian, Kazakh was formally named Kirgiz (Kirgizischer Dialekt), whereas Kirgiz was formally named Kara-Kirgiz (Kara-Kirgisischer Dialekt). The Kara-Kirgizskaya Autonomous Oblast was actually the earliest official title of Kyrgyzstan given in 1924.
    As to the origins of the ethnonym Qyrqyz, there are more wild guesses than well-argued explanations. The name is obviously at least 1500 years old, as it was first mentioned in the Orkhon inscriptions (720's), though probably had existed even earlier. It seems to be the original name applied not only to Yenisei Kyrgyz tribes, but also to the members of the Kyrgyz Kaganate, and in the general sense, to most Turkic tribes of the eastern part of the Great Steppe, at least until the Mongol invasion. As a result, it is actually very difficult to differentiate between the Yenisei Kyrgyz, the Kyrgyz of the Kyrgyz Kaganate, and the early Kyrgyz of Kyrgyzstan, though all of them seem to be different entities.
    Phonetically, the word Qyrqyz can be associated with qyr- "break, smash" or qorq- "fear". It seems to be a reduplication, typical of Turkic languages, where the root *qyr-qyr was repeated for emphasis, but the second word-ending -r mutated to -z according to the law of zetacism in Turkic Proper. The original meaning was therefore "breaker" (strong warrior). Additionally, and most likely, as explained above, it must have originally been a name or a war alias of a clan progenitor or chief, which later spread to his clan as a whole (as in the case of Seljuk, Noghai, Uzbek, etc). The event could probably be dated to as early as the beginning of the common era, judging by the action of the zetacism law, thus placing it among the oldest self-appellations used by the Turkic peoples.

    Specific phonological features in Kazakh-Karakalpak

    The similarities between Kyrgyz and Kazakh are so many that it is easier to discuss their differences in the first place. Most following phonological differences between Kyrgyz and Kazakh seem to have emerged in Kazakh and Karakalpak because of their secondary contact with the Kimak-Kypchak-Tatar languages and some unknown Southern Uralic substratum. These are just the few phonological features that contradict the Kyrgyz-Kazakh subgrouping, the languages which are similar in most other respects.

    Phonological differences between Kyrgyz and Kazakh-Karakalpak
    KyrgyzKazakh, Karakalpak
    ch > shchach "hair" shash, whic is similar to Nogai shash and Bashkir säs. The difference can probably be attributed to a local Southern Uralic substratum;
    sh > sbash "head"; tish "tooth"
    bas, tis, which is similar to Nogai bas, tis; probably due to the action of an unknown Southern Uralic substratum, since similar transitions are also found in Bashkir.
    -0- : -w-buur "liver" bawïr; similar to Tatar bawïr, Bashkir bawïr, Nogai bawïr, Karachay bawur;
    -0- : -y-söök "bone" süyek; similar to Tatar söyek, Bashkir höyäk, Nogai süyek, Kumyk süyek, Karachay süyek; the -y- formation in this word is not found elsewhere and seems to be innovative;
    -u- : -ï- in suffixeskuyruk "tail" quyrïq; similar to Tatar qoyrïq and Nogai quyrïq. This is an innovative Tatar feature, as most TL's have -u- in the 2nd position, see the Starling database;
    Also cf. a similar table for Kypchak-Tatar (below).
    Consequently, we can see that the phonological differences between Kyrgyz and Kazakah-Karakalpak are also shared by some of the Kimak-Kypchak-Tatar languages that were part of the Golden Horde. Such phonetic evidence probably led Baskakov to believe that Kyrgyz and Kazakh are not even closely related, and Kazakh should be regrouped with Nogai. However, judging from the good lexical coincidence between the two languages, this is clearly not the case. Rather, these must be just few secondary changes in Kazakh-Karakalpak which resulted from the posterior interaction of early Kazakh with the languages of the Golden Horde.

    The grammar of Kyrgyz and Kazakh
    Both Kyrgyz and Kazakh share a great number of archaic features, many of which are also known to exist in the Altay-Sayan Turkic languages. As far as the innovative elements are concerned , Kyrgyz and Kazakh seem to share the following features:
    (1) Both Kyrgyz and Kazakh use the typical 2nd person plural pronoun, apparently absent in other branches, cf. Kyrgyz sizder, siler, Kazakh sizder, sender.
    (2) A rather unique type of the instrumental case, cf. the Kyrgyz menen e.g. qol menen "with the hand", Kazakh -men, -pen, -ben; also menen. Although this feature is probably archaic as *menen is also known in certain dialectal variations, such as Eastern Bashkir and Sagai Khakas.

    However, beside that, there is also some notable discrepancy in grammatical usage

    pronouns in the ablative, e.g. "from me"men-denmen-en
    pronouns in the dative, e.g. "to me"ma-Gama-Gan
    the possessive suffix for "sender" (you, plural, informal)-Nar, -ner, -nör-Ndar, -Nder
    The formation of the future tense-baq / bek-, -paq / pek-, -maq / mek
    endings in the 3rd person plural, present tense -(she)t, as in barï-shat (they go)-di, -dï
    endings in the 3rd person plural, past tense-d-ïshtï, -d-ishti-di, -dï
    The rather odd Kyrgyz formation barï-shat apparently results from the superposition of the mutual mood marker -sh- with a posterior vowel metathesis: barï-sh-tïr > barï-sh-tï > barï-sha-t.

    The lexis of Kyrgyz and Kazakh
    Kyrgyz seems to be a rather archaic language with a minimum number of lexical borrowings, which clearly sets it apart from Kimak-Kypchak-Tatar that include a number of Oghuz innovations and Perso-Arabic loanwords (see below).
    Speakers of both Kazakh and Kyrgyz usually report good mutual intelligibility and sometimes state that they are bir tuGan, ethnic brothers of the same kin. The differences in Swadesh-215 seem to be very small, no more than 8%, and in some cases these are just minor inconsistencies in dictionaries. Only the following clear-cut mismatches were found in the original Swadesh-200:
    legbut (as in Altay), also ayaq "foot"ayaq
    big choN, apparently from Altay Ja:N.
    Also ulu: "great"
    whatusually emne, also frequently nene
    thattigianau, sonau
    sniff, smell usually jïto:, but more literary or formal iisko:iiskeu
    singï:rdo:, also jïrlau (?)zhïrlau
    wetnïm, nïmdu: (< Perso-Arabic nam "moisture")ïlgal
    to swellköbü:, shishü:isip-kebu, isinu
    sharpkurch, also ötkürötkir
    thinichke, jukêzhiñishke, zhûqê "fine, thin work"
    to burnküyü:, also janu:zhanu
    to hear ugu, eshïtu (probably outdated or dialectical)estu
    correcttu:ra, sometimes durus "decent, right"dûrïs
    wipeaarchu, also sürtü:sürtü
    Among the local isolexemes, apparently absent in other languages, the following could be found:
    Kyrgyz küyö, Kazakh küyeu "husband";
    Kyrgyz chöp, Kazakh shöp, Uyghur chöp, "grass";
    Kyrgyz sogu:, Kazakh soGu "blow (of wind) (originally: strike)";
    Kyrgyz soru:, Kazakh soru "suck" also exist in Altay-Khakas and/or Uzbek-Uyghur but seem to be absent or not typical in Tatar-Bashkir;
    Kyrgyz özön, Kazakh özön "river", typical in this meaning only of Kyrgyz-Kazakh, though is also known in Kumyk, Tatar, Salar, Altay, etc as "brook", "stream" and Crimean Tatar "river" (which may be an independent semantic mutation);
    Kyrgyz qachïq, Kazakh qashïq "far away" (from kach- "run away");

    Also, cf. similar pronunciation in
    Kyrgyz jumurtqa, Kazakh zhûmurtqa "egg";
    Kyrgyz jalbïraq, Kazakh zhapïrak "leaf", which are rather unique among other Turkic (and presumably archaic).

    The history and geography of Kazakh
    The Kazakh Khanate was founded in 1456-1465 by Janybek (Zhany-bek) Khan and Kerey Khan in the Jety-Su area (that is in the southeastern part of the present-day Kazakhstan), following a successful rebellion against the Uzbek Ulus and its Khan Abu'l-Khayr Khan, a descendant of Genghis Khan [described by Mukhammed Khaydar in Tarih-i-Rashidi]. The early years of the Kazakh Khanate were marked by the struggle against the Uzbek leader Muhammad Shaybani, who was defeated in 1470.

    Consequently, the Jeti-Su (Zhetysu) ("The Seven Waters") area to the north of Almaty and especially the area of the Chu river, can be regarded as Kazakh Urheimat, where the Kazakh Khanate was first founded and where the Kazakhs began their expansion to the Great Steppe in the north. On the other hand, the Chu River, that now runs along the Kazakh-Kyrgyz border from the present-day territory of Kyrgyzstan, is also often seen as a traditional Kyrgyz habitat. Actually, this is where Bishkek, the capital of Kyrgyzstan, is located. Almaty, the largest city of Kazakhstan, is only 200 km (120 miles) away from Bishkek across the Zaili (=Trans-Ilian) Alatau Ridge, so both settlements are situated at the foot of the Tian Shan Mountains, nearly in the same area. Consequently, the geographic and historical connection between the two ethnicities becomes quite evident.
    The dialectical differentiation in Kazakh
    There are at least two major dialectal groups within the Kyrgyz language: the Northern and Southern dialects. This dialectal differentiation in Kyrgyz marks it as a slightly "older language" than Kazakh, which is much more uniform. Indeed, Kazakh is often reported to have no dialects at all, despite the large territory it occupies, especially in popular, nonscientific sources. However, this is not entirely true. The Western Kazakh dialect may differ (or may have differed in the past before the mass Russification and the TV standardization) from the Eastern one in several ways, including such features as the /J/-/zh/ pronunciation, the usage of -zhaq / zhek for the future tense, etc.
    Moreover, certain minority dialect-languages in Astrakhan (along the Volga) can presently be viewed as nothing but westernmost dialects of Kazakh, since they share 98% of mutual intelligibility with it, e.g. the so called Karagash Nogai language (not to confuse with Nogai Proper on the Caspian Sea).
    In any case, the weaker dialectal differentiation in Kazakh as compared to Kyrgyz marks it as a little "younger" language that must have been spreading from the area of stronger dialectal differentiation, such as the foot of the Tian Shan Mountains near Kyrgyzstan.

    Alternative hypotheses
    The placement of Kyrgyz within the same subgroup as the Altay Turkic languages was popularized by the famous Baskakov's classification, which became a generally-accepted standard in the Soviet-Russian turkology [Baskakov, N.A. Klassifikatsiya tyurkskikh yazykov v svyazi s istoricheskoy periodizatsiyey ikh razvitiya i formirovaniya (The classification of Turkic languages as connected to the historical periodization of their development and formation), Moscow (1952)]. However, judging by his later works (1960-88), it turned out that there was no or little specific argumentation for this taxonomic placement. Generally speaking, Baskakov's classification was based on phonological and grammatical features, personal intuition alone, excluding vocabulary comparison.

    The close relatedness between Kazakh and Kyrgyz is hardly deniable. In fact, they are so lexically close (92%, Swadesh-215) that under certain simplifying circumstances they could even be viewed as very distant dialects or variants of each other, however, the notable discrepancy in phonology and grammar marks them as distinct languages.

    Based on (1) the weaker dialectical differentiation in Kazakh as compared to Kyrgyz, (2) the presence of notable Kimak-Kypchak-Tatar phonological features, (3) the geographical proximity of Kazakh to the languages of the Golden Horde, (4) its original location along the Chu River, near the present-day Kyrgyzstan border, we can draw several conclusions concerning the early Kazakh history. Kazakh can be viewed as a historically recent 14th-16th century expansion of Kyrgyz-related tribes from the Tian-Shan into the northern steppeland. Because of the expansion over the large territory of the Kazakhstan steppe, the early Kazakh tribes must have made contact with various languages and dialects of the Golden Horde, specifically the early Noghai and Tatar dialects along the Volga and Ural (Yaik / Jaik) River, and probably even Bashkir in the Southern Ural Mountains. This contact must have resulted in the formation of a "Tatarized" or "Kypchakicized" form of the medieval Kyrgyz, which finally led to the emergence of the present-day Kazakh and Karakalpak languages.


    Altay-Kyrgyz isolexemes
    However, this is not that simple. Besides the close proximity between Kazakh and Kyrgyz, there exist there exist several Altay-Kyrgyz isoglosses, which make the Kyrgyz relationship much more complicated:

    Altay and Kyrgyz lexis and phonology

    In basic vocabulary, both Altay and Kyrgyz share a number of isolexemes:
    (1) Altay jaan, Kyrgyz choN, and Uyghur chong "big";
    (2) Altay kurch, Kyrgyz kurch "sharp (as of a knife)";
    (3) Altay moko, Kyrgyz mokok "dull (as of a knife)" (also cf. Tuvan mugur, probably from Mongolian);
    (4) Altay d'ün, Kyrgyz jün "feather" (cf. Kazakh qawïrsïn, Khakas chüg);
    (5) Altay sok, sogor, Kyrgyz sogu:, Kazakh soGu "to blow (as of wind) (literally "strike");
    (6) Altay uk, Kyrgyz ugu: "to hear" (also in Khakas, Uyghur, Kazakh "understand", but most typical of the Altay dialects) The word may be related to the Mongolian uqa-/uxa- "to understand" [see Sevortyan's dictionary (1974)];
    (7) Altay bul, Kyrgyz bul, Kazakh bûl, and also Bashkir bïl "this", instead of the apparently more archaic *bu (and despite the alleged Starling's external etymologies, where the Altaic words for "body" seem to be used);
    (8) Altay sler, Kyrgyz siler, sizder "you (plural)", and similar but not identical Kazakh secondary formations sender, sizder. This is isolexeme is obviously not exclusive to Kyrgyz-Altay, but widely used in Altay-Sayan, Uyghyr as well as probably in some other "eastern" Turkic languages;
    (9) Altay küyer, Kyrgyz küyü: "to burn (intr.)" (also in Khakas, Tuvan);

    Moreover, note the following phonological similarities:
    (1) Altay üren, Khakas üren, Kyrgyz ürön "seed", as opposed to Kazakh ûrïq, Uzbek uruG, Uyghur uruq;
    (2) Altay sö:q, Khakas sö:q, Tuvan sö:q, Kyrgyz sö:q "bone", as opposed to Kazakh süyeq, Uzbek suyoq, Tatar söyaq;
    (3) Altay o:s, Khakas a:s, Tuvan a:s, Kyrgyz o:z "mouth", as opposed to Kazakh awïz, Tatar avïz;

    In other words, the typical Altay-Sayan phonological contraction that we have discussed earlier is also present in Kyrgyz, at least to some extent.

    Kyrgyz history
    One of the most dramatic historical period in the history of the Kazakh nation was marked with the long-lasting struggle (1723- 1758) against the Dzungarian Khanate that ruled over East Turkestan and West Mongolia. This severe and brutal conflict finally forced the Kazaks seek alliance with the Russian Empire in 1731. It is assumed herein that this period could also be marked by the supposed Altay-Kyrgyz migrations, which might have brought Altay Turkic to the Tian Shan, though this is just a hypothesis. In any case, similar Altay—Tian-Shan migrations are mentioned in the Manas, the Kyrgyz epic, and may also be reflected in the conflation between the Altay-kizhi people (=Standard Altay speakers) and Oirots (=Dzungarians of Mongolic origin), who retained the latter name well into the Soviet era. Indeed, the Kyrgyz people had been pushed by the Oirat invasion into the Ferghana valley. Moreover, some of the Oirats, known as Sart-Kalmaks, survived the downfall of the Dzungarian Khanate (1755-58) and became part of the Kygyz tribes [The Great Russian Encyclopedia (2005)].

    Kyrgyz geography
    The present-day mountain habitat of the Kyrgyz people in the Tian Shan appears to be a typical isolated refugium formed after several military invasions from the Kazakhstan steppe and Taklamakan desert, such as the Mongolian invasion (c. 1220-1450), and the Dzungarian invasion of (c. 1720-1750's). This predicts the Kyrgyz presence in the Jeti-Su (Zhetisu) area and the Ili Valley during the early Middle Ages, stretching along the northern part of the Silk Road.

    Since many or most of the Altay-Kyrgyz isoglosses are also found in Khakas and sometimes even Tuvan, and (1a) Altay has been shown above to belong to the Altai-Sayan taxon, on one hand, and (1b) Kyrgyz has been shown above to be closely related to Kazakh, on the other hand, and (2) few of these words are found in the closely related Kazakh language, we may conclude that most of these unexpected Altay-Kyrgyz isoglosses are late borrowings brought into Kyrgyz from Altay Turkic somewhere between the 1500-1900's, that is after the separation of Kazakh. The most likely historical event that occurred in this geographic region during that historical period was the Dzungarian invasion. Therefore, we may suppose that there existed an 18th century's military migration from the Altai to the Tian Shan Mountains, which brought these originally Altay lexemes into Kyrgyz, making it presently look more similar to Altay Turkic than it actually is.
    In any case, we can infer that Kyrgyz is still more closely related to Kazakh than to any other Turkic language, whereas the Altay-Kyrgyz shared features may result from the secondary interaction between Altay and Kyrgyz.

    The Chagatai subtaxon (Uzbek/Uyghur) looks like Kypchak affected by Karakhanid
    The Chagatai subtaxon includes medieval Chagatai, modern Uzbek, Uyghur and their dialectal variations.

    The Chagatai subtaxon
    First of all, note that with just 86% of lexical proximity in Swadesh-215 (obvious borrowings excluded), the Uyghur and Uzbek languages (and their internal dialects) must be as close to each other as Turkish and Azeri, which is the common example of closely related languages in the Turkic group and outside of it. Both languages received their respective names only in the 1920's, and had been known as Chagatai, Sart or Türki for most of the time before that. Therefore, from the linguistic perspective, they must belong to a special Chagatai subtaxon, often known as Karluk in Baskakov's classification and those of his followers. However, as we have explained above, the exact origins and linguistic affiliation of Karluks is very obscure, and it is far from clear in what relationship the Chagatai people stood to the Karluk tribes. Moreover, this kind of misplacement of ethnonymic stress seems to make the Chagatai language and its well-known relatedness to Uzbek and Uyghur unjustly forgotten, which may make one wonder what kind of Turkic language Chagatai possibly was.
    For these reasons, the name "Karluk" for this taxon seems to be unsuitable and should probably be replaced with Chagatai.

    Chagatai-Uzbek-Uyghur geography

    Just as the neighboring Kyrgyz, the Chagatai-Uzbek-Uyghur languages have originally occupied mountain territories along the Tian Shan range as well as suitable oases in the nearby deserts. The Tian Shan is one of the longest mountain ranges in Central Asia forming part of the natural barrier between the Great Steppe in the north and Taklamakan and Dzungaria deserts in the south.

    The Tian-Shan mountains
    A topographic map of the Tian Shan Mountains [topomapper.com (2011)]

    Chagatai-Uzbek-Uyghur history

    The Chagatai Ulus was a Turko-Mongol Khanate inherited by Chagatai Khan (1183-1241) — the second son of Genghis Khan (1162-1227) — and ruled by his successors. The true founder of the Chagatai Ulus was Alghu, the grandson of Chagatai, who in 1261 established control over most of its territory but died in 1266.

    The Chagatai Khanate
    Chagatai Khanate [en.wikipedia.org (2011)]

    Giovanni da Pian del Carpine, who was passing through the Chagatay Ulus to the north of Tian Shan in 1245, describes the evidence of great destruction in the nearby western territory, left by the war with the Mongols:

    Moreouer, out of the land of the Kangittæ [= probably, the land of Kangly, the Ustyurt Plateau or nearby area], we entered into the countrey of the Bisermini [= apparently, a vague alias for Turkic-speaking Muslims, cf. dialectal Russian basurmany from musulmany "Muslims"], who speake the language of Comania [= by Cumania the author meant the land between the Kievan Rus in the west and the Volga River in the east, where Cuman-Polovtsian, or (Old) Kypchak, was spoken], but obserue the law of the Saracens [= Islam, Sharia]. In this countrey we found innumerable cities with castles ruined, and many towns left desolate. The lord of this country was called Soldan Alti, who with al his progenie, was destroyed by the Tartars [= the Mongols, Tataro-Mongols, Turko-Mongols, the Tatar tribes directed by the Mongols]. This countrey hath most huge mountains [= apparently, the Tian Shan]. On the South side it hath Ierusalem and Baldach [= Baghdad], and all the whole countrey of the Saracens [=Arabs, Muslims]. In the next territories adioyning doe inhabite two carnall brothers dukes of the Tartars [= Mongols], namely, Burin and Cadan, the sonnes of Thyaday [= Chagatai], who was the sonne of Chingis Can.
    [Frier Iohn de Plano Carpini, The long and wonderful voyage of Frier Iohn de Plano Carpini, (1245-46)]
    Political strife in the Chagatai Ulus never ceased since the days of its formation. In 1346, a tribal chief Qazag-han from the Mongolic tribe of Qaraunas in Afghanistan and eastern Persia (Babur noted that they still spoke Mongolian in the late 15th century) killed the Chagatai Khan Qazan during a revolt. Qazan's death marked the end of effective Chagatayid rule over Transoxiana. As a result, the administration of the region fell into the hands of the local tribes of Turkic and Mongolic origin. Using the disintegration, Janibeg Khan, the ruler of the Golden Horde from 1342 to 1357, asserted Jochid dominance over the Chagatai Khanate. [Note: It is believed that Janibeg's army had catapulted infected corpses into the Crimean port city of Kaffa (1343) in an attempt to use the plague to weaken the defenders. Infected Genoese sailors subsequently sailed from Kaffa to Genoa, introducing the Black Death into Europe.] However, the Chagataids expelled his administrators after his assassination in 1357. By 1363, the control of Transoxiana was contested by two tribal leaders, Amir Husayn (the grandson of Qazaghan) and the famous Timur or Tamerlane. Timur eventually defeated Amir Husayn and took control of the state.

    It is conjectured herein that the devastation caused by the Mongol invasion, desolation of towns, the ensuing internal turmoil, the subsequent intervention of the Golden Horde, the spread of the deadly diseases and the later military conquest of the Golden Horde territories by Timur (Tamerlane) resulted in supplanting of the Karakhanid language by northern Turkic languages of the Great Steppe, such as Kyrgyz, early Kazakh, Karluk, Kypchak. The legacy of these events led to the Karakhanid language of the Tarim Basin losing its political dominance and cultural significance and being replaced by an unknown language of the Great Steppe, similar to Kyrgyz, as a result of the continual movement of armies and several supposed demographic migrations of the 14th century.

    Consequently, the early Chagatai language that emerged during that period, was essentially a type of mixed language with mostly Kyrgyz-Kazakh glossary and Karakhanid phonology.

    Chagatai-Uzbek-Uyghur phonology

    By taking a closer look at the actual lexical and phonological differences (see the table below), we may conclude that Uzbek and Uyghur phonology is similar to Karakhanid, e.g.:

    (1) the innovative *S- into y- mutation as in Orkhon-Karakhanid, e.g. Uzbek, Uyghur, Karakhanid yol "way" as opposed to Kyrgyz Jol, Kazakh zhol; Uzbek yurak, Uygur, Karakhanid yürek "heart" as opposed to Kyrgyz Jürek, Kazakh zhürek;
    (2) the retention of -N- as in Karakhanid, cf. Karakhanid müNüz (horn), Uzbek mugiz
    , Uyghur müNgüz; Karakhanid süNük "bone", Uyghur söNäk (but Uzbek suyak), as oppose to Kyrgyz sö:k, Kazakh süyek;
    (3) the intervocalic -G- and the final -G, cf. Karakhanid taG, Uzbek tôG (mountain)
    , Uyghur taG; Karakhanid baGïr, Uyghur beGir "liver". By contrast, the languages of the Great steppe all have -w- and -w in this case;
    (4) the initial b- instead of m- as in Karakhanid, cf. Karakhanid boyun, boy
    ïn, Uzbek bûyin, Uyghur boyin "neck", as opposed to Kyrgyz moyun, Kazakh moyïn;
    (5) the retention of -vq in certain words, such as in Karakhanid yuvqa, Uzbek yupka
    , Uyghur yupqa "thin", as opposed to Kyrgyz Juka;
    (6) the lenition of -d-, -t-, into -l-, as in Uzbek -lar, Uyghur -lar, -lêr, as opposed to Kyrgyz -lar, -ler, -lor, -lör, -dar, der, -dor, dör, -tar, -ter, -tor, -tör with fortified consonants and other similar fortification in the languages of the eastern part of the Great Steppe.
    On the other hand, the Kyrgyz phonological influence in Uzbek, Uyghur and, probably even additionally, the Kimak-Kypchak-Tatar influence in Uzbek is also quite evident, e.g.
    (1) the innovative metathesis in Uzbek yamGir, Uyghur yamGur as in Tatar yaNgïr, Bashkir yamgïr, Nogai yamGïr, Kyrgyz Jamgïr, instead of the Old Uyghur yaG-mur from *jaG- "to fall, to rain" and *mur, the typical Proto-Altaic word for "water";
    (3) Uzbek mûgiz, Uyghur müNgüz similar to the Tatar mögez, Bashkir mögöð, instead of Karakhanid müNüz, Old Uyghur müyüz;
    (4) Uzbek sovuk similar to the Tatar sïwïq, Bashkir hïwïq, Nogai suwïq instead of Karakhanid suGïq, but partly retained in Uyghur soGaq;
    (5) Uzbek yaproq from Kimak-Kypchak-Tatar *yapraq instead of Karakhanid yapurgak, but partly retained in Uyghur yapurmaq

    The table below shows phonologically dissimilar words in Turkic languages of Central Asia. Note that Uzbek, Uyghur and Karakhanid are mostly colored dark red, marking Uzbek-Uyghur lexical and phonological relatedness to Karakhanid, with a few Kimak-Kypchak-Tatar borrowings in Uzbek.

    A List of Phonologically Dissimilar Basic Words in Central Asian Turkic Languages

    Turkmen KazanTatar
    not (adj, nouns)dälKT. tügelemesemasemesemes ärmäs; ämäs (rare)
    täkül (cited only as Oghuz)
    hornbuynuz; shax KT. mögez;
    B. mögöð
    Kg. müyüz;
    Kz. müyiz;
    (mugiz); shoxmüNgüzmoNïzmüNüz, muNuz
    bonesüNkKT. süyäk;
    B. höyêk
    N. süyek;
    Kg. söök;
    Kz. süyek;
    coldsowukKT. sïwïk
    B. hïwïq
    Kg. suuk;
    Kg. suïq;
    liverbaGïrKT. bawïr;
    B. bauïr
    Kg. boorzhigarbeGirpaGïrbaGïr
    mouthaGïzKT. awïz;
    B. auïð
    Kg. oozoGizeGizaGïzaGïz
    mountaindaGKT. tau;
    B. tau
    Kg. tootoGtaGtaG taG
    neckboyunKT. muyïn;
    B. muyïn;
    N. moyïn
    Kg. moyunbûyinboyunpoynï, puynï boyin
    roundöwreKT. yomrï;
    B. yomoro
    Kg. Jumuruyumaloqyumlaq yumGaq
    KT. yaNgïr;
    B. yamGïr
    N. yamGïr
    Kg. JamgïryomGiryamGuryaGmuryaGmur
    smallkichiKT. keche;
    B. kese;
    Kg. kichinekeykichkina, kichikkichikkichi, kiJikichik
    sleepu:qla-KT. yokla-;
    B. yoqla-;
    N. uyqla-;
    CT. yuxla-
    Kg. uktoo, uyku:
    leafyapraGKT. yapraq;
    B. yafraq
    Kg. Jalbïraq(yaproq); barg yopurmaqyärfïx, yaRfaxyapurGaq
    dryGurï KT. korï;
    B. qoro
    Kg. qurGakquruqquruqquru, qurï quruG
    homeöyKT. öy;
    B. üy;
    Kg. üyuyöyoyev, äw
    seedtoxumKT. orlïq
    B. orlok
    Kg. ürönuruGuruqashlïxuruG
    bitedishle-KT. teshlê-
    B. teshlê-
    Kg. tishte-tishla-chishlä-chishlï-tishla-
    earthtopraGKT. tufrak
    B. tupraq
    Kg. topuraktuproqtopatorïx, toraxtubra:q
    treeaGachKT. aGach
    B. aGas
    Kg. JïgachyoGoch "wood'; daraxt däräxta:lyIGach
    grassotKT. ülên
    B. ülên
    Kg. ot, chöp utchöpchöpot
    thinincheKT. nechkê
    B. nêðek
    Kg. ichke iNichkainchikäläshgiyinchkä
    thin (2) yuGa, 'uka KT. yukaKg. Juka yupqayuqqayoxbayuvqa
    eatiy-KT. asha-
    B. asha-
    N. asha-
    Kg. Je-ye-yä-yï-yä-
    bellyGarïnKT. korsak
    B. qorhaq
    Kg. qarïn qorinqo(r)saqxusaxqarïn

    Chagatai-Uzbek-Uyghur grammar
    However, in grammar the most essential features of Orkhon-Karakhanid are usually absent, and may only be occasionally present in Chagatai.
    (1) the lack of the archaic copula er-/är- (see below), and its mutation to e- in emes, as in the languages of the Great-Steppe, etc;
    (2) the lack of the typically Karakhanid usage of the 3rd pers. singular pronoun ol as a copula (see below), as in ul mêniN oGlïm ol, literally "he (is) my son-he", and its mutation to zero as in most Turkic;

    (4) the absence of the Future Tense with -Gay, -gey (see below) in Uzbek-Uyghur, though retained in written Chagatai as -Ge, as in Kazakh, Kypchak-Kimak;
    (3) no persistent usage of -mïsh- (replaced by -gan- as in other languages of the Great Steppe), though -mïsh- is still sporadically present in Chagatai and Uzbek dialects;
    (4) the absence of the archaic instrumental case ending -(n)ïn, as it should be in the languages of the Great Steppe;

    Quite to the contrary, the typical grammatical features of the Kazakh-Kyrgyz are present
    (1) the typically Great-Steppe verbal ending -di/-dï in the 3rd person singular in the present and future tense, e.g. Uzbek borap-ti "s/he is going", bara-di "s/he will go", Uyghur yazi-du "s/he, they (will) write", cf. Kyrgyz bara-t "s/he will go", Kazakh bara-dï "s/he is going",
    (2) the usual Great-Steppe verbal ending -d-ik in the 1st pers. plural Past Tense (bordik "we went, keldik "we came"), though it seems to be used interchangeably with the Karakhanid -dimiz > -divuz in the Toshkent dialect of Uzbek (barduvuz "we went", keldivuz "we came"). The -d-ik type of suffix also seems to be occasionally attested in Karakhanid sources, but it had never been original to the Orkhon-Karakhanid subtaxon.

    Chagatai-Uzbek-Uyghur lexis
    To which subgroup within the Great Steppe taxon is Chagatai-Uzbek-Uyghur related most?
    According to the lexicostatistical research (2012), there is about 83% of average lexicostatistical distance from Uzbek-Uyghur to Kyrgyz-Kazakh, about 78% to Tatar-Bashkir, and about 74% to Turkmen (with borrowings excluded), which marks Kyrgyz-Kazakh as the most closely related subtaxon.
    Uzbek-Uyhgur and Kyrgyz-Kazakh may seem to share a couple of presumably innovative isolexemes in Swadesh-215, apparently missing or atypical in other subgroups, such as
    (1) Uzbek yiqilmoq, Uyghur yiqilmaq, Kazakh zhïGïlu, Kyrgyz zhïGïlu "to fall";
    (2) Uzbek dumaloq, Uyghur domlaq, Kazakh domalaq "round (such as wheel, lake, table)";
    (3) Uyghur chöp, Kazakh shöp, Kyrgyz chöp "grass";
    (4) Uyghur ugulumaq, Kazakh uqalau, Kyrgyz ukalo: "to rub";
    (5) Uzbek bu yerda, Uyghur bu yerde, Kazakh bûl zherde, Kyrgyz bul zherde "here", also at least in Altay bu d'erde and Turkmen bu yerde "here". This phrase, of course, is not necessarily original, and may be a natural independent formation in several Turkic subgroups with some posterior contact spreading (for instance, probably into Turkmen which often borrowed from Great-Steppe);

    These 5 words constitute merely 2.5% (5/215) in Swadesh-215, so it is difficult to make any claims concerning particular relatedness of Uzbek-Uyghur to Kyrgyz-Kazakh. The 83% - 78% = 5% difference between Uzbek-Uyghur to Kyrgyz-Kazakh vs. Uzbek-Uyghur to Tatar-Bashkir relatedness may also be explained by errors in establishing exclusive isolexemes and the retention of archaisms as well as some posterior contacts in the Tian Shan. However, the general trend in the lexical analysis is to exclude the Kimak subgroup from direct Chagatai predecessors.

    That becomes even more evident, if we take into consideration the closer geographic proximity between Kyrgyz-Kazakh and Chagatai-Uzbek-Uyghur, as opposed to Kimak. Consequently, we should infer that a certain tribe related to Proto-Kyrgyz-Kazakh (such as Karluk) came into contact with Karakhanid by the 13th-14th century, thus resulting in the formation of the early Chagatai, whereas the Kimak tribes could not have played any significant role in this interaction.

    Apparently, the Chagatai-Uzbek-Uyghur subgroup turns out to be a sort of a secondary dialectical "creolization seam" resulting from the linguistic contact between the two Turkic subtaxa: a certain language of the Great Steppe closely related to Proto-Kyrgyz-Kazakh and Karakhanid of the 10-12th century. It seems that Karakhanid, which had been a local substratum in the Tian Shan Mountains, was overrun by this Proto-Kyrgyz-Kazakh tribal migration during the turmoil of the Mongol invasion of the 13th-14th century. Consequently, the Uzbek and Uyghur languages inherited the Great Steppe grammar and lexis, but acquired some superficial phonological features from Karakhanid.
    Approximate glottochronological calculations suggest that the separation of Proto-Chagatai and Proto-Kyrgyz-Kazakh (naturally, known as just Kyrgyz at the time) must have occurred at least a few centuries before the Mongol invasion, c. 1000 AD, so it is difficult to attribute Proto-Chagatai directly to Proto-Kyrgyz-Kazakh, rather it could have been a different tribe, such as Karluk, though the linguistic affiliation of the latter remains unknown.
    The formation of such "mixed" languages is a typical adstratic phenomenon occurring at the boundary of two linguo-geographical areas, sometimes involving strong influence from a third or forth superstratic component (in this case, Arabic and Persian). Without a doubt, this phenomenon deserves a separate detailed consideration elsewhere. Though in the case of Proto-Chagatai the Proto-Kyrgyz-Kazakh origins seem to be more clearly visible than any other.

    Additionally, Uzbek, and to a much lesser extent Uyghur, seem to have picked up certain lexical and phonological elements from Kimak-Kypchak-Tatar languages, but that process was less significant and did not affect the basic vocabulary of Uzbek and Uyghur to the same extent.

    The term Karluk should not be directly conflated with Chagatai, Uyghur and Uzbek as in Baskakov's classification. The Karluks seem to be an early Turkic tribe, most likely closely related to the modern Kyrgyz and Kazakh, that lived near the Tian Shan between the 8th and 12th centuries.


    The Kimak subtaxon

    The Kimak subtaxon, also sometimes designated herein as the Kimak-Kypchak-Tatar subtaxon, includes at least the following languages and dialects: Baraba, Sibir Tatar, Bashkir, Kazan Tatar, Mishar Tatar, Nogai, Kumyk, Crimean Tatar, and Karachay-Balkar (as a separate subtaxon within Kimak). It does not include Kyrgyz or Kazakh.
    Below, we will try to demonstrate that the above-mentioned languages share common innovative features.

    Kimak history and geography

    We know that the Tatars were first attested among other Turkic tribes in the Kul Tegin Orkhon inscription c. 732 in reference to the burial of Bumin Kagan in 552 with the following passage, "...Böküli Chölüg (=Korea), TabGach (=Chinese), Avar, Rome, Kirgiz, Uc-Quriqan, Otuz-Tatar, QitaN and Tatabi, this many people came..." [Türük Bitig, a site dedicated to Orkhon-Yenisei inscriptions].
    Even though it is possible that Tatars were mentioned even earlier in Chinese records, with such attestation as da-da or the like, hereinafter, we exclude any evidence from Middle Chinese because of its phonological vagueness.

    According to the genealogical legend recorded in detail c. 1030 by Gardezi in his work Zayn-al-Akhbar where he probably cites ibn Khordadbeh (820-912) writings, once upon a time, after the death of a leader of Tatars, there were two sons left after him. The younger son, named Shad, was envious of his elder brother—who was the heir to the kingdom—and attempted to kill him. So, Shad had to run with his slave concubine into the steppe near the Irtysh, where they settled down in a yurt and lived well for some time, hunting squirrels and ermines. As a result, some of his Tatar relatives came and joined them. These were the seven men named Imi, Imak (Yamak, Kimak), Tatar, Bayandur, Kipchak, Lanikaz, and Aj(a)lad all of whom also settled down at the Irtysh, and finally formed the seven tribes named after these founders. See [Gosudarstvo kimakov IX-XI vv. po arabskim istochnikam (The Kimak State of the 9-11th century according to the Arab sources), Kumekov, B.E.; Alma-Ata (1972)]. Most authors writing n the subject [Kumekov (1972), Marquart (1920)] date this legendary period to about 700 AD, which is also supported glottochronologically herein.
    By the time of Gardezi (c. 1030), and Mahmud al-Kashgari (c. 1070) , the Kimak tribes were well-established and described as well-known ethnicities. Mahmud al-Kashgari aptly mentions an idiom that says, "The snake has seven heads", meaning the seven original tribes of the Kimaks. The Arab geographer Al Idrisi (1099-1165), who created the famous (though very convoluted by modern standards) map of the world, too mentions the existence of 16 Kimak towns apparently located in the upper Irtysh basin and Lake Zaysan [see below].
    Therefore, the Proto-Kimak-Kypchak-Tatar tribes must have been situated somewhere along the upper course of the Irtysh River, where they finally formed their own Kimak Kaganate.
    The difference between the attested ethnonyms Kimak (Kimek) and Imak (Yemek) is poorly understood. It has been hypothesized, for instance, that the original name could have been preserved in the ethnonym Kumyk, therefore the original reading could be *Qïmeq, which was later misread or incorrectly recorded in the Arabic script with a different consonant.
    In any case, the Kimak (Kimek) Confederation / Kaganate/ Khanate was a prominent medieval Turkic state in the area of the middle and upper Irtysh River. It existed as the Kimak Kaganate from approximately 743 to 1050 AD, and as the Kimak Khanate until the Mongol conquest in the early 13th century. Even though the Kimaks were essentially nomadic, they also had many cities mostly in the Irtysh basin, such as Imakiya, which was the summer seat of the Kimak kagan, and which is said to have markets and temples.
    It can be inferred from certain evidence that during the 9th century CE, the Kimak tribes began to spread far away to the west. For instance, being clearly attested as (1) "Bashkirt" near the Southern Urals and the Volga River by Ibn-Fadlan in 921 and then as (2) "Tatar", "Bashkirt", "Kifchak", etc by Mahmud al-Kashgari in 1073, the Kimak-Kypchak-Tatars tribes must have expanded beyond the Ural Mountains somewhere between the 750s-900s, or, most likely, after the fall of the Göktürk-Uyghur Kaganate, that is after the 840's.
    The period of the Kimak spread to the northwest is supported archaeologically: somewhere between the 700-900 CE, there was a wave migrants into the Baraba Steppe that displaced the earlier Potchev culture. The new culture was characterized by inhumations in burial mounds with a horse, which is typically associated with the Kimaks and other Turkic tribes in general. [Arkheologija Zapadno-Sibirskoj ravniny (The Archaeology of the West Siberian Plain), Troitskaja, T.N., Novikov, A.V., Novosibirsk (2004), pp. 93-95].
    Moreover, we may suppose that this migration must have proceeded along the northern border of present-day Kazakhstan and Russia, because the Irtysh flows to the northwest, hereby providing a natural route for migrating in that direction. The migration along the Irtsyh towards the confluence of the Irtysh and Tobol is also evidently corroborated by the existence of the Baraba Tatars along the middle course of Irtysh and Sibir Tatars near the Tobol-Irtysh confluence. These ethnic groups share many common features both with each other and with Bashkir and Kazan Tatar. Otherwise — if the the migrating Kimak tribes turned west or southwest — they would have run into the Karluk-Kyrgyz territory in the south, mentioned by al-Idirisi and other historical sources. Also note that direct migrations across the central Kazakhstan are particularly unlikely due to geographic difficulties, such as desert climate and lack of water.
    By following the Tobol River and Yaik River, and/or traveling across the Southern Ural, the Kimak tribes must have crossed into Eastern Europe and formed the ancestors of the early Bashkirs and Tatars. Following the upper Kama, some of the Bashkirs, as they were probably called at the time judging by the ethnonym "al-Bashqird" attested by ibn-Fadlan in 922, must have soon reached the confluence of the Kama and Volga, the territory of the Volga Bulgaria. These Bashkir tribes apparently finally became what we presently know as Kazan Tatar people.
    The exact migration tracks of Nogai, Crimean Tatar, Proto-Karachay-Balkar and Kumyk tribes are harder to establish. At the time of their arrival to the Urals, all of these dialects were evidently indistinguishable, but they may well have belonged to different clans. Apparently, some of the Kimak tribes split off from rest of the Kimak, Tatar and Bashkir tribes near the Southern Ural. These tribes migrated southwest by following the Ural river first towards the Caspian Sea and the Caucasus, and finally as far as the Kievan Rus.
    Some of the Kimak ethnic groups under consideration (at least Kazan Tatar, North Crimean Tatar, Caspian Nogai, etc) seem to have emerged only after the expansion of the Golden Horde (1235-1502), and the formation of the localized post-Golden-Horde Khanates of the 16th century.

    Kimak-Kipchak-Tatar dialects of the Golden Horde
    The spread of the Kimak and Tatar dialects [Darkstar ( 2012)]

    It should be explained that the Golden Horde (cf. ordu, orda "army") is a historiographic name for the Kypchak-Tatar Empire (1226-1502) established after the Mongol invasion of Rus and ruled by the descendants of Genghis Khan. It was mostly known either as just Orda in Russian sources or as the (Ulug) Ulus "the (Big) Country" or by the name of its current ruler, such as Ulus of Jochi, in Turkic and Persian sources of that period. It was officially Islamized only in 1313. The Golden Horde exacted taxes from Russians, Armenians, Georgians, Circassians, Alans, Crimean Greeks, Crimean Goths, and other subjugated peoples along its borders. The Golden Horde capitals were Sarai-Batu meaning"the Palace built by Batu Khan" and just Sarai "Palace", both of which were located along the Volga and had many thousands of inhabitants but were sacked, destroyed and dismantled after the fall of the empire.

    The Golden Horde elite were descended from the Mongol clans and originally used Mongolian as the main mean of communication, however most common population was apparently of Kimak-Kypchak-Tatar origin. After the collapse of this powerful state by the end of 15th century, several Kypchak-Tatar dialects and ethnic groups must have formed, mostly just vaguely known as "Tatar" in the early Russian sources of the 16th century until the end of the 19th century. The word "Tatar" may still retain slightly negative connotation in Russian and other languages affected by the expansion of the Golden Horde.
    It is conjectured herein that nearly all the Turkic languages presently located on the territory of the former Golden Horde (Kazan Tatar, Mishar Tatar, Bashkir, Karachay-Balkar, Kumyk, Nogai, North Crimean Tatar, etc) are particularly close to each other to the extent of mutual intelligibility, bearing distinct common innovations in phonology, grammar and lexis. Some of these innovations are also shared with the Oghuz-Seljuk languages, an interesting fact that deserves a separate consideration. On the other hand, these innovations are mostly absent in Kyrgyz-Kazakh-Karakalpak, which originally did not seem to belong to the Golden Horde, with Kazakh and Karakalpak forming only after the middle of the 15th century, when the Golden Horde no longer formally existed, and Kyrgyz being locked up too far in the Tian Shan.
    Kimaks on the map of al-Idrisi
    The location of the Kimak Confederation was shown on the 12th century atlas prepared by the Arab geographer Mukhamed al-Idrisi, known in Europe as the Tabula Rogeriana. Unfortunately, the Asian part of the map is extremely difficult to decipher. It has been studied by several authors including [Kumekov, B.E. in Strana kimakov po karte al-Idrisi (The land of the Kimaks according to the al-Idrisi's map)// Strany i narody vostoka, vol.10, 1971, pp.194-198 (in Russian)]. Judging by phonetically garbled toponyms and the typical contractions and doubling, such as "Dardan", "Lalan", etc., the Asian part was probably based on some Chinese sources, basically on hearsay evidence given by medieval Silk Road merchants. Consequently, the map is not based on astronomic measurements, and there is no such thing as scale or orientation there, so trying to attribute some particular features can turn into a formidable task. However map features are supposed to match real-world geography to the extent that they would in a verbal account, and toponyms as if the were reinterpreted from heavy Tatar into medieval Mandarin and finally into al-Idirisi's Moroccan Arabic.
    Tabula Rogeriana, the Land of Kimaks
    The Land of Kimaks in the Tabula Rogeriana [Darkstar (2012)] (clickable)
    The map ends abruptly near Mongolia, where traveling in the Altai-Sayan mountains was obviously impossible. Apparently, B.E. Kumekov made an error by attributing Lake Gagan to Lake Alakol. It all becomes clear as soon as one takes into consideration that, in a way similar to English or Italian, the letter gimmel can be pronounced either as /g/ or /J, zh/ in Arabic, depending on a dialect, such as Moroccan dialect of al-Idirisi, so in fact it should be read as Jajan or even Zhazhan, which immediately reminds of Lake Zaysan lying along the course of the Irtysh river. That allows to identify the multiple Kimak settlements as being located on the shores of Lake Zaysan and along the Kara-Irtysh (Gamash), where they are supposed to be according the legend. This territory is designated on the map as Ard-al-Kimakiyya (The Land of the Kimaks). In reality, it most likely extended further to the northeast than the map shows, but Chinese merchants rarely visited there, so we see only its southern part.

    Kimak phonology, grammar and lexis

    Consequently, a matter that should be discussed in detail herein is the difference between the Kyrgyz-Kazakh, Altay and Kimak-Kypchak-Tatar subtaxa, which are frequently mixed up and intermingled in other classifications. How do these subtaxa differ? The following table shows that Proto-Kimak-Kypchak has undergone certain crucial transformations that made it phonologically very different from Kyrgyz-Kazakh and Altay, therefore they cannot be grouped together.

    The Comparison of Differentiating Features
    in the Languages of the Great Steppe

    Innovations in
    Typical Kimak-Kypchak-Tatar languages;
    [Alishina (1992)], [Akhatov (1964)], [Sibir Tatar lexicon was collected from a speaker on the net]
    see [Dmitriyeva (1981)]


    Common Kypchak-Tatar innovative features not shared with Oghuz (blue, green)
    The presence of
    the intervocalic -w- (either archaic or innovative)

    Kazan Tatar bawïr; Bashkir bawïr; Sibir Tatar pawïr; Nogaibavïr; Kumyk bavur; Baraba pawïr; bawïr
    as in Kimak-Kypchak
    as in Kimak-Kypchak
    The presence of
    the intervocalic -y- (either archaic or innovative)

    Kazan Tatar söyäk; Bashkir höyêk; Sibir Tatar söyak; Nogai süyek; Kumyk süyek; Baraba süök; süyek
    as in Kimak-Kypchak
    as in Kimak-Kypchak
    A different suffix in "seed" used innovativelyKarachay
    Kazan Tatar orlïk; Bashkir orloq; Sibir Tatar orloq; Nogai urlïk; Kumyk urluq;   urïqûrïq
    ürön < Mong?; cf. uruq "kin"üren < Mong?
    Also, in Khakas
    The use of *bek "very" in Kimak and *ötö in Kyrgyz-KazakhKarachay

    Kazan Tatar bik; Bashkir bik; Nogai bek; Kumyk bekBaraba bek, päk; zhüdeötöötösürekeyvery (before adj)
    *oltur instead of *oturKarachay

    Kazan Tatar utïr-ïrGa; Bashkir oltur-urGa; Sibir Tatar utïr-ïu; Nogai oltïr-; Kumyk oltur-makBaraba oltïr, otïr; otïrï-uotïr-uotur-u:to sit
    *ölön instead of *ot and *chöpKarachayhans,
    Kazan Tatar ölön; Bashkir ülên; Sibir Tatar ülên; Nogai ölên; Kumyk otBaraba öylän, ülän shöp, otot, shöp chöpölöngrass
    *qart instead of *keriKarachay
    Kazan Tatar qart; Bashkir qart; Sibir Tatar qart; Nogai qart; Kumyk qartBaraba qart Garrïkêri qarïqarGanold (person)
    *yïlGa instead of *özên


    Kazan Tatar yelga;
    Bashkir yïlGa; Sibir Tatar yïlGa; Nogay yïlGa suw; Kumyk özen, qoysuw
    Baraba yïlGa özek özen özönsu:river
    *asha- instead of *Je-

    Kazan Tatar ashau;
    Bashkir ashau; Sibir Tatar ashau, yeyü; Nogay yew, ashaw; Kumyk asha-maq
    Baraba asha- zheu,
    zheu zheshd'i:rto eat

    Common Kimak features also shared with Oghuz (blue)
    An innovative contraction in "leaf" and "earth"(as in Oghuz)
    (the hypothesis that these might be different words seems implausible)

    Kazan Tatar yafrak; Bashkir yaprak; Sibir Tatar yaprak; Nogai yapïrak; Kumyk yaprak;
    Kazan Tatar tufrak; Bashkir tupraq; Sibir Tatar tuprak; Nogai topïraq, topraq; Kumyk topuraq;
    Baraba yapraq zhalbïraq
    The innovative partial *S > y transition before open vowels, (as in Oghuz)Karachayjulduz (/J/ as in Eng.)
    Kazan Tatar yoldïz; Bashkir yondoð; Sibir Tatar yoltos; Nogai yuldïz; Kumyk yulduz; Baraba
    The -t-/-d- :
    -l-/-n- full softening
    in the verb suffix
    (as in Oghuz)

    Kazan Tatar yoqla-; Bashkir yoqla-; Sibir Tatar yokla-; Nogaiuykla-; Kumyk uykla-;
    Kazan Tatar eshlêü; Bashkir eshlêü; Nogai êshlä; Kumyk ishle-;
    Baraba yoqla-(looks like a Kazan Tatar borrowing)
    Baraba êshlä-
    ishte:rsleep (v)
    work (v)
    The -t-/-d- :
    -l-/-n- softening
    after consonants in the plural and accusative suffix (as in Oghuz)

    -la, -lê
    -nu, -nü, -ni

    Kazan Tatar -lar, -lêr, -nar, -nêr (plural); Sibir Tatar -lar; Nogai -lar, -lêr (plural)

    Kazan Tatar -nï, -n, (accusative); Nogai -nï, -ni, -n, -dï, -di, -tï, -ti ; Kumyk -nï, -ni, -nu, -nü
    Baraba-lar, -nar, -lär, -när;
    -tar, -tär (Radloff)
    -nï, -ni, -tï,- -di, -ti;
    -ïnï, -ini (Radloff)

    -lar, -ler,

    -ni, -nï, -di, -dï, -ti, -tï

    -lar, -ler,
    -dar, der,
    -tar, -ter,

    -ni, -nï, -di, -dï, -ti, -tï

    -lar, -ler, -lor, -lör,
    -dar, der,
    -dor, dör,
    -tar, -ter,
    -tor, -tör

    -nu, -nü, -ni, -nï, -du, -dü, -di, -dï,

    -lar, -ler, -lor, -lör,
    -dar, der,
    -dor, dör,
    -tar, -ter,
    -tor, -tör

    -ni, -nï, -di, -dï, -ti, -tï

    the plural marker

    the accusative marker
    The -b/p : m softening after consonants (as in Oghuz)Karachay

    kellik mise?
    Kazan Tatar ütmês; Bashkir ütmêß; Sibir Tatar ütmês; Nogai ötpes; Kumyk yaxshï ötmeygen;
    Kazan Tatar barasïn mï?; Sibir Tatar para-mïsïn?Nogai qördiN be? Kumyk geleJek mi?;
    pu yiGit-mi?
    kildi ba?

    (Radloff recorded -b-/-p- in -pïn, -bïn "I am", which later mostly disappeared)
    keldi me?
    barasïN ba?
    keldi bi?
    dull (not cutting)
    question marker
    The loss of -Gaq (as in Oghuz)Karachay
    Kazan Tatar korï; Bashkir qoro; Sibir Tatar koro; Nogai kurï; Kumyk quru;   qûrGaqqûrGaqqurGaqqurgaqdry
    The innovative voicing t- > d- in some positions (as in Oghuz)Karachay

    an archaism or back-mutation
    Kazan Tatar dürt; Bashkir dürt; Sibir Tatar türt; Nogai dört; Kumyk dörtBarabatört, dört  törttörttörttörtfour
    The lack of the word initial m-Karachay

    Kazan Tatar borïn; Bashkir; Sibir Tatar poron; Nogai burïn; Kumyk burunBaraba
    The transition of the menen into belen Karachay

    Kazan Tatar belen; Bashkir menên; Sibir belen, men; Nogai -men; Kumyk bulanBaraba
    bilän, birlän, pilä, pirlän, pïlan, pirlä, pïla;

    mïnan, mïna, ma:n;

     menen, penen, benenmenen, -men, -pen;
    South Kazakh pïpnan, -mïnan
    menen with
    The use of the *achak Future TenseKarachay
    -rïk, -nïk, -lïk
    Kazan Tatar -achak, Bashkir -asaq, Nogai -ayak,-eyek, Sibir Tatar —; Kumyk -azhak, -ezhek, Crimean Tatar -aJak, -eJekBaraba
    -är, -ïr
     -a-zhaq-ar, -er, etc
    -baq, -bek
    (But also -ayak, -eyek only in western dialects)
    -ar, -er, etc-ar, -er, -r;
    -at, -et
    Future Tense
    The use of *tegül (as in Oghuz) after adj. and nouns.Karachay

    Kazan Tatar tügel, Bashkir tügil; Sibir Tatar tügel; Nogai tuwïl; Kumyk tügülBaraba
    The absence of
    the word-final -e in "tiz"; and the use of *tobuq
    tobuq; tiz (Balkar?)
    Kazan Tatar tez;
    Bashkir tubïq; Sibir Tatar tes, tubïq; Nogay tiz; Kumyk tiz(-ler), tobuq;
     dize tize;
    cf. tobïq "ankle"
    The absence of sizder, seler etc (as in Oghuz)Karachay
    Kazan Tatar siz; Sibir Tatar ses; Nogay siz
    Cuman-Polovtsian siz; Kumyk siz
    sis, silär
     sizsender, sizder, sizsizder, siler, sizler, sizslerleryou (plural)
    The innovation *nechik instead of *qanday

    Kazan Tatar nichek;
    Bashkir nisek; Sibir Tatar nitsek; Nogay qalay, Kara Nogay neshik; Kumyk nechik
     qalay qalay,
    The innovation *quyash instead of *kün

    Kazan Tatar qoyash;
    Bashkir qoyash; Sibir Tatar qoyash; Nogay kün közi; Kumyk gün(esh),
     kün, kuyas kün künkünsun
    *burada < *bu yerde (as in Seljuk) along with the common and archaic *munda in most TL'sKarachay

    Kazan Tatar biredê;
    Bashkir —; Sibir Tatar piretê, pï yertê; Kumyk ;
      bul zherde bûl zherde bul zherdebu d'erde
    The use of the verb *is- in reference to "wind" (as in Oghuz)
    Kazan Tatar isu, Bashkir iSeu; Kumyk esh-, üfür-;Baraba
    soGusoGu:soqto blow

    Other features
    The retention of the word final -w;
    Kazan Tatar sïw;
    Bashkir hïw; Sibir Tatar sow, sïw; Nogay sïw; Kumyk suw;
     suw su
    The retention of the word final -m in "I'd rather do" instead of -n in Kazakh-Kyrgyz;

    Kazan Tatar bara-yïm; Bashkir bara-inem (?); Sibir bara-yïn; Nogay bara-yïm; Kumyk bara-yïm;Baraba
    bara-yïn; bara-yïm (rare)
    bara-yïnbara-yïnI'd rather go
    *ne(rse) de bulsain Kimak, albeit *bir nerse in Kazakh-Kyrgyz

    Kazan Tatar berär närsä; närsä dä bulsa; ni de bulsa; Bashkir berêy nêmê, nêmê bulha la; Nogay bir zat, ne di; Kumyk bir zat, ne busa daBaraba
    ällä nemä
     bir närse,
    ne bolsa da
    bir närse bir nerse
    ne de,
    neni de;

    bir neme
    Kimak *kim-de vs. Kyrgyz-Kazakh *birö

    Kazan Tatar kem dä; Bashkir kem der; Nogay kim de; Kumyk kim busa da, bireu  bireu bireu birökem desomeone
    The retention of the word final -sh; with -s apparently being a local innovation that spread from Sibir Tatar and Nogai (?) into Kazakh
    Kazan Tatar tash;
    Bashkir tash; Sibir Tatar tos; Nogay tas; Kumyk tash;
    Baraba tash tas tas
    tash tashstone
    Evidently, this table demonstrates the differences between the Kyrgyz-Kazakh and Kimak-Kypchak-Tatar subtaxa, with Karakalpak being something of a secondary seam between the two.

    Notes on other classifications and their views concerning Kimak
    The table also demonstrates why Kazakh should be included into the same subtaxon with Kyrgyz, whereas (Caspian) Nogai, on the contrary, has no direct bearing on either of them, and should be positioned into the same subtaxon with Kazan Tatar, as opposed to an older Baskakov's classification. It is true, however, that Kazakh may exhibit some Kypchak features, but these seem to result from a secondary contact due to the large territory in the Great Kypchak Steppe covered by the Kazakh nomads, which should have inevitably resulted in some intermingling of the early "Kazakh Kyrgyz" speakers. Naturally, even more Kypchak influence may be found in Karakalpak, which is essentially something of a northwestern variety of Kazakh, and is reported to have good mutual intelligibility with Kazakh.
    Also, consider again the above-mentioned lexicostatistical research by Dybo (2006), which shows the proximity of some of the Kimak-Kypchak-Tatar languages that were omitted or understudied in the present publication.
    Kypchak languages, Anna Dybo (2006)

    [Dybo, Anna, The Chronology of the Turkic Languages and the Linguistic Contacts of the Early Turks (2006)]

    A similar classification had also been proposed at least as early as Bogoroditskiy (Kazan, 1934), but was later superseded by Baskakov's classification. Bogoroditskiy's classification was based purely on geographic principles and rather correctly differentiated (1) the many Khakas dialects; (2) the many Altai dialects; (3) the Siberian Tatars like Baraba (an interesting point which has not been considered herein in detail due to lack of materials); (4) Tatar, Bashkir; (5) Kazakh, Kyrgyz, Karakalpak, Uzbek, Uyghur; (6) Seljuks and Oghuzes.
    However, Baskakov (1960), apparently incorrectly, regrouped Kyrgyz with Altai, and Kazakh with Nogai, ignoring the obvious similarity between Kazakh and Kyrgyz, a view that lasted for about a half a century. Yet, Baskakov's classification was still the most detailed for its time.
    For the above-mentioned reasons, it is essentially incorrect to name both Kyrgyz-Kazak subtaxa and Kypchak-Tatar subtaxon as "Kypchak" (or "Kipchak") as Baskakov and his followers tend to do. Initially, the term "Kypchak" seems to refer only to a relatively small subgroup within the original Kimak confederation. At a later stage, during the 11-13th centuries it possibly referred to Cuman-Polovtsian or the Turkic languages in contact with the Kievan Rus or situated nearby, see for instance [Gosudarstvo kimakov IX-XI vv. po arabskim istochnikam (The Kimak State of the 9-11th century according to the Arab sources), Kumekov, B.E.; Alma-Ata (1972)]], therefore the term "Kipchak" had a much more narrow original usage than it was artificially attributed in Turkic historiography during the second half of the 20th century.

    The Kimak languages originally constituted a genetic unity that formed near Lake Zaysan and the upper Irtysh River by c. 700 AD. By c. 900 AD they must have spread to the west across most of the Great Steppe territory and by 1050 AD reached the Kievan Rus.
    The term "Kimak" (sometimes named as "Kimak-Kypchak-Tatar" in this publication to keep some compatibility with the older terminology) may hereinafter be only applied to those languages that share the features described in the table above, and, therefore, are particularly close to Kazan Tartar, the latter being a typical example of modern Kimak languages. Other typical instances include Bashkir, (Caspian) Nogai, North Crimean Tartar, Lithuanian Karaim, Crimean Karaim, Kumyk, possibly extinct Cuman-Polovtsian, and some other closely related dialect-languages. A special position belong to Karchay-Balkar (see below). These languages exhibit innovative features, which as we shall explain in detail below result from the interaction with the Oghuz adstratum. On the other hand, Kyrgyz, Kazakh and Karakalpak are more linguistically archaic and belong to a different subtaxon of the languages of the Great Steppe.


    The relationship between Oghuz and Kimak

    The Kimak and Oghuz secondary contact
    Finally, we come to an interesting point: the Oghuz-Seljuk subtaxon seems to share some unique innovations with Kimak-Kypchak-Tatar, such as:

    (1) the incomplete J- to y- mutation, cf. Proto-Oghuz *Jedi "seven" attested by Mahmud al-Kashgari (see below), North Crimean Tatar Jedi, Kazan Tatar Jide, the intermingled allophonic use of J / y- in East Bashkir dialects, etc., as opposed to the clear-cut Karakhanid yeti;
    (2) sporadic t > d voicing, cf. Gagauz, Turkish, Azeri, Turkmen dört, Kazan Tatar dürt, Nogai dört as opposed to the Karakhanid tört;
    (3) the loss of -G / -Gaq as in Turkish kuru, Azeri Guru, Tukrmen Gurï, Kazan Tatar korï, Nogai kurï, as opposed to the Karakhanid quruG and Kazakh qûrGaq;
    (4) a typical contraction in "leaf" cf. Turkish yaprak, Azeri yapraG, Turkmen yapraG, Kazan Tatar yafrak, Nogai yapïrak, as opposed to the Karakhanid yapurGaq;
    (5) the t : l correspondence in other morphemes called herein "light western Turkic consonantism", e.g. a "light" -l- in the plural marker: -lar in Oghuz-Seljuk, Kimak-Kypchak-Tatar, Chagatai-Uzbek-Uyghur, Orkhon-Karakhanid, Khalaj, as opposed to the "heavy" eastern -dar-/-tar- (for instance, as in Kazakh-Kyrgyz, Siberian, Baraba, Yugur). Curiously, however, Kazan Tatar also preserves -nar, -ner which can be seen as an intermediate form between -dar and -lar as far as the degree of lenition is concerned. The stronger -dar / -tar and other fortified suffixes are also preserved in East Bashkir (which was least affected by Kazan Tatar) and in Baraba. This may imply that the Kimak-Kypchak-Tatar languages originally retained some phonological fortition, typical of the Yenisei Kyrgyz clusters, whereas their historically recent lenition is probably acquired from Oghuz;
    (6) the use of *tegül instead of e(r)mes, cf. Turkish deGil, Azeri deyil, Turkmen del, Kazan Tatar tügel, Kumyk tügül as opposed to the Karakhanid ermes, Kazakh-Tatar emes;
    (7) the use of the *aJak in Future Tense, cf. Turkish -aJak-/-eJek-, Turkmen -Jak/-Jek, Kazan Tatar -achak-, Bashkir -asaq-, Nogai -ayak-/-eyek-, Crimean Tatar -aJaq-/-eJeq-, Kumyk -azhaq/-ezhek; also in Karakalpak in the Aral-Caspian region probably because of the Oghuz presence there;
    (8) the frequent use of -dïr/-tïr in the 3rd person singular, cf. Turkmen, Azeri, Turkish; Cuman-Polovtsian, Kazan Tatar -dïr/-tïr, etc as opposed to its absence in Kazakh and Kyrgyz at least as far the copula construction is concerned (e.g. Ol qazaq "He is a Kazakh), etc;

    On the other hand, despite this presumable relatedness, presently, there's only poor mutual intelligibility between modern Oghuz-Seljuk and Kimak-Kypchak-Tatar languages, with many differences in syntax, morphology and semantics. With the 70% of averaged similarity between Turkmen to the languages of the Golden Horde, the present-day distance between even the most archaic and easternmost Oghuz languages and Kimak-Kypchak-Tatar seems to be very considerable.
    For instance, with the 65% between Turkish and Tatar in Swadesh-215 (borrowings excluded), the actual difference in speech normally looks like roughly as follows:

    Kazan Tatar Sin kaya barasïn cong? cf. Turkish Sen nerede gidiyorsun?, literally "You where going-are-you?";
    Kazan Tatar Salkïn su biregezche cf. Turkish SoGuk su verin, "Cold water give-please";
    Kazan Tatar Gailêbizde öch bala — min, apam hêm enem, cf. Turkish Ailemizde üch choJuk (var)—ben, ablam ve kardeshim, "Family-my three child — me, sister-my and brother-my".

    That doesn't mean, of course, that these two Turkic subgroups have nothing in common with each other, it is just that the described changes seem to be roughly consistent with at least 1500-2000 years of glottochronological separation, which makes the existence of a recent Oghuz-Kimak genetic unity an unlikely option.

    And indeed, as we have concluded below, the phonology, grammar and particularly vocabulary of the Oghuz languages are in good correspondence with Karakhanid, which implies that the Oghuz languages originally belonged to the same stock as Orkhon Old Turkic, Old Uyghur and Karakhanid languages, which seems to discard the idea of the possible Oghuz-Kimak close genetic relatedness. So where do these Oghuz-Kimak shared features come from, then?
    We may not suppose that these mutations could have occurred independently in both subtaxa, since the statistical coincidence of several simultaneous mutations is far too small, therefore a much more likely and interesting hypothesis would be that these transitions occurred due to the secondary contact and mutual intermingling, when at some point in time, the speakers of Proto-Kimak-Kypchak-Tatar crossed the area where the early Oghuzes were located.

    The conclusion of the close secondary relatedness between the Kimak and Oghuz is in accordance with historical records stating that Seljuk's clan separated from the Transoxanian (=Aral-Caspian ) Oghuz tribes in the northwestern Kazakhstan steppe, which seems to have traditionally belonged to the Kimak-Kypchak-Tatar or Karluk-Kyrgyz-Kazakh habitat. In other words, it is geographically simple to assume (Occam's principle) that, since Oghuz and Kimak-Kypchak were geographically close, they might also form a linguistic area. Curiously, Al-Kashgari claims that "Kirkiz, Kifzhak, Uguz, Tuxsi, Yagma, Zhikil [the latter three tribes, apparently located near the Ili river in the Tian Shan], Ugrak, Zharuk all have one pure Turkic language. Close to them are the dialects of Yamak [a Kimak tribe] and Bashkirt...", which evidently positions Uguz into the same row as Kyrgyz and Kypchak with several lesser medieval Kimak-Karluk tribes.
    We also find multiple historical records, mentioning that in the 10th century the Kimak tribes were allied with the Oghuz. Arab geographer Al-Masudi wrote c. 930 that all of them were coaching along Emba and Yaik. Ibn Haukal c. 950 drew a map showing that Kipchak-Kimak tribes together with the Oghuz tribes pastured in the steppes north of the Aral Sea. Al-Biruni c.1000 noted that Oghuz tribes quite often pastured in the country of Kimaks [en.wikipedia.org].
    Moreover, below we will consider a special hypothesis that suggests a cultural and linguistic exchange near Lake Zaysan.

    The hypothesis of Proto-Kyrgyz and Proto-Oghuz interaction
    We know from historical records that starting from 552 AD some of the Great-Steppe tribes were subdued by the Göktürks, who, essentially, were the speakers of Orkhon-Oghuz-Karakhanid. Presumably, the Göktürk language-dialect must have acquired a high sociolinguistic status in many Turkic-speaking societies of the time. We also know that Oghuz, that belongs to the Orkhon-Oghuz-Karakhanid grouping, and Kimak, that belong to the Great-Steppe grouping, share multiple similar phonological, lexical and grammatical innovations. Finally, we know that the Kyrgyz-Kazakh subgrouping (or Karluk-Kyrgyz-Kazakh subgrouping, as long as we assume that Karluk tribes were close to Kyrgyz tribes) is particularly close to Kimak.
    Consequently, we can infer that somewhere around c. 500-800 AD there occurred a strong linguistic exchange between the early Oghuz and Kyrgyz dialects which could have resulted in the formation of Proto-Kimak. Moreover, the most simple and probable hypothesis which would explain the relatedness between Proto-Oghuz, Proto-Kimak, and Proto-Kyrgyz-Kazkah, would be that the area of Proto-Kimak was originally just a transitional geographic area between early Proto-Kyrgyz-(Karluk) and Proto-Oghuz, where these two languages overlapped and intermingled with each other.

    The early distribution of Oghuz, Kimak, Kyrgyz tribes
    The map of the Oghuz and Kyrgyz-Karluk hypothetical interaction between 500-800 CE

    The plausible hypothesis would be that, initially, Proto-Kyrgyz-Karluk (or Proto-Kyrgyz) was probably a conservative Turkic language located north of the Irtysh, between the Irtysh and Ob rivers, essentially in the area known as the Baraba and Kulunda Steppe, also possibly including some areas of the Altai Mountains.
    The overlapping of Kyrgyz with the Oghuz area soon resulted in the formation of a new transitional dialect, which became known in history as Kimak. This Kimak area shared archaic linguistic features both with Kyrgyz-Karluk, on one hand, and innovative features with the early Oghuz, on the other.
    Furthermore, Oghuz too was affected by Kimak and Kyrgyz dialect-languages; it absorbed some of their elements, becoming part of the Great Steppe Sprachbund, thus deviating from its Orkhon-Karakhanid parent stem.

    On the other hand, the speakers of Kyrgyz-Karluk were largely unaffected by Göktürk dialect-languages because it was buffered in the Kimak area. Consequently, they may have formed a linguistic refugium near the Altai Mountains. Afterwards, according to scanty historical evidence, the early Kyrgyz and Karluk languages seem to have formed as a result of a later migration from the Altai Mountains towards the Tarbagatai Ridge, and the Zhetti-Su (the Seven Waters) region located between Lake Balkhash and the Tian Shan Mountains. This migration must have occurred most likely between 630-750 AD, thus creating the basis for the early Karluk and, probably, for the Kyrgyz (of Kyrgyzstan) languages. It was perhaps the political turmoil in the Western Turkic Kaganate, which allowed the Karluks to seize power in the Zhetti-Su area by about 766. In 840, there was likely to be a second wave of Kyrgyz migration to the Zhetti-Su (sources?) that ended political domination of the Karluks and apparently brought the name of "Kyrgyz" to the present-day Kyrgyzstan.

    As the Western Göktürk tribes speaking a language similar to the early Old Uyghur moved back from Mongolia into the upper reaches of the Irtysh river c. 550-700 AD, they came into contact with the local western Proto-Kyrgyz tribes. This intermingling must have resulted in the formation of three local dialectal areas:
    (1) the Proto-Kyrgyz (possibly including Proto-Karluk) area that was almost unaffected by the Göktürk language and which ultimately led to the emergence of Karluk, Tian-Shan Kyrgyz, and finally, much later, after the 15th century, Kazakh and Karakalpak people;
    (2) the northern Proto-Kimak area that was strongly affected by Oghuz or Western Göktürk, but retained many older Kyrgyz elements, such as -w- in bawïr "liver", and -w in taw "mountain", as opposed to the -G- and -G in the oncoming Orkhonic (Oghuz) language), to name just the most typical ones;
    (3) the southern Proto-Oghuz area which acquired certain features from Kimak, but otherwise remained relatively unaffected, retaining many Orkhon-Karakhanid archaisms from an older period.
    In other words, the formation of the three subtaxa — Proto-Karluk-Kyrgyz-Kazakh, Proto-Kimak-Kypchak-Tatar, and Proto-Oghuz-Seljuk — could have been the result of a back-migration of Western Göktürks or Orkhon Old Turkic or Old Uyghur or Oghuz speakers into the Kazakhstan Great Steppe from the Dzungarian Desert, eastern Tarim Basin or nearby regions, and their linguistic exchange with the local Kyrgyz or Karluk tribes

    On the origins of the ethnonym Tatar
    As mentioned above, the ethnonym Tatar is clearly traceable to a certain tribe forming part of the Kimak confederation along the Irtysh River by c. 700 AD. Moreover, the legend of the Kimak origins implies that the ethnonym Tatar had existed even earlier, before the formation of the Kimaks.
    Again, herein we try to exclude evidence from Middle Chinese records due to their ambiguity and frequent misinterpretations. However, according to the Chinese version, the word ta-da could have been initially used as the Chinese exonym applied to all the foreign tribes beyond the Great Wall, similar to the barbars of the Greeks, however in that case it is hard to explain how it became accepted as a name of an originally small Kimak clan.
    Moreover, and quite confusingly, the Tatars are described in the Secret History of the Mongols (1227) somewhere near the modern-day border of Buryatia and Mongolia along the Onon River as the eternal enemies of the Genghis Khan's clan that had waged war with his father, and were finally exterminated by Genghis Khan. The History does not explain which language they spoke, or whether they were Turkic or Mongolic, but apparently they could say at least some Mongolian phrases. Judging by their location near the Baikal, we may suppose that these Tatars could in fact have been a lost extension of Proto-Yakutic tribes that had integrated into the local Mongolic society and possibly adopted the Mongolian language.
    A more plausible hypothesis of the Tatar etymology is based on the theory of Turkic patrilineal ethnonymy, explained above. The theory suggests that the word Tatar may originate in a personal name or alias of the Tatar clan progenitor born during an unknown period, but definitely before 500 AD.
    In any case, the actual use of this word throughout history has been entirely different — rising from the limited, regional usage to an all-encompassing Turkic and Mongolic exonym and then falling into disuse again.
    In 922, the "al-Bashkird" of Ibn-Fadlan were already attested near their present-day location to the east and southeast of the Southern Ural Mountains, however there is no direct reference to Tatars, as yet. Presumably, in the course of the 9th-10th centuries, during the period of the Kimak dissemination over the Great Steppe, the Kimak Tatars must have become the ruling clan among the Kimaks. So one may suppose that, at the time, the word Tatar must have gained a socially prestigious connotation of a leading clan's title, and many Kimaks might have attempted to trace their roots specifically to the Tatars. That common usage could have lasted well into the times of the Mongols in the 13th century, so finally the Mongols themselves were frequently conflated with the Tatars.
    The latter point can be explained from the military outlook: the aristocracy of Mongolic descent constituted only a small part of the Golden Horde population, at least during its later stages, and the Mongolic tribes had initially been far too small to achieve the conquest of the enormous territory they acquired. Therefore, it is implausible to assume that the Mongol generals were able to do without help from the locals, it is much more likely that they recruited the regional Turkic population into their armies, most of whom were evidently of Kimak-Kypchak-Tatar origin. Therefore, the actual conquest and control over the land was probably achieved by means of the Kimak tribes. However, there are few historical documents that could support this view.
    According to a different version, the name Tatar was brought only during the Mongolian period.
    In any case, in the 13-15th centuries, the population of the Golden Horde was known as "Tatars" in Rus, parts of Central Asia, and Europe, where it was changed to "Tartar" in Latin-speaking Europe, apparently due to the association with the Greek Tartarus, which, according to Greek mythology, was the abyss and underworld deep beneath the earth, where an anvil would take 9 days to fall.
    The ethnonym was particularly widespread among the Golden Horde aristocracy, military and local officials [see The Great Russian Encyclopedia (2004)].
    The small linguistic differentiation of the Golden Horde languages resulted in all the post-Golden Horde peoples of the early 16th century being called Tatars by Russians.
    After the dissolution of the Golden Horde, the term must have acquired negative connotations, whereas many newborn ethnicities came up with other newly-formed names, such as Noghai (=from the Noghai Khanate, after the name of a Mongol general), Mishar, Kazanly (=from the Kazan Khanate), etc. For instance, in reference to the 18th-19th century, Carl Ritter who cites the research of German ethnographer Julius Klaproth (1783–1835), notes the following [Die Erdkunde im Verhaltniss zur Natur und zur Geschichte des Menschen (Geography in Relation to Nature and the History of Mankind), written 1816–1859]:
    "But if you ask the so called Kazan or Astrakhan Tatar, if he is a Tatar, he will answer negatively, for he names his dialect 'Turki' or 'Turuck', not 'Tatar'. Being aware that his ancestors were subdued by Tatars and Mongols, he takes the word 'Tatar' as pejorative and meaning nearly the same thing as a bandit."
    During the period of Ivan the Terrible (1530-84), who moved the imperial frontier beyond the Ural Mountains, the ethnonym Tatar was presumably carried further into Siberia by Russian Cossacks. Supposedly, this is how it came to be applied to the Siberian and Baraba Tatars, and to the Altay peoples and Yenisei Kyrgyz tribes of the 17th century, though the presumable Russian origin of the Tatar self-reference among Khakassians and Altay Turks is disputable. In any case, until the beginning of the 20th century, the Altay-Sayan people were known under such names as Abakan Tatars, Chulym Tatars, Kuznetsk Tatars, and so forth.
    Moreover, until the 19th century, Siberia was often designated as Tartaria (Magna) in Latin or Grande Tartarie in French on most geographic maps, see, for instance, Nicolaes Witsen, Noord en Oost Tartarye... , (1672). Hence, also the name of the Strait of Tartary, located between Russia and Sakhalin Island. The name was coined by La Perouse in 1787, even though no Turkic peoples had ever lived there. In other words, the expression Tartaria (Magna) was used in the same way as present-day Siberia.
    During the reign of Peter the Great (1682-1725), when turkology began to rise as a distinct branch of science in the Russian Empire and Western Europe [see Baskakov, N.A. Vvedeniye v izucheniye tyurkskikh yazykov (An intoduction into the study of Turkic languages), (1969); chapter The history of study of Turkic languages in Russia before the 19th century, p. 18], nearly all the known Turkic languages and dialects (outside Ottoman Turkish) became generally known as tatarskiye narechiya "Tatar dialects". And, in some cases that indiscriminately included Mongolic, Tungusic, Tibetic, Samoyedic and other completely unrelated Siberain ethnic groups.
    The Brockhaus and Efron Encyclopedic Dictionary (1906), widely popular before and even after the Russian Revolution, openly protested against the then-accepted terminology,
    "Tatars do not exist as a single ethnicity; the word "Tatar" is nothing but a collective nickname for a number of peoples of [sometimes] Mongolic, but particularly Turkic descent, speaking Turkic languages, and of Quranic affiliation. [...] From scientific perspective, the name of Tatar has presently been rejected when applied to Mongols or Tunguses, and retained only in reference to those linguistically Turkic ethnicities that form part of the Russian Empire, but excluding other Turkic nations with independent historical appellations (Kirigizes, Turkmens, Sarts, Uzbeks, Yakuts, etc). Certain scientists (Yadrintsev, Kharuzin, Shantr) have suggested to modify the appellation terminology of some of the Turco-Tatar ethnicities [...], for instance, by renaming Azerbaijani Tatars to Azerbaijanis, Altay Tatars to Altayans, etc., but that has not gained much acceptance, as yet [...]"
    As a result, the indiscriminate term tatarskiye narechiya "Tatar dialects", generally accepted in the 19th century, was soon supplanted by the names of specific languages that appeared during the 1920-30's post-revolutionary renovation, though in some cases, such names as Uzbek, Uyghur, Khakas seem to have been chosen by consensus or even made up right off the top of the head. For some time, "Turkish-Tatar languages", "Turkish languages", "Turco-Tatars" were variably used as generic terms by various authors between the 1800-1930's . After the rise of the Republic of Turkey (1922) and its frequent generalization of Türk as a comprehensive, far-reaching concept, the recognition of the newly-formed term tyurkskiye yazyki "Turkic languages" must have finally become widespread and generally-accepted even in reference to the ethnic groups that never called themselves Turks. Nevertheless, the older usage in such phrases as tataro-mongoly "Tatar-Mongols" or tataro-mongolskoye igo "Tatar-Mongol yoke"— referring to the rise of the Golden Horde and its punitive raids against Rus — still commonly persists in Russian historiography.
    Apparently, the extensive use of the term Kypchak popularized by Baskakov's classification (1950-1980's) followed the same avoidance strategy, trying to get rid of the word Tatar. As a result, in certain contexts, both names became nearly synonymous, the former being sort of euphemistic for the latter.
    In the beginning of the 21st century, the name Tatar is retained only by Kazan Tatars of Tatarstan (who sometimes object to its usage), Crimean Tatars (persecuted between 1944-67), Siberian Tatars (of Tyumen /too-MEN/ and Tobolsk, whose language is poorly described in the scientific literature), Baraba Tatars (on the verge of linguistic extinction, but often just "Baraba"). It is also acceptable as a generic self-appellation by various Khakas and Altay Turkic ethnicities, and sometimes can be applied to other smaller and lesser-known ethnic groups, such as Astrakhan, Lithuanian Tatars, etc.

    Bashkir is closely related to Kazan Tatar
    Judging solely by phonology, a casual onlooker might conjecture that Bashkir is a strongly differentiated language among Turkic, no less than Chuvash or Sakha.
    However, at a closer observation, there is a remarkable similarity of more than 95% between Kazan Tatar and Bashkir in Swadesh-215. The lexical errors in there are unlikely, taken that the list was composed by proficient speakers as part of other Swadesh lists at wiktionary.org and then rechecked again herein.
    The few clear-cut lexical and semantic discrepancies found in Swadesh-200 are:

    BashkirKazan Tatar
    tubïq "knee"tïz "knee"; tubïk "ankle";
    tanau "nose"borïn "nose"; tanau "muzzle"
    êsê(y) "mother"ana
    nimê "what"nêrse
    saN (rare or formal tuZan "dust"tuzan
    alïS ("far"yeraq
    usually bïsraq "dirty"shaqshï, kerle, pïchraq
    bïnda "here"mïnda, biredê "here", with the latter word obviously from Oghuz, cf. Azeri, Turkish burada

    However, there may be more lexical differences that are less distinct, such as the ones connected with different semantic connotations of the same word, synonyms, slightly different phonology, etc.
    Moreover, the speakers of both languages report good mutual intelligibility, even though the Bashkir phonology developed some remarkable innovations which, in any way, can hardly be any more pronounced than, say, those in northern British and American English. Curiously, unlike the English dialects, the odd phonology of Bashkir is hardly heard in real speech, and generally Bashkir has the same "sound" as Tatar, Kazakh and other languages of the Great Steppe do, which is an interesting example how misleading the phonological observations alone may be.
    Moreover, in phonology, note the typical i > e Tatar-Bashkir vowel mutation, as in tel "tongue", bel- "know", ber "one", whereas e > i, e.g. Tatar and Bashkir it "meat". By the same token u > o, as in Tatar-Bashkir urman "forest", but o > u, e.g. qul "arm, hand", ut "fire", etc. These phonological mutations are rather unique among Turkic languages. The fact that they are noticeable in vocalism is indicative of the recent separation, since vowels tend to change faster than consonants.

    On the hypothetical origins of the ethnonym "Bashkort" and the early Tatar-Bashkir migrations

    The autonym Bashkort is often explained as Turkic bash "head' + Oghuz kurt "wolf", where kurt is euphemistic for "wolf" and originally meant "worm, bug". However, in Bashkir, qort in fact means "larva", so the immediate meaning "head-larva" doesn't sound very elegant and poises questions concerning the origins of the ethnonym. Moreover, the word kurt with the meaning "wolf" is a purely Oghuz word, evidently with the original implication "a parasite that kills sheep"; it is also sometimes thought to be influenced by Persian and West Iranian gorg "wolf". The use of an Oghuz word instead of the original Bashkir word büre (common in many Turkic languages) raises doubts about the correctness of this etymology and runs into several difficulties, for this reason, a different version is suggested herein.
    We know that the Bashkort people were mentioned in several Arab sources since c. 840; at that time, they were said to occupy the territory to the south of the Ural Mountains — from the Volga and Kama to the Tobol Rivers. Ibn-Fadlan clearly mentions certain "al-Bashkird" located in the present-day Tatarstan near the Kama River as early as 922 ["We arrived in the land of the Turks called al-Bashgird... these were the most foul of all the Turkic peoples... when one of them meets a man, he cuts his head..."], as well as near the Emba River (to the south of the Urals) ["...to protect them (the carts) from the Bashkir(d)s in case they capture them..."].

    Hence, we can infer that the name originally referred to a "headcutter (-splitter, -buster)" > gangster > caravan robber, and could have been ambiguously applied to various robbers and cutthroats from Kimak-Kypchak-Tatar groups, but was unluckily retained into the modern period only by the Ural Tatars (Bashkirs). Again, the practice of killing strangers was widespread in many early societies, it is mentioned for instance for the neighboring Mordvins of the 13th century [Friar Julian (1235)]. The name could also have referred, just as in many other Turkic clans, to the name or alias of the hypothetical clan's progenitor, though an exonymic colorful reference also seems likely. Originally meant to imply force and fury, such an implication must long have become unacceptable, and its primary meaning must have been forgotten and redeveloped into a sort of a folksy etymology, such as "head wolf".
    Moreover, one can easily note that there is a certain geographical discrepancy of about a hundred miles in the location of Ibn-Fadlan's al-Bashkird (which were situated in the present-day Tatarstan and along the Yaik river) and the modern Bashkorkostan (which is situated to the northeast in the Southern Ural). This indicates that Ibn-Fadlan, as well as other Arab historians and travelers, apparently used this ethnonym to refer to what we would presently call "Tatars" including "Kazan Tatars Of Tatarstan", "Kypchak tribes" and "southern and western Bashkirs". Another point is that Kazan Tatars rather referred to themselves as Bulgars, Kazans or just Muslims before and during the 19th century, with Russian Tatary and Latin Tartari probably being an exonym of the Mongol period, though the latter is questionable.
    Therefore we may assume that at least before the 13th century, Bashkird was in fact a popular early ethnonym for most Tatar-Kipchak ethnicities from the Volga to the Ural mountains, but was retained until present days only in the Uralic Mountains which served as a sort of the ethnonymic refugium.
    Additionally, the glottochronological separation of Bashkir and Kazan Tatar predicts a much more recent physical separation — actually, only as late as the 18th century. Before that period Bashkir and Kazan Tatar must have been one single language. Even if that date is an exaggeration or some sort of glottochronological error, Ibn-Fadlan's al-Bashkird people can hardly be directly equated with the speakers of the Uralic Tatar dialect of Bashkortostan. Linguistically, the al-Bashkird language must rather have been a predecessor of both Kazan Tatars and modern-day Bashkirs.
    Moreover, the habitat of the present-day Bashkir people coincides with the area of a South Ugric substratum (South Mansi languages) and probably even, at least to some extent, with Magna Hungaria, the supposed Proto-Hungarian Urheimat. The people in the area were still mentioned to speak a sort of proto-Hungarian as late as 1235 by Friar Julian, shortly before the arrival of Mongols.

    He found them near the large river named Etil [supposedly, Ak-Etil or Belaya, the main river of Bashkortostan]... And to everything he wanted to tell them, they listened carefully, for their language was entirely Hungarian, and they understood each other... The Tatar people live near them. But the Tatars, when waging a war on them, could not overcome them, on the contrary, they were defeated in the first battle... In that country, the aforementioned friar found the Tatars and the messenger of their lord, who spoke Hungarian, Russian, Cuman, Teutonian, Saracyn, and Tatar [and who said that behind the country of Tatars there were the "big-headed" people who wanted to start a war, apparently the oncoming Mongols]
    [Relatio fratris Ricardi, De facto Ungarie Magne a fratre Ricardo invento tempore domini Gregorii pape noni (On the existence of Magna Hungaria discovered by Friar Ricardus), quoted from the Russian translation by S.A. Anninskiy (1940)]
    This implies that the emergence of the unusual phonological features in Bashkir could in fact have been the result of Tatar-Hungarian intermingling when the rest of the South Mansi / Majar tribes (=usually Magyar in Hungarian spelling) switched to Kimak-Kypchak-Tatar. The interaction between Proto-Tatar-Bashkir and Proto-Hungarian had probably begun very early, as implied by the very fact that the Hungarian expulsion from their homeland occurred as early as c. 830 AD, which had supposedly been caused by the warfare with the arriving Kimak-Kypchak-Tatar tribes. The interaction must have continued during the rise of the Golden Horde in the 14th century, when Turkic and Mongolian languages acquired paramount importance. Nevertheless, this process is generally only poorly understood.

    On the Tatar-Bashkir interaction
    There seems to exist a long history of Kazan Tatar, Mishar, Russian, Mari, etc. immigration to the Urals and Bashkortostan. There were various reasons for these migrations, however one of the most significant was the strictness of feudal laws in Tsarist Russia and certain freedoms that Bashkirs were granted ever since their voluntary joining of the Moscovy in 1557. Consequently, Bashkir was probably continuously contaminated by Kazan Tatar, Russian and probably, to a much lesser extent, by Kazakh, especially as far the western, southern dialect and standard (literary) Bashkir are concerned, with the eastern dialect being less affected by external influence.


    Accordingly, the present-day (Standard) Bashkir and Kazan Tatar can be viewed almost as two varieties of the same language with the high level of mutual intelligibility. Naturally, when two languages are that close, the glottochronological law implies that their separation should be very recent, by no means could it have occurred earlier than the Mongol invasion of the 13th century. Moreover, we may suppose that there was strong posterior interaction and the Tatar immigration to the Southern Ural Mountains resulting in the secondary language contact, which makes Tatar and Bashkir look and sound closer than they are supposed to be historically, at least judging by the long presence of Bashkir near the Southern Ural and Kazan Tatar near the Volga. The odd Bashkir phonology can most likely be explained by the presence of an unknown substratum in the area of the Southern Ural Mountains, such as South Mansic, or Proto-Hungarian, or western Samoyedic, or even Bulgaric.

    Karachay-Balkar, an atypical Kimak language

    Most features listed in the table above indicate that Karachay-Balkar (self-appellation: Qarachay-Malqar) in the North Caucasus also belongs to the Kimak-Kypchak-Tartar branch. However, much evidence sets it apart as a distinctive and peculiar Kimak language from the Caucasus.

    Karachay-Balkar phonology
    In most respects, Karachay-Balkar share the same typical innovations as other Kimak-Kypchak-Tatar languages, such as (see the table above):
    (1) a mixed -Ga /-a ending in the dative case;
    (2) the traces of an intervocalic sound in baur < *bawur "liver", süyek "bone";
    (3) a typcal Kimak suffix in urluk "seed";
    (4) the softened (lenitive) -d- > -l- transition as in -jukla- "sleep", -la "the plural suffix".

    However, certain other features set Karachay-Balkar apart from the typical representatives of the Kimak-Kypchak-Tatar subtaxon, such as:
    (1) the retention of /J-/, /ch-/, note that, as we have shown above, the initial J- / ch- is supposed to be present in Proto-Turkic;
    (2) the retention of /t-/ in tört;
    (3) the retention of the
    -Gaq suffix, as well as a few phonological innovations probably from the Circassian-Kabardian substratum;
    (4) the loss of -r in -lar / -ler ;

    Karachay-Balkar grammar
    Among the most typical Kimak-Kypchak-Tartar grammatical features are
    (1) the use of the future tense with the -rïk, -nïk, -lïk suffix, apparently akin to the Oghuz and Tatar -aJak, -eJek;
    (2) use of tüyül instead of emes

    Among peculiar features, there is the formation of the present tense in Karachay-Balkar using the -dïr-suffix, which is found in Altay-Sayan and Sakha:
    root + -a/-e + tur + personal ending = Present Continuos

    Karachay-Balkar lexis
    Lexically, Karachay-Balkar is almost equidistanced from other languages of the Great Steppe: 78% from Tatar-Bashkir and about 78% from Kyrgyz-Kazakh (most likely due to high retention of archaisms in Kazakh-Kyrgyz); 75-76% from Uzbek-Uyghur, 69% from Turkmen, 65% from Standard Altay and Khakas (Swadesh-215).
    The lexicostatistical research suggest the early separation of Karachay-Balkar from the Kimak-Kypchak-Tatar stem, basically at the same period as the Kyrgyz-Kazakh, which is approximately consistent with the period of existence of the Kimak Kaganate unity near the Irtysh. The glottochronological date of separation is about 730 AD, but this figure may be set too low, considering that the Circassian-Kabardian influence was not taken into consideration.
    Circassian and Kabardian are the two neighboring languages of Northwest Caucasian stock, which are distantly related to each other. Their presence seems to have resulted in certain Caucasian borrowings into the basic Karachay-Balkar vocabulary. At least the following Circassian words were found in Swadesh-200 (1%):
    Karachay-Balkar gakkï, Circassian qanqa "egg";
    Karachay-Balkar gokka, Circassian qeGeG, Kabardian GaGe "flower";

    Karachay-Balkar history
    The early history of Karachay-Balkar is poorly understood. A likely date for the Proto-Karachay-Balkar arrival in the Northern Caucasus is c. 1000-1050 AD, when the Kypchak-Cuman-Polovtsian tribes began to infiltrate into the Pontic steppes and finally emerged near the Kievan Rus. However, historically, the Karachay-Balkar people are only attested since the Mongol invasion or even centuries later.

    The lexical differences set Karachay-Balkar aside from other representatives of the Kimak-Kypchak-Tatar subtaxon, however the presence of certain grammatical and some of the phonological innovations is quite in accordance with the Kimak origin of Karachay-Balkar. Generally, we may assume an early diversification of Karachay-Balkar from the Kimak-Kypchak stem somewhere c. 800-900 AD. This separation was probably unconnected with the Mongol invasion and the later expansion of dialects of the Golden Horde, but occurred a few centuries earlier when Karachay-Balkar moved towards the Caucasus. After the settlement near the Caucasus, Karachay-Balkar has been affected by its North Caucasian neighbors, whose influence is now evident at least in basic vocabulary.

    On the origins of Nogai

    Deconstructing Kazakh-Nogai direct genetic unity
    Much discussion has gone into contesting and deconstructing the direct Kazakh-Nogai genetic unity, which people of Kazakh and Nogai descent often debate with each other or even take for granted. The theory has been advanced by Baskakov in the 1950's through 1980's and, indeed where there is the smoke, there is usually some fire: there are in fact certain features that indicate particular proximity of Nogai to Kazakh, whereas both languages share good mutually intelligibility.
    However, the problem is not as simple as it seems. Most of the arguments against this hypothesis have already been expounded in the table above, nevertheless we should lay more stress on this issue. The main criticism of all the Baskakov's hypotheses is that he did not differentiate between shared retentions and innovations, so most of his taxonomic suggestions were based merely on a few superficial phonetic and grammatical shared features, not necessarily innovative ones.
    In most of his works [Baskakov, N.A., Sovremennyje kypchakskije yazyki (The modern Kypchak languages), Nukus (1987)], [N. A. Baskakov, Vvedenije v izuchenije tyurkskikh jazykov (An introduction into the study of Turkic languages, Moscow (1969)], [Ocherki istorii funktsionalnogo razvitija tyurkskikh jazykov (The historical essays of Turkic languages functional development), Ashgabad, (1988)], which tend to repeat the same early content, Baskakov rather explicitly cites the following features for Nogai-Kazakh subgrouping:
    (1) ch > sh, as in Turkic *kach- > Nogai, Kazakh kash- "run away", Great-Steppe, Altay *chach > Nogai, Kazakh sach "hair";
    (2) sh > s, as in Turkic *qïsh > Nogai, Kazakh qïs "winter", *tash > Nogai, Kazakh tas "stone";
    However, similar changes are are also present in Sibir Tatar [By "Sibir Tatar" we always understand "Tobol-Irtysh Tatar", whereas Baraba and Tomsk are seen as separate entities], e.g. Sibir Tatar tas "stone", tsats "hair", Bashkir säs "hair";
    (3) The occasional retention of the "heavy" consonants in consonant harmony, e.g.
    Nogai qördiN be? "did you see?" and Kazakh Sen kinoga barasïn ba? "Are you going to the movies?"
    However, this feature is also found in the 19th century Baraba Tatar, cf. Kildi ba? "Did he come?" and, of course, Kygryz Keldi bi? "Did he come?"
    Nogai accusative -nï, -dï, -tï, -ni, -di, -ti, Kazakh -nï, -dï, -tï, -ni, -di, -ti, however similarly, Baraba -nï, -dï, -tï, -ni, -di, -ti, Bashkir , Kygyz -nï, -ni, -nu, -nü, -tï, -ti, -tu, -tü.
    It should also be explained that, in any case, Kazakh is "heavier" than Nogai, which in other cases prefers light western consonantism, e.g. Nogai taslar, as opposed to Kazakh tastar "stones".
    (4) The usage of -e-taGan participle. Cf. Nogai kel-etaGan "the coming one" and Kazakh -atïn / -etin, etc. Not only these sufiixes have diffeent phonological shape in Nogai and Kazakh, they are also widely distibuted among Kimak languages as well, cf. Baraba yör-ätiGän "the usually walking one", Sibir Tatar par-atïGan keshe "a walking man";

    And that is about all Baskakov mentions concerning Nogai and Kazakh affiliation. At this point, it seems that the sh > s and ch > sh mutation is the only typical Nogai-Kazakh feature that is difficult to deal with.
    We can also add a few of our own possible features and explain why they don't fit the picture very well, either:
    (4) Nogai -men for instrumental case, as in Kazakh at-pen "with the horse", as opposed to Kimak *belen. However, this feature is not exclusive, and it is also present in Sibir Tatar, cf. at-man "with the horse". The usage of *menen and harmonically similar words can also be found in the southern dialect of Kazakh and Kyrgyz, e.g. siz menen "with you", Bashkir menän, Baraba Tatar mïnan, mïna, ma:n. As a result this feature is hardly unique and is part of the local Sprachbund interaction, whereas the contraction *menen > men is also present in Sibir and Bareba. Moreover, based on other evidence, it must even go back to Proto-Bulgaro-Turkic, so it's taxonomic value is arguable.
    (5) The use of the archaic question word qalay "how" instead of *nichek as in Kazan Tatar, Kumyk, Sibir Tatar, Baraba Tatar. However, in the Kara Nogai dialect we in fact do have neshik "how?", therefore qalay may be an old retention in Ak Nogai.
    (6) The usage f the specific Perfect Tense: Nogai barïp-pan "I have gone there" (Perfect) and Kazakh barïp-pïn "it turns out I went". However, a similar tense seems to exist in several local languages, cf. Sibir Tatar parïp-mïn "I used to go", Baraba Tatar alïp-mïn "It turns out I took", therefore it may be an old retention.
    (7) The active usage of the continuous-type *veb-ïp (-a) + yat- construction, as in Nogai bara yatïr-man "I'm going" and Kazakh bara zhatïr-mïn "I'm going", kelip turat "He's coming", okup zhatat "He's studying" etc. But this feature was also widely distributed in Baraba (verb-ïp + yat-, tûr-, ôtïr-, yör-, kal-, bil-, al-) and, of course Kyrgyz, e.g. bara zhata-bïz "We're going" as well as many other eastern Turkic languages. Therefore, it may be an old retention, which survived in Nogai in a single construction -a + yat-, though it used to have a much wider range in other eastern Turkic language.
    (8) The usage of the characteristic I-want-to construction, cf. Nogai Men onï körGïm keledi "I want to see him", Kazakh bar-Gïm keledi "I want to go", lit. desire-my came". However it also exists at least in Kyrgyz ayt-kïm kelet "I want to say" and Sibir Tatar parGï keleu "to want to go", let alone Kazan Tatar parallels, therefore it is hardly unique.
    (9) The usage of Nogai yew "to eat" along with ashaw of Kimak origin, whereas Kazakh has only zhew. However, this is an obvious archaism and it also seems to be used parallelly in Sibir Tatar ashau, yeü "to eat".
    (10) The use of Nogai yapïraq "leaf" and top(ï)raq "earth", as opposed to Kimak *yapraq, *topraq. Note that an older Baskakov's vocabulary [Nogayskij yazyk i yego dialekty (The Nogay language and its dialects], Baskakov. N.A., Moscow (1940)] in fact provides topraq, so we may assume that both variants, topïraq and topraq, could be used interchangeably in Nogai. Cf. Kazakh zhalbïraq, topïraq. However this is an evident retention as it is also preserved in Kyrgyz zhalbïraq, topuraq; Altay d'albïraq; Khakas tobïrakh, Kumyk topuraq.

    On the other hand, the more or less unique Kazakh grammatical features that must be there, if the two languages were directly related, are not shared with Nogai, cf.:
    (1) Kazak maGan, but Nogai maGa "to me" as in all the TL's;
    (2) Kazakh bar-mak-pïn, but Nogai bar-ayak-pan "I have to go, I will go", as in other Kimak languages. This feature was noted by Baskakov, but for some reason did not prompt him to make a proper conclusion.
    (3) Kazakh sizder bardï-Nïzdar, Kyrgyz sizder bardï-Nïzdar "you came", but no such constuction in Nogai.

    By the same token, none of the typical shared Kyrgyz-Kazakh words and collocations are shared with Nogai, even though they should be:
    (1) Kyrgyz chöp, Kazakh shöp "grass", but Nogai ölên (as in other Kimak languages);
    (2) Kyrgyz-Kazakh öte "very", but Nogai bek (as in other Kimak languages);
    (3) Kyrgyz-Kazakh özen "river", but Nogai yïlGa suw (as in other Kimak languages);
    (4) Kyrgyz birö, Kazakh bireu "someone", but Nogai kim de;
    (5) Kyrgyz-Kazakh bir närse "something", but Nogai bir zat, ne di;


    We have found no unique Nogai-Kazakh innovations, which demonstrates that despite all the apparent proximity, these two languages are of slightly different historical descent. As Proto-Nogai advanced from northern Kazakhstan and the Ishim Steppe towards the Emba and Yaik Rivers between the 9th and 15th centuries, it must have retained archaic features which are also present in the 19th century's Baraba and modern Kazakh, though the later secondary influence from Karakalpak and western Kazakh dialects cannot be completely excluded. Cf. the following retentions (1) The retention of -b- as in questions, cf. Nogai qördiN be?, Kazakh barasïN ba?, Baraba kildi ba?; (2) the retention of -tï, -di, -dï, -ti in the accusative in Nogai, Baraba (Radloff), Kazakh; (3) The retention of the 1st peson singular -mïn in Nogai bara-man, Baraba alamïn (Radloff), Kazakh, Kirgiz bara-mïn, Tyumen Sibir Tatar pelê-men, as opposed to Kazan Tatar bara-m.
    The only really interesting feature is the shared phonetic mutation ch > sh, sh > s that is also partly shared with the Sibir Tatar ch > ts, sh > s, and to some extent with the Bashkir ch > s and even the Turkmen s > ß (interdental or alveolar). This shows that before the arrival of the Great-Steppe tribes, there existed a common substrate in the Ishim-Tobol-Emba-Yaik area that had a very specific way of lenitive sibilant pronunciation. Judging by superficially similar transitions in Chuvash, cf. Turkic *chach vs. Chuvash s'üs'e "hair" and Turkic *tash vs. Chuavsh chol, we may tentatively assume that this substrate might have possibly been of Bulgaric origin; or at least this possibility cannot be exlcuded. In any case, the possible existence of this substrate has no direct bearing on the supposed Kazakh-Nogai unity, which was the point of the discussion above.
    As you can see in such languages as Bashkir and Nogai, despite some unusual phonological changes there is no reason to place them into specific subgroups. Bashkir, for instance, closely matches Kazan Tatar in most respects except phonology, whereas Nogai does not tend to match Kazakh anywhere except phonology. These instances show that in closely related languages, taxonomic conclusions cannot be based upon superficial phonetic similarity alone, since such features may be the result of a secondary mutual interaction or interaction with a third party language. A full analysis of grammatical and unique lexical innovations is required, instead.

    The Oghuz-Seljuk subtaxon

    Oghuz-Seljuk is still a valid subtaxon
    The Oghuz-Seljuk subtaxon (traditionally named just Oghuz) includes such westernmost Turkic languages as several Turkmen dialects, Azeri, Qashqai, Turkish and Gagauz, and is characterized by the following distinctive features.

    Oghuz-Seljuk phonology
    In phonology, the Oghuz-Seljuk subtaxon is marked by the famous Oghuz voicing of initial consonants (t- > d-, k- > g). Note, however, that the Oghuz voicing has never been conclusive or comprehensive — as it has been shown (at least) by A. Scherbak (1970) [(cited in detail by Starostin in The Altaic Problem and the Origins of the Japanese Language (1991)], many words in Turkmen, Turkish, and Azeri preserve the word-initial k- or t-, a feature that seems to go back to the Oghuz proto-state, e.g. Turkmen towuq, Azeri toyuG, Turkish tavuk "hen"; Turkmen kim, Azeri kim, Turkish kim "who", etc. Moreover, many other Turkic languages exhibit temporary intervocalic voicing, e.g. Kyrgyz /maGa gelseN/ "if you come to me" (written as maga kelseN).
    Also see the phonological comparison with Orkhon-Karakhanid below.

    Oghuz-Seljuk grammar
    Several shared Oghuz-Seljuk innovations can be found in grammar, such as:
    (1) The full transition of -ga/-ge, -ka/-ke into -a/-e in the dative case;
    (2) The active usage of -mïsh- in the audative mood where it can join with nouns and adjectives as the contracted form of imish, not just being used as a perfect participle or as the past tense suffix. The latter case is true of Sakha, where -bït- may be used to denote the perfect tense, but false of Turkmen that uses -a:n in the perfect tense, as in the languages of the Great Steppe. In any case, the active use of -mïsh- in certain contexts is supposed to be a typical Oghuz feature.
    (3) The loss of m-/b- in the 1st pers. plural -bïs /
    -mïs verbal ending marker, hence Turkmen -ïs, Turkish -ïz and Azeri -ïk (where the original -ïs has been further replaced by the past tense suffix);
    (4) The frequent use of the synthetic
    Present Continuous Tense with -yor-: Turkmen -yar-, Azeri -yur-, -ir-, Turkish -yor-.

    Oghuz-Seljuk lexis
    A few examples of the Oghuz-Seljuk lexical innovations are listed below. These have mostly been found in Swadesh-215, therefore they all belong to the basic vocabulary. Note that even though some of these cognates may also be known in other Turkic languages and especially in their dialects extending along the borders, the idea is that the words in this particular phonetic shape with this particular meaning and this particular lexical frequency exist only in this branch of Turkic languages, which supposedly demonstrates both its validity and considerable separation from other branches. Most of these phono-semantic formations seem to be innovative.
    (1) Turkish bura-(da), Azeri bura-(da), Turkmen bäri < bu ara "this span, place" (also cf. Kazan Tatar bire-dê "here", which shows that this word seems to have been borrowed into Kimak-Kypchak-Tatar), as opposed to *munda and *bu yerde in other western Turkic languages;
    (2) Turkish nere-(de), Turkmen nire-(de) "where" < ne ara "which span, place". Curiously, the word is also known in modern Uyghur as nerde "where";
    (3) Turkish chok, Azeri chox "many, very", Turkmen choq "a crowd", as opposed to köp in other western Turkic language;
    (4) Azeri
    chaga, Turkmen chaga "child", Turkish chaga "baby", as well as Turkish choJuk ("child" < "piglet"), as opposed to bala in most other Turkic languages;
    (5) Turkish kök, Azeri kök, Turkmen kök "root";
    (6) Turkish ada, Azeri
    ada, Turkmen ada "island";
    (7) Turkish geche, Azeri
    geche, Turkmen giye "night". An archaism, judging by the fact that exists in Chuvash as kas', which shows that this might have been the original way to say "night", probably subsequently displaced by tün in most Turkic languages after their separation from Bulgaric. It is also inconsistently found in Karachay, Crimean Tatar (most likely from Ottoman Turkish), Uzbek and Salar, which seems to confirm that this word is an archaic retention;
    (8) Turkish dösh (colloq.), Azeri dösh, Turkmen dösh "breast", as opposed to emchek in most other Turkic languages; (though also cf. Kyrgyz tösh "breastbone, sternum", Kazakh tös "breast" etc, therefore probably an archaism);
    (9) Turkish chekmek, Azeri chêkmêk, Turkmen chekmek "to pull", as opposed to variations with the tart- root in most other Bulgaro-Turkic languages.
    (10) Turkmen kütek, Azeri küt, and Turkish kör (literally "blind" in Turkish, apparently with an irregular change) "dull (as of a knife)", as opposed to *otmes, *maka, etc in other TL's.
    (11) Turkish köpek, Azeri köpêk, Turkmen köpek "dog", as opposed to more archaic it in other TL's, which is less frequent in Oghuz-Seljuk. Essentially, this is an Oghuz-Seljuk word though found in other borderline TL's where it is much less common
    (12) Turkish genish, Azeri genish, Turkmen ginish "wide" with the -sh suffix. Besides that, the Sevortyan's dictionary apparently incorrectly cites Kyrgyz, where keNish means "widening" [see Yudakhin's dictionary of Kyrgyz], and Karakalpak, where it is naturally keN as in most other TL's outside, such as Tatar, Bashkir, Karachay, Kazakh, Kyrgyz, Karakalpak, Uzbek, Uyghur;
    (13) Turkish üflemek, Azeri üflümek, Turkmen üflemek "to blow (at smt.)";
    (14) Turkish dönmek, Azeri dönmêk "turn (right, left, back)"; Turkmen dönmek "return, turn back". Also, Tatar tün- "to turn over (upside down) and probably in other closely-related Kimak-Kypchak-Tatar languages, but with semantic differences, however the word seems to be originally Oghuz;
    (15) Turkish saG, Azeri saG, Turkmen saG "right (side)", probably from the original meaning "healthy" acc. to Clauson;
    (16) Turkish günesh, Azeri günêsh, Turkmen günesh "sunny (side), sun", as opposed to gün in most other Turkic languages, though the latter is used in Oghuz-Seljuk just as well;
    Turkish düz, Azeri düz, Turkman düz "smooth", as opposed to *tegiz in most languages of the Great Steppe. Also found in Altay-Sayan languages in the same meaning, albeit this is probably coincidental;
    (18) Turkish kurt, Azeri kurd, Turkmen gurt, möjek "wolf", apparently, originally pejorative from "a bug", that is "a parasite that kills sheep", probably as a folksy Turkic elaboration of the Persian gurg "wolf"; first mentioned by Mukhamed al-Kashgari c. 1073 as an Oghuz word; *böre in most other TL's). (Note that this word is not included into Swadesh-200);
    As you can see, there are multiple lexical differences which clearly put Oghuz-Seljuk aside from other Turkic languages, making them rather mutually unintelligible with other subgroups.

    Oghuz history and geography

    The Oghuz people first appear in history after 605 or 630 AD [see S.G. Klyashtornyi, Stepnyye imperii: rozhdeniye, triumf i gibel (The Steppe Empires: birth, triumph and disintegration), Saint Petersburgh (2005) ]. They are clearly mentioned in the Orkhon inscriptions c. 720, which makes them, along with Kyrgyz and Türük, one of the oldest historically attested Turkic confederacy of clans. In the Orkhon inscriptions, they are described as a Toquz Oghuz confederacy waging war with the Tür(ü)ks, but finally conquered and subjugated by the latter. Therefore, we may note that a clear ethnological difference between Türük and Oghuz has been evident starting from the earliest historical records, which implies that Oghuz tribes have most likely formed as a distinct linguistic and ethnographic entity at least a few centuries earlier, that is before 600 AD.

    Generally speaking, the mere fact of ethnonymic coincidence between the Transoxanian Oghuz of the Aral Sea, the Üch Oghuz of Kyrgyzstan, and Toqquz Oghuz of Mongolia [as first noted perhaps by Lev Gumilev] is not enough to proclaim the common descent of the respective languages. It takes verifiable linguistic and historical data to show that any two of these ethnic groups are related — ethnonymic evidence alone may not be sufficient. However, it is very plausible to assume herein that the Transoxanian (= or Aral) Oghuz tribes described by Ibn Fadlan in 921 had been direct descendants of the Üch Oghuz tribes (?) in the Kyrgyzstan region and ultimately the Toquz Oghuz of the Kul Tekin and Bilge Kagan inscriptions in Mongolia.
    Outside the famous Toquz Oghuz, there existed other ethnonyms of the same structure, such as Seqiz Oghuz [mentioned in the El Etmish Bilge Kagan inscription (759)], Toquz Tatar [idem], Uch Karluk [idem], etc. Therefore, the number before the ethnonym could easily change depending on political circumstances, and apparently just denoted the number of clan units forming a tribal confederacy. Continually mentioning this number before the ethnonym must have been important from the military and diplomatic point of view, because it showed how many tribal units participated in a given conflict or how potent and influential they could be.
    Being first historically attested in the 720's in Mongolia as enemies of Tür(ü)ks and vaguely located somewhere in the vicinity of Kyrgyz and Tatar tribes, the Oghuz people have rather uncertain geographical origins. However, they must have probably been situated somewhere near Dzungaria or to the north of it.
    In any case, the typically southern location of the Oghuz tribes may imply their connection to the southern Orkhon-Karakanid branch of the Turkic languages.
    A definitive geographical attestation, such as in 921 by Ibn-Fadlan, pinpoints the Oghuz tribes already in the Aral-Caspian region, when they were partly Islamized. Ibn al-Athir, an Arab historian, explained that the Oghuz tribes had moved to the Syr-Darya River in Transoxania during the period of the caliph Al-Mahdi (775-785). Therefore, their westward movement along the northern Tian Shan must have been rather quick on historical scale. The Aral-Caspian locus implies that, from the 8th century onward, the Transoxanian Oghuz language must have existed in close contact with the languages of the Great Steppe and thus in no direct interaction with the languages located to the south of the Tian Shan, such as Old Uyghur, Karakhanid or Old Turkic.
    It is plausible to assume that the Transoxanian Oghuz tribes concentrated along the Aral Sea coastlines and probably engaged in fishing to supplement their diet with maritime products. There are certain (albeit rather inconclusive) historical, archaeological and journalist reports of tomb ruins and towns, such as Yangï Kent, found in the vicinity of the Aral Sea and dating to the last millennium, and probably belonging to Oghuz tribes, though archaeological issues are outside the main scope of the present work, and cannot be discussed this here at much length.

    The Oghuz-Seljuk subtaxon seems to constitute a linguistically valid unity that had originally been located most likely somewhere near Dzungaria and Lake Zaysan c. 730's, but migrated by 780 to the Syr-Darya River and then to the Aral Sea, apparently having moved along the Tian Shan which is the shortest and most suitable route that avoids waterless areas of central Kazakhstan. By the 920's, the Oghuz people were clearly described in the region located between the Aral and Caspian Sea.

    Seljuk as a subtaxon of Oghuz

    Secondly, there are certain innovative features that separate the Seljuk languages, such as Turkish, Gagauz and Azeri, from Turkmen, which results in the need to differentiate the Seljuk subtaxon from the rest of the Oghuz subgrouping. As a result, we will normally use the term Seljuk-Oghuz instead of just Oghuz to stress the composite nature of this subgroup.

    Seljuk lexis
    The following lexemes in Swadesh-200, taking into consideration their exact meaning and phonological shape (do not to confuse with "cognate", which is often understood as a more vague term!) are absent in Turkmen, making Turkish and Azeri particularly close to each other:
    (1) saymak "to count (numbers)", cf. Turkmen sanamak "to count" and saymak "to believe, think";
    (2) silmek "to wipe (dust)", cf. Turkmen süpürmek;
    (3) baGïrsak "intestine (gut)", an innovation, cf. ichege in most Turkic languages;
    (4) bura-da "here", as opposed to Tukmen bu yerde, mïnda, shu tayda, etc;
    (5) ora-da "there", as opposed to Tukmen ol yerde, ol tayda;
    (6) her shey "everything" (from Persian), as opposed to Turkmen hersi "every", hemme, barï "everything";
    (7) Turkish chok, Azeri chox "much, many; very", an innovation, as opposed to köp in Turkmen and most languages of the Great Steppe;
    (8) düshünmek "think", an innovation, as opposed to Turkmen "understand, know";
    (9) vurmak "hit", with the innovative /v-/, as opposed to *ur- in most Turkic languages;
    (10) Turkish kadïn, Azeri qadïn "woman", probably a retention, instead of heley, ayal (from Arabic) in Turkmen and many languages of the Great Steppe;
    (11) Turkish anne, ana, Azeri ana "mother", a retention (in Turkmen ezhe is more common, ene usually means "grandmother", sometimes "mother");
    (12) Turkish orman, Azeri orman (poetic), usually meshê "forest", a retention (cf. Turkmen tokay, zheNNel);
    (13) The absence of the verb aytmak "to speak, talk", which is common in most languages of the Great Steppe, but which acquired a different unrelated meaning in Seljuk;
    (14) Turkish uyumak "to sleep", Azeri uyumêk "to fall asleep", a retention, cf. Turkmen uklamak, Uzbek uxlamoq, Uyghur uxlimaq "to sleep";
    (15) Turkish onlar, Azeri onlar "they", but olar in most other languages from Turkmen to Tuvan;
    (16) Turkish kïsa, Azeri kïsa "short", but qïsqa in most other languages from Turkmen to Tuvan;

    In the same way, outside Swadesh-200:
    (18) Gagauz ev, Turkish ev, Azeri ev "home", a retention, as opposed to öy in most languages of the Great Steppe.
    (19) Turkish olmak, Azeri olmêq "to be", an irregular phonetic innovation, as opposed to bol- in most languages of the Great Steppe.

    Lexicostatistically, there's a merely poor relatedness of 74% between Turkish and Turkmen (Swadesh-215, borrowings excluded), which, more or less conforms to the early separation of the Seljuk clan from the Oghuz tribes c. 985. Accordingly, there is a much better Turkish-to-Azeri (86%) lexical overlapping.

    Seljuk history and geography
    The split of the Seljuk clan from the Oghuz confederacy in 985 and the formation of the Great Seljuk Empire by Tughril Bek in 1037, which must have resulted in the emergence of the Seljuk languages, is well known from history, so the early diversification of the Aral Oghuz tribes into the Turkmen and Seljuk predecessors should not raise much doubt.

    Based on lexical and historical evidence, we may conclude that Turkish and Azeri form a separate Seljuk subtaxon within the Oghuz languages.

    Oghuz-Seljuk is indirectly related to Orkhon-Karakhanid
    At first glance, the Oghuz-Seljuk languages seem to share a number of linguistic features with Orkhon and Karakhanid languages, but most of them seem to be archaisms, not innovations, therefore there is little evidence that could clearly demonstrate the direct descendence of Oghuz-Seljuk from the Orkhon-Karakhanid subtaxon, however certain proximity is immediately obvious. Naturally, some of these features are also found in Uyghur and Uzbek that inherited certain features from Karakhanid, so instances from these languages are also sometimes listed below, even though they belong to the Great-Steppe subtaxon.

    Oghuz, Karakhanid and Orkhon phonology
    (1) the presence of the inrevocalic -G- and the word-final -G, as in Turkmen baGïr, aGïr, Uyghur beGir, eGir, Uzbek —, oGir, Karakhanid baGïr, aGïr "liver", "heavy"; Turkish, Azeri, Turkmen daG, Uzbek, Uyghur, Karakhanid taG (either an archaism or innovation);
    (2) a typical sonorization pattern as in *sekkiz, *doquz (rather an archaism), as opposed to the Kimak-Kypchak-Tatar *segiz, toGuz;
    (3) the retention of the nasal -N- as in Azeri sümük, Turkmen süNk, Uyghur söNek, Orkhon Old Turkic, Karakhanid söNük (probably an archaism);
    (4) the lenition -d-,-t-,-l- > -l- as in -lar, -ler; this feature could rather be called the light Turkic consonantism. It is also shared by Kimak-Kypchak-Tatar (see below), especially in the languages to the west of East Bashkir, Baraba, etc. This feature is most likely an old innovation that spread to Kimak-Kypchak-Tatar from Oghuz when they were in contact;

    On the other hand, the Oghuz-Seljuk languages exhibit certain features which clearly differentiate them from Karakhanid and Old Turkic. Makhmud al-Kashgari's (1073) cited over 200 Oghuz-specific words and a number of classical phonological Oghuz mutations. These mutations, present as early as the 11th century, allow to distinguish between the modern / medieval Oghuz languages from Karakhanid:
    (1) m- > b- (as in "I"); (2) t- > d- (as in "camel"); (3) b > v (as in <äv> "house"); (4) -G- > -0- (as in > "throat", > "going"); (5) -D- > -y- (as in <äyïg> "bear", "birch" with the loss of -ð- as opposed to the Karakhanid ).

    As a result, Al-Kashgari (1072) described Oghuz (whatever he understands by this term) as a dialect quite different not only from Kypchak, but also from the "normal" and "pure" Turkic, which to him naturally was Karakhanid, implying there was a rather early differentiation between Oghuz and Karakhanid languages.

    Oghuz, Karakhanid and Orkhon grammar
    (1) the frequent use of the verb -mïsh in the past tense in Oghuz and Orkhon-Karakhanid. The -mïsh marker by itself as used in the past tense is definitely an archaic retention, as it is likewise known from Sakha in the form of -bït-, -mït-, -pït-. Besides, even though -mïsh- is no longer used in modern Kimak-Kypchak-Tatar, it was used as past tense in Cuman-Polovtsian. It also seems to be sometimes found in Chagatai.
    (2) the retention of siz "you (pl.)"; that this is an archaism is evident from the Chuvash esir alone;

    Oghuz, Karakhanid and Orkhon lexis
    Most of the Oghuz-Seljuk-specific words can in fact be explained from Karakhanid sources [see Drevnetyurkskiy slovar (The Old Turkic dictionary), Editors: V.M Nadelyayev, D. M. Nasilov, et al., Leningrad (1969)].
    (1) Oghuz *el (hand), Karakhanid, Old Uyghur eliG (also in Chuvash, Sakha, Yugur); this word is not shared by Uzbek, Uyghur, Kimak-Kypchak-Tatar;
    (2) Oghuz-Seljuk *burada, *orada, *nerede; also cf. Modern Uyghur nerde [ne:de];
    (3) Oghuz-Seljuk choq "much, very", Karakhanid choq "much, very" (attested circa 1070);
    (4) Oghuz-Seljuk
    kök "root", Karakhanid kök "root" (1070);
    Oghuz-Seljuk geche, Karakhanid kechê (1070);
    (6) Oghuz-Seljuk dösh "breast", Karakhanid tösh (1070);
    (7) Oghuz-Seljuk chek-, "to pull", Karakhanid chek- "to pull; tie" (1070);
    (8) Oghuz-Seljuk köpek, Karakhanid köpêk "dog" (1070);
    (9) Oghuz-Seljuk günesh "sun", Old Uyghur (?) (attested in the Irq Bitig) künêsh;
    (10) Oghuz-Seljuk
    düz "smooth", Orkhon Old Turkic (735), Karakhanid (1070) tüz;
    (11) Seljuk ev "home", Karakhanid (1070) ev;
    (12) Seljuk
    uyu- "to sleep", Karakhanid (1070) uDï-;

    The retention of the many Orkhon-Karakhanid archaisms in Oghuz-Seljuk is evidently indicative of the Oghuz relatedness to the Orkhon-Karakhanid subtaxon at the lexical level.

    Oghuz, Karakhanid and Orkhon history and geography
    Curiously, on the basis of historical sources, S.G. Klyashtorniy describes the Toquz-Oghuz tribes as something that has naturally split off from the Uyghur tribal confederacy.
    In 605, [...] the Uyghur leader has taken his tribes to the Khangai Mountains, where a separate group was created, known in the Chinese historiographical sources as "the nine tribes". In the Orkhon inscriptions, this group was named Toquz-Oghuz. [Stepnyye imperii: rozhdeniye, triumf i gibel (The Steppe Empires: birth, triumph and disintegration), Saint Petersburgh (2005)].
    Therefore, we may even conclude that Oghuz is nothing but a different pronunciation of Uyghur, which can easily be explained by the widespread use of the liquid affricate in Mongolian (and most likely the nearby early Turkic languages and dialects), where /r/-/l/-/s/-/z/ are pronounced as mere allophones of the same phoneme. In other words, it is not even necessary to add any evidence from the Bulgaric languages, where the /z/ to /r/ transition is commonplace, rather the local Mongolic data alone provide enough support for this suggestion, as the -z to -r mutation may have arisen either on the basis of incorrect Mongolic-based translations, transcriptions, Sprachbund phonology or, finally, the remains of genetic Bulgaro-Turkic relatedness. In any case, the hypothesis that Oghuz and Uyghur may have originally been the same word seems quite plausible, albeit not clearly demonstrated.
    Therefore, historical records suggest that the earliest Oghuz tribes must have been located somewhere between the Tarim Basin, Khangai Mountains and Dzungaria, probably near the Mongolian Altai and Dzungarian Gobi.
    We may assume from the geographic perspective that Proto-Oghuz must have originally been a Dzungarian variety of Orkhon-Karakhanid, that had initially moved towards Mongolia but either stayed midway in Dzungaria or even turned back again from Mongolia towards the Altai and / or Mongolian Altai Mountains. This Proto-Oghuzic backwave probably occurred by the 6th century during the initial stages of the rise of the Gökturk Kaganate. As a result, the Oghuz superstratum apparently traveled back through the Zaysan Passage towards the Irtysh river where it must have run into the Kyrgyz tribes, or the speakers of various Kyrgyz-Karluk dialects.
    On one hand, the Orkhonic features in Oghuz-Seljuk are remarkable and Oghuz seems to be clearly related to Karakhanid and Old Uyghur as it shares both archaic retentions and innovations, and even bears nearly the same name. Moreover, historical sources seem to vote for the split of Oghuz from Uyghur circa 605. On the other hand, the phonological changes in Oghuz, as compared to Karakhanid, mentioned as early as 1073, should have taken some glottochronological time, probably consistent with about 500 years of separation, therefore we should confirm that Oghuz was not a direct Karakhanid offshoot, but rather its sibling, that had separated from the Old Uyghur stem. Consequently, Oghuz was a different branch of Orkhon-Karakhanid dialects that must have traveled a different geographic route from the Altai region, without getting intermingled with the Karakhanid and Kara-Khoja dialects of the Tarim Basin. As it has been described above, the only alternative route available is located to the north of the Tian Shan mountains, and indeed, we do know from historical records that this route was explored by the Gökturks as early as 600-700s AD. We also know that the Oghuz tribes must have migrated from the Irtysh to the Syr-Darya River along this path somewhere between 730 and 780 AD.
    Therefore, based on historical and linguistic evidence, we conclude that Oghuz diversified from the early Old Uyghur and Karakhanid in Dzungaria by c. 600 AD, and then some of the Oghuz tribes traveled along the northern Tian-Shan towards the Syr-Darya River c. 750 AD.

    Notes on the confusion about y-/J- in Oghuz and Kimak
    Before we proceed any further, we should consider the controversy related to the "flickering" pronunciation of the famous Turkic initial J-/y-, which becomes particularly unstable when it comes to the Kimak-Kypchak-Tatar subtaxon. [We should remind again that /J-/ herein transcribes a consonant approximately similar to the English .] As we have mentioned in the beginning, Proto-Kimak partly lost its original Proto-Kimak-Kyrgyz word-initial *J-, which began to mutate into y-, although this transition has never been conclusive throughout the Kimak languages. For instance, *J- survives in Karachay-Balkar; whereas in Kazan Tatar it was preserved before- i- (hence Kazan Tatar Jir "earth", Jil "wind"), but changed to y- before other vowels (hence Kazan Tatar yafraq "leaf", yul "road", yïlan "snake", yörek "heart"). On the other hand, *J- also survives in the dialects of North Crimean Tatar in all positions.
    Hence, apparently the Old Russian zhenchug' "pearl" (first attested c. 1160) and Hungarian /JönJi/, etc., originally from Chinese, but most likely borrowed from Cuman-Polovtsian (the latter belong to the Kimak subtaxon) [though an earlier borrowing from Bulgaric cannot be completely excluded].
    Besides that, Mahmud al-Kashgari claimed that there existed a y- > J- or ' [zero or an Arabic hamza] mutation both in Oghuz and Kypchak.
    For example, the Turks [=Karakhanid Turks] call a traveler yalkin, whereas they [Oghuz and Qifchaq] call him 'alkin. The Turks call warm water yilig suw, whereas they say ilig with the 'alif. Likewise, the Turks call a pearl yinchu, whereas they call it Jinchu. The Turks call the long hair of a camel yigdu, whereas they call it Jugdu. [Diwanu l-Lugat al-Turk (c. 1073)]
    The Uguz and Kifzhak say the words beginning with y- as J-: ul mani Jatti (he reached me) instead of yatti. At-turk say suvda yundum (I bathed in water), whereas they [Oghuz and Qifchaq] say Jundum. Amongst the Turks and the Turkman, there exists this constant rule. [Diwanu l-Lugat al-Turk (c. 1073)]
    Despite this quote, al-Kashgari also confusingly cites a good dozen of Oghuz words beginning with the y-, as if, either what he had said earlier no longer applied to them, or the reader was supposed to make the y-to-J substitution for himself. Consequently, the reader is left to wonder whether it's a mistake or a dialectical or allophonic variation. Neither is it clear why /J-/ is mostly absent from the modern Oghuz languages, such as Standard Turkmen. However, at a closer look, we find out that /J-/ exists in many dialects of Turkmen, specifically, Karakalpak Turkmen, and as the /J-/ > /d'-/, /t'-/ mutation in Saryk, Yomud, Ersar dialects of Turkmen [see Sravnitelnaya gramatika tyurkskikh yazykov. Fonetika (1984) p. 261 ], which makes al-Kashgari claims more plausible.
    The allophonic variations between J- and y- are also reported in East Bashkir [proficient speakers (2011)], and many other Kimak-Kypchak-Tatar languages.

    It seems that the J-/y- were interchangeably used both in the early Oghuz and Kimak-Kypchak-Tatar languages. Both the former and the latter still retain wobbly allophonic usage, which varies across different dialects. The real life pronunciation, which may differ from textbook or written fixation, as well as multiple allophonic variations add more plausibility to Mahmud al-Kashgari's account.

    The Orkhon-Karakhanid subtaxon
    Orkhon-Karakhanid as a valid subtaxon
    The Orkhon-Karakhanid subtaxon is thought to include, among the most significant representatives, Orkhon Old Turkic, Old Uyghur (Kara-Khoja), and Karakhanid. The relatedness of the Oghuz-Seljuk and Khalaj to this group is less evident.
    Note that in some sources, such as Lars Johanson's Turkic Languages, Starling database, etc. Orkhon-Yenisei Old Turkic, Old Uyghur (Kara Khoja) and Karakhanid are all confusingly viewed as the same language. We should stress that, in theory, there might be no direct connection between these three languages (or even between Orkhon and Yenisei Old Turkic inscriptions). It actually stands to be demonstrated that they all belong to the same subtaxon.

    The Orkhon-Karakhanid languages are rather geographically and linguistically opposed to the languages of the Great Steppe and Siberian Turkic languages, demonstrating a significant number of peculiar features.

    Orkhon-Karakhanid history and geography

    All the languages of this subtaxon were located to the south of a relatively narrow passage that separates the Tian Shan ridges from the Altai-Sayan mountain system. Therefore, these languages belong to the desert and semi-desert habitat of Dzungaria, Tarim Basin, Mongolian Gobi and southern Mongolia.
    As we mentioned above, the Kul Tigin, Bilge Kagan and other Orkhon inscriptions describe the Tür(ü)ks (the speakers of Orkhon (Old Turkic)) as enemies of Oghuz, Kyrgyz, Tatars and many other Turkic ethnicities, therefore we may expect a clear differentiation between Orkhon-Karakhanid and other Turkic groups by about 550 AD, when the events accounted in these inscriptions were taking place. This predicts a relatively long history of ethnic and linguistic separation between Orkhon-Karakhanid and other Turkic languages, which had begun at least five centuries before that date, judging by the minimum reasonable amount of glottochronological time required for a language formation (taken that the Tür(ü)ks spoke a language-dialect different from their adversaries).

    Orkhon-Karakhanid phonology
    (1) A distinct and stable *S- > y- innovative transition:
    cf. Chuvash s'ichê, Sakha sette, but Orkhon Old Turkic yeti, Karakhanid yeti "seven";
    Sakha süreq, Tuvan chüreq, but Orkhon Old Turkic, Karakhanid yüreq "heart".
    This process left few traces of the original *S- in any of the Orkhon Turkic descendants and is clearly attested as /y-/ in Karakhanid by Makhmud al-Kashagri;

    (2) The presence of an intervocalic -G- and the final -G:
    cf. Chuvash pôver, Sakha bïar, Kypchak *bawur, bawïr, but Orkhon Old Turkic and Karakhanid baGïr "liver";
    Orkhon Old Turkic taG, Karakhanid taG; Uzbek, Uyghur taG (from Karakhanid), as well as in Oghuz-Seljuk: Turkish, Azeri, Turkmen daG; Khakas, Tuvan, Tofa taG (an independent formation), but Proto-Kimak *taw, Kyrgyz too.
    However, it is often rather hard to tell whether these are archaisms or innovations. Judging by the coincidental usage of -G in the Altay-Sayan subgrouping, it may be an archaism.
    (Naturally, the loss of -G-, -G in Turkish and Gagauz as in the Turkish olaJaGïm > olïJa:m (I will be) is a historically recent and completely different phenomenon).

    (3) The retention of the intervocalic sonants -n-, -ng-, -m-, where the Great Steppe and Altay-Sayan have -y- or zero.
    Cf. Karakhanid süNük, Orkhon Old Turkic süñök (and Turkmen süñk, Azeri sümük), but Proto-Kimak *süyek "bone", Tuvan, Khakas, Kyrgyz söök. That this is an archaic retention is evident at least from Sakha unuox and Chuvash shâmâ, where the sonants are also retained.

    (4) The retention of the intervocalic -D- as in Orkhon Old Turkic and Karakhanid aDak "foot", uDï "sleep", which was possibly pronounced as an alveolar /ð/ (due to lenition which finally led to its loss in other branches), as opposed to the languages of the Great Steppe which all have -y- . That this is an archaism is evident from Khakas azax, Chuvash ura;
    (5) Possibly, the lack of sonorization in -k-, as in Old Orkhon Turkic säkiz, toquz; Karakhanid säqiz, toqu:z, Proto-Oghuz-Seljuk *sekiz, *doqquz, but Proto-Kimak*segiz "eight", *toGuz "nine", and Kyrgyz segiz, toGuz;
    (6) Possibly, the retention of the word-final -b /-v as in Orkhon Old Turkic sub, Old Uyghur suv, Karakhanid suv ; Turkmen suv; (also Kimak-Kypchak-Tatar *suw), but Sakha u:, Tuvan, Tofa suG, Khakas suG, Altay su:, Kyrgyz-Kazakh su:; Oghuz-Seljuk;
    (7) Possibly, the -S* > -ch word-final transition, where the original palatalized *S was stabilized through fortition:
    cf. Chuvash vís's'ê Sakha üs, :s, Tofa üish, küsh, Tuvan küsh, Khakas üs, küs, but Orkhon Old Turkic üch "three", küch "force";
    Chuvash ês'-, Sakha is-, Tuvan izh-, Khakas is/iz-, but Proto-Orkhon-Oghuz_Karakhanid (Turkic, Azeri, Turkmen, Uyghur, Uzbek) and Proto-Kimak-Kygyz ich- "drink".

    Orkhon-Karakhanid grammar
    The following features are notable in grammar:
    (1) The retention of a consonant in the verbal copula er-/är- as opposed to e-/i- in Oghuz-Seljuk, Kimak-Kypchak-Tatar, Yakut, Altay-Sayan, etc. Cf. Old Uyghur ärür, Orkhon Old Turkic er-, and Karakhanid ol (pronoun that might have substituted the original copula). (Also retained in Yugur (see below))
    (2) The retention of the instrumental case ending-(n)in, -(n)ïn. Albeit substituted by -la in Kalaj. Also present in Sakha (-nan), Khakas (-naN, -neN), therefore is probably an archaic retention;
    (3) The formation of the directive case ending in -Garu, -gärü, found in Orkhon Old Turkic, Old Uyghur, Karakhanid; although absent in Khalaj;
    (4) The use of -Gai, -gey, -qay, -kêy as the Future Tense instead of its use as the Optative Mood in Old Uyghur, Karakhanid, Khalaj, and Chagatai (where it apparently comes from Karakhanid). Also found in a rather disjointed fashion in Yugur, Cuman-Polovtsian, Tofalar, where it might have emerged from the Optative Mood independently.
    Orkhon-Karakhanid lexis
    The lexicostatistical research of Orkhon Old Turkic, Old Uyghur and Karakanid is absent, except for the results provided by Anna Dybo for Swadesh-110 (2006), which attempt to position the Old Turkic somewhere at the bottom of the Great Steppe subtree.
    However, judging by the strong lexical differentiation of the Oghuz-Seljuk branch, related to Orkhon-Karakhanid, we should infer that the latter must have been at least just as differentiated.

    Based on (1) the clear-cut geographical separation by the Sayan-Altay-Tian-Shan mountain system; (2) exclusive features in phonology and grammar not shared by either Siberian or Great-Steppe, (3) some arguable evidence from an unfinished lexicostatistical study, we may infer that Orkhon-(Oghuz)-Karakhanid was a separate branch in its own right similar to the Siberian Turkic languages and the languages of the Great Steppe. The inference is mostly based on exclusion of other subgroups, rather than on positive evidence, because the direct documentation, such as full-fledged Swadesh lists or accurate pronunciation guides of Old Turkic, are absent due to extinction of the subtaxon representatives.

    Khalaj is probably an offshoot of South Karakhanid
    Apparently, no other question in formal turkology has been filled with so many nonsensical overestimations as the position of Khalaj, grossly exaggerated in the studies of Gerhard Doerfer. Nevertheless, there is some truth to some of those claims: Khalaj seems to be the only present-day survivor of the extinct Orkhon-Karakhanid branch, which indeed makes it stand conspicuously different against its Seljuk-Iranian background. In the present research, Khalaj is rather viewed as an offshoot of the southern dialect of Karakhanid or Old Uyghur with considerable and predictable Azeri and Persian posterior influence.
    The first clear and concise account of Khalaj was composed by Minorsky [V. Minorsky, The Turkish dialect of Khalaj, Bulletin of the School of Oriental Studies, London (1940) ] during his stay in central Iran in 1906. Minorky's views on Khalaj classification were quite reasonable and rather contained. However, according to Gerhard Doerfer, who revisited the Khalaj speakers in 1968-73 and then published a series of articles in 1974-78, Khalaj is some kind of a fundamental Turkic language, more or less like Chuvash or Sakha. This concocted idea has been spreading like a turkological virus, apparently because Khalaj is so remote that no one knows anything about it and no one has been able to check or revise that view (at least until the 2000's), with most information on this language coming from Minorsky's and Doerfer's articles only. [Note that Doerfer also denied the existence of the Altaic family.] As Oleg Mudrak noted in his morpho-statistical classificational study (2009), Doerfer's position on the subject "rather reflected the joy of discovering a language retaining the archaic -d-", than the result of an objective and unbiased analysis.
    In any case, based upon the early studies by Minorksy, certain peculiarities of Khalaj do set it aside from other nearby languages.
    On one hand, the presence of the following grammatical and phonological features mark Khalaj as a typical Oghuz-Seljuk language with no particular features:

    (1) the -ïor- present tense marker;
    (2) the 1st person plural verbal marker with the -Yk eniding, e.g. -d-Yk in past tense, which is evidently from Azeri;
    (3) the typical Seljuk b- > v- > 0 mutation (as in "var", "uol"), evidently as in Azeri and Ottoman Turkish.
    (4) the use of da:l for negation instead of *e(r)mes, which is a typical Oghuz-Kimak feature (see above).

    On the other hand, Khalaj does seem to exhibit some archaic features, not found in Oghuz-Seljuk but typical of Orkhon-Karakhanid
    , such as:

    (1) the unvoiced word-initial t-, k-, as in ta:G "mountain", ki:echä "night", kez, kiz < *köz "eye";
    (2) the retention of -G in disyllabic words, as in ha:chuG "bitter", sa:ruG "orange";
    (3) the retention of the -YmYz verb marker, which is completely atypical of the Great Steppe languages, but typical outside of them, for instance in Orkhon-Karakhanid;
    (4) the striking retention of the copula är- "to be" as in ärti (as opposed to the Turkish and Azeri idi), apparently as in Karakhanid, Old Uyghur and Old Turkic, as well as in Yugur and Salar;
    (5) the full retention of -qa, -ga in the dative case, which is not typical of Seljuk-Oghuz;
    (6) the future tense with -(ï)Ga, which is normally found in Orkhon-Karakhanid (-Gai, -gei, -qai, -kei, etc), though it also developed, apparently independently, at least in Tofalar and Cuman-Polovtsian.

    As you can see, most of these features are grammar-related, which provides significant backup for this hypothesis.

    A lexicostatistical study performed by A. Dybo (2006) viewed Khalaj as being distantly related to Turkish and Azeri.
    From the subjective viewpoint, Khalaj words are usually recognizable and the texts are more or less readable using Turkish and Azeri only, which is evident from the very fact that Minorsky, the earliest researcher of the language, was able to pick up a great deal of words and expressions in his first study.

    However, Doerfer goes much further insisting on a unique position of Khalaj among all Turkic languages.
    Usually, based on his research, the following features are cited as the evidence for the uniqueness of Khalaj: (1) the existence of long vowels as in Turkmen, (2) the occasional or almost persistent presence of the mysterious h- before vowels; (3) the above-mentioned usage of the conjugated copula är-; (4) the above-mentioned retention of the intervocalic -D- as in hada:q "foot"; (5) the frequent usage of -cha in different meanings, including the locative case found in Old Turkic.
    (1) The long vowels may turn out to be a recent development, considering the fact that vocalism often changes much too quickly over time or across different dialects. Let alone we know the long vowels to be present in Turkmen, thus constituting a normal Oghuz-Seljuk feature, which is not sufficient to proclaim Turkmen as an early diversified language. Neither do we have any significant evidence confirming that the long vowels must have necessarily been part of Proto-Turkic. However, they might have been part of the Orkhon-Karakhanid subtaxon, whose vocalism is poorly studied due to the deficiencies of Arabic and Orkhon-Yenisei writing systems. The latter explanation seems to be very likely.
    (2) The retention of the intervocalic -d- may easily be explained by reminding that Karakhanid also preserved the intervocalic-D- as in aDaq until about as late as 1200 AD

    (3) The retention of the archaic är- (to be, is) is a very interesting feature, which is by no means exclusive to Khalaj, as we do find it at least in Karakhanid, early Chagatai, Old Uyghur, Orkhon Old Turkic, Yugur, and Salar. Cf. Khalaj Konduruchä ärtim "I was in Kondurud", koy-är "it is black", yolï pis ärti "the road was bad", varmorum-är "I'm not going". As already noted above, this feature too seems to pinpoint Khalaj as part of this Orkhon-Karakhanid subtaxon.

    (4) As to the famous word-initial h- issue, we can find a possible explanation in Mahmud al-Kashgari's work (1073), which was obvious as early as Minorsky's article (1940), who mentioned the following passage:
    "People of Khutan [= the city of Khotan in the Tarim Basin] and Kanzhak [= another city further to the east] substitute the 'alifs [= the word-initial hamza] by an h. That is why we do not consider them as Turks [=pure Karakhanid Turks], they introduce something foreign into the Turkic speech. For instance, the Turks call the father 'ata, whereas they say hata, the mother — 'ana, whereas they say hana." [Diwanu l-Lugat al-Turk].
    Therefore, we can see quite explicitly that Khalaj might in fact be an offshoot of the South Karakhanid dialect spoken near Khotan, which may have traveled forth along the Silk Road, until it finally settled in Persia where it survived the Mongol invasions of the Karakhanid Khanate tha led to the disappearance of that language of Khotan. The development of an h- may (possibly) be explained by the presence of an Arabic substratum in South Karakhanid, since the vowels in Arabic are preceded by a hamza that may have developed into the "h". The presence of the Arabic substratum in Iran and the Tarim Basin should hardly be surprising, considering this was the Golden Age of Islam and the period of the middle Caliphate, when Arabic was ubiquitous and could have reached Khotan via the Silk road. However, the fact that a different language could have been spoken in Khotan is supported by Marco Polo (1275) who mentions that there were several languages spoken along the southern part of the Tarim Basin; and of course we do know about such Iranian languages as Khotanese and Tumshuqese, which also may have affected Proto-Khalaj.
    Moreover, the word-initial h- is also present in some of the Azeri dialects, where its origin is unclear.

    Besides, we should explain that the claim that h- in hadaq may be so archaic that it goes back as far as the Proto-Altaic state is rather absurd. The Mongolic and Tungusic-Manchu languages have extremely complex rules for the word initial x-/h-/ 0- correspondence. Such an h- may be present in one language but then disappear in another, or mutate into an f-. In fact, there's no conclusive proof that the Middle Mongolic h- can be traced back to a *p-. To the contrary, in many cases it seems to correspond to the Turkic k- / q-, e.g. Middle Mongolian hula'an, Khalkha uLa:n /ush'an/, Dongxiang xulan, Dagur xula:n, Bonan fulaN "red", cf. Chuvash xerle, Turkic qizil < *qiRil (see Mongolic/Tungusic Correspondences). The Tungusic word *xalgan "foot" (as in Evenk, Negidal) is apparently akin to the Middle Mongolian kol "foot", probably having nothing to do with *adaq. On the other hand, palzhan in Orok (foot) might in fact be a secondary development from xalgan > falgan > palzhan, whereas the Nanay begdi may be a different word altogether, akin to the Turkic but. As one can realize, it's all very complicated and far from obvious. It's very unlikely that anyone has ever shown that the Khalaj h- is in a regular correspondence with the Altaic roots, as it takes to solve the Altaic problem first, which had not been achieved so far; not even by Starostin, a vehement supporter of the Altaic theory.

    Furthermore, the hypothesis of h- as the unique survivor is simply not statistically viable. If Khalaj were so archaic, other languages would also exhibit similar traces of the Proto-Turkic *h-. Also note the relatively large number of h's in Khalaj before vowels, which can hardly be all traced to the same phoneme, especially considering there is no further corroborating evidence that such a phoneme even existed in Proto-Turkic. Therefore, the prothesis hypothesis, with the h- being just a secondary formation in Khotanese dialect, seems a much more plausible option.

    (5) Additionally, both Minorsky and Doerfer found the usage of -cha in Khalaj in the locative meaning, as in ucha "in the sleep", yanïcha "on its side". On this basis, Doerfer (1971) assumed that this was the ending of an ancient (?) locative case. However, there seems to be no locative case with -cha in Old Turkic, only a comparative case with -cha in Orkhon Old Turkic and Old Uyghur. That may be an independent development based upon the usage of the comparative -cha/-che when answering the how-question, e.g. "how? where? — in the sleep". It has the same common adverbial meaning as, say, modern Turkish günlerce "during these days", türkçe "in Turkish", etc. However, this 5th point appears to be somewhat inconclusive, and we must admit that this usage might indeed represent a sort of unique feature, though there is no objective reason to believe it is particularly old and goes back to Proto-Turkic.


    Apparently, Khalaj is yet another "semi-creolized" or "mixed" language that must have formed on the basis of a Karakhanid or southern Old Uyghur h-type substratum, mentioned by al-Kashgari in 1073, with some strong posterior Seljuk influence from Azeri and Persian. Khalaj could have arrived in Iran from the southern towns of the Tarim Basin by moving along the Silk Road. In Iran, it came into contact with the Seljuk languages and the Persian superstratum. Khalaj cannot constitute an early diversified branch of the Turkic languages, as Doerfer suggested, though its still has a few peculiarities lost in other modern Turkic branches. The Orkhon-Karakhanid hypothesis of the Khalaj origin still makes it sufficiently archaic and stand-alone due to the early diversification of the Orkhon-Karakhanid branch itself.

    The Yugur-Salar subtaxon

    Yugur is most likely based on Old Uyghur

    In the present study, the Yugur and Salar languages are regarded as being related to each other, as well as more distantly to the Orkhon-Karakhanid subtaxon with strong posterior influence from nearby Chinese and Tibetan.

    Yugur history and geography
    Yugur and Salar were originally located on the outskirts of the ancient Chinese Empire, in the vicinity of the Silk Road protected by the Great Wall in the north and the Qilian Mountains in the south. From the historical and geographical perspective, they look like a result of Silk Road merchant settlements on the border of China.
    Note that part of the Yugurs were finally Mongolicized forming a small separate Mongolic ethnic group known as East Yugurs or Shira Yugurs speaking a Mongolic language of the same name, which is sufficient to conclude that the Mongolic influence in the region has historically been very strong.

    Yugur and Salar in western China

    An enthographic map of Yugur and Salar [proel.org (2010) (Only a few features added.)]

    Speaking of the origin of Yugur, a simple conjecture would be that the Yugur people could be possible emigrants to Turfan and Ganzhou from the Orkhon Valley civilization (Eastern Uyghur Kaganate) that is said to be destroyed in 840 by the Yenisei Kyrgyz tribes, therefore, in theory, Yugur might be related directly to Orkhon Old Turkic.

    Furthermore, according to Tenishev [E. Tenishev, B. Todayeva, Yazyk zhyoltykh ujghurov (The language of the Yellow Uyghurs), Moscow (1966)], the legends of Yugur people claim that part of their tribes moved from Turfan to Ganzhou after the introduction of Islam, which would have resulted in a geographically natural migration along the Silk Road from the Kara-Khoja Khanate (where Old Uyghur was supposed to be spoken) to their present location. This second hypothesis explains the origin of the ethnonym Yugur / Uyghur and is also geographically viable.

    As a third option, we might assume that the Yugurs may have emerged from the intermingling with the Yenisei Kyrgyz population that must have lived to the north of the area, near Lake Zaysan, and thus consequently Yugur might be related to Proto-Altai-Khakas or Proto-Great-Steppe languages, as it has been advanced in other alternative hypotheses.

    Finally, a fourth suggestion would be that Yugur is a completely independent and poorly-classified branch of the Turkic languages.

    Yugur phonology

    The following mutations support the hypothesis of the Orkhon-Karakhanid origins:
    (1) The *S > y transition is typical of Orkhon Turkic and Karakhanid, such as in yuldïs "star", as opposed Khakas *chïltïs, Altai d'ïldïs, Kyrgyz Jïldïz (though the Kimak-Kypchak tribes also developed a partial *S > *y mutation, as described above). On the other hand, some examples from Tenishev The Language of the Yellow Uighurs (1966) show that a Mandarin-type tsh'- may also be present in some of the Yugur dialects. This feature (if real) does not necessarily makes Yugur related to the Great-Steppe linguistic area, but just may be an archaic retention with allophonic variation.
    (2) The presence of an intervocalic -N- (=nasal) as in Orkhon and Karakhanid, e.g.
    Yugur moNïs, Old Orkhon Turkic or Karakhanid müNüz, as opposed to Khakas müüs, Altai müüs, Kyrgyz müyüz "horn", etc;
    Yugur sïmïk, Old Orkhon Turkic or Karakhanid süNök, as opposed to Khakas sö:k, Altai sö:k, Kyrgyz sö:k. On the other hand, we also have *muNuz, *süyek in Proto-Kimak-Kypchak-Tatar.

    (3) The presence of an intervocalic -G- as in Proto-Orkhon and Karakhanid and their descendants, e.g. Yugur paGïr, Old Orkhon Turkic baGïr, as opposed to Khakas pa:r, Altai buur, Kyrgyz boor, Proto-Kimak bawïr "liver";
    (4) The retention of a final -G as in Yugur taG, quruG, Old Orkhon Turkic and Karakhanid taG "mountain", quruG "dry", but Altai tu:, gurgak, Kyrgyz to:, gurGak, Proto-Kimak *quru. Though, this feature does not exclude the Khakas taG, quruG;
    (5) The retention of the intervocalic -*D- > -z- as in azaq "foot", Guzuruq "tail", cf. Karakhanid aðak, quðruk, Old Orkhon Turkic aDak, and Khakas azax, quzurux. This transition is not necessarily connected with Proto-Khakas, where a similar -*D- > -z- transition must have taken place a long time ago; rather, it seems to be just a natural lenitional mutation that could have occurred independently, and per se cannot demonstrate the relatedness between Yugur and the hypothetical "Yenisei Kyrgyz" language;
    (6) The retention of -lq-, -rq-, e.g. Yugur kurgak, Old Orkhon Turkic qulqaq, but Khakas xulax, Tuvan kulak, Kyrgyz kulak, choñ "ear", etc;
    (7) The presence of the initial i-/y-, where it rather shouldn't be, e.g. Yugur yiGash, cf. Old Turkic ïGach, but Khakas aGas, Kyrgyz Jïgach "tree", Tatar agach, etc; though this point is dubious;

    Yugur grammar
    The Yugur grammar is largely simplified, probably due to contacts with Mandarin and Dongxiang (=Santa). However, the striking retention of the er-type i:re copula and the use of a Future Tense with the -Gu marker instead of the Optative Mood, may also be indicative of its relatedness to Old Orkhon and Old Uyghur. Additionally, there's a peculiar presence of the Future Tense with -qïr (in Yugur, Salar) and -qïsh (Yugur), which is probably akin to the Old Turkic construction verb + qïl/qïsh- "to do smt" (causative). Consider the following table:

    TenseYugurOld OrkhonOld UyghurKarakhanidKhakas
    Future Tense -Gu, -gu, -Go, -go; -Gï, -ge, -kï, -ke
    -tachï, -dachï;
    Giy (rarely)
    -Gay, -gey-Gay, -gey, -qay, -kêyGai/gei,
    qai/kei = Optative Mood
    Perfect Tense-Gan-mïsh-, -mish;
    -mïsh-, -mish-;
    -mïsh-, -mish;
    -Gan-, -gen-, -qan,
    plural-lar, -nar, -dar, -tar-lar-lar-lar-lar, -nar, -tar
    youselersizsizsiz sirer
    copulai:re er-ärür ol (3rd pers. copula)

    Yugur lexis
    Certain common isoglosses are shared with Orkhon Old Turkic, e.g.
    (1) Yugur bezïk, Orkhon Old Turkic beDük, but Khakas uluG, Altai d'aan, Kyrgyz ulu:, choñ "large, great";
    (2) Yugur ïlïG, Old Turkic elig, but Khakas xol, Kyrgyz qol "hand";
    (3) Yugur emïG, Old Turkic emig, but Khakas imJäk, Kyrgyz emchek "breast" etc
    However, in the glottochronological study by Anna Dybo (2006) Yugur was placed into the Khakas-Altai subgrouping, as if it were related to Yenisei Kyrgyz.

    Evidence relating Yugur to Yenissei Kyrgyz
    On the other hand, we could summarize the few pieces of evidence possibly relating Yugur to Yenisei Kyrgyz:
    (1) The fortified -d-, -t- as in the plural marker -lar, -nar, -dar, -tar, a typical "Siberian" or eastern Great-Steppe e.g. Kyrgyz-Kazakh archaism, widespread in the eastern Turkic languages;
    (2) A possible retention of the initial tsh- < ch- < *S in some dialects, as inconclusively noted by Tenishev;
    (3) The phonological opposition of weak semivoiced vs. strong unvoiced, typical of Tuvan, Tofalar, though in fact an area feature. It has most likely emerged independently due to contacts with local Chinese, Mongolic, and Tibetan population;
    (4) The comparative case with -daG, -deg, -taG, -teg, which elsewhere seems to exist only in Sakha (-ta:Gar) (Mongolic?)

    (5) The 2nd pers. plural seler, which is typical of Altay-Sayan and other languages of the eastern region, but not limited to Altay-Sayan only, cf. Khakas sirer, Uyghur siler;

    Based on (1) phonological and some grammatical and lexical evidence, and (2) the geographical position of Yugur along the eastern end of the Silk Road on the Chinese boarder, Yugur may be regarded as a descendant of the Old Uyghur from Turfan as well as probably other eastern towns and oases near the Taklamakan Desert, and less likely, as a result of emigration from the Orkhon Valley, which would be hindered by more pronounced geographical obstacles. Yugur probably formed along the Silk Road outposts after 900 AD, as a contact language spoken by local merchants and traders.
    Judging by the great variety of Mongolic (including Baoan, Dongxian) and Tibetan languages in the same area that had formed apparently due to linguistic intermingling of many Silk Road travelers during the late Middle Ages, the Middle Yugur can probably be regarded as a type of mixed or "semi-creolized" language that emerged as a result of the interaction among Old Uyghur (Kocho) with the local Tibetan, and Mongolic adstrata and Mandarin superstratum.
    On the other hand, the influence of the Yenisei Kyrgyz migrants into Ganzhou from the north could not have been completely excluded, though their role (if any) was probably minor.

    Salar has little to do with Oghuz, but quite a lot to do with Yugur and Uyghur Chagatai

    Salar history

    According to legends, Salar seems to be an eastern Chagatai migration from Uyghur cities of the Taklamakan desert (or even Samarqand). The Salar people arrived in China most likely by moving along the Silk Road after the dissolution of the Karakhanid Khanate during the Mongol invasion of the 13th century. Their historically attested date of arrival is c. 1370 (during the rise of Tamerlane).

    Salar cannot be related to Oghuz-Seljuk directly
    Being a remote and forlorn language far and deep in Central Eurasia, Salar, just like Khalaj and Yugur, has been surrounded by a number of traditional misconceptions. A common widespread belief unsupported by much reasonable evidence is that Salar is an Oghuz language. However, not all scholars accepted this view, and there has always existed certain controversy about this issue:
    Nicholas Poppe in the Remarks on the Salar language (1953) analyzed the Salar lexis and phonology using Potanin's field materials, and came to the conclusion that it must be an "East Turki dialect", meaning that it must be part of the Chagatai-Uyghur language-dialect continuum. (He ignored, however, the striking differences in Salar, which should make it almost completely unintelligible to any other Turkic speaker).
    Tenishev, who studied Salar in vivo in 1957, ambiguously supported its traditional classification as Oghuz despite the many facts to the contrary that he himself had provided [E. Tenishev, Stroj salarskogo jazyka (The structure of the Salar language), Moscow, (1976)].
    A plausible classification of Salar within the Chagatai subtaxon has been suggested (at least) by Karl Menges in The Turkic Languages and Peoples p. 60. (1962, published in 1968).
    On the other hand, Arienne Dwyer argues for the more traditional "Oghuz" positioning of Salar in [Arienne M. Dwyer, Salar: A Study in Inner Asian Language Contact Processes, Part I: Phonology; Turcologica Herausgegeben von Lars Johanson, Band 37,1 (2007)].

    The following features in Salar are often considered to be typically Oghuz:
    (1) Salar exhibits the b > v Seljuk-type transition (as in Turkish, Azeri bar > var), but that cannot be viewed as an intrinsic and unique Oghuz feature, neither it is actually Oghuz (only Seljuk) and can easily be a parallel phonological development.
    (2) The presence of the archaic -mïsh- audative past tense. However, this feature is not uniquely Oghuz, it can also be found in Old Uyghur, Karakhanid, Chagatai and is essentially an archaic retention, which can be supported by the existence of -byt in Sakha.

    There are also a few features that could, in theory, demonstrate some similarity to Turkmen, the most typical representative of the Oghuz subtaxon.
    (1) The lack of personal conjugation in some tenses (such as -Jag
    (future) -makchi (intention), -malï (obligation), which, however, are all absent in Yugur-Salar.) Nevertheless, the loss of grammatical markers cannot be viewed as a shared innovation, and, in Salar, is obviously a result of the secondary contact with Mandarin and Mongolic languages. Actually, a similar process of losing personal conjugation — apparently under the influence of the local languages — has also occurred in Khalkha-Mongolian.
    (2) A peculiar usage of -yok to express negation in verbs in some tenses as in Salar root+yoxtur (Present) and Yugur root+ qïsh + yoqtïr (Future II), distantly similar to the Turkmen root + a + personal marker + ok construction as in yaz-a-m-ok (= "I haven't written", lit. "no my writing"). But evidently, this feature finds a local Yugur parallel, and its analogy in Turkmen may be purely coincidental.

    Furthermore, the comparison to the typical Oghuz shared innovations demonstrates their absence in Salar and theerfore indicated the lack of direct connection between Salar and Oghuz languages (see Oghuz features above for reference):
    (1) No siz pronoun (you). Sele(r), sile(r) for plural and sen for polite reference are used instead, as in Yugur (siller, seller, sele);
    (2) No trace of deyil/deGil, which is a standard form of negation in Oghuz and Kimak-Kypchak-Tatar. More archaic "emes(tir)" is used instead, which is similar to Yugur;
    (3) The dative with -ga/-a, which is not typical of Oghuz, where only -a is used almost exclusively. Cf. -Ga, -ge, -qa, -ke in Yugur;
    (4) The forms of genitive case do not coincide with those in Oghuz, being similar to only those in Karachai-Balkar, the Lobnoor dialect of Uyghur, and some of the Uzbek dialects (see Tenishev (1975)), with the Uyghur and Uzbek dialects evidently being the original source of these mutations;
    (5) The system of verbal tenses is quite similar to Yugur, lacks any personal endings, and has nearly nothing to do with Turkmen, Azeri, or Turkish, except for the most basic forms recognizable in all the Turkic languages;
    (6) There is a notable lack of any typical Oghuz lexical innovations, such as Oghuz-Seljuk *kök "root" : Salar sachax; Oghuz-Seljuk *choq "many" : Salar köp; or phonological innovations, such as Oghuz-Seljuk *boynuz "horn" : Salar moNïz.
    (7) The audative past tense with -mïsh- does exist, but the -mïsh- marker does not seem to join adjectives or nouns, which seems to be the distinguishing feature of the Seljuk-Oghuz languages.
    (8) The root + por/par/padïr = Present Tense grammeme bears absolutely no relation to the Oghuz Present Continuous with -yor-, as Tenishev claims, but is apparently akin to the present tense root-ïp-par in Yugur, where par(dïr) is akin to Karakhanid *bar "be present". Hence, evidently, root + yoxtur in the negation of verbs in Salar.
    (9) There is no "Oghuz voicing" in Salar, as most word-initials are either unvoiced or semi-voiced, which is sometimes incorrectly reflected in writing as fully voiced consonants by European researchers. A simple explanation of this phenomenon is that the Salar phonology tends to follow the Mandarin system: strong aspirated vs. weak semivoiced. The degree of voicing may vary creating the impression of full voicing (Tenishev). This is the usual areal feature common to many languages of the Far East (Yughur, Tuvan, Mongolian, Korean, etc), not necessarily because of the direct influence of Mandarin but rather due to their mutual interaction and the formation of a common linguistic area, especially as far as the phonology is concerned. Furthermore, Tenishev says in his own words:

    The system of the Salar consonantism is so drastically different from the South Turkic (Oghuz) system, which was supposed to exist for the Salar language in the past, that one involuntary arrives to a conclusion of its secondary, posterior origin, and its dependence upon the neighboring languages, such as Chinese, Dongxiang, Tibetan. [E. Tenishev, Stroj salarskogo jazyka (The structure of the Salar language), Moscow, (1976)]

    Consequently, Tenishev explains how the phonological systems of Mandarin and Dongxiang (=Santa) could have affected Salar. He does not go as far as rejecting the "Oghuz hypothesis", probably unwilling to go against the mainstream view of his time, but many of the facts he explicitly mentioned do point in that direction.
    By the same token, as it was shown in the table for the Chagatai subtaxon, Salar cannot be directly related to other Great Steppe subtaxa, at least because of the following reasons:
    (1) the presence of the –G-, -G velar: cf. Kypchak bawïr (liver), tau (mountain) : Kyrgyz boor, too : Salar paGïz, taG.;
    (2) the absence of *tügel, cf. Kypchak tügel : Salar emes;
    (3) Kimak-Kypchak-Tatar *yukla "to sleep": Kyrgyz uktoo : Salar uxla;
    (4) Kimak-Kypchak-Tatar asha- "to eat" : Kyrgyz Je- : Salar yi-, etc.

    Additionally, to confirm the lack of any mutual intelligibility between Salar and most other Turkic languages, we will provide a link to this lovely (and well-performed) traditional Salar song with very simple lyrics: http://www.youtube.com/...

    usher ya, mA(nya) (maNa) ushEr-ya!
    salar (seler) mAnya ushEr
    yaNï pizgen zOrakh-ne tAkhïner pAshï-me
    akokO akokO akokO, pAshï-me
    usher ya, mA(nya) ushEr-ya!
    salar mAnya ushEr
    Ichim tikh-ken tonïmne gi:ir pONï-me

    akokO akokO akokO, poN(ï)-me
    usher ya, mA(nya) ushEr-ya!
    salar mAnya ushEr
    Apam AlGan Ishtan-nE ki:ir di:zï-me
    akokO akokO akokO, ti:ze-me
    usher ya, mA(nya) ushEr-ya!
    salar mAnya ushEr
    Izem Etken khAimne gi:ir ayaq-E
    akokO akokO akokO, ayaq-E
    A broken Turkish translation (with the maximum usage of cogantes) would look something like this:
    Bak bana ya! Üşür ya! (Oh look at me! Gather around me!)
    Sizler (=siz) üşür! (You all gather (trans. = the crowd)!)
    Yeni bezeyen shapkayı "taşırım" (giyirim) başımda (The newly ornated hat, I shall wear on my head)
    Anne-m(-in) dik-en palto(sunu) giyirim bedenimde (The by-my-mother sewn coat, I shall wear on my-self (my body))
    Babam(-ın) al-an pantalon(unu) giyirim dizimde (The by-my-father bought pants, I shall wear on my knees)
    Kendimin ed-en (yap-an) ayakkabı(yı) giyirim ayakta (The by-my-own-made shoes, I shall wear on my feet)
    Apparently, no direct connection to the Oghuz-Seljuk languages can be found. In fact, most Turkic words in the song lyrics are barely recognizable. Actually, nowhere outside Chuvash and Yakutic do we find such strong phonological, lexical and grammatical changes — that is, changes at all the language levels of language structure — as we do in Salar and Yugur, which makes their taxonomic positions very much distanced from most other Turkic subgroups.

    Salar grammar
    Essentially, the main influence in Salar apparently comes from Yugur, and, as Tenishev [idem] briefly asserts, "The very same order of tenses is observed in Yugur". Indeed, the similarities in the verbal systems are striking, though not complete; some of them are listed in the table below.

    Present Progressiveroot+ïp+parroot+porRather innovative, probably from *par/var "there is", as follows from the Salar examples in the other "root+Gan var" tense, though Tenishev for some reason assumed that -par is from the Oghuz -yor-.
    Aoristroot+ar (Future)root+ïr/er (Present-Future)Common to all Turkic (no taxonomic value)
    "Yugur" Futureroot+qïrroot+qurApparently, a unique Yugur-Salar innovation
    Simple Pastroot+teroot+Je
    Common to all Turkic, but still phonologically innovative, including the striking absence of personal endings.
    The Gan- Pastroot+Gan+troroot+Gan+dïrCommon outside of Oghuz-Seljuk, but the addition of -dïr or -tro is rather innovative.

    The bizarre lack of personal conjugation markers in verbs in Yugur and Salar can naturally be ascribed to the Sino-Tibetan or Mongolic influence. Concerning Mongolic, Tenishev [idem] notes, "most Mongolic languages, including Dongxiang, lack personal conjugation. It is only present in the Kalmyk and Buryat languages, and the Bargu-Buryat and Oyrot dialects of Mongolian." This may be further evidence for the existence of some sort of a typological Sprachbund near Mongolia and North China.

    Both Salar and Yugur use the ira(r) copula akin to the Old Uyghur ärür, which is used after nouns and adjective much in the same way as the English is, are, which is a quite peculiar feature, especially considering a similar phonological development ä > i in both Yugur and Salar. The presence of -r- in this construction can be regarded as a typical archaism shared with Orkhon-Karakhanid.

    er, ere, ire
    ira, irar;
    iter, itïr, ider; ideroN (except 1st pers);
    tïr, dïr, tir, dir;
    shi, shê < Mandarin
    Cf. Old Uyghur ärür, Khalaj är;
    According to Tenishev, the Salar itïr = ira + tïr (a double copula), same as in emes-tïr, emes-er (a negative copula)
    xo p'er k'i:se i:re
    "(we) all good people are"
    wu pirinige oy iter
    "this our house is";
    men xon iter
    "I'm the khan";
    inJi avu ira vu
    "a young(man) still he is";
    putaGï pir ideroN
    "their roots one are"
    Also, used in Salar much as "right, it is" in English.
    Man ka'cha yanshaGanï idero? — Ider!
    "What I said, is it right? — It is."
    Men pichtigeni ira mu? — Ira.
    "What I wrote, is it right? — It is.")

    Salar phonology
    However, besides the Yugur, there's some notable phonological similarity between Salar, on one hand, and Chagatai-Karakhanid, on the other.
    For instance, Salar likewise contains the Karakhanid-Chagatai y-, which has been shown herein (see introduction) to be a late innovation.
    Moreover, both Yugur and Salar share a number of peculiar developments, such as in "two": Yugur shigï, ishke, ïshqï : Salar ishki, ichki. Curiously, this development also frequently appears in spoken Uyghur, but never in writing.

    Nevertheless, we cannot position Salar in the same subtaxon with Karakhanid Proper because of the absence of certain typical Karakhanid archaisms in Salar:

    cf. Karakhanid ev "house" : Salar oy;
    Karakhanid uDa- "to sleep" : Salar uxla-;
    Karakhanid yapurGaq "leaf" : Salar yarfïx, etc.
    Moreover, we know from historical sources, that Salar must have emerged in the 14th century after the disappearance of Karakhanid.
    That leaves us with Uzbek-Uyghur-Chagatai as the only possible source of phonological influence, with the eastern Uyghur dialects being the likeliest candidates for Salar's closest linguistic neighbors.
    Cf. (1) Uyghur müNgüz "horn": Salar moNïz, as opposed to Uzbek mugiz, shoz (from Kypchak and Persian respectively);
    (2) Uyghur süNäk "bone" : Salar senix, as opposed to Uzbek suyak from Kypchak;
    (3) Uyghur beGir "liver" : Salar paGïr, as opposed to Uzbek zhigar;
    (4) Uyghur qo:saq "belly" : Salar xusax, as opposed to Uzbek qorin.

    Salar lexis
    There's no detailed lexicostatistical study of Salar, except the one in Anna Dybo's work, who again places Salar near Turkmen, which is highly dubious. A superficial overview of the Salar Swadesh-110 (collected by Starostin (1991)) suggests that this language contains many unusual innovations and would only be poorly intelligible by the speakers of the Great Steppe Sprachbund, let alone Siberian.

    Consequently, based on strong grammatical evidence, we may conclude that Proto-Salar was based on the Yugur substratum, but retained much of the Chagatai phonology of the arrivals from the Tarim Basin. Therefore, Salar is essentially an ethno-lingustic seam formed on the border of interaction between the old Yugur settlers and the newly-arrived merchants from the Chagatai Khanate. As the study of demography implies, these settlers were probably coming in several waves of migration. In other words, the process of supplanting and "creolizing" the local Yugur substratum in Ganzhou could not have been an overnight event and must have probably taken several centuries. The modern Salar seems to be a Chagatai-Yugur "creole" that emerged as an admixture of the Yugur substratum, the Mandarin and Mongolic adstratum, and the Uyghur-Chagatai superstratum. As the Ganzhou kingdom Yugur speakers gradually acquired new Chagatai vocabulary and some of the new grammatical features, the early Salar emerged as a distinct language with the Yugur grammatical basis but strongly modified Uyghur-Chagatai lexis and Mandarin-Mongolic phonology.

    4. The Resulting Internal Classification of Bulgaro-Turkic Languages

    4.1 The Genealogical Classification of Bulgaro-Turkic Languages
    As an outcome of the present research, we can now build the probable dendrogram of the Turkic languages, which includes their internal secondary connections. The resulting tree should look roughly as follows (only the languages included into the lexicostatistical study plus Khalaj, West Yugur, and Old Turkic are shown):

    The Tree of the Turkic Languages
    The dendrogram of the Turkic languages [Darkstar (2012)]

    4.2 The Taxonomic Classification of Bulgaro-Turkic Languages

    Taxonomic classifications are often regarded as being of secondary importance, since they cannot reflect all the complexities of real phylogenetic relationships, however they are still necessary in many situations, for instance when classifying languages in a language list. In any case, based on the kinship shown in the above dendrogram, as well as the lexical, phonological, morphological and geographical evidence provided and discussed in this publication, the Turkic languages can be subdivided into the following taxa:




    (1.1) Chuvash (including Chuvash and its dialects)


    The sometimes-accepted term "Common Turkic" used only in English language sources, is best to be avoided becauuse of its inconsistent association with such meanings as "a language common to all Turks" or "commonplace, ordinary Turkic". Turkic in the strictest sense of the word may rather be addressed as Turkic Proper or just Turkic, as opposed to Bulgaro-Turkic, which may seem slightly unusual in the beginning, but is generally self-explanatory.

    (1) EASTERN (
    Even though Yakutic may sometimes be regarded as sharing a few features with the Central subtaxon, it should still be viewed separately because of too many innovative differences. The features shared with Altay-Sayan (and occasionally with Great-Steppe) should mostly be regarded as a result of an older Yakutic substrate in the Sayan-Altay Turkc languages.

    (1.1) Yakutic
    (1.1.1) Yakutic (including: the hypothetical Kurykan (or Proto-Sakha), Modern Sakha, Dolgan)
    All these languages belong to the Lena basin.

    (2) CENTRAL

    (2.1) Sayan-Altay (or possibly Yenisei-Kyrgyz)
    Geographically, most of the ethnic groups in this subgrouping belong to the upper Yenisei and Ob basins.

    (2.1.1) Tuvan (including Tuvan, Tofa (outdated: Tofalar), Todzhin, Soyot, Tsatan)
    (2.1.2) Khakas (including Sagai Khakas, Kacha Khakas, Fuyu Kyrgyz, Shor, Middle Chulym and other closely related dialect-languages) Note that Khakas seems to be an entirely artificial ethnonym created in the 1920's.
    (2.1.3) Altay (Turkic) (affected by Great-Steppe, especially in the south)
    Note that the name of the mountains is usually spelled irregularly as Altai, whereas the name of the languages is spelled more regularly as Altay.
    ( North Altay (Turkic) (including Kumandy, Kuu (Chelkan), Tuba) The sub-classification of local dialects is poorly elaborated.
    ( South Altay (Turkic)
    (including Standard Altay or just Altay (confusingly known as Oirot until the 1940's; the name Altay-kizhi is also applicable, albeit illogical), Teleut, Telengit). The sub-classification of dialects is poorly elaborated.

    (2.2) Great Steppe (Turkic)
    This supergrouping is supposed to include those languages that were migrating to the north of the Great Eurasian Barrier in the enormous area of the Great Steppe of Eurasia including such areas as Jeti-Su, the Southern Ural, the Aral-Caspian region, the Volga, the Crimea, and western areas all the way to Ukraine and even Lithuania and Poland. All of these tribes most likely originate from the upper Irtysh basin.

    (2.2.1) Kyrgyz-Chagatai (or, alternatively, Kyrgyz-Karluk-Chagatai, according to the typical medieval names, or Kyrgyz-Kazakh-Uzbek-Uyghur, according to the typical modern representatives).
    The exact original homeland area is unclear, but was probably situated somewhere near the Altai Mountains with a later expansion to the Tian-Shan by the 7-9th centuries CE.

    ( Kyrgyz-Kazakh (including Kyrgyz, Kazakh, Karakalpak)
    Kyrgyz was apparently affected by Altay Turkic ("Oirot") during the Dzungarian invasion of the 17-18th century, hence its frequent misplacement in other classifications.

    ( Chagatai (including possibly the hypothetical Karluk, medieval Chagatai, modern Uzbek and Uyghur dialects, and other closely related dialect-languages)
    The subgroup is essentially an admixture of the old Uyghur-Karakhanid substratum with the language of Great-Steppe newcomers. It formed after the Mongol invasion in the 13th century. The name "Karluk" from Baskakov's classification is best to be avoided because our knowledge of Karluks is rather limited, and their Turkic dialect was not preserved. On the contrary, Chagatai was one of the most significant and commonly-used medieval koines in Central Eurasia, and sounds much more reasonable and understandable as a taxonomical name.
    (2.2.2) Kimak (or Kimak-Kypchak-Tatar, according to the most famous representatives of Kimaks). All of the ethnicities therein are thought to be descendant from the Kimak Confederation (Kaganate, Khanate) located near Lake Zaysan.
    Strongly affected by Oghuz probably due to linguistic exchange near the Zaysan Passage in the 7th-9th centuries. The older Bakakov's name "Kipchak" is to be avoided due to the inaccurate and confusing inclusion of Kazakh, exclusion of Nogai, etc, as well as considering that Kypchak constituted only a small part of the whole subtaxon that seemed to be located only near the Kievan Rus.

    ( Karachay-Balkar (including Karachay-Balkar and its dialects)
    Strong linguistic deviations but still evidently and notably of Kimak-Kypchak-Tatar origin.

    ( Golden-Horde (including Bashkir, Kazan Tatar, Mishar Tatar, (Caspian) Nogai, Kumyk, North Crimean Tatar, Central Crimean Tatar, Karaim, and other closely related language-dialects plus [possibly Baraba (Tatar), Tomsk Tatar, Siberian Tatar])
    The formation of most Kimak languages is clearly connected with the rise and expansion of the Golden Horde during the 13th-15th centuries. Therefore, the earliest clearly differentiated languages of this subgrouping should appear only by about the 16th century, and in some cases even later. Due to the vast amount of languages, this subgroup has been studied rather superficially in this work.

    This major supergrouping is supposed to include those languages that migrated south of the Great Eurasian Barrier and thus inhabited the system of deserts, semi-deserts and steppes in the Tarim Basin, Dzungaria, Mongolia, Gobi and northwestern China. Many of these ethnic groups formed part of (or were closely related to) the
    Gökturk-Uyghur Empire of the 6th-9th century CE.
    (3.1) Orkhon-Karakanid
    This subtaxon includes various extinct descendants the Gökturk-Uyghur Empire, such Orkhon Old Turkic, Old Uyghur, Karakhanid, with
    Khalaj being as the only living representative. The original self-appellation of this subtaxon speakers was likely to be Tür(ü)k.

    (3.1.1) Orkhon Old Turkic (including Orkhon Old Turkic of the Orkhon inscriptions and its dialects)
    Also known as just Türük, or Kök Türük, or
    (3.1.2) Uyghur-Karakhanid (including Old Uyghur, (North) Karakhanid, unattested South Karakahnid, and Khalaj)
    (3.2) Oghuz-Seljuk
    This subtaxon was slightly affected by Kimak-Kypchak-Tatar due to linguistic contact probably near the Zaysan Passage c. 7th-8th cent CE and afterwards.

    (3.3.1) Oghuz (including modern Turkmen and the closely related language-dialects of Turkmen, the hypothetical "Early Oghuz" associated with the Toquz-Oghuz and other Oghuz confederacies somewhere near Dzungaria). It should be noted Turkmen has apparently been strongly affected by the langauges of the Great Steppe.
    (3.3.2) Seljuk (including Qashqai, Khorosani, Azeri, early Anatolian Turkic, Ottoman Turkish, Modern Turkish, Gagauz and other closely related language-dialects of Turkey, Iran and Azerbaijan, which apparently go back to the Oghuz dialect(s) of the Great Seljuk Empire of the 11-13th centuries)
    (3.3) Yugur-Salar
    This subtaxon emerged apparently as the result of intermingling of Turkic, Mongolic, Tibetic and Chinese ethnic groups located where the Silk Road's entrance to China. Despite the frequent misplacement, both Yugur (Yughur) and Salar seem to form a separate subgroup, most likely within the Southern taxon.

    (3.2.1) Yugur (including (West) Yugur)
    (3.2.2) Salar (including Salar)

    4.3 The Geographical Tree of Bulgaro-Turkic Languages

    We should also note that any attempt to build an absolutely consistent genealogical classification of closely related languages is an essentially doomed approach, since closely related taxa largely obey the principles of Schmidt's wave model. Therefore, we should also consider a more reasonable and plausible geographical dendrogram at the end.
    The geographical tree of the Turkic languages

    A geographical dendrogram of the Turkic languages [Darkstar (2012)]

    For the further analysis of the Proto-Bulgaric and Proto-Turkic Urheimat position see the separate article The Proto-Turkic Urheimat & The Early Migrations of Turkic Peoples (2012)


    5. References and sources
    Note that most of documents, books, and articles in the list below should be available online.

    Turkic languages in general

    1a. Jazyki mira: Tyurkskije jazyki (The Languages of the World: The Turkic Languages); editorial board: E. Tenishev, E. Potselujevskij, I. Kormushin, A. Kibrik, et al, consists of articles by specific authors; The Russian Academy of Sciences (1996) (a detailed, authoritative edition with a brief phonological and grammatical description of each language)
    1b. Jazyki mira: Uralskije jazyki (The Languages of the World: The Uralic Languages); editorial board: V. Yartseva, Yu. Yelisejev et al, consists of articles by specific authors; The Russian Academy of Sciences (1993)
    2. Jazyki narodov SSSR. Tyurkskije jazyki (The languages of peoples of the USSR. Turkic languages.); Editor-in-Chief: Baskakov, N.A.; Moscow (1966) (This is actually a thoroughly writen collection of grammars of all the major languages of the ex-USSR from the Khruschev period, when many outstanding works were created. Many readers have praised the quality of this work.)

    3. Altajskaja problema i proiskhozhdenije japonskogo jazyka (The Altaic Problem and the Origins of the Japanese Language), by Sergey Starostin; Moscow (1991); (includes excellent, detailed 100-word Swadesh lists for all the Altaic languages, with just few occasional errors)
    4. Starling Database, The Turkic etymology, starling.rinet.ru, composed by Anna Dybo [pronounced: AHN-nah de-BAW]
    5a. Sravnitelno-istoricheskaja grammatika tyurkskikh jazykov. Morphologija. (The Comparative Historical Grammar of the Turkic Languages. Morphology.); editorial board: E. Tenishev et al, Moscow (1988) (Despite the word "grammar" in the title, this multivolume publication is essentially an attempt of comprehensive linguistic research of Proto-Turkic, with this particular volume dedicated to the analysis of grammar/morphology in the Turkic languages; frequently abbreviated to a Cyrillic equivalent of SIGTY in Russian; some articles, however, are too verbose and confusing for important subjects they try to cover)
    5b. Sravnitelno-istoricheskaja grammatika tyurkskikh jazykov. Regionalnyje rekonstruktsii. (The Comparative Historical Grammar of the Turkic Languages. Regional reconstructions.); editorial board: E. Tenishev, G.V. Blagova, E A. Grunina, A. V. Dybo, I.V. Kormushin, L.S. Levitskaja, D.N. Nasilov, O.A. Mudrak, K.M. Musajev, A.A. Chechenov, et al; Moscow (2002)
    5c. Sravnitelno-istoricheskaja grammatika tyurkskikh jazykov. Leksika. (The Comparative Historical Grammar of the Turkic Languages. Lexis.); editorial board: E. Tenishev et al; Moscow (2002) (Many good lexical examples concerning the life of Proto-Turks)
    5d. Sravnintelno-istoricheskaja grammatka tyurkskikh jazykov. Pratyurkskij jazyk-osnova. Kartina mira pratyurkskogo etnosa po dannym jazyka. (The Comparative Grammar of the Turkic Languages. The Proto-Turkic Language. The Worldview of the Proto-Turkic Ethnic Group Based on the Linguistic Data.), editorial board: E. Tenishev et al., Moscow (2006) (Attempts at mythologic and semiotic analysis of the Turkic lexis from the previous volume)
    6a. O.A. Mudrak, Ob utochnenii klassifikatsii tyurkskikh jazykov s pomosch'ju morphologicheskoj lingvostatistiki (On the clarification of the Turkic languages classification by means of morphological linguostatistics)// Sravnintelno-istoricheskaja grammatka tyurkskikh jazykov. Regionalnyiye rekonstruktsii. Moscow (2002) (an abbreviated article published within the SYGTY; basically, attempts to build a novel taxonomic approach using a grammatical and phonological version of linguistical statistics, by counting phonemes and grammemes instead of lexemes)
    6b. O.A. Mudrak, Klassifikatsija tyurkskikh jazykov i dialektov s pomosch'ju metodov glottokhronologii na osnove voprosov po morophologii i istoricheskoj fonetike (The classification of the Turkic languages and dialects based on the glottochronological methodology with a morphological and phonological questionary); Moscow (2009) (same as above, full version in a separate book; only 100 paper copies)
    6c. O. A. Mudrak, Yazyk vo vremeni. Klassifikatsija tyurkskikh jazykov. Istorija jazykov (The language in time. The cbassification of the Turkic Languages. The History of languages.) (2009); published as pdf at www.turklib.ru and elsewhere as html, and a video (same as above, a lecture for general public with a brief history of Turkic languages)
    7a. Anna Dybo, Khronologija tyurkskikh jazykov i lingvisticheskije kontakty rannikh tyurkov (The Chronology of the Turkic Languages and the Linguistic Contacts of the Early Turks) (2006?)
    7b. Anna Dybo, Lingvisticheskije kontakty rannikh tyurkov. Leksicheskij fond. (Linguistic Contacts of the Early Turks: the Lexical Fund), Moscow (2007) (includes a lexicostatistical analysis with trees, and an analysis of early borrowings into Proto-Turkic)

    8. M. Dyachok, Glottchronolgija tyurkskikh jazykov (The Glottochronology of the Turkic Languages), Materials of 2nd Scientific Conference, Novosibirsk (2001) (preliminary materials, known mostly as a short online paper, however quite interesting)
    9. Lars Johanson, Eva A. Csato, The Turkic languages, London, New York (1998)
    10. Mahmud al-Kashgari, Compendium of the Turkic Dialects (c. 1073); (in English (1982) by Robert Dankoff and James Kelly and a Russian edition)
    11. Classifications of Turkic Languages by various authors (in Russian) etheo.org
    Classifications of Turkic Languages by Baskakov (1969) (in Russian), etheo.org

    12. Werner Froehlich, Turkic glossary, www.geonames.de, (2001-2011) (some valuable lexical materials for various language groups; the author states, "I created this site with the greatest possible care.")
    13. 200-word Swadesh lists for Turkic languages (composed by many people incl. the author of this publication)
    14. Talat Tekin, Türk Dilleri Ailesi (The Turkic Language Family) // Genel Dilbilim Dergisi, Vol. 2, pp. 7-8, Ankara (1979)
    15. A. Scherbak, Sravnitelnaja fonetika tyurkskikh jazykov (The Comparative Phonology of the Turkic Languages) (1970)
    16. Yu. V. Normanskaja , Rastitelnyj mir. Derevja i kustarniki. Geograficheskaja lokalizatsija prarodiny tyurkov po dannym floristicheskoj leksiki (The plant world. Trees and shrubs. The geographical localization of the Turkic homeland based on the floristic lexis data.) // Sravnintelno-istoricheskaja grammatka tyurkskikh jazykov. Pratyurkskij jazyk-osnova. Kartina mira pratyurkskogo etnosa po dannym jazyka. Moscow (2006) (controversial but interesting nonetheless)
    17. Atlas narodov mira (The Atlas of the Peoples of the World), Moscow (1964) (old, but ethnographic maps generally get better with the time, because of the language loss)
    18. Alexander Samoylovich, Nekotoryje dopolnenija k klassifikatsiji turetskikh jazykov (Some additions to the classification of Turkish languages, Petrograd (1922); reprinted in the collection of his works (2005)
    19. Alexander Samoylovich, K voprosu o klassifikatsiji turetskikh jazykov (Towards the question of the classification of Turkish languages, the Bulletin of the 1st turkological Congress of the Soviet Union (1926); reprinted in the collection of his works (2005)
    20. N. A. Baskakov, Vvedenije v izuchenije tyurkskikh jazykov (An introduction into the study of Turkic languages, Moscow (1969) (Note that the work itself, acc. to the author, dates back to 1952 and several reprints and remakes under different names were made from this book, e.g. Ocherki istorii funktsionalnogo razvitija tyurkskikh jazykov, Ashgabad, (1988). It should be explained that Nicolay Baskakov (1905-1995) was not just the famous turkologist, he was the brand of many Soviet turkological studies, so many dictionaries of regional Turkic languages composed by different authors were printed with his name as a chief editor.)
    20a. Baskakov, N.A., Sovremennyje kypchakskije yazyki (The modern Kypchak languages), Nukus (1987) (Again, mostly a reiteration of Bakakov's previous classification with particular emphasis on Kypchak, including South Altai)
    21. Etymologicheskij slovar tyurkskikh jazykov (The Etymological Dictionary of the Turkic Languages), E. V. Sevortyan, Vol. 1-7, Moscow (1974-2003) (Mostly known as Sevortyan's dictionary, though he died in 1978. Pronounced /seh-vor-TAHN/ as an Armenian surname. It is in fact a multivolume publication prepared by a group of authors, with the earliest volume still photocopied from a typewriter, apparently due to difficulties in reprinting diacritics; the last volumes are still being prepared for publication; proto-forms are arranged in alphabetical order; despite some convoluted passages, perhaps still the most comprehensive work on Turkic lexicon)
    22. Stepnyje imperii drevnej Evrazii (The Steppe Empires of Old Eurasia), S. G. Klyashtornyj , D.G. Savinov, Saint-Petersburgh (2005
    23. Gosudarstvo kimakov IX-XI vv. po arabskim istochnikam (The Kimak State of the 9-11th century according to the Arab sources), Kumekov, B.E.; Alma-Ata (1972)
    24. Brockhaus and Efron Encyclopedic Dictionary, Saint Petersburg (1906)
    25. Sevda Sulejmanova, Istorija tyurkskikh narodov (The history of the Turkic peoples), Baku (2009)
    26. Aus Sibirien. Lose Blätter aus meinem Tagebuche (From Siberia: Torn pages from my diary), Wilhelm Radloff, Leipzig, 1893 (An ethnographic description of Altay, Khakas, Kazakh, Kyrgyz people, archaeological evidence, etc. An absolutely awesome book first hand. There exists an abbreviated Russian translation from as late as 1989)

    Specific Turkic languages

    Russko-chuvashskij slovar, by M. Skvortsov, A. Skvortsova; Cheboksary (2002) (doc)
    Nutshell Chuvash, by Andras Rona-Tas, Szeged (Hungary) (2009?)
    Etymologicheskij slovar chuvashskego jazyka (The etymological Dictionary of Chuvash), by M. Fedotov; volume 1-2, Cheboksary (1996) (quite helpful and enlightening)
    Chuvashskij jazyk i jego otnoshenije k mongolskomu i tyurkskim jazykam (Chuvash and its relatedness to Mongolian and the Turkic languages), Nicholas Poppe (1924) (downloadable)

    Russian-Yakut, Yakut-Russian online dictionary (22.000, 35.000 words), www.sakhatyla.ru
    Brigitte Pakendorf, Contact in the Prehistory of the Sakha, Linguistic and Genetic Perspective, (2007)
    Shirokobokova, N.N. Otnoshenije jakutskog jazyka k tyurkskim jazykam Yuzhnoj Sibiri (The relatedness of the Yakut language to the Turkic languages of South Siberia), Novosibirsk (2005) (essentially, a small monograph on the linguistic origin of Sakha)

    Grammatika tuvinskogo jazyka, F. Iskhakov, A. Pal'mbakh, Moscow (1961) (a very detailed grammar of Tuvan)
    Slovar tofalarsko-russkij, russko-tofalarskij,V.I. Rassadin, Saint-Petersburg (2005)
    Sojotsko-buryatsko-russkij slovar, V.I. Rassadin, Ulan-Ude (2003)
    V.I. Rassadin, O probemakh vozrozhdenija i sokhranenija nekotorykh tyurkskikh narodov Yuznoj Sibiri (na primere tofalarskogo i sojotskogo (2006)

    Orys-Khakas Slovar; D. Chankov, Editor in Chief; Moscow (1961)
    Khakassko-russkij slovar, composed by N. Baskakov, A. Inkizhekova-Grekul (1953)
    Khakasskij jazyk, by N. Baskakov, A. Inkizhekova-Grekul, Moscow (1953)
    Dialekty khakasskogo jazyka, Editor in Chief: D. Patachakova, Abakan (1973)

    Russko-khakasskij slovar dla khakasskikh nachalnych shkol, Ts. Nominakhanov, Abakan (1948)

    Series of articles concerning the origins of the ethnonym "Khakas", by S. Yakhontov, V. Butanayev, S. Klyashtornyij // Ethnograficheskoje obozrenije (1992) (in Russian)

    Fu-yü Kırgızcası ve akrabaları, Mehmet Ölmez; Mersin (1998)
    Fu-yü Kırgızcası ve akrabaları, Mehmet Ölmez; Istanbul (2001)

    Russko-Oyrotskij Razgovornik, composed by V. Antonov-Saratovskiy, translated by I. Kalanakov, Leningrad (1931)
    Russko-Altajskij Elektronnyij Slovar, by U. Tekenova, S. Tekenov, E. Tatin, (TRANS.exe) (2006?)
    Russko-Altajskij Slovar, Editor-in-Chief: Baskakov, N.A.; Director: Kuchigasheva, N.A.; Moscow (1964)

    Dialekt Kumandintsev /Kumandy-Kizhi/, Grammaticheskij ocherk, teksty i slovar, by N. Baskakov, Moscow (1972)

    Кыргызча-орусча сöздöк, Орусча- кыргызча сöздöк, by K Yudakhin
    Grammatika kyrgyzskogo jazyka, kratkij spravochnik, Bishkek (2002)
    Grammatika kazakhskogo jazyka v tablitsakh i skhemakh, by L. Kulikovskaja , E. Musayeva; Almaty (2006)

    Kazakhskij jazyk, by K. Musayev; Moscow (2008)
    Kratkaja grammatika kazak-kirgizskogo jazyka, composed by P. Melioranskij, Sankt-Peterburg (1894) (quite interesting)

    Russko-karakalpakskij slovar, Editor-in-Chief: N. Baskakov, composed by Sh. Karimkhodzajev, K. Kdyrbajev, et al.; Moscow (1967)

    Къарачай-Малкъар Орус-Сёзлюк
    , edited by E. Tenishev, Kh. Suyunchev; Moscow (1989)

    Obschchije svedenija o karachajevo-balkarskom jazyke, by Ali Dzharashtiyev (2009?) (online only)
    Shkolnyj russko-kabardinskij slovar, by Kh. Dzhaurdzhij, Kh. Syk'un; Nalchik (1991)

    Russko-tatarskij razgovornik, composed by E. Lazareva, Moscow (2004)
    Russko-tatarskij slovar slovosochetanij (A Russian-Tatar dictionary of word combinations, composed by Khanif Agishev, Kazan (1996) (To put it simply, it's a Tatar dictionary with examples — a world of useful info)
    Tatarcha-Ruscha Uku-Ukïtu Süzlege, composed by F.A Ganiyev, I.A. Abdulin, R.G. Gataulina, F.Ye. Yusupov; Moscow (1992)
    http://www.xatasiz.com (A good online Russian-Tatar, Tatar-Russian dictionary with an audio database)

    Govory sibirskikh tatar yuga tymenskoj oblasti (The dialects of the Siberian Tatars of South Tyumen Oblast), Alishina, Kh. Ch.; avtoreferat dissertatsii (a thesis summary); Kazan (1992)
    Dialekty zapadnosibirskikh tatar (The dialects of West Siberian Tatars), Akhatov G. Kh.; avtoreferat dissertatsii (a thesis summary); Moscow (1964))

    Russko-kumykskij slovar, Editor: Z. Bammatov, Moscow (1960)

    Russko-nogajskij razgovornik, composed by I. Kapayev, K. Kumratova; Stavropol (2007)
    Grammatika nogayskogo yazyka. Fonetika i morfologija (The grammar of the Nogai language. Phonetics and morphology.), Editor-in-Chief: Baskakov, N.A.; Authors: Kalmykova, S.A., Sartseva M.F., Cherkessk (1973)
    Nogayskij yazyk i yego dialekty (The Nogay language and its dialects), Baskakov, N.A., Moscow (1940)

    Forschungsreise durch Sibirien 1720-1727, by Daniel Messerschmidt (1721-1725) (some data on early Baraba)
    Yazyk barabinskikh tatar (materialy i issledovanija) (The language of the Baraba Tatars (materials and studies)),
    L.V. Dmitriyeva; Leningrad (1981) (This is one of the very few detailed field studies of Baraba Tatars in the 20th century, conducted in the 1950-60's. It includes legends and stories recorded from illiterate participants, grammar notes and a brief lexicon.)

    Russko-bashkirskij slovar, composed by Z.G. Uraksin, Ufa (2005)
    Grammatika bashkirskoho jazyka dla izuchayuschikh jazyk kak gosudarstvennyj (The grammar of Bashkir for state students), Usmanova, M.G.; Ufa (2006)

    Elbrusoid Russian-Karachay-Balkar Dictionary (Version 2.0)

    Uzbekskij jazyk dlya vzroslykh (samouchitel), I. Kissen, Sh. Rakhmatulayev; Tashkent (1990)
    Russko-uzbekskij slovar, Editor-in-Chief M. Ch. Koshchanov; Vol 1-2; Tashkent (1983)
    Uighur - Russian Dictionary (an electronic dictionary for ABBYY Lingvo) (2008)
    Uygursko-russkij slovar, Editors-in-Chief: Sh. Kibirova, Yu. Tsunvazo; Alma-Ata (1961)

    The long and wonderful voyage of Frier Iohn de Plano Carpini, by Frier Iohn de Plano Carpini (1245-46)

    Turkmen-English Dictionary, by Garret, Lastowka, Muhammetmuradova, et al (1996)
    Turkmenskij jazyk, by E.Grunina, Moscow (2005)
    Kratkij russko-turkmenskij slovar, Editors-in-Chief: M. Khazmayev, S. Altayev; Ashgabad (1968)
    Turkmence-Rusca s
    özlük, Editors-in-Chief: N.A. Baskakov, B.A. Karryyeva, M. Ya. Khamzayeva; Moscow (1968)

    Samouchitel azerbajdzhanskogo jazyka, by T. Khudazarov, Baku (2006)
    Azerbaycanca-Rusca lüg^et, Editor-in-Chief: M.T.Tagiyev; Vol. 1-4, Baku (2006)

    Grammatika turetskogo jazyka dla nachinajuschikh, by Olga Sarygyoz (2007)
    Turetsko-russkij slovar, composed by R. R. Yusipova, Editor-in-Chief: T. Ye. Rybalchenko, Moscow (2005)
    Turetsko-russkij i russko-turetskij slovar, composed by T. Ye. Rybalchenko, Moscow (2007)
    Intensivnyj kurs turetskogo jazyka, by Yu. Scheka, Moscow (1996)

    Grammatika jazyka tyurkskikh runicheskikh pamyatnikov, VII-XII vv., by A. Kononov, Leningrad (1980)
    Ocherk grammatiki drevnetyurkskogo jazyka, by V. Kondratyev, Lenigrad (1970)
    Drevnetyurkskij slovar (The Old Turkic dictionary), Editors: V.M Nadelyayev, D. M. Nasilov, et al., Leningrad (1969)
    Türik Bitig, a site dedicated to Orkhon-Yenisei inscriptions

    The Turkish dialect of Khalaj, by V. Minorsky, Bulletin of the School of Oriental Studies, London (a field study, written c. 1906, published in 1940)

    Yazyk zhyoltykh ujghurov (The language of the Yellow Uyghurs), E. Tenishev, B. Todayeva, 1966 (a field study, but too concise)

    Remarks on the Salar Language, by Nicholas Poppe, University of Washington (1950's?)
    Stroj salarskogo jazyka (The structure of the Salar language), by E. Tenishev, Moscow, 1976 (a field study)
    Menges, Karl. The Turkic Languages and Peoples (1968)
    Dwyer, Arienne M., Salar: A Study in Inner Asian Language Contact Processes, Part I: Phonology by Arienne M. Dwyer; Turcologica Herausgegeben von Lars Johanson, Band 37,1; (2007), Harrassowitz Verlag, Weisbaden

    Arabic Etymological Dictionary, by Andras Rajki (2002)

