A Global Model of Predicted Peregrine Falcon ( Falco peregrinus ) Distribution with Open Source GIS Code and 104 Open Access Layers for use by the global public

Peregrine falcons (Falco peregrinus) are among the fastest members of the animal kingdom, and they are probably the most widely distributed raptors in the world; their migrations and habitats range from the tundra, mountains and some 10 deserts to the tropics, coastal zones and urban habitats. Habitat loss, conversion, contamination, pesticides and other anthropogenic pressures are all known factors that have an adverse effect on these species. However, while peregrine falcons were removed from the list of endangered species due to rebounding populations linked with the DDT ban in many nations of the world, no accurate global distribution models have ever been developed for good conservation practice and in an open access data framework. 15 Here we used the best-available open access peregrine falcon data from the Global Biodiversity Information Facility (GBIF.org) to obtain the first publicly available global distribution model for peregrine falcons. For that purpose, we compiled over a hundred high resolution global GIS layers (1km pixel size) that incorporated various variables such as biological, climatic, and socio-economic predictors allowing to analysis habitat relationships in a holistic fashion and to build a generalizable model. These value-added layers have also been made available by us for the global public, free of 20 charge, for further use and consumption in any modeling effort wanted (https://scholarworks.alaska.edu/handle/11122/7151). We created data extraction explicit in space and time also with an open source python script tool as well as with ArcGIS (via the GUI) on a PC. The obtained data cube (global, 1km pixel, 104 GIS layers) was ‘mined’ with the Salford Predictive Modeler (SPM) software suite, which offers one of the best platforms for data mining, to build the prediction model for robust inference. We found that peregrine falcons are widely urbanized occurring in coastal areas and also associated with 25 riparian zones. This is the first model ever obtained using 104 predictors on a 1km scale predicting the potential ecological niche of falcons around the world. While our model might show uncertainty for parts of Siberia, Russia, it has an assessed global accuracy of over 95% and hence provides the currently best possible public available global prediction model for peregrine falcons, based on all available empirical data. Overlaid with the national parks of the world we found that most peregrine hotspots are actually located outside of protected areas warranting more protection efforts while global change 30 unfolds. Finally, a nationwide assessment of the presence points taken from GBIF allows for insight as to the many signatory Earth Syst. Sci. Data Discuss., doi:10.5194/essd-2016-65, 2017 O pe n A cc es s Earth System Science Data D icu ssio n s Manuscript under review for journal Earth Syst. Sci. Data Discussion started: 13 February 2017 c © Author(s) 2017. CC-BY 3.0 License.


Introduction
Predictive modeling explicit in space and in time has been used in ecology to generate distribution patterns for thousands of species to help model their ecological niches (Guisan and Zimmermann, 2000;Barry andElith, 2006, Drew et al. 2011).
These niches rely on data used and available.They are defined by a set of biotic and abiotic factors, and various methods have been explored for inference and the identification of relevant predictors (Breiman, 2001;Hastie et al., 2001;Peterson, 2001).Understanding these defining elements and concepts is important for successful conservation and species management, especially now, with the rapidly changing climate (Walther et al., 2002), shifting biomes and a large multitude of human impacts (Halpern et al., 2008).
Species distribution modeling, both temporal and spatial, plays an important role in monitoring, managing and conserving species effectively (Cushman and Huettmann, 2010;Drew and Perera, 2010).It is a convenient and effective method of dataanalysis, where identification of known ecological niches of species provides an insight into the presence or suitability of presence in remote (un-sampled) areas.For example, the fundamental niche was modelled for three quintessential Arctic bird species, and distribution patterns spanning over 200 years (from 1900-2100) were designed to predict the suitability of survival and changing spatial concentration corresponding to the changes in climate (Booms et al., 2011).By now, there have been many such distribution models that have put forth the ecological niches for various species over the years (Drew et al. 2011).Generally, the models built deal with the estimation of the fundamental ecological niche of the species in question.This can be achieved using either a mechanistic approach or a correlative approach (Soberon and Peterson, 2005).
The mechanistic approach involves the physical modeling of the direct response of individuals and their metabolism, etc. to parameters such as temperature and humidity, and then using GIS to identify regions of positive fitness.The correlative approach, followed in this study, deals with using various predictor variables and building a predictive model using various supervised machine learning algorithms.On a global level, this is rather powerful because 'global correlates' can be identified, pursued and tested further.These correlations are not biased but are very powerful for inference.Using many data are therefore essential.
The ecological niche has been used and classified in many ways (Soberon and Nakamura, 2009;Cushman and Huettmann, 2010) -the Grinnellian niche of a species is determined by its habitat and its behavioral adaptations; the Eltonian niche is classified according to the foraging activity of the species; the Hutchinsonian niche, which is the most generic form of niche, takes into account the various diverse environmental conditions and resources the individual requires to survive (Bruno et al., 2003).The range of such biotic and abiotic conditions that define the requirements of survival of a particular species is the fundamental niche.The complete set of locations that fit the requirements of the fundamental niche is the potential niche, Earth Syst.Sci. Data Discuss., doi:10.5194/essd-2016-65, 2017 and the set of locations where the species is actually found is the realized niche.Most models deal with the prediction of potential niches.Such modeling, however, has not been so prevalent with species that are as wide spread as Peregrine falcons.Global analysis platforms and computational solutions do not exist yet to deal with such large data, worldwide and on a 1 km pixel size, all as open access and open source for fast analysis.While the constraints were always put on species data, here we try to promote the opportunities on the 104 GIS predictors as open access and open source code to actually operate such a data cube effectively, e.g. on a local PC.

Biology of peregrine falcons
Peregrine falcons, though known for their widespread habitat range and adaptability, also migrate long distances connecting the winter with their nesting areas.Some of these nesting areas have been in use for over hundreds of years, and probably longer (Newton, 1979).Peregrine falcons can be classified into 19 subspecies depending on their geographic locations (Cade and Digby, 1982).Table 1 illustrates these known subspecies with their corresponding regions of occurrence.
Poaching and hunting of this species has been ongoing for millennia, e.g. in falconry; perhaps it was somewhat sustainable even.But in the mid-20 th century, the peregrine falcons were critically endangered globally, and were even close to extinction in North America, due to the excessive use of DDT and other chemical pesticides that led to their death or reproductive failure due to the thinning of their egg shells (Newton et al., 2008).From the eventual ban on the use of organochlorine pesticides onwards, and with widespread reintroduction of these species and protection under various national and international legislations, they have since made a strong recovery in many parts of the world (Tordoff and Redig, 2001;Jacobsen et al., 2007).Known for their speed and broad geographical availability due to their adaptability, they are probably the most frequently used raptors for falconry.The detrimental effect of the pesticides aside, this aspect of human pursuit has made these falcons more vulnerable, being pursued and poached by egg collectors and falconry thieves alike with the ever-present demand in the Middle-east.Some of the international protection policies that have been established to ensure the protection of these species have been listed in Table 2. Apart from international regulations, individual countries have declared their own additional laws that protect these species from harm, a few of which are listed in Table 3.

Conservation efforts
The Convention of Biological Diversity (CBD) plays a central role in modern times also using digital opportunities.It is therefore considered here in more detail.Specifically it deals with online data aspects of biodiversity conservation affecting world-wide conservation management.CBD is an important multilateral treaty signed for now by 196 parties from around the world for sustainable development.One of the key points still discussed at the 10 th Conference of Parties (COP), held in Japan, is the issue regarding sharing of data on biodiversity (Balmford, 2005).Often, the areas that are richest in biodiversity are also the ones that lack the resources for conservation, and enough data for analysis often is unavailable to make good decisions.Hence it is important for scientists handling databases, in public office and such funding, to make this data Earth Syst.Sci. Data Discuss., doi:10.5194/essd-2016-65, 2017 Open available for all users and researchers around the world to help build robust and collaborative methods of conservation that will ensure holistic benefits (Huettmann, 2011;Resendiz-Infante and Huettmann, 2015).In this study we also examine the empirical data that is available for peregrine falcons, when placed against the predictive models obtained, in order to find the countries that are in good compliance with this agreement of data sharing.
By now, the peregrine falcon is one of the best known examples for 'synurbization', the adaptation of wild animals to the rapidly invasive urban conditions, since its reintroduction (Luniak, 2004).The increased number of urban pigeons presents a central role in this discussion as prey species and in populated settings.Modeling, predicting and studying the distribution pattern of such a global and adaptive species can give useful insights into ecological aspects such as the effects of globalization on biodiversity and wilderness habitats.So far, one will find several distribution maps put forth by various organizations, showing generic but often conflicting habitat regions for these birds, none of which really carry relevant and compliant metadata, scientific accuracy metrics, are not repeatable and are not available for a repeatable scientific assessment in a useable GIS format (Huettmann, 2004;Huettmann 2015b;Zuckerberg et al., 2011).
The global distribution patterns of peregrine falcons have not been studied in detail for conservation purposes since its removal from the U.S. list of endangered species in 1999 and the rebound due to DDT ban and breeding programs.Modern study methods have not been employed, yet.In this study, we investigate steps to achieve the first global distribution model for peregrine falcons, using over a hundred compiled open access predictors that include climatic, biological and socioeconomic factors to represent a more holistic set of factors that can have an effect on the survival and suitability of the species in the region.We consider the species -Falco peregrinus (Taxonomic Serial No.: 175604), which encompasses all the subspecies, to build a general global niche (whole year round).Further, here we try to present a software open source analysis platform done in python code for such analysis cases for generic uses of this data cube readily to be used by the global public for their own purposes.We believe this is a rather large progress because such 104 data layers do not exist yet in a readily available GIS format as provided here.It allows to demand for best-possible holistic views in any habitat study.

Training data (presence and pseudo-absence)
We used the 'presence only' data for peregrine falcons from the Global Biodiversity Information Facility (GBIF.org).As per Convention of Biodiversity (CBD), it is the one-stop open access international data warehouse for species occurrence.Many nations confirmed their participation by signature and ministerial support.GBIF represents currently the largest known empirical data about peregrine falcons in the public realm, which includes information such as the geo-coordinates of the location, the date, and the organization that reported the record (e.g. a sighting or a specimen).This raw data had to be filtered for accurate and duplicate records, for records with incorrect geo-referencing and for records with ambiguous data to finally obtain 60,261 unique presence points, less than half the size of the raw database.Once the presence points were established, we then plotted 35,800 evenly distributed random points of pseudo-absence over land masses (excluding Earth Syst. Sci. Data Discuss., doi:10.5194/essd-2016-65, 2017 Open Access Earth System Science Data Discussions Manuscript under review for journal Earth Syst.Sci.Data Discussion started: 13 February 2017 c Author(s) 2017.CC-BY 3.0 License.Antarctica), using the "Create Random Points" tool in ArcGIS.This was done to obtain a representative pseudo-absence data layer for the world.Another major aspect of this study was the first-time compilation and public delivery of over hundred global GIS layers at a 1 km x 1 km resolution from various open source projects for use as predictor variables for such models.We followed the initial work by Ohse et al. (2010) and Herrick et al. (2013).The range of predictor layers used is presented in detail in Appendix Table 1 and   Variables such as density of livestock population such as pigs and poultry were also included, which prove to be highly influential factors when dealing with species that live in close contact with humans.(LZW compressed), in WGS-84 projection, with a resolution of 1 km x 1 km in our public repository dSPACE UAF library and Google Drive (available upon request from the authors).This dataset has a size of 37.5 GB.They can also be easily converted to/from ESRI grid and ASC formats.Using such a wide range of predictors helps us to start explore and recognize the hidden but so far unknown but driving factors that influence the species.In predictive models, having a complete description of the ecological niches is essential and reduces uncertainty whereas parsimony fails (Elith et al., 2006;Guthery et al., 2005).
Usually, the compiled layers were loaded into ArcMap and then overlaid with the presence and pseudo-absence points compiled.Using the Extract Multi-values to Points tool on ArcMap, the appropriate values for all these layers at the aforementioned points were extracted, and this compilation of data was used for creating the distribution model.But here we developed a second and open source approach and making it available to the global public for their empowerment and to use these data more effectively for their own purposes: The extraction of values from this data cube can also be done using python and its supporting libraries.Python is rapidly becoming one of the most popular languages used for machine learning and any advanced analysis (Harrington, 2012).The extensive libraries and packages available are programmed to do most of the 'heavy lifting' and provide efficient models and solutions, enabling users to concentrate on the problem at hand rather than the modeling specifics.It also gives users the powers to determine the predictors that are used to build predictive models (as per Leo Breiman 2001).The script, that is available for access for the global audience, can be used as a generic template to handle big data on small machines as well (IBM PCs here).This, when combined with other useful multiprocessing libraries can be used to scale up performance when run on the cloud or clusters, as needed for in-time applications for instance.

Modeling approach
We used primarily the TreeNet algorithm in SPM7 provided by Salford Systems Ltd (https://www.salford-systems.com/) to build the distribution model.We also tested RandomForest in comparison.These algorithms have been widely used for modeling by data mining of ecological data for conservation management (Craig and Huettmann, 2009).They are all known to generate highly accurate models for both regression and classification and are also pretty robust when dealing even with faulty data and outliers (Fernández-Delgado et al., 2014).SPM also gives the user the flexibility of controlling the parameters of the models.The classification models were trained to predict the relative index of occurrence (RIO) of peregrine falcons in any given region of the world using the presence and the pseudo-absence points with all attributes from the data cube.We used the 'balance' class weight option to balance the unequal presence and pseudo-absence sample sizes and kept all others at 'default' (a setting known to perform very strong).

Display of prediction surfaces
Next, using the Create Fishnet tool in ArcMap, we generated a global layer with an equally spaced point lattice grid with a 1 km x 1 km resolution, bounded by the continental landmass.This was then overlaid with the hundred layers of predictors, and the technique that was used to extract the values of these variables to the presence points was also used to achieve the same with these points in the fishnet layer.This set of points was then 'scored' using the classification predictor model built in order to obtain the global distribution of the relative index of occurrence of peregrine falcons.We then used the Inverse Distance Weighting (IDW) tool in ArcMap to create the raster surface for the predicted RIOs of presence.
These values of RIOs were then extracted for the known presence points that were used to build the model.A frequency distribution histogram was obtained for the RIOs to show the range of indices that predict the presence of the species in the region, according to the model built.The error percentage of the model is then used to determine the cut-off threshold of the indices to obtain a binary presence/absence prediction for peregrine falcons.

Accuracy assessment
The accuracy of the models obtained was assessed using the Relative Operating Characteristic (ROC) -Area under the curve (AUC) metric, as is commonly used (Pearce and Ferrier, 2009).The ROC consists of a graph of a binary classifier that plots the true positive rate against the false positive rate.We assumed AUC scores less than to 0.7 indicate low accuracy, between 0.7 and 0.9 to indicate moderate accuracy, and scores higher than 0.9 for high accuracy (Swets, 1988).We also obtained another set of the few publicly available Open Access presence points from MoveBank (www.movebank.org)that we used to validate our models.Though there were over fifteen datasets that were listed for Peregrine Falcons, only two of them were available for public access, and none of them were shared with GBIF.Extracting the predicted RIOs at these points and plotting their frequency distribution allowed us to examine the accuracy of our model for many regions.
The workflow in its entirety is illustrated in a flowchart shown in Figure 2.This is the first investigation making 104 value-added predictor layers publically available.As an example and case study, we employed it to peregrine falcons.This is also the first global distribution model for peregrine falcons with over hundred predictors, based on empirical data and using data mining.We found that the species, being globally widespread, is 5 adjustable to a varied range of environments.Using such a vast number of predictors allowed us to obtain as the bestpossible general model and inference.perspective, considering all the sub-species of the peregrine falcons together and all-year round, in order to establish a quantified common ecological niche description for their presence around the world.

Deliveries of data and GIS layers
One of the important contributions of this paper is the compilation and global sharing of over a hundred global GIS layers.
We therefore present it as a result and details.These predictors have a resolution of 1km x 1km, that was compiled from various open source projects (details shown in appendix table 1).These data are available for free access in a good format at (https://scholarworks.alaska.edu/).They are all compiled (re-projected and re-sampled) so that the rasters are aligned correctly, and can be directly used for any other model without changes.We provided metadata with those layers.

Model performance
Both the models built, using the Random Forests and the TreeNet algorithms, resulted in a ROC of greater than 97%.
However, on comparison by us, we assessed that the TreeNet algorithm resulted in a model that was more realistic and based on the ground truth data we had at hand (see below, limited Movebank web portal data).The results shown in this paper are derived from this model.

Variable importance
The top predictors for the classification model, as shown in Table 1, made it apparent that the most relevant predictors that drive the suitability of the region to peregrine falcons world-wide are actually socio-economic factors.These have never been described before (Kaufman, 2000).The ecological niche of the peregrine falcon is determined by a multivariate set of predictors; it is clearly not parsimonious.The population count and night light pollution index indicate the intensity of urbanization in the area, while the infant mortality rate and life expectancy indicate the status of development in the region.
High October temperatures are also important.These are all new predictors that should be studied and interpreted further on a local scale.Taken together, peregrines seem to favor warm coastal urban areas with low infant mortality and high light pollution.The species richness of birds and the density of poultry in region are also subsequent predictors as falcons are birds of prey, and as this species sit on the apex of the food chain.The graphs for the dependence of presence on the top four predictor variables are shown in Figure 4. Arguably, on a global scale, the peregrine falcon links with urban areas much more than with remote wilderness.

Validation of the peregrine model
The classification model obtained from the TreeNet algorithm had an AUC of 98%, showing this model with a highly 10 accurate performance metric.Figure 5 shows the associated ROC curve obtained for the model.The predicted indices obtained from the model were extracted for the presence points, and the frequency distribution of these values, shown in Figure 6, was plotted as a histogram in R (https://www.r-project.org/).The normal curve obtained for this distribution shows that most of the points are correctly classified with a high value of predicted occurrence.shown in Figure 7. Models in machine learning do not have to be symmetrical because no logistic function is used.As a matter of fact, nature is never symmetrical nor linear or logistic and our models are based on 'recursive partitioning' trees (Breiman, 1984).Though the use of pseudo-absence points can affect the strength of evidence obtained from the ROC curve of the model, we used a balanced weight in model building, and the prediction of presence is validated, as the 'data contamination' occurs only in the absence points considered.The 'balance weight' setting is a specific feature in SPM and makes it very powerful for 'real' data mining!The AUC hence gives a close estimate to the model that can be achieved when using true absence points ( Barbet-Massin, et al., 2012).To assess this potential shortcoming, further validations are performed.
Unfortunately we were not able to locate or obtain other public data for peregrine falcons; they are usally not made freely available by peregrine falcon investigators.But we located two datasets from Movebank (otherwise a closed website), tracking peregrine falcons with geolocators in the western hemisphere and in Europe.
Confronting our model with real world data, for the validation of the model using ground truthing, we used this other set of 7,800 presence points that were independent of the points used to train the model for validation.These test points overlaid with the presence/absence threshold map proposed is shown in Figure 8. Extracting the predicted RIOs of these points results in the frequency distribution histogram shown in Figure 9.Using the assumed threshold of 0.01, we see that the only 141 points out of 7,800, are falsely classified, resulting once more into a 98% accuracy, which is in good consistence with the 2% error in the predicted model.Considering that these are large-scale model predictions such high global accuracies must seem as highly remarkable, we find.

Efficiency of Peregrine Falcon Conservation
The predicted RIOs of this model were overlaid with a layer of global national parks (UNEP-WCMC, 2015), to assess the occurrence of these birds within the protected lands.We observed that less than 3% of these points were actually situated within national parks.These protected areas include regions on the west coast of America, as well as Alaska, England, South Africa and Madagascar, a large concentration in south-east Australia, and a few in south-east Asia.The conservation and 10 land protection efforts for peregrine falcons have not been much active since its removal from the red list in 1999.Though many hand-raised and released habituated birds have been rapidly adapting themselves to the increasingly urbanizing space, Earth Syst.Sci. Data Discuss., doi:10.5194/essd-2016-65, 2017 Open Access Earth System Science Data Discussions Manuscript under review for journal Earth Syst.Sci.Data Discussion started: 13 February 2017 c Author(s) 2017.CC-BY 3.0 License.more conservation is needed along their nesting sites to maintain their healthy population in the wild.By now, wilderness seems to play a small role for this species.
Next, the polygons that represent the national parks around the world were interpreted as sampling points at a 1 km x 1 km resolution.The predicted ROI (Relative Occurrence indices) obtained from the model was then extracted for these points to access if the areas of high ROIs are included under the plots demarked for conservation purposes.The frequency distribution histogram for the above, shown in Figure 10, clearly shows that the global national parks currently in place, just cover areas where the ROI of peregrine falcons are extremely low.The normal curve obtained is starkly different from the one desired for a conservation that maximizes peregrine falcon coverage and protection.This analysis clearly shows that the current system of national parks provides little or no protection to the peregrine falcons, globally.

Figure 10. Frequency distribution of the national park points plotted against the predicted indices of the model
All our model layers and data are made publicly available with metadata.We invite everybody to use, assess and improve them further, e.g. with the open source code we provide also.Considering this is the first global model for this species we believe this presents a solid achievement, a baseline, conservation progress and a template and role model how public science can be delivered worldwide.We hope this catches on further for progress of this species, habitats, human well-being and the world (Huettmann, 2011).

Discussion
This study is able to present the first global distribution model prediction of Peregrine falcons, based on open access data, open source code, and data mining with machine learning using more than a hundred diverse predictors for global consumption.Choosing a few predictors a priori and restricting the range to a small range of commonly used climatic and biological variables is a common practice in species distribution modeling, albeit not making good use of all its potential and Here we provide a true macro-ecology perspective but which allows for subsequent study at smaller scales as needed.

Model Uncertainty
The binary distribution pattern obtained (Figure 7) agrees with the general assumption that the falcons are globally widespread and can be found in all kinds of ecosystems.Exceptions are the particularly extreme ones, like the interior tropical rainforests in South America and the Sahara in Africa.Looking closely, it can also be seen that areas close to the coast and along rivers are consistently predicted as hot spots, specifically in the Siberian and the Amazonian regions.This can be attributed to the specific layers that define the proximity to the coast and the rivers, and it provides there good knowledge (peregrine falcons are known to breed in and to use those areas).
The pattern predicted around Russia however, probably remains less certain.Though the presence of peregrine falcons is a known fact around the area, there is no publically accessible data for Russia available for verification or to better the model built.When the presence/absence map is overlaid with the test presence points used for validation (shown in Figure 8), it can be seen that the incorrectly classified points occur in the Amazonian region as well as in the far west corner of Russia.Since there were no publicly available presence points in Russia to validate our model with, we used the observations recorded by Rogacheva (1992) in her book "The Birds of Central Siberia" to compare and contrast our predicted potential niche in that region (shown in Figure 11).Some of the observations recorded in the book match well with the predictions made by the model.These are described in Table 5.
Earth Syst.Sci.Data Discuss., doi:10.5194/essd-2016-65,2017 Open Access  We conclude that Russia, a GBIF member nation, is still not sharing many data on peregrine falcons (specifically for Siberia and Russian Far East) with the global community, but that our model so far predicts in good agreement with Siberian bird references.

Open access sharing of data
On analyzing the presence data that is available for open access on GBIF and MoveBank, it is apparent that there are a few countries such as Russia, China, Brazil and some African nations that have not really volunteered their data for public access.We find this is in violation of the data sharing requirements as set by the Convention of Biological Diversity (CBD) and its signatory nations (such as Russia, Brazil and many African nations).This unfortunately also includes bird banding records, e.g. as maintained by EURING and BirdLife International.The availability of such data is very important though from the conservation perspective, as ecologists and organizations working towards conservation management need all possible data at hand to make suitable decisions.In North America for instance, such data are usually made readily available (Huettmann pers com.; Beiring, 2013).
Conservation agencies are truly mandated to contribute to those efforts for sound management of a public resource but fail.
Other than GBIF, we have almost no data to use for those species.Table 6 shows the list of the top ten contributors to the open-access presence data of peregrine falcons that was obtained from GBIF.Though all the countries are CBD signatories, the list makes it apparent that such biodiversity data is prevalently available for developed countries that have the resources to make conservation a possibility.The countries with the most diversity are under-represented in these databases, and there is a dire need for data sharing and open access of this data to build more robust models to help make decisions for conservation (Huettmann, 2015a;Huettmann, 2015b).It is a clear reflection of the North-South gradient described by (Rosales, 2008).
Earth Syst. Sci. Data Discuss., doi:10.5194/essd-2016-65, 2017 Open Access nations that are still in violation of the open access data sharing requirement set by the Convention of Biological Diversity (CBD) and the Budapest and Berlin Declaration.
for journal Earth Syst.Sci.Data Discussion started: 13 February 2017 c Author(s) 2017.CC-BY 3.0 License.

Figure 1 .
Figure 1.Global distribution of Peregrine falcons (presence points) they include the following:  Climatic data such as mean monthly temperatures, mean monthly precipitation, mean monthly solar radiation, and global aridity index  Bioclimatic variables (bio 1bio 19) as defined by WorldClim  Digital Elevation models (DEMs) and other variables derived from them such as slope and aspect  Variables pertaining to biodiversity such as species richness of birds, mammals, amphibians and plants, annual average potential evapo-transpiration  Quantitative indices indicating the effect of humans on biodiversity such as the Human Influence Index (HII), Human Footprint level and Last of the Wild  Proximity measures to coast, rivers and roads, which were calculated using the Euclidean Distance tool in ArcMap  Socio-economic factors such as Gross Domestic Product (GDP), human population density and count, Infant mortality, literacy rate, life expectancy, and trade and night light pollution.
Earth Syst.Sci.Data Discuss., doi:10.5194/essd-2016-65forjournal Earth Syst.Sci.Data Discussion started: 13 February 2017 c Author(s) 2017.CC-BY 3.0 License.These variables were gathered from various open source projects, were re-projected and re-sampled for better alignment (e.g. for coastline and for each other) allowing us to deliver this value-added data product.The global layers are available for open access in GeoTiff format EarthSyst.Sci.Data Discuss., doi:10.5194/essd-2016-65,2017    for journal Earth Syst.Sci.Data Discussion started: 13 February 2017 c Author(s) 2017.CC-BY 3.0 License.

Figure 2 .
Figure 2. Flowchart illustrating the process of building and validating the distribution model

Figure 3 .
Figure 3. Global heat-map of the predicted presence of Peregrine falcons, showing the relative index of occurrence (RIO) from 0 to 1.10

Figure 5 .
Figure 5. ROC curve for the model obtained using TreeNet algorithm.

Figure 6 .
Figure 6.Frequency distribution of predicted vs reality Peregrine Falcon occurrence for TreeNet model for training data 5

Figure 7 .
Figure 7. Interpreted best-available predicted binary Presence/Absence prediction map of peregrine falcons with a cut-off of 0.01 for journal Earth Syst.Sci.Data Discussion started: 13 February 2017 c Author(s) 2017.CC-BY 3.0 License.

Figure 8 .
Figure 8. Presence/Absence map (with a cut-off of 0.01) overlaid with test presence points used for validation for journal Earth Syst.Sci.Data Discussion started: 13 February 2017 c Author(s) 2017.CC-BY 3.0 License.opportunities yet.Starting the model unbiased and involving as many predictors as possible is however important, as the model might identify various unexpected influencing factors that might not be apparent to the modeler and expert.The model thus builds quantitative maps and predicts the global distribution pattern of Peregrine falcons, with highest possible accuracy (97% in most cases and assessments).Though the model has a high accuracy rate, we acknowledge that the predicted regions of presence are 'just' the year-round potential niche of the species.This might vary slightly from the realized niche, as predation, prey, disturbance, as well as intense urbanization, climate change and other anthropological factors have already disturbed the delicate balance.Further fine-scale assessments and re-runs should be performed at a smaller scale to obtain an accurate prediction of the realized niche as needed and for more local studies.
review for journal Earth Syst.Sci.Data Discussion started: 13 February 2017 c Author(s) 2017.CC-BY 3.0 License.

Figure 11 .
Figure 11.Presence/Absence binary map with a cut-off of 0.01 for the Russian region review for journal Earth Syst.Sci.Data Discussion started: 13 February 2017 c Author(s) 2017.CC-BY 3.0 License.Manuscript under review for journal Earth Syst.Sci.Data Discussion started: 13 February 2017 c Author(s) 2017.CC-BY 3.0 License.for journal Earth Syst.Sci.Data Discussion started: 13 February 2017 c Author(s) 2017.CC-BY 3.0 License.
review for journal Earth Syst.Sci.Data Discussion started: 13 February 2017 c Author(s) 2017.CC-BY 3.0 License. of UAE to Convention of Migratory Species (CMS) held at Bonn in 2016, the migratory falcons along the coastline in UAE now have an elevated conservation status EC birds Directive (2009) Apart from the protection of migratory species, this also specified the conditions under which hunting and falconry can be undertaken Migratory Bird Treaty Act It regulates the conservation of all migratory birds, including the peregrine falcons, in US, Canada and Mexico Earth Syst.Sci.Data Discuss., doi:10.5194/essd-2016-65forjournal Earth Syst.Sci.Data Discussion started: 13 February 2017 c Author(s) 2017.CC-BY 3.0 License.
for journal Earth Syst.Sci.Data Discussion started: 13 February 2017 c Author(s) 2017.CC-BY 3.0 License.Table 5. Validation of the predicted model in the region around Russia by comparing and contrasting with the observations made by Rogacheva (1992) in the key reference "The Birds of Central Siberia" densities of the Peregrine were always distributed in two wide belts, one in the north and the other in the south."The model correctly predicts the two belts, one in the north and the other in the south where higher population for these falcons are commonly observed.2 "The Peregrine requires open habitats richly supplied with prey for hunting; therefore, it does not breed in denser taiga and occurs in such areas only around large lakes, along river valleys, and near large, open marshes."The model predicts this characteristic of the falcons, which can be seen by the predicted presence along the rivers.This uses the "Proximity to rivers" layer that was used as one of the predictors.3 "There are no peregrine falcons present in the Severnaya Zemlya Archipelago."The model classifies this correctly, except along the coasts, as the coastal areas generally boast suitable conditions for their presence.5 Earth Syst.Sci.Data Discuss., doi:10.5194/essd-2016-65forjournal Earth Syst.Sci.Data Discussion started: 13 February 2017 c Author(s) 2017.CC-BY 3.0 License.Table 6.Top 10 contributors for open-access presence data of peregrine falcons in GBIF and Movebank, ranked by the number of presence points contributed to the GBIF database.The surface area of the countries and the average ROI predicted are also listed for a more detailed assessment.(Russia was added for information due to relevance

Table 3 . A selection of some local protection policies of Peregrine Falcons around the world Conservation policies Country Comments
EarthSyst.Sci.Data Discuss., doi:10.5194/essd-2016-65,2017

Discussions
Manuscript under review for journal Earth Syst.Sci.Data Discussion started: 13 February 2017 c Author(s) 2017.CC-BY 3.0 License.