Ik probeer de wiskunde achter MaxEnt te begrijpen, maar ik mis wat basiskennis.
"In this section, we describe our approach to modeling species distributions. As explained above, we are given a space X representing some geographic region of interest. Typically, X is a set of discrete grid cells; here we only assume that X is finite. We also are given a set of points x1, . . . , xm in X, each representing a locality where the species has been observed and recorded. Finally, we are provided with a set of environmental variables defined on X, such as precipitation, elevation, etc. Given these ingredients, our goal is to estimate the range of the given species. In this paper, we formalize this rather vague goal within a probabilistic framework. Although this will inevitably involve simplifying assumptions, what we gain will be a language for defining the problem with mathematical precision as well as a sensible approach for applying machine learning.
Unlike others who have studied this problem, we adopt the view that the localities x1, . . . , xm were selected independently from X according to some unknown probability distribution pi, and that our goal is to estimate pi. At the foundation of our approach is the premise that the distribution pi (or a thresholded version of it) coincides with the biologists’ concept of the species’ potential distribution. Superficially, this is not unreasonable, although it does ignore the fact that some localities are more likely to have been visited than others. The distribution may therefore exhibit sampling bias, and will be weighted towards areas and environmental conditions that have been better sampled, for example because they are more accessible. That being said, the problem becomes one of density estimation: given x1, . . . , xm chosen independently from some unknown distribution, we must construct a distribution piˆ that approximates pi.
In constructing piˆ, we also make use of a given set of features f1, . . . , fn where fj : X -> lR. These features might consist of the raw environmental variables, or they might be higher level features derived from them. Let f denote the vector of all n features."
Mijn vraag is niet zozeer inhoudelijk, maar slechts wiskundig wat betreft de notatie: wat betekent fj : X -> lR? (Ik ben me bewust van de betekenis van de afzonderlijke symbolen; ik begrijp vooral de betekenis van het deelteken niet in dit geval.)
=> Ik denk dat ik het nu begrijp. Elke willekeurige feature die behoort tot de geographic region of interest, kan in een reëel getal uitgedrukt worden.
Veranderd door Shadow, 17 juni 2014 - 12:44