quantmod getFinancials() 不拉财务

quantmod getFinancials() not pulling financials

我正在寻找 public 公司的基本数据下载。利用 quantmod 包,我试图使用 getFinancials() 来提取数据,它适用于某些公司,但结果各不相同(我阅读并理解有关免费数据的免责声明)但想确认我拉对了。

摩根大通: 在雅虎财经网站上,我确实看到了财务信息,但下面的调用似乎将 "google" 拉为 src 而不是 "yahoo",因此财务信息很少。

Google - https://www.google.com/finance?q=NYSE%3AJPM&fstype=ii&ei=9kh-WejLE5e_etbzmpgP

雅虎 - https://finance.yahoo.com/quote/JPM/financials?p=JPM

library(quantmod)
JPM <- getFinancials("JPM", src = "yahoo", auto.assign = FALSE)
viewFin(JPM, type = "IS", period = "A")

是否有指定 src 的正确方法?还有一种使用 getFinancials() 的方法,但是如果指示性列(例如收入)中有 NA 切换来源(google vs. yahoo)?

getFinancials 的帮助页面顶部说(强调),

Download Income Statement, Balance Sheet, and Cash Flow Statements from Google Finance.

目前无法将雅虎财经指定为来源。这样做需要有人编写一种方法来从 Yahoo Finance 抓取和解析 HTML,因为无法像下载价格数据那样将其下载到文件中。

我认为 Yahoo 最近更改了 API。从 link 下载名为 "Get Excel Spreadsheet to Download Bulk Historical Stock Data from Google Finance"

的文件

http://investexcel.net/multiple-stock-quote-downloader-for-excel/

那是为了 Excel,您可以轻松地将其加载到 R 中。

你也可以尝试这样的事情。

# assumes codes are known beforehand
codes <- c("MSFT","SBUX","S","AAPL","ADT")
urls <- paste0("https://www.google.com/finance/historical?q=",codes,"&output=csv")
paths <- paste0(codes,"csv")
missing <- !(paths %in% dir(".", full.name = TRUE))
missing

# simple error handling in case file doesn't exists
downloadFile <- function(url, path, ...) {
# remove file if exists already
if(file.exists(path)) file.remove(path)
# download file
tryCatch(
download.file(url, path, ...), error = function(c) {
# remove file if error
if(file.exists(path)) file.remove(path)
# create error message
c$message <- paste(substr(path, 1, 4),"failed")
message(c$message)
}
)
}
# wrapper of mapply
Map(downloadFile, urls[missing], paths[missing])

或者,这个。

## downloads historic prices for all constituents of SP500
library(zoo)
library(tseries)                        

## read in list of constituents, with company name in first column and
## ticker symbol in second column

## CREATE A FILE TO READ DATA FROM!!!
spComp <- read.csv("C:/Users/Excel/Desktop/stocks.csv" ) 

## specify time period
dateStart <- "2013-01-01"               
dateEnd <- "2015-05-08"

## extract symbols and number of iterations
symbols <- spComp[, 1]
nAss <- length(symbols)

## download data on first stock as zoo object
z <- get.hist.quote(instrument = symbols[1], start = dateStart,
                    end = dateEnd, quote = "AdjClose",
                    retclass = "zoo", quiet = T)

## use ticker symbol as column name 
dimnames(z)[[2]] <- as.character(symbols[1])

## download remaining assets in for loop
for (i in 2:nAss) {
   ## display progress by showing the current iteration step
   cat("Downloading ", i, " out of ", nAss , "\n")

   result <- try(x <- get.hist.quote(instrument = symbols[i],
                                     start = dateStart,
                                     end = dateEnd, quote = "AdjClose",
                                     retclass = "zoo", quiet = T))
   if(class(result) == "try-error") {
      next
   }
   else {
      dimnames(x)[[2]] <- as.character(symbols[i])

      ## merge with already downloaded data to get assets on same dates 
      z <- merge(z, x)                      

   }


}

## save data
#  CREATE A FILE TO WRITE DATA TO!!!
write.zoo(z, file = "C:/Users/Excel/Desktop/all_sp500_price_data.csv", index.name = "time")

这里还有一个选项供您考虑。

Method #1:
---
layout: post
title: "2014-11-20-Download-Stock-Data-1"
description: ""
category: R
tags: [knitr,lubridate,stringr,plyr,dplyr]
---
{% include JB/setup %}

This article illustrates how to download stock price data files from Google, save it into a local drive and merge them into a single data frame. This script is slightly modified from a script which downloads RStudio package download log data. The original source can be found [here](https://github.com/hadley/cran-logs-dplyr/blob/master/1-download.r).  

First of all, the following three packages are used.


{% highlight r %}
library(knitr)
library(lubridate)
library(stringr)
library(plyr)
library(dplyr)
{% endhighlight %}

The script begins with creating a folder to save data files.


{% highlight r %}
# create data folder
dataDir <- paste0("data","_","2014-11-20-Download-Stock-Data-1")
if(file.exists(dataDir)) { 
      unlink(dataDir, recursive = TRUE)
      dir.create(dataDir)
} else {
      dir.create(dataDir)
}
{% endhighlight %}

After creating urls and file paths, files are downloaded using `Map` function - it is a warpper of `mapply`. Note that, in case the function breaks by an error (eg when a file doesn't exist), `download.file` is wrapped by another function that includes an error handler (`tryCatch`). 


{% highlight r %}
# assumes codes are known beforehand
codes <- c("MSFT", "TCHC") # codes <- c("MSFT", "1234") for testing
urls <- paste0("http://www.google.com/finance/historical?q=NASDAQ:",
               codes,"&output=csv")
paths <- paste0(dataDir,"/",codes,".csv") # back slash on windows (\)

# simple error handling in case file doesn't exists
downloadFile <- function(url, path, ...) {
      # remove file if exists already
      if(file.exists(path)) file.remove(path)
      # download file
      tryCatch(            
            download.file(url, path, ...), error = function(c) {
                  # remove file if error
                  if(file.exists(path)) file.remove(path)
                  # create error message
                  c$message <- paste(substr(path, 1, 4),"failed")
                  message(c$message)
            }
      )
}
# wrapper of mapply
Map(downloadFile, urls, paths)
{% endhighlight %}


Finally files are read back using `llply` and they are combined using `rbind_all`. Note that, as the merged data has multiple stocks' records, `Code` column is created.



{% highlight r %}
# read all csv files and merge
files <- dir(dataDir, full.name = TRUE)
dataList <- llply(files, function(file){
      data <- read.csv(file, stringsAsFactors = FALSE)
      # get code from file path
      pattern <- "/[A-Z][A-Z][A-Z][A-Z]"
      code <- substr(str_extract(file, pattern), 2, nchar(str_extract(file, pattern)))
      # first column's name is funny
      names(data) <- c("Date","Open","High","Low","Close","Volume")
      data$Date <- dmy(data$Date)
      data$Open <- as.numeric(data$Open)
      data$High <- as.numeric(data$High)
      data$Low <- as.numeric(data$Low)
      data$Close <- as.numeric(data$Close)
      data$Volume <- as.integer(data$Volume)
      data$Code <- code
      data
}, .progress = "text")

data <- rbind_all(dataList)
{% endhighlight %}

Some of the values are shown below.


|Date       |  Open|  High|   Low| Close|   Volume|Code |
|:----------|-----:|-----:|-----:|-----:|--------:|:----|
|2014-11-26 | 47.49| 47.99| 47.28| 47.75| 27164877|MSFT |
|2014-11-25 | 47.66| 47.97| 47.45| 47.47| 28007993|MSFT |
|2014-11-24 | 47.99| 48.00| 47.39| 47.59| 35434245|MSFT |
|2014-11-21 | 49.02| 49.05| 47.57| 47.98| 42884795|MSFT |
|2014-11-20 | 48.00| 48.70| 47.87| 48.70| 21510587|MSFT |
|2014-11-19 | 48.66| 48.75| 47.93| 48.22| 26177450|MSFT |

This way wouldn't be efficient compared to the way where files are read directly without being saved into a local drive. This option may be useful, however, if files are large and the API server breaks connection abrubtly.

I hope this article is useful and I'm going to write an article to show the second way.

Method #2:
---
layout: post
title: "2014-11-20-Download-Stock-Data-2"
description: ""
category: R
tags: [knitr,lubridate,stringr,plyr,dplyr]
---
{% include JB/setup %}

In an [earlier article](http://jaehyeon-kim.github.io/r/2014/11/20/Download-Stock-Data-1/), a way to download stock price data files from Google, save it into a local drive and merge them into a single data frame. If files are not large, however, it wouldn't be effective and, in this article, files are downloaded and merged internally.

The following packages are used.


{% highlight r %}
library(knitr)
library(lubridate)
library(stringr)
library(plyr)
library(dplyr)
{% endhighlight %}

Taking urls as file locations, files are directly read using `llply` and they are combined using `rbind_all`. As the merged data has multiple stocks' records, `Code` column is created. Note that, when an error occurrs, the function returns a dummy data frame in order not to break the loop - values of the dummy data frame(s) are filtered out at the end.


{% highlight r %}
# assumes codes are known beforehand
codes <- c("MSFT", "TCHC") # codes <- c("MSFT", "1234") for testing
files <- paste0("http://www.google.com/finance/historical?q=NASDAQ:",
                codes,"&output=csv")

dataList <- llply(files, function(file, ...) {
      # get code from file url
      pattern <- "Q:[0-9a-zA-Z][0-9a-zA-Z][0-9a-zA-Z][0-9a-zA-Z]"
      code <- substr(str_extract(file, pattern), 3, nchar(str_extract(file, pattern)))

      # read data directly from a URL with only simple error handling
      # for further error handling: http://adv-r.had.co.nz/Exceptions-Debugging.html
      tryCatch({
            data <- read.csv(file, stringsAsFactors = FALSE)
            # first column's name is funny
            names(data) <- c("Date","Open","High","Low","Close","Volume")
            data$Date <- dmy(data$Date)
            data$Open <- as.numeric(data$Open)
            data$High <- as.numeric(data$High)
            data$Low <- as.numeric(data$Low)
            data$Close <- as.numeric(data$Close)
            data$Volume <- as.integer(data$Volume)
            data$Code <- code
            data               
      },
      error = function(c) {
            c$message <- paste(code,"failed")
            message(c$message)
            # return a dummy data frame
            data <- data.frame(Date=dmy(format(Sys.Date(),"%d%m%Y")), Open=0, High=0,
                               Low=0, Close=0, Volume=0, Code="NA")
            data
      })
})

# dummy data frame values are filtered out
data <- filter(rbind_all(dataList), Code != "NA")
{% endhighlight %}

Some of the values are shown below.


|Date       |  Open|  High|   Low| Close|   Volume|Code |
|:----------|-----:|-----:|-----:|-----:|--------:|:----|
|2014-11-26 | 47.49| 47.99| 47.28| 47.75| 27164877|MSFT |
|2014-11-25 | 47.66| 47.97| 47.45| 47.47| 28007993|MSFT |
|2014-11-24 | 47.99| 48.00| 47.39| 47.59| 35434245|MSFT |
|2014-11-21 | 49.02| 49.05| 47.57| 47.98| 42884795|MSFT |
|2014-11-20 | 48.00| 48.70| 47.87| 48.70| 21510587|MSFT |
|2014-11-19 | 48.66| 48.75| 47.93| 48.22| 26177450|MSFT |

It took a bit longer to complete the script as I had to teach myself how to handle errors in R. And this is why I started to write articles in this blog.

I hope this article is useful.


Summarize Stock returns From Multiple Files:
---
layout: post
title: "2014-11-27-Summarise-Stock-Returns-from-Multiple-Files"
description: ""
category: R
tags: [knitr,lubridate,stringr,reshape2,plyr,dplyr]
---
{% include JB/setup %}

This is a slight extension of the previous two articles ( [2014-11-20-Download-Stock-Data-1](http://jaehyeon-kim.github.io/r/2014/11/20/Download-Stock-Data-1/), [2014-11-20-Download-Stock-Data-2](http://jaehyeon-kim.github.io/r/2014/11/20/Download-Stock-Data-2/) ) and it aims to produce gross returns, standard deviation and correlation of multiple shares.

The following packages are used.


{% highlight r %}
library(knitr)
library(lubridate)
library(stringr)
library(reshape2)
library(plyr)
library(dplyr)
{% endhighlight %}

The script begins with creating a data folder in the format of *data_YYYY-MM-DD*.


{% highlight r %}
# create data folder
dataDir <- paste0("data","_",format(Sys.Date(),"%Y-%m-%d"))
if(file.exists(dataDir)) {
  unlink(dataDir, recursive = TRUE)
  dir.create(dataDir)
} else {
  dir.create(dataDir)
}
{% endhighlight %}

Given company codes, URLs and file paths are created. Then data files are downloaded by `Map`, which is a wrapper of `mapply`. Note that R's `download.file` function is wrapped by `downloadFile` so that the function does not break when an error occurs.


{% highlight r %}
# assumes codes are known beforehand
codes <- c("MSFT", "TCHC")
urls <- paste0("http://www.google.com/finance/historical?q=NASDAQ:",
               codes,"&output=csv")
paths <- paste0(dataDir,"/",codes,".csv") # backward slash on windows (\)

# simple error handling in case file doesn't exists
downloadFile <- function(url, path, ...) {
  # remove file if exists already
  if(file.exists(path)) file.remove(path)
  # download file
  tryCatch(
    download.file(url, path, ...), error = function(c) {
      # remove file if error
      if(file.exists(path)) file.remove(path)
      # create error message
      c$message <- paste(substr(path, 1, 4),"failed")
      message(c$message)
    }
  )
}
# wrapper of mapply
Map(downloadFile, urls, paths)
{% endhighlight %}

Once the files are downloaded, they are read back to combine using `rbind_all`. Some more details about this step is listed below.

* only Date, Close and Code columns are taken
* codes are extracted from file paths by matching a regular expression
* data is arranged by date as the raw files are sorted in a descending order
* error is handled by returning a dummy data frame where its code value is NA.
* individual data files are merged in a long format
    * 'NA' is filtered out


{% highlight r %}
# read all csv files and merge
files <- dir(dataDir, full.name = TRUE)
dataList <- llply(files, function(file){
  # get code from file path
  pattern <- "/[A-Z][A-Z][A-Z][A-Z]"
  code <- substr(str_extract(file, pattern), 2, nchar(str_extract(file, pattern)))
  tryCatch({
    data <- read.csv(file, stringsAsFactors = FALSE)
    # first column's name is funny
    names(data) <- c("Date","Open","High","Low","Close","Volume")
    data$Date <- dmy(data$Date)
    data$Close <- as.numeric(data$Close)
    data$Code <- code
    # optional
    data$Open <- as.numeric(data$Open)
    data$High <- as.numeric(data$High)
    data$Low <- as.numeric(data$Low)
    data$Volume <- as.integer(data$Volume)
    # select only 'Date', 'Close' and 'Code'
    # raw data should be arranged in an ascending order
    arrange(subset(data, select = c(Date, Close, Code)), Date)
  },
  error = function(c){
    c$message <- paste(code,"failed")
    message(c$message)
    # return a dummy data frame not to break function
    data <- data.frame(Date=dmy(format(Sys.Date(),"%d%m%Y")), Close=0, Code="NA")
    data
  })
}, .progress = "text")

# data is combined to create a long format
# dummy data frame values are filtered out
data <- filter(rbind_all(dataList), Code != "NA")
{% endhighlight %}

Some values of this long format data is shown below.


|Date       | Close|Code |
|:----------|-----:|:----|
|2013-11-29 | 38.13|MSFT |
|2013-12-02 | 38.45|MSFT |
|2013-12-03 | 38.31|MSFT |
|2013-12-04 | 38.94|MSFT |
|2013-12-05 | 38.00|MSFT |
|2013-12-06 | 38.36|MSFT |

The data is converted into a wide format data where the x and y variables are Date and Code respectively (`Date ~ Code`) while the value variable is Close (`value.var="Close"`). Some values of the wide format data is shown below.


{% highlight r %}
# data is converted into a wide format
data <- dcast(data, Date ~ Code, value.var="Close")
kable(head(data))
{% endhighlight %}



|Date       |  MSFT|  TCHC|
|:----------|-----:|-----:|
|2013-11-29 | 38.13| 13.52|
|2013-12-02 | 38.45| 13.81|
|2013-12-03 | 38.31| 13.48|
|2013-12-04 | 38.94| 13.71|
|2013-12-05 | 38.00| 13.55|
|2013-12-06 | 38.36| 13.95|

The remaining steps are just differencing close price values after taking log and applying `sum`, `sd`, and `cor`.


{% highlight r %}
# select except for Date column
data <- select(data, -Date)

# apply log difference column wise
dailyRet <- apply(log(data), 2, diff, lag=1)

# obtain daily return, variance and correlation
returns <- apply(dailyRet, 2, sum, na.rm = TRUE)
std <- apply(dailyRet, 2, sd, na.rm = TRUE)
correlation <- cor(dailyRet)

returns
{% endhighlight %}



{% highlight text %}
##      MSFT      TCHC 
## 0.2249777 0.6293973
{% endhighlight %}



{% highlight r %}
std
{% endhighlight %}



{% highlight text %}
##       MSFT       TCHC 
## 0.01167381 0.03203031
{% endhighlight %}



{% highlight r %}
correlation
{% endhighlight %}



{% highlight text %}
##           MSFT      TCHC
## MSFT 1.0000000 0.1481043
## TCHC 0.1481043 1.0000000
{% endhighlight %}

Finally the data folder is deleted.


{% highlight r %}
# delete data folder
if(file.exists(dataDir)) { unlink(dataDir, recursive = TRUE) }
{% endhighlight %}