English 中文(简体)
事件/一般同步 I/O 任务颗粒
原标题:Eventlet/general async I/O task granularity

我正在研究一个网络后端 / API 提供商, 获取来自第三党网络API的实时数据, 将其输入 MySQL 数据库, 并通过 HTTP/JSON API 提供。

我正用SQLAlchemy核心 向API提供酒瓶和与DB合作。

对于实时数据采集部分,我有功能通过发送请求、将返回的 xml 解析成 Python dict 并返回它来覆盖第三党的API 。 我们称这些 API 包装器为 API 。

然后,我将这些功能用其他方法调用,这些方法将各自的数据取用,必要时进行任何处理(如时区转换等),并将其放入DB。我们将这些处理器调用。

我读到过一些非同步的 I/O和事件 特别让我印象深刻

我打算把它纳入我的数据采集代码, 但我先要问几个问题:

  1. 我能否安全地修补一切? 考虑到我有酒瓶、SQLAlchemy和一群其他自由派,猴子修补(假设没有迟到的捆绑)是否有任何不利之处吗?

  2. 我的任务应该分为什么颗粒? 我在想建立一个定期产卵处理器的池子。 然后,当处理器到达它称之为API包装器的那部分时, API包装器将启动一个绿色程序, 以便用事件. green. urlib2 获取实际的 HTTP 数据 。 这是一个很好的方法吗?

  3. Timeouts - I want to make sure no greenthreads ever hang. Is it a good approach to set the eventlet.Timeout to 10-15 seconds for every greenthread?

FYI,我有大约10套 不同的实时数据, 和处理器产卵 每~5 -10秒。

谢谢!

问题回答

我不认为把弗拉斯克/SQLAlchemiy与非同步风格(或事件驱动)的编程模式混为一谈是明智之举。

然而,既然你声称你正在使用RDBMS(MYSQL)作为中间储存库,你为什么不仅仅创建零星的工人,在RDMBS中存储第三方网络服务的结果,并保持前端同步?

那样的话,你就不需要 猴子抓瓶子或SQLAlchemy了

关于颗粒特性,您可能想要使用 < a href=" http://en.wikipedia.org/wiki/MapRedduce" rel=“nofolt”>mapreduce 模式来进行网络API的呼叫和处理。 这个模式可能会给你们一些想法, 如何在逻辑上分解连续步骤, 以及如何控制相关的进程 。

个人而言,我不会使用一个非同步框架来做到这一点。 最好使用多处理、 < a href="http://celeryproject. org/" rel="nofollow" > Celery 或像 addoop 这样的真正的地图系统。

只是一个提示: 小开始, 保持它简单和模块化, 如果您需要更好的性能, 以后再优化。 这还可能受到您想要信息的实时程度的极大影响 。

它可以安全地修补一个由纯用标准lib 写成的模块。

  • there are few pure mysql adapters:
  • PyMysql has a sqlalchemy test suite, you could run the test for your cases.
  • There is a module named pymysql_sa to provide dialect for sqlalchemy
  • Flask is wrote by pure python and 100% WSGI 1.0 compliant. use eventlet.wsgi to provide the service.

使用绿色模块进行单抓取任务。 将任务放入队列, 由事件提供, 每个任务工人从队列中获取一份工作, 然后在完成抓取后将结果保存到 db 中, 或发送到事件 。 Event 对象以触发等待任务完成的工作 。 或者, 两者 。

更新 :

该事件官方文件强烈建议使用主模块拳头线上的补丁, 并可以安全地多次调用 monkey_ patch 。 更多信息请访问< a href=> http://eventlet. net/doc/patching.html> rel=“ nofollow” > http://eventlet. net/doc/patch.html

There some green module can working with eventlet, all of them are in the eventlet.green. A list on bitbucket. Make sure use the green module in your code, or patch them before import 3th modules which use the standard libs.

但猴子小狗只接受少数模块, 需要手工导入绿色模块 。

def monkey_patch(**on):
    """Globally patches certain system modules to be greenthread-friendly.

    The keyword arguments afford some control over which modules are patched.
    If no keyword arguments are supplied, all possible modules are patched.
    If keywords are set to True, only the specified modules are patched.  E.g.,
    ``monkey_patch(socket=True, select=True)`` patches only the select and 
    socket modules.  Most arguments patch the single module of the same name 
    (os, time, select).  The exceptions are socket, which also patches the ssl 
    module if present; and thread, which patches thread, threading, and Queue.

    It s safe to call monkey_patch multiple times.
    """    
    accepted_args = set(( os ,  select ,  socket , 
                          thread ,  time ,  psycopg ,  MySQLdb ))
    default_on = on.pop("all",None)




相关问题
Can Django models use MySQL functions?

Is there a way to force Django models to pass a field to a MySQL function every time the model data is read or loaded? To clarify what I mean in SQL, I want the Django model to produce something like ...

An enterprise scheduler for python (like quartz)

I am looking for an enterprise tasks scheduler for python, like quartz is for Java. Requirements: Persistent: if the process restarts or the machine restarts, then all the jobs must stay there and ...

How to remove unique, then duplicate dictionaries in a list?

Given the following list that contains some duplicate and some unique dictionaries, what is the best method to remove unique dictionaries first, then reduce the duplicate dictionaries to single ...

What is suggested seed value to use with random.seed()?

Simple enough question: I m using python random module to generate random integers. I want to know what is the suggested value to use with the random.seed() function? Currently I am letting this ...

How can I make the PyDev editor selectively ignore errors?

I m using PyDev under Eclipse to write some Jython code. I ve got numerous instances where I need to do something like this: import com.work.project.component.client.Interface.ISubInterface as ...

How do I profile `paster serve` s startup time?

Python s paster serve app.ini is taking longer than I would like to be ready for the first request. I know how to profile requests with middleware, but how do I profile the initialization time? I ...

Pragmatically adding give-aways/freebies to an online store

Our business currently has an online store and recently we ve been offering free specials to our customers. Right now, we simply display the special and give the buyer a notice stating we will add the ...

Converting Dictionary to List? [duplicate]

I m trying to convert a Python dictionary into a Python list, in order to perform some calculations. #My dictionary dict = {} dict[ Capital ]="London" dict[ Food ]="Fish&Chips" dict[ 2012 ]="...

热门标签