English 中文(简体)
How to share variables across scripts in python?
原标题:

The following does not work

one.py

import shared
shared.value =  Hello 
raw_input( A cheap way to keep process alive.. )

two.py

import shared
print shared.value

run on two command lines as:

>>python one.py
>>python two.py

(the second one gets an attribute error, rightly so).

Is there a way to accomplish this, that is, share a variable between two scripts?

问题回答

Hope it s OK to jot down my notes about this issue here.

First of all, I appreciate the example in the OP a lot, because that is where I started as well - although it made me think shared is some built-in Python module, until I found a complete example at [Tutor] Global Variables between Modules ??.

However, when I looked for "sharing variables between scripts" (or processes) - besides the case when a Python script needs to use variables defined in other Python source files (but not necessarily running processes) - I mostly stumbled upon two other use cases:

  • A script forks itself into multiple child processes, which then run in parallel (possibly on multiple processors) on the same PC
  • A script spawns multiple other child processes, which then run in parallel (possibly on multiple processors) on the same PC

As such, most hits regarding "shared variables" and "interprocess communication" (IPC) discuss cases like these two; however, in both of these cases one can observe a "parent", to which the "children" usually have a reference.

What I am interested in, however, is running multiple invocations of the same script, ran independently, and sharing data between those (as in Python: how to share an object instance across multiple invocations of a script), in a singleton/single instance mode. That kind of problem is not really addressed by the above two cases - instead, it essentially reduces to the example in OP (sharing variables across two scripts).

Now, when dealing with this problem in Perl, there is IPC::Shareable; which "allows you to tie a variable to shared memory", using "an integer number or 4 character string[1] that serves as a common identifier for data across process space". Thus, there are no temporary files, nor networking setups - which I find great for my use case; so I was looking for the same in Python.

However, as accepted answer by @Drewfer notes: "You re not going to be able to do what you want without storing the information somewhere external to the two instances of the interpreter"; or in other words: either you have to use a networking/socket setup - or you have to use temporary files (ergo, no shared RAM for "totally separate python sessions").

Now, even with these considerations, it is kinda difficult to find working examples (except for pickle) - also in the docs for mmap and multiprocessing. I have managed to find some other examples - which also describe some pitfalls that the docs do not mention:

Thanks to these examples, I came up with an example, which essentially does the same as the mmap example, with approaches from the "synchronize a python dict" example - using BaseManager (via manager.start() through file path address) with shared list; both server and client read and write (pasted below). Note that:

  • multiprocessing managers can be started either via manager.start() or server.serve_forever()
    • serve_forever() locks - start() doesn t
    • There is auto-logging facility in multiprocessing: it seems to work fine with start()ed processes - but seems to ignore the ones that serve_forever()
  • The address specification in multiprocessing can be IP (socket) or temporary file (possibly a pipe?) path; in multiprocessing docs:
    • Most examples use multiprocessing.Manager() - this is just a function (not class instantiation) which returns a SyncManager, which is a special subclass of BaseManager; and uses start() - but not for IPC between independently ran scripts; here a file path is used
    • Few other examples serve_forever() approach for IPC between independently ran scripts; here IP/socket address is used
    • If an address is not specified, then an temp file path is used automatically (see 16.6.2.12. Logging for an example of how to see this)

In addition to all the pitfalls in the "synchronize a python dict" post, there are additional ones in case of a list. That post notes:

All manipulations of the dict must be done with methods and not dict assignments (syncdict["blast"] = 2 will fail miserably because of the way multiprocessing shares custom objects)

The workaround to dict[ key ] getting and setting, is the use of the dict public methods get and update. The problem is that there are no such public methods as alternative for list[index]; thus, for a shared list, in addition we have to register __getitem__ and __setitem__ methods (which are private for list) as exposed, which means we also have to re-register all the public methods for list as well :/

Well, I think those were the most critical things; these are the two scripts - they can just be ran in separate terminals (server first); note developed on Linux with Python 2.7:

a.py (server):

import multiprocessing
import multiprocessing.managers

import logging
logger = multiprocessing.log_to_stderr()
logger.setLevel(logging.INFO)


class MyListManager(multiprocessing.managers.BaseManager):
    pass


syncarr = []
def get_arr():
    return syncarr

def main():

    # print dir([]) # cannot do `exposed = dir([])`!! manually:
    MyListManager.register("syncarr", get_arr, exposed=[ __getitem__ ,  __setitem__ ,  __str__ ,  append ,  count ,  extend ,  index ,  insert ,  pop ,  remove ,  reverse ,  sort ])

    manager = MyListManager(address=( /tmp/mypipe ), authkey=  )
    manager.start()

    # we don t use the same name as `syncarr` here (although we could);
    # just to see that `syncarr_tmp` is actually <AutoProxy[syncarr] object>
    # so we also have to expose `__str__` method in order to print its list values!
    syncarr_tmp = manager.syncarr()
    print("syncarr (master):", syncarr, "syncarr_tmp:", syncarr_tmp)
    print("syncarr initial:", syncarr_tmp.__str__())

    syncarr_tmp.append(140)
    syncarr_tmp.append("hello")

    print("syncarr set:", str(syncarr_tmp))

    raw_input( Now run b.py and press ENTER )

    print
    print  Changing [0] 
    syncarr_tmp.__setitem__(0, 250)

    print  Changing [1] 
    syncarr_tmp.__setitem__(1, "foo")

    new_i = raw_input( Enter a new int value for [0]:  )
    syncarr_tmp.__setitem__(0, int(new_i))

    raw_input("Press any key (NOT Ctrl-C!) to kill server (but kill client first)".center(50, "-"))
    manager.shutdown()

if __name__ ==  __main__ :
  main()

b.py (client)

import time

import multiprocessing
import multiprocessing.managers

import logging
logger = multiprocessing.log_to_stderr()
logger.setLevel(logging.INFO)


class MyListManager(multiprocessing.managers.BaseManager):
    pass

MyListManager.register("syncarr")

def main():
  manager = MyListManager(address=( /tmp/mypipe ), authkey=  )
  manager.connect()
  syncarr = manager.syncarr()

  print "arr = %s" % (dir(syncarr))

  # note here we need not bother with __str__ 
  # syncarr can be printed as a list without a problem:
  print "List at start:", syncarr
  print "Changing from client"
  syncarr.append(30)
  print "List now:", syncarr

  o0 = None
  o1 = None

  while 1:
    new_0 = syncarr.__getitem__(0) # syncarr[0]
    new_1 = syncarr.__getitem__(1) # syncarr[1]

    if o0 != new_0 or o1 != new_1:
      print  o0: %s => %s  % (str(o0), str(new_0))
      print  o1: %s => %s  % (str(o1), str(new_1))
      print "List is:", syncarr

      print  Press Ctrl-C to exit 
      o0 = new_0
      o1 = new_1

    time.sleep(1)


if __name__ ==  __main__ :
    main()

As a final remark, on Linux /tmp/mypipe is created - but is 0 bytes, and has attributes srwxr-xr-x (for a socket); I guess this makes me happy, as I neither have to worry about network ports, nor about temporary files as such :)

Other related questions:

You re not going to be able to do what you want without storing the information somewhere external to the two instances of the interpreter.
If it s just simple variables you want, you can easily dump a python dict to a file with the pickle module in script one and then re-load it in script two. Example:

one.py

import pickle

shared = {"Foo":"Bar", "Parrot":"Dead"}
fp = open("shared.pkl","w")
pickle.dump(shared, fp)

two.py

import pickle

fp = open("shared.pkl")
shared = pickle.load(fp)
print shared["Foo"]
sudo apt-get install memcached python-memcache

one.py

import memcache
shared = memcache.Client([ 127.0.0.1:11211 ], debug=0)
shared.set( Value ,  Hello )

two.py

import memcache
shared = memcache.Client([ 127.0.0.1:11211 ], debug=0)    
print shared.get( Value )

What you re trying to do here (store a shared state in a Python module over separate python interpreters) won t work.

A value in a module can be updated by one module and then read by another module, but this must be within the same Python interpreter. What you seem to be doing here is actually a sort of interprocess communication; this could be accomplished via socket communication between the two processes, but it is significantly less trivial than what you are expecting to have work here.

you can use the relative simple mmap file. you can use the shared.py to store the common constants. The following code will work across different python interpreters scripts processes

shared.py:

MMAP_SIZE = 16*1024 
MMAP_NAME =  Global\SHARED_MMAP_NAME 

* The "Global" is windows syntax for global names

one.py:

from shared import MMAP_SIZE,MMAP_NAME                                                        
def write_to_mmap():                                                                          
    map_file = mmap.mmap(-1,MMAP_SIZE,tagname=MMAP_NAME,access=mmap.ACCESS_WRITE)             
    map_file.seek(0)                                                                          
    map_file.write( hello
 )                                                                 
    ret = map_file.flush() != 0                                                               
    if sys.platform.startswith( win ):                                                        
        assert(ret != 0)                                                                      
    else:                                                                                     
        assert(ret == 0)                                                                      

two.py:

from shared import MMAP_SIZE,MMAP_NAME                                          
def read_from_mmap():                                                           
    map_file = mmap.mmap(-1,MMAP_SIZE,tagname=MMAP_NAME,access=mmap.ACCESS_READ)
    map_file.seek(0)                                                            
    data = map_file.readline().rstrip( 
 )                                     
    map_file.close()                                                            
    print data                                                                  

*This code was written for windows, linux might need little adjustments

more info at - https://docs.python.org/2/library/mmap.html

Share a dynamic variable by Redis:

script_one.py

from redis import Redis
from time import sleep

cli = Redis( localhost )
shared_var = 1

while True:
   cli.set( share_place , shared_var)
   shared_var += 1
   sleep(1)

Run script_one in a terminal (a process):

$ python script_one.py

script_two.py

from redis import Redis
from time import sleep

cli = Redis( localhost )

while True:
    print(int(cli.get( share_place )))
    sleep(1)

Run script_two in another terminal (another process):

$ python script_two.py

Out:

1
2
3
4
5
...

Dependencies:

$ pip install redis
$ apt-get install redis-server

I d advise that you use the multiprocessing module. You can t run two scripts from the commandline, but you can have two separate processes easily speak to each other.

From the doc s examples:

from multiprocessing import Process, Queue

def f(q):
    q.put([42, None,  hello ])

if __name__ ==  __main__ :
    q = Queue()
    p = Process(target=f, args=(q,))
    p.start()
    print q.get()    # prints "[42, None,  hello ]"
    p.join()

You need to store the variable in some sort of persistent file. There are several modules to do this, depending on your exact need.

The pickle and cPickle module can save and load most python objects to file.

The shelve module can store python objects in a dictionary-like structure (using pickle behind the scenes).

The dbm/bsddb/dbhash/gdm modules can store string variables in a dictionary-like structure.

The sqlite3 module can store data in a lightweight SQL database.

The biggest problem with most of these are that they are not synchronised across different processes - if one process reads a value while another is writing to the datastore then you may get incorrect data or data corruption. To get round this you will need to write your own file locking mechanism or use a full-blown database.

If you wanna read and modify shared data between 2 scripts which run separately, a good solution would be to take advantage of python multiprocessing module and use a Pipe() or a Queue() (see differences here). This way you get to sync scripts and avoid problems regarding concurrency and global variables (like what happens if both scripts wanna modify a variable at the same time).

The best part about using pipes/queues is that you can pass python objects through them.

Also there are methods to avoid waiting for data if there hasn t been passed yet (queue.empty() and pipeConn.poll()).

See an example using Queue() below:

    # main.py
    from multiprocessing import Process, Queue
    from stage1 import Stage1
    from stage2 import Stage2


    s1= Stage1()
    s2= Stage2()

    # S1 to S2 communication
    queueS1 = Queue()  # s1.stage1() writes to queueS1

    # S2 to S1 communication
    queueS2 = Queue()  # s2.stage2() writes to queueS2

    # start s2 as another process
    s2 = Process(target=s2.stage2, args=(queueS1, queueS2))
    s2.daemon = True
    s2.start()     # Launch the stage2 process

    s1.stage1(queueS1, queueS2) # start sending stuff from s1 to s2 
    s2.join() # wait till s2 daemon finishes
    # stage1.py
    import time
    import random

    class Stage1:

      def stage1(self, queueS1, queueS2):
        print("stage1")
        lala = []
        lis = [1, 2, 3, 4, 5]
        for i in range(len(lis)):
          # to avoid unnecessary waiting
          if not queueS2.empty():
            msg = queueS2.get()    # get msg from s2
            print("! ! ! stage1 RECEIVED from s2:", msg)
            lala = [6, 7, 8] # now that a msg was received, further msgs will be different
          time.sleep(1) # work
          random.shuffle(lis)
          queueS1.put(lis + lala)             
        queueS1.put( s1 is DONE )
    # stage2.py
    import time

    class Stage2:

      def stage2(self, queueS1, queueS2):
        print("stage2")
        while True:
            msg = queueS1.get()    # wait till there is a msg from s1
            print("- - - stage2 RECEIVED from s1:", msg)
            if msg ==  s1 is DONE  :
                break # ends loop
            time.sleep(1) # work
            queueS2.put("update lists")             

EDIT: just found that you can use queue.get(False) to avoid blockage when receiving data. This way there s no need to check first if the queue is empty. This is no possible if you use pipes.

Use text files or environnement variables. Since the two run separatly, you can t really do what you are trying to do.

In your example, the first script runs to completion, and then the second script runs. That means you need some sort of persistent state. Other answers have suggested using text files or Python s pickle module. Personally I am lazy, and I wouldn t use a text file when I could use pickle; why should I write a parser to parse my own text file format?

Instead of pickle you could also use the json module to store it as JSON. This might be preferable if you want to share the data to non-Python programs, as JSON is a simple and common standard. If your Python doesn t have json, get simplejson.

If your needs go beyond pickle or json -- say you actually want to have two Python programs executing at the same time and updating the persistent state variables in real time -- I suggest you use the SQLite database. Use an ORM to abstract the database away, and it s super easy. For SQLite and Python, I recommend Autumn ORM.

This method seems straight forward for me:

class SharedClass:

def __init__(self):
    self.data = {}

def set_data(self, name, value):
    self.data[name] = value

def get_data(self, name):
    try:
        return self.data[name]
    except:
        return "none"

def reset_data(self):
    self.data = {}

sharedClass = SharedClass()

PS : you can set the data with a parameter name and a value for it, and to access the value you can use the get_data method, below is the example:

to set the data

example 1:
sharedClass.set_data("name","Jon Snow")
example 2:
sharedClass.set_data("email","jon@got.com")

to get the data

sharedClass.get_data("email")

to reset the entire state simply use

sharedClass.reset_data()

Its kind of accessing data from a json object (dict in this case)

Hope this helps....

You could use the basic from and import functions in python to import the variable into two.py. For example:

from filename import variable

That should import the variable from the file. (Of course you should replace filename with one.py, and replace variable with the variable you want to share to two.py.)





相关问题
Can Django models use MySQL functions?

Is there a way to force Django models to pass a field to a MySQL function every time the model data is read or loaded? To clarify what I mean in SQL, I want the Django model to produce something like ...

An enterprise scheduler for python (like quartz)

I am looking for an enterprise tasks scheduler for python, like quartz is for Java. Requirements: Persistent: if the process restarts or the machine restarts, then all the jobs must stay there and ...

How to remove unique, then duplicate dictionaries in a list?

Given the following list that contains some duplicate and some unique dictionaries, what is the best method to remove unique dictionaries first, then reduce the duplicate dictionaries to single ...

What is suggested seed value to use with random.seed()?

Simple enough question: I m using python random module to generate random integers. I want to know what is the suggested value to use with the random.seed() function? Currently I am letting this ...

How can I make the PyDev editor selectively ignore errors?

I m using PyDev under Eclipse to write some Jython code. I ve got numerous instances where I need to do something like this: import com.work.project.component.client.Interface.ISubInterface as ...

How do I profile `paster serve` s startup time?

Python s paster serve app.ini is taking longer than I would like to be ready for the first request. I know how to profile requests with middleware, but how do I profile the initialization time? I ...

Pragmatically adding give-aways/freebies to an online store

Our business currently has an online store and recently we ve been offering free specials to our customers. Right now, we simply display the special and give the buyer a notice stating we will add the ...

Converting Dictionary to List? [duplicate]

I m trying to convert a Python dictionary into a Python list, in order to perform some calculations. #My dictionary dict = {} dict[ Capital ]="London" dict[ Food ]="Fish&Chips" dict[ 2012 ]="...

热门标签