Code downloads a text file from a website, saves it to local disk, and then loads it into a list for further processingSplitting a CSV file with headersMaking a list from user inputSplitting plain text dictionary data to multiple files, round 2.txt word-counterReusable, generic save methodFile copy in Python for slow networksFile copy in Python for slow networks (version 2)Reading a DHT11 to a fileMad Libs excerciseMinimising latency on a pyautogui responce when a given pixel is seen

Did Karl Marx ever use any example that involved cotton and dollars to illustrate the way capital and surplus value were generated?

Are all instances of trolls turning to stone ultimately references back to Tolkien?

How to get cool night-vision without lame drawbacks?

Was there ever a name for the weapons of the Others?

Change CPU MHz from Registry

Would a two-seat light aircaft with a landing speed of 20 knots and a top speed of 180 knots be technically possible?

Has there been any indication at all that further negotiation between the UK and EU is possible?

First-year PhD giving a talk among well-established researchers in the field

Distance Matrix (plugin) - QGIS

Does Marvel have an equivalent of the Green Lantern?

How precise do models need to be for 3d printing?

Can White Castle?

Inverse-quotes-quine

Why do textbooks often include the solutions to odd or even numbered problems but not both?

Hot coffee brewing solutions for deep woods camping

Can the negators "jamais, rien, personne, plus, ni, aucun" be used in a single sentence?

Why is there no havdallah when going from Yom Tov into Shabbat?

STM Microcontroller burns every time

Should I hide continue button until tasks are completed?

Why do some games show lights shine through walls?

Intuitively, why does putting capacitors in series decrease the equivalent capacitance?

Is there a maximum distance from a planet that a moon can orbit?

Change the boot order with no option in UEFI settings

How to add multiple ip address in destination ip in acl rule



Code downloads a text file from a website, saves it to local disk, and then loads it into a list for further processing


Splitting a CSV file with headersMaking a list from user inputSplitting plain text dictionary data to multiple files, round 2.txt word-counterReusable, generic save methodFile copy in Python for slow networksFile copy in Python for slow networks (version 2)Reading a DHT11 to a fileMad Libs excerciseMinimising latency on a pyautogui responce when a given pixel is seen






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;








11












$begingroup$


My program opens a website and downloads a text file. The text file is a simple file with one word per line. I save the file to local disk and then create a list to hold each line of the text file for later processing. I would like to know if I am doing these first steps in a way that would be considered idiomatic Python and have I made any big mistakes that will hamper my efforts to expand on it later.



This is similar to an exercise in Think Python by Allen Downey. He suggests using a browser to download the text file but I wanted to do it with Python.



import requests

def get_webpage(uri):
return requests.get(uri)


def save_webpagecontent(r, filename):
""" This function saves the page retrieved by get_webpage. r is the
response from the call to requests.get and
filename is where we want to save the file to in the filesystem."""

chunk_size = 8388608 # number of bytes to write to disk in each chunk
with open(filename, 'wb') as fd:
for chunk in r.iter_content(chunk_size):
fd.write(chunk)


def make_wordlist(filename):
wordlist = []
with open(filename) as fd:
wordlist = fd.readlines()
return wordlist


def get_mylist(wordlist, num_lines=10):
if len(wordlist) <= num_lines:
return wordlist
return wordlist[:num_lines]


def print_mylist(mylist):
for word in mylist:
print(word.strip())
return None

"""List of words collected and contributed to the public domain by
Grady Ward as part of the Moby lexicon project. See https://en.wikipedia.org/wiki/Moby_Project
"""
uri = 'https://ia802308.us.archive.org/7/items/mobywordlists03201gut/CROSSWD.TXT'
filename = 'wordlist.txt'

r = get_webpage(uri)
save_webpagecontent(r, filename)
wordlist = make_wordlist(filename)
mylist = get_mylist(wordlist)
print_mylist(mylist)


My program works as I expect it to. I have basically found how to do each individual piece by reading this forum and but I would like to know if I'm putting all the pieces together correctly. By correctly I mean something that not only functions as expected but also will be easy to build larger programs and modules from.



I hope it isn't wrong of me to post this much code. I wasn't sure how I could trim it down and still show what I am doing. Please let me know if I need to change the format of my question.










share|improve this question











$endgroup$







  • 3




    $begingroup$
    Your code looks good, and it is not really long. Here on Code Review it is accepted to sometimes post around 1000 lines, if that's necessary for understanding the code. When pasting your code, you made a small mistake with the indentation around the chunk_size line. After you repaired this, your question is in a perfect format for this site, especially since you explained in detail what code you wrote and why. That's something essential that several other questions are missing.
    $endgroup$
    – Roland Illig
    Jun 7 at 6:12






  • 1




    $begingroup$
    Thank you for your remarks. I feel more encouraged about the process now!
    $endgroup$
    – Duane Whitty
    Jun 7 at 6:21






  • 3




    $begingroup$
    Don't worry, the SE network can be confusing for new users. You're in The Good Place now.
    $endgroup$
    – Mast
    Jun 7 at 6:29










  • $begingroup$
    Might want to add an event in case the request can't be completed and you can't download the code.
    $endgroup$
    – BruceWayne
    Jun 7 at 19:49

















11












$begingroup$


My program opens a website and downloads a text file. The text file is a simple file with one word per line. I save the file to local disk and then create a list to hold each line of the text file for later processing. I would like to know if I am doing these first steps in a way that would be considered idiomatic Python and have I made any big mistakes that will hamper my efforts to expand on it later.



This is similar to an exercise in Think Python by Allen Downey. He suggests using a browser to download the text file but I wanted to do it with Python.



import requests

def get_webpage(uri):
return requests.get(uri)


def save_webpagecontent(r, filename):
""" This function saves the page retrieved by get_webpage. r is the
response from the call to requests.get and
filename is where we want to save the file to in the filesystem."""

chunk_size = 8388608 # number of bytes to write to disk in each chunk
with open(filename, 'wb') as fd:
for chunk in r.iter_content(chunk_size):
fd.write(chunk)


def make_wordlist(filename):
wordlist = []
with open(filename) as fd:
wordlist = fd.readlines()
return wordlist


def get_mylist(wordlist, num_lines=10):
if len(wordlist) <= num_lines:
return wordlist
return wordlist[:num_lines]


def print_mylist(mylist):
for word in mylist:
print(word.strip())
return None

"""List of words collected and contributed to the public domain by
Grady Ward as part of the Moby lexicon project. See https://en.wikipedia.org/wiki/Moby_Project
"""
uri = 'https://ia802308.us.archive.org/7/items/mobywordlists03201gut/CROSSWD.TXT'
filename = 'wordlist.txt'

r = get_webpage(uri)
save_webpagecontent(r, filename)
wordlist = make_wordlist(filename)
mylist = get_mylist(wordlist)
print_mylist(mylist)


My program works as I expect it to. I have basically found how to do each individual piece by reading this forum and but I would like to know if I'm putting all the pieces together correctly. By correctly I mean something that not only functions as expected but also will be easy to build larger programs and modules from.



I hope it isn't wrong of me to post this much code. I wasn't sure how I could trim it down and still show what I am doing. Please let me know if I need to change the format of my question.










share|improve this question











$endgroup$







  • 3




    $begingroup$
    Your code looks good, and it is not really long. Here on Code Review it is accepted to sometimes post around 1000 lines, if that's necessary for understanding the code. When pasting your code, you made a small mistake with the indentation around the chunk_size line. After you repaired this, your question is in a perfect format for this site, especially since you explained in detail what code you wrote and why. That's something essential that several other questions are missing.
    $endgroup$
    – Roland Illig
    Jun 7 at 6:12






  • 1




    $begingroup$
    Thank you for your remarks. I feel more encouraged about the process now!
    $endgroup$
    – Duane Whitty
    Jun 7 at 6:21






  • 3




    $begingroup$
    Don't worry, the SE network can be confusing for new users. You're in The Good Place now.
    $endgroup$
    – Mast
    Jun 7 at 6:29










  • $begingroup$
    Might want to add an event in case the request can't be completed and you can't download the code.
    $endgroup$
    – BruceWayne
    Jun 7 at 19:49













11












11








11


0



$begingroup$


My program opens a website and downloads a text file. The text file is a simple file with one word per line. I save the file to local disk and then create a list to hold each line of the text file for later processing. I would like to know if I am doing these first steps in a way that would be considered idiomatic Python and have I made any big mistakes that will hamper my efforts to expand on it later.



This is similar to an exercise in Think Python by Allen Downey. He suggests using a browser to download the text file but I wanted to do it with Python.



import requests

def get_webpage(uri):
return requests.get(uri)


def save_webpagecontent(r, filename):
""" This function saves the page retrieved by get_webpage. r is the
response from the call to requests.get and
filename is where we want to save the file to in the filesystem."""

chunk_size = 8388608 # number of bytes to write to disk in each chunk
with open(filename, 'wb') as fd:
for chunk in r.iter_content(chunk_size):
fd.write(chunk)


def make_wordlist(filename):
wordlist = []
with open(filename) as fd:
wordlist = fd.readlines()
return wordlist


def get_mylist(wordlist, num_lines=10):
if len(wordlist) <= num_lines:
return wordlist
return wordlist[:num_lines]


def print_mylist(mylist):
for word in mylist:
print(word.strip())
return None

"""List of words collected and contributed to the public domain by
Grady Ward as part of the Moby lexicon project. See https://en.wikipedia.org/wiki/Moby_Project
"""
uri = 'https://ia802308.us.archive.org/7/items/mobywordlists03201gut/CROSSWD.TXT'
filename = 'wordlist.txt'

r = get_webpage(uri)
save_webpagecontent(r, filename)
wordlist = make_wordlist(filename)
mylist = get_mylist(wordlist)
print_mylist(mylist)


My program works as I expect it to. I have basically found how to do each individual piece by reading this forum and but I would like to know if I'm putting all the pieces together correctly. By correctly I mean something that not only functions as expected but also will be easy to build larger programs and modules from.



I hope it isn't wrong of me to post this much code. I wasn't sure how I could trim it down and still show what I am doing. Please let me know if I need to change the format of my question.










share|improve this question











$endgroup$




My program opens a website and downloads a text file. The text file is a simple file with one word per line. I save the file to local disk and then create a list to hold each line of the text file for later processing. I would like to know if I am doing these first steps in a way that would be considered idiomatic Python and have I made any big mistakes that will hamper my efforts to expand on it later.



This is similar to an exercise in Think Python by Allen Downey. He suggests using a browser to download the text file but I wanted to do it with Python.



import requests

def get_webpage(uri):
return requests.get(uri)


def save_webpagecontent(r, filename):
""" This function saves the page retrieved by get_webpage. r is the
response from the call to requests.get and
filename is where we want to save the file to in the filesystem."""

chunk_size = 8388608 # number of bytes to write to disk in each chunk
with open(filename, 'wb') as fd:
for chunk in r.iter_content(chunk_size):
fd.write(chunk)


def make_wordlist(filename):
wordlist = []
with open(filename) as fd:
wordlist = fd.readlines()
return wordlist


def get_mylist(wordlist, num_lines=10):
if len(wordlist) <= num_lines:
return wordlist
return wordlist[:num_lines]


def print_mylist(mylist):
for word in mylist:
print(word.strip())
return None

"""List of words collected and contributed to the public domain by
Grady Ward as part of the Moby lexicon project. See https://en.wikipedia.org/wiki/Moby_Project
"""
uri = 'https://ia802308.us.archive.org/7/items/mobywordlists03201gut/CROSSWD.TXT'
filename = 'wordlist.txt'

r = get_webpage(uri)
save_webpagecontent(r, filename)
wordlist = make_wordlist(filename)
mylist = get_mylist(wordlist)
print_mylist(mylist)


My program works as I expect it to. I have basically found how to do each individual piece by reading this forum and but I would like to know if I'm putting all the pieces together correctly. By correctly I mean something that not only functions as expected but also will be easy to build larger programs and modules from.



I hope it isn't wrong of me to post this much code. I wasn't sure how I could trim it down and still show what I am doing. Please let me know if I need to change the format of my question.







python array file






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Jun 7 at 7:27









AlexV

2,7359 silver badges31 bronze badges




2,7359 silver badges31 bronze badges










asked Jun 7 at 5:56









Duane WhittyDuane Whitty

687 bronze badges




687 bronze badges







  • 3




    $begingroup$
    Your code looks good, and it is not really long. Here on Code Review it is accepted to sometimes post around 1000 lines, if that's necessary for understanding the code. When pasting your code, you made a small mistake with the indentation around the chunk_size line. After you repaired this, your question is in a perfect format for this site, especially since you explained in detail what code you wrote and why. That's something essential that several other questions are missing.
    $endgroup$
    – Roland Illig
    Jun 7 at 6:12






  • 1




    $begingroup$
    Thank you for your remarks. I feel more encouraged about the process now!
    $endgroup$
    – Duane Whitty
    Jun 7 at 6:21






  • 3




    $begingroup$
    Don't worry, the SE network can be confusing for new users. You're in The Good Place now.
    $endgroup$
    – Mast
    Jun 7 at 6:29










  • $begingroup$
    Might want to add an event in case the request can't be completed and you can't download the code.
    $endgroup$
    – BruceWayne
    Jun 7 at 19:49












  • 3




    $begingroup$
    Your code looks good, and it is not really long. Here on Code Review it is accepted to sometimes post around 1000 lines, if that's necessary for understanding the code. When pasting your code, you made a small mistake with the indentation around the chunk_size line. After you repaired this, your question is in a perfect format for this site, especially since you explained in detail what code you wrote and why. That's something essential that several other questions are missing.
    $endgroup$
    – Roland Illig
    Jun 7 at 6:12






  • 1




    $begingroup$
    Thank you for your remarks. I feel more encouraged about the process now!
    $endgroup$
    – Duane Whitty
    Jun 7 at 6:21






  • 3




    $begingroup$
    Don't worry, the SE network can be confusing for new users. You're in The Good Place now.
    $endgroup$
    – Mast
    Jun 7 at 6:29










  • $begingroup$
    Might want to add an event in case the request can't be completed and you can't download the code.
    $endgroup$
    – BruceWayne
    Jun 7 at 19:49







3




3




$begingroup$
Your code looks good, and it is not really long. Here on Code Review it is accepted to sometimes post around 1000 lines, if that's necessary for understanding the code. When pasting your code, you made a small mistake with the indentation around the chunk_size line. After you repaired this, your question is in a perfect format for this site, especially since you explained in detail what code you wrote and why. That's something essential that several other questions are missing.
$endgroup$
– Roland Illig
Jun 7 at 6:12




$begingroup$
Your code looks good, and it is not really long. Here on Code Review it is accepted to sometimes post around 1000 lines, if that's necessary for understanding the code. When pasting your code, you made a small mistake with the indentation around the chunk_size line. After you repaired this, your question is in a perfect format for this site, especially since you explained in detail what code you wrote and why. That's something essential that several other questions are missing.
$endgroup$
– Roland Illig
Jun 7 at 6:12




1




1




$begingroup$
Thank you for your remarks. I feel more encouraged about the process now!
$endgroup$
– Duane Whitty
Jun 7 at 6:21




$begingroup$
Thank you for your remarks. I feel more encouraged about the process now!
$endgroup$
– Duane Whitty
Jun 7 at 6:21




3




3




$begingroup$
Don't worry, the SE network can be confusing for new users. You're in The Good Place now.
$endgroup$
– Mast
Jun 7 at 6:29




$begingroup$
Don't worry, the SE network can be confusing for new users. You're in The Good Place now.
$endgroup$
– Mast
Jun 7 at 6:29












$begingroup$
Might want to add an event in case the request can't be completed and you can't download the code.
$endgroup$
– BruceWayne
Jun 7 at 19:49




$begingroup$
Might want to add an event in case the request can't be completed and you can't download the code.
$endgroup$
– BruceWayne
Jun 7 at 19:49










3 Answers
3






active

oldest

votes


















7












$begingroup$

Your code is nice and concise however there are some changes you can make:



  1. You can just return f.readlines() in make_wordlist.


  2. If you've done this to show that the result is a list then it'd be better to use the typing module.



    from typing import List


    def make_wordlist(filename: str) -> List[str]:
    ...



  3. get_mylist can be replaced with wordlist[:numlines]. This is because if len(wordlist) is smaller or equal to numlines, then it will return the entire thing anyway.

  4. Performance wise it's best to use print('n'.join(list)) rather than for item in list: print(item).

  5. I would prefer to be able to change chunk_size in save_webpagecontent and so you can make it a default argument.

  6. IIRC multi-line docstrings shouldn't start on the same line as the """, nor should they end on the same line either.

import requests
from typing import List

Response = requests.Response


def get_webpage(uri) -> Response:
return requests.get(uri)


def save_webpagecontent(r: Response, filename: str,
chunk_size: int=8388608) -> None:
"""
This function saves the page retrieved by get_webpage. r is the
response from the call to requests.get and
filename is where we want to save the file to in the filesystem.
"""
with open(filename, 'wb') as fd:
for chunk in r.iter_content(chunk_size):
fd.write(chunk)


def read_wordlist(filename: str) -> List[str]:
with open(filename) as fd:
return fd.readlines()


def print_mylist(word_list: List[str]) -> None:
print('n'.join(word.strip() for word in word_list))


"""
List of words collected and contributed to the public domain by
Grady Ward as part of the Moby lexicon project. See https://en.wikipedia.org/wiki/Moby_Project
"""
uri = 'https://ia802308.us.archive.org/7/items/mobywordlists03201gut/CROSSWD.TXT'
filename = 'wordlist.txt'

r = get_webpage(uri)
save_webpagecontent(r, filename)
print_mylist(read_wordlist(filename)[:10])





share|improve this answer











$endgroup$








  • 1




    $begingroup$
    requests.get() returns a Response object, not a Request.
    $endgroup$
    – Lukasz Salitra
    Jun 7 at 9:51






  • 1




    $begingroup$
    @LukaszSalitra Thank you, I've updated the code with that. Thought I read the docs correctly :/
    $endgroup$
    – Peilonrayz
    Jun 7 at 9:54










  • $begingroup$
    As per PEP257 multi-line docstrings can start either right after the opening """ or on the line below.
    $endgroup$
    – Mathias Ettinger
    Jun 7 at 13:35






  • 1




    $begingroup$
    @DuaneWhitty It's 100% optional. They don't make the code faster. If you use one of mypy, pyright, pyre or pytype then they can perform static code analysis which should make your code safer.
    $endgroup$
    – Peilonrayz
    Jun 7 at 18:27






  • 1




    $begingroup$
    @DuaneWhitty No problem to use this text you need to write `this text`. Feel free to try that out here. You also can't put newlines in comments. (They would look like a mess) Yeah looks like you're correct.
    $endgroup$
    – Peilonrayz
    Jun 7 at 22:38



















3












$begingroup$

Your program is concise and well-readable. One thing that is probably less pythonic is storing the received data in a file. If you have no further use for the file, you could just process the data into a wordlist while receiving it. This saves one intermediate step, and your program would not leave a residual wordlist.txt file around.






share|improve this answer









$endgroup$








  • 3




    $begingroup$
    Besides the gains mentioned, file I/O is expensive CPU-wise. However, it's unclear at the moment whether the further processing of the words ("and then create a list to hold each line of the text file for later processing") is going to be done by the same Python program or a different program altogether. Saving to file as an intermediary step for safety/redundancy/whatever reasons is still a good reason to keep it around.
    $endgroup$
    – Mast
    Jun 7 at 10:11










  • $begingroup$
    Thank you both. I saved the file for two reasons: 1) I knew that I would want to be accessing the file many times over and over to try different analysis techniques as my knowledge grows. I'll be running word length analysis and letter frequency analysis and then trying out different ways to display the results. I thought it would be more efficient that downloading it many times over and over and also more polite to the site I'm downloading from. 2) I am thinking about how I should write programs in case I have less than ideal network conditions. Thanks again!
    $endgroup$
    – Duane Whitty
    Jun 7 at 17:27










  • $begingroup$
    One way to download the file only once would be to check for its existence and only download it if the file is not present. That's basically a simple cache without an invalidation policy, and you can manually invalidate the cache by removing the file. Alternatively, you could write separate download and analysis programs, which would help to avoid code duplication with the associated risk of having slightly different versions of the download code in each program.
    $endgroup$
    – Hans-Martin Mosner
    Jun 7 at 20:48










  • $begingroup$
    Thank you to everyone for the reviews. I have made some big updates to my code using the suggestions I received here. Should I now open another request for a code review or should I answer my question with my updated code? The changes include splitting the file handling into another module and using contextmanager to handle file not found errors.
    $endgroup$
    – Duane Whitty
    Jun 8 at 3:11


















1












$begingroup$

Code downloads a text file from a website, saves it to local disk, and then loads it into a list for further processing - Version 2.0



In my new version of this code I have separated my code into 3 modules (my start at a 12 factor app):



download.py for handling downloading the text file from the website and saving it as a file to local storage;



config.py for specifying the URI of the website and the filename for local storage;



moby.py is the actual code that reads the words in the text file, 1 per line, into a list. For now all it does is prints out the words from the file, one per line.



The review my code received provided valuable suggestions on how it could be made more Pythonic, more modular, and more efficient.



Motivated by Hans-Martin Mosner to separate the file download code here is that module. Also made the chunk_size a parameter to the save_webpagecontent() function based on as suggested by Peilonrayz



download.py



import requests
from typing import List

Response = requests.Response

def get_webpage(uri) -> Response:
return requests.get(uri)


def save_webpagecontent(r: Response, filename: str, chunk_size=8388608) -> None:
"""
This function saves the page retrieved by get_webpage.
r is the response from the call to requests.get.
filename is where we want to save the file to in the filesystem.
chunk_size is the number of bytes to write to disk in each chunk
"""

with open(filename, 'wb') as fd:
for chunk in r.iter_content(chunk_size):
fd.write(chunk)


config.py



uri = 'https://ia802308.us.archive.org/7/items/mobywordlists03201gut/CROSSWD.TXT'
filename = 'wordlist.txt'


I feel I made the most gains in my Python profiency as a result of implementing the changes suggested by Peilonrayz where I did away with intermediate function calls and variables and by working on the suggestion by BruceWayne to add an event for failing to open the file. The file opening code turned out to be the most challenging. I wasn't able to get `opened_w_error() working exactly based on the example from PEP343. Figuring it out was very rewarding.



moby.py



import download_file as df
import config as cfg
from contextlib import contextmanager
from typing import List

filename = cfg.filename
uri = cfg.uri

@contextmanager
def opened_w_error(filename, mode="r"):
try:
f = open(filename, mode)
except OSError as err:
yield None, err
else:
try:
yield f, None
finally:
f.close()


def read_wordlist(filename: str) -> List[str]:
with opened_w_error(filename, 'r') as (fd, err):
if type(err) == FileNotFoundError:
df.save_webpagecontent(df.get_webpage(uri), filename) #since it failed the first time we need to actually download it
with opened_w_error(filename, 'r') as (fd, err): # if it fails again abort
if err:
print("OSError:", err)
else:
return fd.readlines()
else:
return fd.readlines()


def print_mylist(wordlist: List[str]) -> None:
print('n'.join(word.strip() for word in wordlist))


print_mylist(read_wordlist(filename)[:50])


Thank you to everyone, especially Roland Illig, Hans-Martin Mosner, and Mast for all your help and encouragement and a safe place to learn!






share|improve this answer











$endgroup$















    Your Answer






    StackExchange.ifUsing("editor", function ()
    StackExchange.using("externalEditor", function ()
    StackExchange.using("snippets", function ()
    StackExchange.snippets.init();
    );
    );
    , "code-snippets");

    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "196"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: false,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    imageUploader:
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    ,
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );













    draft saved

    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f221830%2fcode-downloads-a-text-file-from-a-website-saves-it-to-local-disk-and-then-load%23new-answer', 'question_page');

    );

    Post as a guest















    Required, but never shown

























    3 Answers
    3






    active

    oldest

    votes








    3 Answers
    3






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    7












    $begingroup$

    Your code is nice and concise however there are some changes you can make:



    1. You can just return f.readlines() in make_wordlist.


    2. If you've done this to show that the result is a list then it'd be better to use the typing module.



      from typing import List


      def make_wordlist(filename: str) -> List[str]:
      ...



    3. get_mylist can be replaced with wordlist[:numlines]. This is because if len(wordlist) is smaller or equal to numlines, then it will return the entire thing anyway.

    4. Performance wise it's best to use print('n'.join(list)) rather than for item in list: print(item).

    5. I would prefer to be able to change chunk_size in save_webpagecontent and so you can make it a default argument.

    6. IIRC multi-line docstrings shouldn't start on the same line as the """, nor should they end on the same line either.

    import requests
    from typing import List

    Response = requests.Response


    def get_webpage(uri) -> Response:
    return requests.get(uri)


    def save_webpagecontent(r: Response, filename: str,
    chunk_size: int=8388608) -> None:
    """
    This function saves the page retrieved by get_webpage. r is the
    response from the call to requests.get and
    filename is where we want to save the file to in the filesystem.
    """
    with open(filename, 'wb') as fd:
    for chunk in r.iter_content(chunk_size):
    fd.write(chunk)


    def read_wordlist(filename: str) -> List[str]:
    with open(filename) as fd:
    return fd.readlines()


    def print_mylist(word_list: List[str]) -> None:
    print('n'.join(word.strip() for word in word_list))


    """
    List of words collected and contributed to the public domain by
    Grady Ward as part of the Moby lexicon project. See https://en.wikipedia.org/wiki/Moby_Project
    """
    uri = 'https://ia802308.us.archive.org/7/items/mobywordlists03201gut/CROSSWD.TXT'
    filename = 'wordlist.txt'

    r = get_webpage(uri)
    save_webpagecontent(r, filename)
    print_mylist(read_wordlist(filename)[:10])





    share|improve this answer











    $endgroup$








    • 1




      $begingroup$
      requests.get() returns a Response object, not a Request.
      $endgroup$
      – Lukasz Salitra
      Jun 7 at 9:51






    • 1




      $begingroup$
      @LukaszSalitra Thank you, I've updated the code with that. Thought I read the docs correctly :/
      $endgroup$
      – Peilonrayz
      Jun 7 at 9:54










    • $begingroup$
      As per PEP257 multi-line docstrings can start either right after the opening """ or on the line below.
      $endgroup$
      – Mathias Ettinger
      Jun 7 at 13:35






    • 1




      $begingroup$
      @DuaneWhitty It's 100% optional. They don't make the code faster. If you use one of mypy, pyright, pyre or pytype then they can perform static code analysis which should make your code safer.
      $endgroup$
      – Peilonrayz
      Jun 7 at 18:27






    • 1




      $begingroup$
      @DuaneWhitty No problem to use this text you need to write `this text`. Feel free to try that out here. You also can't put newlines in comments. (They would look like a mess) Yeah looks like you're correct.
      $endgroup$
      – Peilonrayz
      Jun 7 at 22:38
















    7












    $begingroup$

    Your code is nice and concise however there are some changes you can make:



    1. You can just return f.readlines() in make_wordlist.


    2. If you've done this to show that the result is a list then it'd be better to use the typing module.



      from typing import List


      def make_wordlist(filename: str) -> List[str]:
      ...



    3. get_mylist can be replaced with wordlist[:numlines]. This is because if len(wordlist) is smaller or equal to numlines, then it will return the entire thing anyway.

    4. Performance wise it's best to use print('n'.join(list)) rather than for item in list: print(item).

    5. I would prefer to be able to change chunk_size in save_webpagecontent and so you can make it a default argument.

    6. IIRC multi-line docstrings shouldn't start on the same line as the """, nor should they end on the same line either.

    import requests
    from typing import List

    Response = requests.Response


    def get_webpage(uri) -> Response:
    return requests.get(uri)


    def save_webpagecontent(r: Response, filename: str,
    chunk_size: int=8388608) -> None:
    """
    This function saves the page retrieved by get_webpage. r is the
    response from the call to requests.get and
    filename is where we want to save the file to in the filesystem.
    """
    with open(filename, 'wb') as fd:
    for chunk in r.iter_content(chunk_size):
    fd.write(chunk)


    def read_wordlist(filename: str) -> List[str]:
    with open(filename) as fd:
    return fd.readlines()


    def print_mylist(word_list: List[str]) -> None:
    print('n'.join(word.strip() for word in word_list))


    """
    List of words collected and contributed to the public domain by
    Grady Ward as part of the Moby lexicon project. See https://en.wikipedia.org/wiki/Moby_Project
    """
    uri = 'https://ia802308.us.archive.org/7/items/mobywordlists03201gut/CROSSWD.TXT'
    filename = 'wordlist.txt'

    r = get_webpage(uri)
    save_webpagecontent(r, filename)
    print_mylist(read_wordlist(filename)[:10])





    share|improve this answer











    $endgroup$








    • 1




      $begingroup$
      requests.get() returns a Response object, not a Request.
      $endgroup$
      – Lukasz Salitra
      Jun 7 at 9:51






    • 1




      $begingroup$
      @LukaszSalitra Thank you, I've updated the code with that. Thought I read the docs correctly :/
      $endgroup$
      – Peilonrayz
      Jun 7 at 9:54










    • $begingroup$
      As per PEP257 multi-line docstrings can start either right after the opening """ or on the line below.
      $endgroup$
      – Mathias Ettinger
      Jun 7 at 13:35






    • 1




      $begingroup$
      @DuaneWhitty It's 100% optional. They don't make the code faster. If you use one of mypy, pyright, pyre or pytype then they can perform static code analysis which should make your code safer.
      $endgroup$
      – Peilonrayz
      Jun 7 at 18:27






    • 1




      $begingroup$
      @DuaneWhitty No problem to use this text you need to write `this text`. Feel free to try that out here. You also can't put newlines in comments. (They would look like a mess) Yeah looks like you're correct.
      $endgroup$
      – Peilonrayz
      Jun 7 at 22:38














    7












    7








    7





    $begingroup$

    Your code is nice and concise however there are some changes you can make:



    1. You can just return f.readlines() in make_wordlist.


    2. If you've done this to show that the result is a list then it'd be better to use the typing module.



      from typing import List


      def make_wordlist(filename: str) -> List[str]:
      ...



    3. get_mylist can be replaced with wordlist[:numlines]. This is because if len(wordlist) is smaller or equal to numlines, then it will return the entire thing anyway.

    4. Performance wise it's best to use print('n'.join(list)) rather than for item in list: print(item).

    5. I would prefer to be able to change chunk_size in save_webpagecontent and so you can make it a default argument.

    6. IIRC multi-line docstrings shouldn't start on the same line as the """, nor should they end on the same line either.

    import requests
    from typing import List

    Response = requests.Response


    def get_webpage(uri) -> Response:
    return requests.get(uri)


    def save_webpagecontent(r: Response, filename: str,
    chunk_size: int=8388608) -> None:
    """
    This function saves the page retrieved by get_webpage. r is the
    response from the call to requests.get and
    filename is where we want to save the file to in the filesystem.
    """
    with open(filename, 'wb') as fd:
    for chunk in r.iter_content(chunk_size):
    fd.write(chunk)


    def read_wordlist(filename: str) -> List[str]:
    with open(filename) as fd:
    return fd.readlines()


    def print_mylist(word_list: List[str]) -> None:
    print('n'.join(word.strip() for word in word_list))


    """
    List of words collected and contributed to the public domain by
    Grady Ward as part of the Moby lexicon project. See https://en.wikipedia.org/wiki/Moby_Project
    """
    uri = 'https://ia802308.us.archive.org/7/items/mobywordlists03201gut/CROSSWD.TXT'
    filename = 'wordlist.txt'

    r = get_webpage(uri)
    save_webpagecontent(r, filename)
    print_mylist(read_wordlist(filename)[:10])





    share|improve this answer











    $endgroup$



    Your code is nice and concise however there are some changes you can make:



    1. You can just return f.readlines() in make_wordlist.


    2. If you've done this to show that the result is a list then it'd be better to use the typing module.



      from typing import List


      def make_wordlist(filename: str) -> List[str]:
      ...



    3. get_mylist can be replaced with wordlist[:numlines]. This is because if len(wordlist) is smaller or equal to numlines, then it will return the entire thing anyway.

    4. Performance wise it's best to use print('n'.join(list)) rather than for item in list: print(item).

    5. I would prefer to be able to change chunk_size in save_webpagecontent and so you can make it a default argument.

    6. IIRC multi-line docstrings shouldn't start on the same line as the """, nor should they end on the same line either.

    import requests
    from typing import List

    Response = requests.Response


    def get_webpage(uri) -> Response:
    return requests.get(uri)


    def save_webpagecontent(r: Response, filename: str,
    chunk_size: int=8388608) -> None:
    """
    This function saves the page retrieved by get_webpage. r is the
    response from the call to requests.get and
    filename is where we want to save the file to in the filesystem.
    """
    with open(filename, 'wb') as fd:
    for chunk in r.iter_content(chunk_size):
    fd.write(chunk)


    def read_wordlist(filename: str) -> List[str]:
    with open(filename) as fd:
    return fd.readlines()


    def print_mylist(word_list: List[str]) -> None:
    print('n'.join(word.strip() for word in word_list))


    """
    List of words collected and contributed to the public domain by
    Grady Ward as part of the Moby lexicon project. See https://en.wikipedia.org/wiki/Moby_Project
    """
    uri = 'https://ia802308.us.archive.org/7/items/mobywordlists03201gut/CROSSWD.TXT'
    filename = 'wordlist.txt'

    r = get_webpage(uri)
    save_webpagecontent(r, filename)
    print_mylist(read_wordlist(filename)[:10])






    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited Jun 7 at 13:09









    yuri

    4,1072 gold badges12 silver badges37 bronze badges




    4,1072 gold badges12 silver badges37 bronze badges










    answered Jun 7 at 9:43









    PeilonrayzPeilonrayz

    29.4k4 gold badges45 silver badges119 bronze badges




    29.4k4 gold badges45 silver badges119 bronze badges







    • 1




      $begingroup$
      requests.get() returns a Response object, not a Request.
      $endgroup$
      – Lukasz Salitra
      Jun 7 at 9:51






    • 1




      $begingroup$
      @LukaszSalitra Thank you, I've updated the code with that. Thought I read the docs correctly :/
      $endgroup$
      – Peilonrayz
      Jun 7 at 9:54










    • $begingroup$
      As per PEP257 multi-line docstrings can start either right after the opening """ or on the line below.
      $endgroup$
      – Mathias Ettinger
      Jun 7 at 13:35






    • 1




      $begingroup$
      @DuaneWhitty It's 100% optional. They don't make the code faster. If you use one of mypy, pyright, pyre or pytype then they can perform static code analysis which should make your code safer.
      $endgroup$
      – Peilonrayz
      Jun 7 at 18:27






    • 1




      $begingroup$
      @DuaneWhitty No problem to use this text you need to write `this text`. Feel free to try that out here. You also can't put newlines in comments. (They would look like a mess) Yeah looks like you're correct.
      $endgroup$
      – Peilonrayz
      Jun 7 at 22:38













    • 1




      $begingroup$
      requests.get() returns a Response object, not a Request.
      $endgroup$
      – Lukasz Salitra
      Jun 7 at 9:51






    • 1




      $begingroup$
      @LukaszSalitra Thank you, I've updated the code with that. Thought I read the docs correctly :/
      $endgroup$
      – Peilonrayz
      Jun 7 at 9:54










    • $begingroup$
      As per PEP257 multi-line docstrings can start either right after the opening """ or on the line below.
      $endgroup$
      – Mathias Ettinger
      Jun 7 at 13:35






    • 1




      $begingroup$
      @DuaneWhitty It's 100% optional. They don't make the code faster. If you use one of mypy, pyright, pyre or pytype then they can perform static code analysis which should make your code safer.
      $endgroup$
      – Peilonrayz
      Jun 7 at 18:27






    • 1




      $begingroup$
      @DuaneWhitty No problem to use this text you need to write `this text`. Feel free to try that out here. You also can't put newlines in comments. (They would look like a mess) Yeah looks like you're correct.
      $endgroup$
      – Peilonrayz
      Jun 7 at 22:38








    1




    1




    $begingroup$
    requests.get() returns a Response object, not a Request.
    $endgroup$
    – Lukasz Salitra
    Jun 7 at 9:51




    $begingroup$
    requests.get() returns a Response object, not a Request.
    $endgroup$
    – Lukasz Salitra
    Jun 7 at 9:51




    1




    1




    $begingroup$
    @LukaszSalitra Thank you, I've updated the code with that. Thought I read the docs correctly :/
    $endgroup$
    – Peilonrayz
    Jun 7 at 9:54




    $begingroup$
    @LukaszSalitra Thank you, I've updated the code with that. Thought I read the docs correctly :/
    $endgroup$
    – Peilonrayz
    Jun 7 at 9:54












    $begingroup$
    As per PEP257 multi-line docstrings can start either right after the opening """ or on the line below.
    $endgroup$
    – Mathias Ettinger
    Jun 7 at 13:35




    $begingroup$
    As per PEP257 multi-line docstrings can start either right after the opening """ or on the line below.
    $endgroup$
    – Mathias Ettinger
    Jun 7 at 13:35




    1




    1




    $begingroup$
    @DuaneWhitty It's 100% optional. They don't make the code faster. If you use one of mypy, pyright, pyre or pytype then they can perform static code analysis which should make your code safer.
    $endgroup$
    – Peilonrayz
    Jun 7 at 18:27




    $begingroup$
    @DuaneWhitty It's 100% optional. They don't make the code faster. If you use one of mypy, pyright, pyre or pytype then they can perform static code analysis which should make your code safer.
    $endgroup$
    – Peilonrayz
    Jun 7 at 18:27




    1




    1




    $begingroup$
    @DuaneWhitty No problem to use this text you need to write `this text`. Feel free to try that out here. You also can't put newlines in comments. (They would look like a mess) Yeah looks like you're correct.
    $endgroup$
    – Peilonrayz
    Jun 7 at 22:38





    $begingroup$
    @DuaneWhitty No problem to use this text you need to write `this text`. Feel free to try that out here. You also can't put newlines in comments. (They would look like a mess) Yeah looks like you're correct.
    $endgroup$
    – Peilonrayz
    Jun 7 at 22:38














    3












    $begingroup$

    Your program is concise and well-readable. One thing that is probably less pythonic is storing the received data in a file. If you have no further use for the file, you could just process the data into a wordlist while receiving it. This saves one intermediate step, and your program would not leave a residual wordlist.txt file around.






    share|improve this answer









    $endgroup$








    • 3




      $begingroup$
      Besides the gains mentioned, file I/O is expensive CPU-wise. However, it's unclear at the moment whether the further processing of the words ("and then create a list to hold each line of the text file for later processing") is going to be done by the same Python program or a different program altogether. Saving to file as an intermediary step for safety/redundancy/whatever reasons is still a good reason to keep it around.
      $endgroup$
      – Mast
      Jun 7 at 10:11










    • $begingroup$
      Thank you both. I saved the file for two reasons: 1) I knew that I would want to be accessing the file many times over and over to try different analysis techniques as my knowledge grows. I'll be running word length analysis and letter frequency analysis and then trying out different ways to display the results. I thought it would be more efficient that downloading it many times over and over and also more polite to the site I'm downloading from. 2) I am thinking about how I should write programs in case I have less than ideal network conditions. Thanks again!
      $endgroup$
      – Duane Whitty
      Jun 7 at 17:27










    • $begingroup$
      One way to download the file only once would be to check for its existence and only download it if the file is not present. That's basically a simple cache without an invalidation policy, and you can manually invalidate the cache by removing the file. Alternatively, you could write separate download and analysis programs, which would help to avoid code duplication with the associated risk of having slightly different versions of the download code in each program.
      $endgroup$
      – Hans-Martin Mosner
      Jun 7 at 20:48










    • $begingroup$
      Thank you to everyone for the reviews. I have made some big updates to my code using the suggestions I received here. Should I now open another request for a code review or should I answer my question with my updated code? The changes include splitting the file handling into another module and using contextmanager to handle file not found errors.
      $endgroup$
      – Duane Whitty
      Jun 8 at 3:11















    3












    $begingroup$

    Your program is concise and well-readable. One thing that is probably less pythonic is storing the received data in a file. If you have no further use for the file, you could just process the data into a wordlist while receiving it. This saves one intermediate step, and your program would not leave a residual wordlist.txt file around.






    share|improve this answer









    $endgroup$








    • 3




      $begingroup$
      Besides the gains mentioned, file I/O is expensive CPU-wise. However, it's unclear at the moment whether the further processing of the words ("and then create a list to hold each line of the text file for later processing") is going to be done by the same Python program or a different program altogether. Saving to file as an intermediary step for safety/redundancy/whatever reasons is still a good reason to keep it around.
      $endgroup$
      – Mast
      Jun 7 at 10:11










    • $begingroup$
      Thank you both. I saved the file for two reasons: 1) I knew that I would want to be accessing the file many times over and over to try different analysis techniques as my knowledge grows. I'll be running word length analysis and letter frequency analysis and then trying out different ways to display the results. I thought it would be more efficient that downloading it many times over and over and also more polite to the site I'm downloading from. 2) I am thinking about how I should write programs in case I have less than ideal network conditions. Thanks again!
      $endgroup$
      – Duane Whitty
      Jun 7 at 17:27










    • $begingroup$
      One way to download the file only once would be to check for its existence and only download it if the file is not present. That's basically a simple cache without an invalidation policy, and you can manually invalidate the cache by removing the file. Alternatively, you could write separate download and analysis programs, which would help to avoid code duplication with the associated risk of having slightly different versions of the download code in each program.
      $endgroup$
      – Hans-Martin Mosner
      Jun 7 at 20:48










    • $begingroup$
      Thank you to everyone for the reviews. I have made some big updates to my code using the suggestions I received here. Should I now open another request for a code review or should I answer my question with my updated code? The changes include splitting the file handling into another module and using contextmanager to handle file not found errors.
      $endgroup$
      – Duane Whitty
      Jun 8 at 3:11













    3












    3








    3





    $begingroup$

    Your program is concise and well-readable. One thing that is probably less pythonic is storing the received data in a file. If you have no further use for the file, you could just process the data into a wordlist while receiving it. This saves one intermediate step, and your program would not leave a residual wordlist.txt file around.






    share|improve this answer









    $endgroup$



    Your program is concise and well-readable. One thing that is probably less pythonic is storing the received data in a file. If you have no further use for the file, you could just process the data into a wordlist while receiving it. This saves one intermediate step, and your program would not leave a residual wordlist.txt file around.







    share|improve this answer












    share|improve this answer



    share|improve this answer










    answered Jun 7 at 8:24









    Hans-Martin MosnerHans-Martin Mosner

    1512 bronze badges




    1512 bronze badges







    • 3




      $begingroup$
      Besides the gains mentioned, file I/O is expensive CPU-wise. However, it's unclear at the moment whether the further processing of the words ("and then create a list to hold each line of the text file for later processing") is going to be done by the same Python program or a different program altogether. Saving to file as an intermediary step for safety/redundancy/whatever reasons is still a good reason to keep it around.
      $endgroup$
      – Mast
      Jun 7 at 10:11










    • $begingroup$
      Thank you both. I saved the file for two reasons: 1) I knew that I would want to be accessing the file many times over and over to try different analysis techniques as my knowledge grows. I'll be running word length analysis and letter frequency analysis and then trying out different ways to display the results. I thought it would be more efficient that downloading it many times over and over and also more polite to the site I'm downloading from. 2) I am thinking about how I should write programs in case I have less than ideal network conditions. Thanks again!
      $endgroup$
      – Duane Whitty
      Jun 7 at 17:27










    • $begingroup$
      One way to download the file only once would be to check for its existence and only download it if the file is not present. That's basically a simple cache without an invalidation policy, and you can manually invalidate the cache by removing the file. Alternatively, you could write separate download and analysis programs, which would help to avoid code duplication with the associated risk of having slightly different versions of the download code in each program.
      $endgroup$
      – Hans-Martin Mosner
      Jun 7 at 20:48










    • $begingroup$
      Thank you to everyone for the reviews. I have made some big updates to my code using the suggestions I received here. Should I now open another request for a code review or should I answer my question with my updated code? The changes include splitting the file handling into another module and using contextmanager to handle file not found errors.
      $endgroup$
      – Duane Whitty
      Jun 8 at 3:11












    • 3




      $begingroup$
      Besides the gains mentioned, file I/O is expensive CPU-wise. However, it's unclear at the moment whether the further processing of the words ("and then create a list to hold each line of the text file for later processing") is going to be done by the same Python program or a different program altogether. Saving to file as an intermediary step for safety/redundancy/whatever reasons is still a good reason to keep it around.
      $endgroup$
      – Mast
      Jun 7 at 10:11










    • $begingroup$
      Thank you both. I saved the file for two reasons: 1) I knew that I would want to be accessing the file many times over and over to try different analysis techniques as my knowledge grows. I'll be running word length analysis and letter frequency analysis and then trying out different ways to display the results. I thought it would be more efficient that downloading it many times over and over and also more polite to the site I'm downloading from. 2) I am thinking about how I should write programs in case I have less than ideal network conditions. Thanks again!
      $endgroup$
      – Duane Whitty
      Jun 7 at 17:27










    • $begingroup$
      One way to download the file only once would be to check for its existence and only download it if the file is not present. That's basically a simple cache without an invalidation policy, and you can manually invalidate the cache by removing the file. Alternatively, you could write separate download and analysis programs, which would help to avoid code duplication with the associated risk of having slightly different versions of the download code in each program.
      $endgroup$
      – Hans-Martin Mosner
      Jun 7 at 20:48










    • $begingroup$
      Thank you to everyone for the reviews. I have made some big updates to my code using the suggestions I received here. Should I now open another request for a code review or should I answer my question with my updated code? The changes include splitting the file handling into another module and using contextmanager to handle file not found errors.
      $endgroup$
      – Duane Whitty
      Jun 8 at 3:11







    3




    3




    $begingroup$
    Besides the gains mentioned, file I/O is expensive CPU-wise. However, it's unclear at the moment whether the further processing of the words ("and then create a list to hold each line of the text file for later processing") is going to be done by the same Python program or a different program altogether. Saving to file as an intermediary step for safety/redundancy/whatever reasons is still a good reason to keep it around.
    $endgroup$
    – Mast
    Jun 7 at 10:11




    $begingroup$
    Besides the gains mentioned, file I/O is expensive CPU-wise. However, it's unclear at the moment whether the further processing of the words ("and then create a list to hold each line of the text file for later processing") is going to be done by the same Python program or a different program altogether. Saving to file as an intermediary step for safety/redundancy/whatever reasons is still a good reason to keep it around.
    $endgroup$
    – Mast
    Jun 7 at 10:11












    $begingroup$
    Thank you both. I saved the file for two reasons: 1) I knew that I would want to be accessing the file many times over and over to try different analysis techniques as my knowledge grows. I'll be running word length analysis and letter frequency analysis and then trying out different ways to display the results. I thought it would be more efficient that downloading it many times over and over and also more polite to the site I'm downloading from. 2) I am thinking about how I should write programs in case I have less than ideal network conditions. Thanks again!
    $endgroup$
    – Duane Whitty
    Jun 7 at 17:27




    $begingroup$
    Thank you both. I saved the file for two reasons: 1) I knew that I would want to be accessing the file many times over and over to try different analysis techniques as my knowledge grows. I'll be running word length analysis and letter frequency analysis and then trying out different ways to display the results. I thought it would be more efficient that downloading it many times over and over and also more polite to the site I'm downloading from. 2) I am thinking about how I should write programs in case I have less than ideal network conditions. Thanks again!
    $endgroup$
    – Duane Whitty
    Jun 7 at 17:27












    $begingroup$
    One way to download the file only once would be to check for its existence and only download it if the file is not present. That's basically a simple cache without an invalidation policy, and you can manually invalidate the cache by removing the file. Alternatively, you could write separate download and analysis programs, which would help to avoid code duplication with the associated risk of having slightly different versions of the download code in each program.
    $endgroup$
    – Hans-Martin Mosner
    Jun 7 at 20:48




    $begingroup$
    One way to download the file only once would be to check for its existence and only download it if the file is not present. That's basically a simple cache without an invalidation policy, and you can manually invalidate the cache by removing the file. Alternatively, you could write separate download and analysis programs, which would help to avoid code duplication with the associated risk of having slightly different versions of the download code in each program.
    $endgroup$
    – Hans-Martin Mosner
    Jun 7 at 20:48












    $begingroup$
    Thank you to everyone for the reviews. I have made some big updates to my code using the suggestions I received here. Should I now open another request for a code review or should I answer my question with my updated code? The changes include splitting the file handling into another module and using contextmanager to handle file not found errors.
    $endgroup$
    – Duane Whitty
    Jun 8 at 3:11




    $begingroup$
    Thank you to everyone for the reviews. I have made some big updates to my code using the suggestions I received here. Should I now open another request for a code review or should I answer my question with my updated code? The changes include splitting the file handling into another module and using contextmanager to handle file not found errors.
    $endgroup$
    – Duane Whitty
    Jun 8 at 3:11











    1












    $begingroup$

    Code downloads a text file from a website, saves it to local disk, and then loads it into a list for further processing - Version 2.0



    In my new version of this code I have separated my code into 3 modules (my start at a 12 factor app):



    download.py for handling downloading the text file from the website and saving it as a file to local storage;



    config.py for specifying the URI of the website and the filename for local storage;



    moby.py is the actual code that reads the words in the text file, 1 per line, into a list. For now all it does is prints out the words from the file, one per line.



    The review my code received provided valuable suggestions on how it could be made more Pythonic, more modular, and more efficient.



    Motivated by Hans-Martin Mosner to separate the file download code here is that module. Also made the chunk_size a parameter to the save_webpagecontent() function based on as suggested by Peilonrayz



    download.py



    import requests
    from typing import List

    Response = requests.Response

    def get_webpage(uri) -> Response:
    return requests.get(uri)


    def save_webpagecontent(r: Response, filename: str, chunk_size=8388608) -> None:
    """
    This function saves the page retrieved by get_webpage.
    r is the response from the call to requests.get.
    filename is where we want to save the file to in the filesystem.
    chunk_size is the number of bytes to write to disk in each chunk
    """

    with open(filename, 'wb') as fd:
    for chunk in r.iter_content(chunk_size):
    fd.write(chunk)


    config.py



    uri = 'https://ia802308.us.archive.org/7/items/mobywordlists03201gut/CROSSWD.TXT'
    filename = 'wordlist.txt'


    I feel I made the most gains in my Python profiency as a result of implementing the changes suggested by Peilonrayz where I did away with intermediate function calls and variables and by working on the suggestion by BruceWayne to add an event for failing to open the file. The file opening code turned out to be the most challenging. I wasn't able to get `opened_w_error() working exactly based on the example from PEP343. Figuring it out was very rewarding.



    moby.py



    import download_file as df
    import config as cfg
    from contextlib import contextmanager
    from typing import List

    filename = cfg.filename
    uri = cfg.uri

    @contextmanager
    def opened_w_error(filename, mode="r"):
    try:
    f = open(filename, mode)
    except OSError as err:
    yield None, err
    else:
    try:
    yield f, None
    finally:
    f.close()


    def read_wordlist(filename: str) -> List[str]:
    with opened_w_error(filename, 'r') as (fd, err):
    if type(err) == FileNotFoundError:
    df.save_webpagecontent(df.get_webpage(uri), filename) #since it failed the first time we need to actually download it
    with opened_w_error(filename, 'r') as (fd, err): # if it fails again abort
    if err:
    print("OSError:", err)
    else:
    return fd.readlines()
    else:
    return fd.readlines()


    def print_mylist(wordlist: List[str]) -> None:
    print('n'.join(word.strip() for word in wordlist))


    print_mylist(read_wordlist(filename)[:50])


    Thank you to everyone, especially Roland Illig, Hans-Martin Mosner, and Mast for all your help and encouragement and a safe place to learn!






    share|improve this answer











    $endgroup$

















      1












      $begingroup$

      Code downloads a text file from a website, saves it to local disk, and then loads it into a list for further processing - Version 2.0



      In my new version of this code I have separated my code into 3 modules (my start at a 12 factor app):



      download.py for handling downloading the text file from the website and saving it as a file to local storage;



      config.py for specifying the URI of the website and the filename for local storage;



      moby.py is the actual code that reads the words in the text file, 1 per line, into a list. For now all it does is prints out the words from the file, one per line.



      The review my code received provided valuable suggestions on how it could be made more Pythonic, more modular, and more efficient.



      Motivated by Hans-Martin Mosner to separate the file download code here is that module. Also made the chunk_size a parameter to the save_webpagecontent() function based on as suggested by Peilonrayz



      download.py



      import requests
      from typing import List

      Response = requests.Response

      def get_webpage(uri) -> Response:
      return requests.get(uri)


      def save_webpagecontent(r: Response, filename: str, chunk_size=8388608) -> None:
      """
      This function saves the page retrieved by get_webpage.
      r is the response from the call to requests.get.
      filename is where we want to save the file to in the filesystem.
      chunk_size is the number of bytes to write to disk in each chunk
      """

      with open(filename, 'wb') as fd:
      for chunk in r.iter_content(chunk_size):
      fd.write(chunk)


      config.py



      uri = 'https://ia802308.us.archive.org/7/items/mobywordlists03201gut/CROSSWD.TXT'
      filename = 'wordlist.txt'


      I feel I made the most gains in my Python profiency as a result of implementing the changes suggested by Peilonrayz where I did away with intermediate function calls and variables and by working on the suggestion by BruceWayne to add an event for failing to open the file. The file opening code turned out to be the most challenging. I wasn't able to get `opened_w_error() working exactly based on the example from PEP343. Figuring it out was very rewarding.



      moby.py



      import download_file as df
      import config as cfg
      from contextlib import contextmanager
      from typing import List

      filename = cfg.filename
      uri = cfg.uri

      @contextmanager
      def opened_w_error(filename, mode="r"):
      try:
      f = open(filename, mode)
      except OSError as err:
      yield None, err
      else:
      try:
      yield f, None
      finally:
      f.close()


      def read_wordlist(filename: str) -> List[str]:
      with opened_w_error(filename, 'r') as (fd, err):
      if type(err) == FileNotFoundError:
      df.save_webpagecontent(df.get_webpage(uri), filename) #since it failed the first time we need to actually download it
      with opened_w_error(filename, 'r') as (fd, err): # if it fails again abort
      if err:
      print("OSError:", err)
      else:
      return fd.readlines()
      else:
      return fd.readlines()


      def print_mylist(wordlist: List[str]) -> None:
      print('n'.join(word.strip() for word in wordlist))


      print_mylist(read_wordlist(filename)[:50])


      Thank you to everyone, especially Roland Illig, Hans-Martin Mosner, and Mast for all your help and encouragement and a safe place to learn!






      share|improve this answer











      $endgroup$















        1












        1








        1





        $begingroup$

        Code downloads a text file from a website, saves it to local disk, and then loads it into a list for further processing - Version 2.0



        In my new version of this code I have separated my code into 3 modules (my start at a 12 factor app):



        download.py for handling downloading the text file from the website and saving it as a file to local storage;



        config.py for specifying the URI of the website and the filename for local storage;



        moby.py is the actual code that reads the words in the text file, 1 per line, into a list. For now all it does is prints out the words from the file, one per line.



        The review my code received provided valuable suggestions on how it could be made more Pythonic, more modular, and more efficient.



        Motivated by Hans-Martin Mosner to separate the file download code here is that module. Also made the chunk_size a parameter to the save_webpagecontent() function based on as suggested by Peilonrayz



        download.py



        import requests
        from typing import List

        Response = requests.Response

        def get_webpage(uri) -> Response:
        return requests.get(uri)


        def save_webpagecontent(r: Response, filename: str, chunk_size=8388608) -> None:
        """
        This function saves the page retrieved by get_webpage.
        r is the response from the call to requests.get.
        filename is where we want to save the file to in the filesystem.
        chunk_size is the number of bytes to write to disk in each chunk
        """

        with open(filename, 'wb') as fd:
        for chunk in r.iter_content(chunk_size):
        fd.write(chunk)


        config.py



        uri = 'https://ia802308.us.archive.org/7/items/mobywordlists03201gut/CROSSWD.TXT'
        filename = 'wordlist.txt'


        I feel I made the most gains in my Python profiency as a result of implementing the changes suggested by Peilonrayz where I did away with intermediate function calls and variables and by working on the suggestion by BruceWayne to add an event for failing to open the file. The file opening code turned out to be the most challenging. I wasn't able to get `opened_w_error() working exactly based on the example from PEP343. Figuring it out was very rewarding.



        moby.py



        import download_file as df
        import config as cfg
        from contextlib import contextmanager
        from typing import List

        filename = cfg.filename
        uri = cfg.uri

        @contextmanager
        def opened_w_error(filename, mode="r"):
        try:
        f = open(filename, mode)
        except OSError as err:
        yield None, err
        else:
        try:
        yield f, None
        finally:
        f.close()


        def read_wordlist(filename: str) -> List[str]:
        with opened_w_error(filename, 'r') as (fd, err):
        if type(err) == FileNotFoundError:
        df.save_webpagecontent(df.get_webpage(uri), filename) #since it failed the first time we need to actually download it
        with opened_w_error(filename, 'r') as (fd, err): # if it fails again abort
        if err:
        print("OSError:", err)
        else:
        return fd.readlines()
        else:
        return fd.readlines()


        def print_mylist(wordlist: List[str]) -> None:
        print('n'.join(word.strip() for word in wordlist))


        print_mylist(read_wordlist(filename)[:50])


        Thank you to everyone, especially Roland Illig, Hans-Martin Mosner, and Mast for all your help and encouragement and a safe place to learn!






        share|improve this answer











        $endgroup$



        Code downloads a text file from a website, saves it to local disk, and then loads it into a list for further processing - Version 2.0



        In my new version of this code I have separated my code into 3 modules (my start at a 12 factor app):



        download.py for handling downloading the text file from the website and saving it as a file to local storage;



        config.py for specifying the URI of the website and the filename for local storage;



        moby.py is the actual code that reads the words in the text file, 1 per line, into a list. For now all it does is prints out the words from the file, one per line.



        The review my code received provided valuable suggestions on how it could be made more Pythonic, more modular, and more efficient.



        Motivated by Hans-Martin Mosner to separate the file download code here is that module. Also made the chunk_size a parameter to the save_webpagecontent() function based on as suggested by Peilonrayz



        download.py



        import requests
        from typing import List

        Response = requests.Response

        def get_webpage(uri) -> Response:
        return requests.get(uri)


        def save_webpagecontent(r: Response, filename: str, chunk_size=8388608) -> None:
        """
        This function saves the page retrieved by get_webpage.
        r is the response from the call to requests.get.
        filename is where we want to save the file to in the filesystem.
        chunk_size is the number of bytes to write to disk in each chunk
        """

        with open(filename, 'wb') as fd:
        for chunk in r.iter_content(chunk_size):
        fd.write(chunk)


        config.py



        uri = 'https://ia802308.us.archive.org/7/items/mobywordlists03201gut/CROSSWD.TXT'
        filename = 'wordlist.txt'


        I feel I made the most gains in my Python profiency as a result of implementing the changes suggested by Peilonrayz where I did away with intermediate function calls and variables and by working on the suggestion by BruceWayne to add an event for failing to open the file. The file opening code turned out to be the most challenging. I wasn't able to get `opened_w_error() working exactly based on the example from PEP343. Figuring it out was very rewarding.



        moby.py



        import download_file as df
        import config as cfg
        from contextlib import contextmanager
        from typing import List

        filename = cfg.filename
        uri = cfg.uri

        @contextmanager
        def opened_w_error(filename, mode="r"):
        try:
        f = open(filename, mode)
        except OSError as err:
        yield None, err
        else:
        try:
        yield f, None
        finally:
        f.close()


        def read_wordlist(filename: str) -> List[str]:
        with opened_w_error(filename, 'r') as (fd, err):
        if type(err) == FileNotFoundError:
        df.save_webpagecontent(df.get_webpage(uri), filename) #since it failed the first time we need to actually download it
        with opened_w_error(filename, 'r') as (fd, err): # if it fails again abort
        if err:
        print("OSError:", err)
        else:
        return fd.readlines()
        else:
        return fd.readlines()


        def print_mylist(wordlist: List[str]) -> None:
        print('n'.join(word.strip() for word in wordlist))


        print_mylist(read_wordlist(filename)[:50])


        Thank you to everyone, especially Roland Illig, Hans-Martin Mosner, and Mast for all your help and encouragement and a safe place to learn!







        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited Jun 8 at 4:54

























        answered Jun 8 at 4:40









        Duane WhittyDuane Whitty

        687 bronze badges




        687 bronze badges



























            draft saved

            draft discarded
















































            Thanks for contributing an answer to Code Review Stack Exchange!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid


            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.

            Use MathJax to format equations. MathJax reference.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f221830%2fcode-downloads-a-text-file-from-a-website-saves-it-to-local-disk-and-then-load%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            How to write a 12-bar blues melodyI-IV-V blues progressionHow to play the bridges in a standard blues progressionHow does Gdim7 fit in C# minor?question on a certain chord progressionMusicology of Melody12 bar blues, spread rhythm: alternative to 6th chord to avoid finger stretchChord progressions/ Root key/ MelodiesHow to put chords (POP-EDM) under a given lead vocal melody (starting from a good knowledge in music theory)Are there “rules” for improvising with the minor pentatonic scale over 12-bar shuffle?Confusion about blues scale and chords

            What if the end-user didn't have the required library?What is setup.py?What is a clean, pythonic way to have multiple constructors in Python?What does Ruby have that Python doesn't, and vice versa?What is the reason for having '//' in Python?How do I create a namespace package in Python?How to package shared objects that python modules depend on?setuptools vs. distutils: why is distutils still a thing?Navigation in Windows 10 vs code not going to virtualenv library when the same library is installed at user levelPython create package for local usePackaging a project that uses multiple python versionsWhy is permission denied on pip install except for when “--user” is included at end of command?

            Esgonzo ibérico Índice Descrición Distribución Hábitat Ameazas Notas Véxase tamén "Acerca dos nomes dos anfibios e réptiles galegos""Chalcides bedriagai"Chalcides bedriagai en Carrascal, L. M. Salvador, A. (Eds). Enciclopedia virtual de los vertebrados españoles. Museo Nacional de Ciencias Naturales, Madrid. España.Fotos