Code downloads a text file from a website, saves it to local disk, and then loads it into a list for further processingSplitting a CSV file with headersMaking a list from user inputSplitting plain text dictionary data to multiple files, round 2.txt word-counterReusable, generic save methodFile copy in Python for slow networksFile copy in Python for slow networks (version 2)Reading a DHT11 to a fileMad Libs excerciseMinimising latency on a pyautogui responce when a given pixel is seen
Did Karl Marx ever use any example that involved cotton and dollars to illustrate the way capital and surplus value were generated?
Are all instances of trolls turning to stone ultimately references back to Tolkien?
How to get cool night-vision without lame drawbacks?
Was there ever a name for the weapons of the Others?
Change CPU MHz from Registry
Would a two-seat light aircaft with a landing speed of 20 knots and a top speed of 180 knots be technically possible?
Has there been any indication at all that further negotiation between the UK and EU is possible?
First-year PhD giving a talk among well-established researchers in the field
Distance Matrix (plugin) - QGIS
Does Marvel have an equivalent of the Green Lantern?
How precise do models need to be for 3d printing?
Can White Castle?
Inverse-quotes-quine
Why do textbooks often include the solutions to odd or even numbered problems but not both?
Hot coffee brewing solutions for deep woods camping
Can the negators "jamais, rien, personne, plus, ni, aucun" be used in a single sentence?
Why is there no havdallah when going from Yom Tov into Shabbat?
STM Microcontroller burns every time
Should I hide continue button until tasks are completed?
Why do some games show lights shine through walls?
Intuitively, why does putting capacitors in series decrease the equivalent capacitance?
Is there a maximum distance from a planet that a moon can orbit?
Change the boot order with no option in UEFI settings
How to add multiple ip address in destination ip in acl rule
Code downloads a text file from a website, saves it to local disk, and then loads it into a list for further processing
Splitting a CSV file with headersMaking a list from user inputSplitting plain text dictionary data to multiple files, round 2.txt word-counterReusable, generic save methodFile copy in Python for slow networksFile copy in Python for slow networks (version 2)Reading a DHT11 to a fileMad Libs excerciseMinimising latency on a pyautogui responce when a given pixel is seen
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;
$begingroup$
My program opens a website and downloads a text file. The text file is a simple file with one word per line. I save the file to local disk and then create a list to hold each line of the text file for later processing. I would like to know if I am doing these first steps in a way that would be considered idiomatic Python and have I made any big mistakes that will hamper my efforts to expand on it later.
This is similar to an exercise in Think Python by Allen Downey. He suggests using a browser to download the text file but I wanted to do it with Python.
import requests
def get_webpage(uri):
return requests.get(uri)
def save_webpagecontent(r, filename):
""" This function saves the page retrieved by get_webpage. r is the
response from the call to requests.get and
filename is where we want to save the file to in the filesystem."""
chunk_size = 8388608 # number of bytes to write to disk in each chunk
with open(filename, 'wb') as fd:
for chunk in r.iter_content(chunk_size):
fd.write(chunk)
def make_wordlist(filename):
wordlist = []
with open(filename) as fd:
wordlist = fd.readlines()
return wordlist
def get_mylist(wordlist, num_lines=10):
if len(wordlist) <= num_lines:
return wordlist
return wordlist[:num_lines]
def print_mylist(mylist):
for word in mylist:
print(word.strip())
return None
"""List of words collected and contributed to the public domain by
Grady Ward as part of the Moby lexicon project. See https://en.wikipedia.org/wiki/Moby_Project
"""
uri = 'https://ia802308.us.archive.org/7/items/mobywordlists03201gut/CROSSWD.TXT'
filename = 'wordlist.txt'
r = get_webpage(uri)
save_webpagecontent(r, filename)
wordlist = make_wordlist(filename)
mylist = get_mylist(wordlist)
print_mylist(mylist)
My program works as I expect it to. I have basically found how to do each individual piece by reading this forum and but I would like to know if I'm putting all the pieces together correctly. By correctly I mean something that not only functions as expected but also will be easy to build larger programs and modules from.
I hope it isn't wrong of me to post this much code. I wasn't sure how I could trim it down and still show what I am doing. Please let me know if I need to change the format of my question.
python array file
$endgroup$
add a comment |
$begingroup$
My program opens a website and downloads a text file. The text file is a simple file with one word per line. I save the file to local disk and then create a list to hold each line of the text file for later processing. I would like to know if I am doing these first steps in a way that would be considered idiomatic Python and have I made any big mistakes that will hamper my efforts to expand on it later.
This is similar to an exercise in Think Python by Allen Downey. He suggests using a browser to download the text file but I wanted to do it with Python.
import requests
def get_webpage(uri):
return requests.get(uri)
def save_webpagecontent(r, filename):
""" This function saves the page retrieved by get_webpage. r is the
response from the call to requests.get and
filename is where we want to save the file to in the filesystem."""
chunk_size = 8388608 # number of bytes to write to disk in each chunk
with open(filename, 'wb') as fd:
for chunk in r.iter_content(chunk_size):
fd.write(chunk)
def make_wordlist(filename):
wordlist = []
with open(filename) as fd:
wordlist = fd.readlines()
return wordlist
def get_mylist(wordlist, num_lines=10):
if len(wordlist) <= num_lines:
return wordlist
return wordlist[:num_lines]
def print_mylist(mylist):
for word in mylist:
print(word.strip())
return None
"""List of words collected and contributed to the public domain by
Grady Ward as part of the Moby lexicon project. See https://en.wikipedia.org/wiki/Moby_Project
"""
uri = 'https://ia802308.us.archive.org/7/items/mobywordlists03201gut/CROSSWD.TXT'
filename = 'wordlist.txt'
r = get_webpage(uri)
save_webpagecontent(r, filename)
wordlist = make_wordlist(filename)
mylist = get_mylist(wordlist)
print_mylist(mylist)
My program works as I expect it to. I have basically found how to do each individual piece by reading this forum and but I would like to know if I'm putting all the pieces together correctly. By correctly I mean something that not only functions as expected but also will be easy to build larger programs and modules from.
I hope it isn't wrong of me to post this much code. I wasn't sure how I could trim it down and still show what I am doing. Please let me know if I need to change the format of my question.
python array file
$endgroup$
3
$begingroup$
Your code looks good, and it is not really long. Here on Code Review it is accepted to sometimes post around 1000 lines, if that's necessary for understanding the code. When pasting your code, you made a small mistake with the indentation around thechunk_sizeline. After you repaired this, your question is in a perfect format for this site, especially since you explained in detail what code you wrote and why. That's something essential that several other questions are missing.
$endgroup$
– Roland Illig
Jun 7 at 6:12
1
$begingroup$
Thank you for your remarks. I feel more encouraged about the process now!
$endgroup$
– Duane Whitty
Jun 7 at 6:21
3
$begingroup$
Don't worry, the SE network can be confusing for new users. You're in The Good Place now.
$endgroup$
– Mast
Jun 7 at 6:29
$begingroup$
Might want to add an event in case the request can't be completed and you can't download the code.
$endgroup$
– BruceWayne
Jun 7 at 19:49
add a comment |
$begingroup$
My program opens a website and downloads a text file. The text file is a simple file with one word per line. I save the file to local disk and then create a list to hold each line of the text file for later processing. I would like to know if I am doing these first steps in a way that would be considered idiomatic Python and have I made any big mistakes that will hamper my efforts to expand on it later.
This is similar to an exercise in Think Python by Allen Downey. He suggests using a browser to download the text file but I wanted to do it with Python.
import requests
def get_webpage(uri):
return requests.get(uri)
def save_webpagecontent(r, filename):
""" This function saves the page retrieved by get_webpage. r is the
response from the call to requests.get and
filename is where we want to save the file to in the filesystem."""
chunk_size = 8388608 # number of bytes to write to disk in each chunk
with open(filename, 'wb') as fd:
for chunk in r.iter_content(chunk_size):
fd.write(chunk)
def make_wordlist(filename):
wordlist = []
with open(filename) as fd:
wordlist = fd.readlines()
return wordlist
def get_mylist(wordlist, num_lines=10):
if len(wordlist) <= num_lines:
return wordlist
return wordlist[:num_lines]
def print_mylist(mylist):
for word in mylist:
print(word.strip())
return None
"""List of words collected and contributed to the public domain by
Grady Ward as part of the Moby lexicon project. See https://en.wikipedia.org/wiki/Moby_Project
"""
uri = 'https://ia802308.us.archive.org/7/items/mobywordlists03201gut/CROSSWD.TXT'
filename = 'wordlist.txt'
r = get_webpage(uri)
save_webpagecontent(r, filename)
wordlist = make_wordlist(filename)
mylist = get_mylist(wordlist)
print_mylist(mylist)
My program works as I expect it to. I have basically found how to do each individual piece by reading this forum and but I would like to know if I'm putting all the pieces together correctly. By correctly I mean something that not only functions as expected but also will be easy to build larger programs and modules from.
I hope it isn't wrong of me to post this much code. I wasn't sure how I could trim it down and still show what I am doing. Please let me know if I need to change the format of my question.
python array file
$endgroup$
My program opens a website and downloads a text file. The text file is a simple file with one word per line. I save the file to local disk and then create a list to hold each line of the text file for later processing. I would like to know if I am doing these first steps in a way that would be considered idiomatic Python and have I made any big mistakes that will hamper my efforts to expand on it later.
This is similar to an exercise in Think Python by Allen Downey. He suggests using a browser to download the text file but I wanted to do it with Python.
import requests
def get_webpage(uri):
return requests.get(uri)
def save_webpagecontent(r, filename):
""" This function saves the page retrieved by get_webpage. r is the
response from the call to requests.get and
filename is where we want to save the file to in the filesystem."""
chunk_size = 8388608 # number of bytes to write to disk in each chunk
with open(filename, 'wb') as fd:
for chunk in r.iter_content(chunk_size):
fd.write(chunk)
def make_wordlist(filename):
wordlist = []
with open(filename) as fd:
wordlist = fd.readlines()
return wordlist
def get_mylist(wordlist, num_lines=10):
if len(wordlist) <= num_lines:
return wordlist
return wordlist[:num_lines]
def print_mylist(mylist):
for word in mylist:
print(word.strip())
return None
"""List of words collected and contributed to the public domain by
Grady Ward as part of the Moby lexicon project. See https://en.wikipedia.org/wiki/Moby_Project
"""
uri = 'https://ia802308.us.archive.org/7/items/mobywordlists03201gut/CROSSWD.TXT'
filename = 'wordlist.txt'
r = get_webpage(uri)
save_webpagecontent(r, filename)
wordlist = make_wordlist(filename)
mylist = get_mylist(wordlist)
print_mylist(mylist)
My program works as I expect it to. I have basically found how to do each individual piece by reading this forum and but I would like to know if I'm putting all the pieces together correctly. By correctly I mean something that not only functions as expected but also will be easy to build larger programs and modules from.
I hope it isn't wrong of me to post this much code. I wasn't sure how I could trim it down and still show what I am doing. Please let me know if I need to change the format of my question.
python array file
python array file
edited Jun 7 at 7:27
AlexV
2,7359 silver badges31 bronze badges
2,7359 silver badges31 bronze badges
asked Jun 7 at 5:56
Duane WhittyDuane Whitty
687 bronze badges
687 bronze badges
3
$begingroup$
Your code looks good, and it is not really long. Here on Code Review it is accepted to sometimes post around 1000 lines, if that's necessary for understanding the code. When pasting your code, you made a small mistake with the indentation around thechunk_sizeline. After you repaired this, your question is in a perfect format for this site, especially since you explained in detail what code you wrote and why. That's something essential that several other questions are missing.
$endgroup$
– Roland Illig
Jun 7 at 6:12
1
$begingroup$
Thank you for your remarks. I feel more encouraged about the process now!
$endgroup$
– Duane Whitty
Jun 7 at 6:21
3
$begingroup$
Don't worry, the SE network can be confusing for new users. You're in The Good Place now.
$endgroup$
– Mast
Jun 7 at 6:29
$begingroup$
Might want to add an event in case the request can't be completed and you can't download the code.
$endgroup$
– BruceWayne
Jun 7 at 19:49
add a comment |
3
$begingroup$
Your code looks good, and it is not really long. Here on Code Review it is accepted to sometimes post around 1000 lines, if that's necessary for understanding the code. When pasting your code, you made a small mistake with the indentation around thechunk_sizeline. After you repaired this, your question is in a perfect format for this site, especially since you explained in detail what code you wrote and why. That's something essential that several other questions are missing.
$endgroup$
– Roland Illig
Jun 7 at 6:12
1
$begingroup$
Thank you for your remarks. I feel more encouraged about the process now!
$endgroup$
– Duane Whitty
Jun 7 at 6:21
3
$begingroup$
Don't worry, the SE network can be confusing for new users. You're in The Good Place now.
$endgroup$
– Mast
Jun 7 at 6:29
$begingroup$
Might want to add an event in case the request can't be completed and you can't download the code.
$endgroup$
– BruceWayne
Jun 7 at 19:49
3
3
$begingroup$
Your code looks good, and it is not really long. Here on Code Review it is accepted to sometimes post around 1000 lines, if that's necessary for understanding the code. When pasting your code, you made a small mistake with the indentation around the
chunk_size line. After you repaired this, your question is in a perfect format for this site, especially since you explained in detail what code you wrote and why. That's something essential that several other questions are missing.$endgroup$
– Roland Illig
Jun 7 at 6:12
$begingroup$
Your code looks good, and it is not really long. Here on Code Review it is accepted to sometimes post around 1000 lines, if that's necessary for understanding the code. When pasting your code, you made a small mistake with the indentation around the
chunk_size line. After you repaired this, your question is in a perfect format for this site, especially since you explained in detail what code you wrote and why. That's something essential that several other questions are missing.$endgroup$
– Roland Illig
Jun 7 at 6:12
1
1
$begingroup$
Thank you for your remarks. I feel more encouraged about the process now!
$endgroup$
– Duane Whitty
Jun 7 at 6:21
$begingroup$
Thank you for your remarks. I feel more encouraged about the process now!
$endgroup$
– Duane Whitty
Jun 7 at 6:21
3
3
$begingroup$
Don't worry, the SE network can be confusing for new users. You're in The Good Place now.
$endgroup$
– Mast
Jun 7 at 6:29
$begingroup$
Don't worry, the SE network can be confusing for new users. You're in The Good Place now.
$endgroup$
– Mast
Jun 7 at 6:29
$begingroup$
Might want to add an event in case the request can't be completed and you can't download the code.
$endgroup$
– BruceWayne
Jun 7 at 19:49
$begingroup$
Might want to add an event in case the request can't be completed and you can't download the code.
$endgroup$
– BruceWayne
Jun 7 at 19:49
add a comment |
3 Answers
3
active
oldest
votes
$begingroup$
Your code is nice and concise however there are some changes you can make:
- You can just return
f.readlines()inmake_wordlist. If you've done this to show that the result is a list then it'd be better to use the
typingmodule.from typing import List
def make_wordlist(filename: str) -> List[str]:
...get_mylistcan be replaced withwordlist[:numlines]. This is because iflen(wordlist)is smaller or equal tonumlines, then it will return the entire thing anyway.- Performance wise it's best to use
print('n'.join(list))rather thanfor item in list: print(item). - I would prefer to be able to change
chunk_sizeinsave_webpagecontentand so you can make it a default argument. - IIRC multi-line docstrings shouldn't start on the same line as the
""", nor should they end on the same line either.
import requests
from typing import List
Response = requests.Response
def get_webpage(uri) -> Response:
return requests.get(uri)
def save_webpagecontent(r: Response, filename: str,
chunk_size: int=8388608) -> None:
"""
This function saves the page retrieved by get_webpage. r is the
response from the call to requests.get and
filename is where we want to save the file to in the filesystem.
"""
with open(filename, 'wb') as fd:
for chunk in r.iter_content(chunk_size):
fd.write(chunk)
def read_wordlist(filename: str) -> List[str]:
with open(filename) as fd:
return fd.readlines()
def print_mylist(word_list: List[str]) -> None:
print('n'.join(word.strip() for word in word_list))
"""
List of words collected and contributed to the public domain by
Grady Ward as part of the Moby lexicon project. See https://en.wikipedia.org/wiki/Moby_Project
"""
uri = 'https://ia802308.us.archive.org/7/items/mobywordlists03201gut/CROSSWD.TXT'
filename = 'wordlist.txt'
r = get_webpage(uri)
save_webpagecontent(r, filename)
print_mylist(read_wordlist(filename)[:10])
$endgroup$
1
$begingroup$
requests.get()returns aResponseobject, not aRequest.
$endgroup$
– Lukasz Salitra
Jun 7 at 9:51
1
$begingroup$
@LukaszSalitra Thank you, I've updated the code with that. Thought I read the docs correctly :/
$endgroup$
– Peilonrayz
Jun 7 at 9:54
$begingroup$
As per PEP257 multi-line docstrings can start either right after the opening"""or on the line below.
$endgroup$
– Mathias Ettinger
Jun 7 at 13:35
1
$begingroup$
@DuaneWhitty It's 100% optional. They don't make the code faster. If you use one of mypy, pyright, pyre or pytype then they can perform static code analysis which should make your code safer.
$endgroup$
– Peilonrayz
Jun 7 at 18:27
1
$begingroup$
@DuaneWhitty No problem to usethis textyou need to write `this text`. Feel free to try that out here. You also can't put newlines in comments. (They would look like a mess) Yeah looks like you're correct.
$endgroup$
– Peilonrayz
Jun 7 at 22:38
|
show 4 more comments
$begingroup$
Your program is concise and well-readable. One thing that is probably less pythonic is storing the received data in a file. If you have no further use for the file, you could just process the data into a wordlist while receiving it. This saves one intermediate step, and your program would not leave a residual wordlist.txt file around.
$endgroup$
3
$begingroup$
Besides the gains mentioned, file I/O is expensive CPU-wise. However, it's unclear at the moment whether the further processing of the words ("and then create a list to hold each line of the text file for later processing") is going to be done by the same Python program or a different program altogether. Saving to file as an intermediary step for safety/redundancy/whatever reasons is still a good reason to keep it around.
$endgroup$
– Mast
Jun 7 at 10:11
$begingroup$
Thank you both. I saved the file for two reasons: 1) I knew that I would want to be accessing the file many times over and over to try different analysis techniques as my knowledge grows. I'll be running word length analysis and letter frequency analysis and then trying out different ways to display the results. I thought it would be more efficient that downloading it many times over and over and also more polite to the site I'm downloading from. 2) I am thinking about how I should write programs in case I have less than ideal network conditions. Thanks again!
$endgroup$
– Duane Whitty
Jun 7 at 17:27
$begingroup$
One way to download the file only once would be to check for its existence and only download it if the file is not present. That's basically a simple cache without an invalidation policy, and you can manually invalidate the cache by removing the file. Alternatively, you could write separate download and analysis programs, which would help to avoid code duplication with the associated risk of having slightly different versions of the download code in each program.
$endgroup$
– Hans-Martin Mosner
Jun 7 at 20:48
$begingroup$
Thank you to everyone for the reviews. I have made some big updates to my code using the suggestions I received here. Should I now open another request for a code review or should I answer my question with my updated code? The changes include splitting the file handling into another module and using contextmanager to handle file not found errors.
$endgroup$
– Duane Whitty
Jun 8 at 3:11
add a comment |
$begingroup$
Code downloads a text file from a website, saves it to local disk, and then loads it into a list for further processing - Version 2.0
In my new version of this code I have separated my code into 3 modules (my start at a 12 factor app):
download.py for handling downloading the text file from the website and saving it as a file to local storage;
config.py for specifying the URI of the website and the filename for local storage;
moby.py is the actual code that reads the words in the text file, 1 per line, into a list. For now all it does is prints out the words from the file, one per line.
The review my code received provided valuable suggestions on how it could be made more Pythonic, more modular, and more efficient.
Motivated by Hans-Martin Mosner to separate the file download code here is that module. Also made the chunk_size a parameter to the save_webpagecontent() function based on as suggested by Peilonrayz
download.py
import requests
from typing import List
Response = requests.Response
def get_webpage(uri) -> Response:
return requests.get(uri)
def save_webpagecontent(r: Response, filename: str, chunk_size=8388608) -> None:
"""
This function saves the page retrieved by get_webpage.
r is the response from the call to requests.get.
filename is where we want to save the file to in the filesystem.
chunk_size is the number of bytes to write to disk in each chunk
"""
with open(filename, 'wb') as fd:
for chunk in r.iter_content(chunk_size):
fd.write(chunk)
config.py
uri = 'https://ia802308.us.archive.org/7/items/mobywordlists03201gut/CROSSWD.TXT'
filename = 'wordlist.txt'
I feel I made the most gains in my Python profiency as a result of implementing the changes suggested by Peilonrayz where I did away with intermediate function calls and variables and by working on the suggestion by BruceWayne to add an event for failing to open the file. The file opening code turned out to be the most challenging. I wasn't able to get `opened_w_error() working exactly based on the example from PEP343. Figuring it out was very rewarding.
moby.py
import download_file as df
import config as cfg
from contextlib import contextmanager
from typing import List
filename = cfg.filename
uri = cfg.uri
@contextmanager
def opened_w_error(filename, mode="r"):
try:
f = open(filename, mode)
except OSError as err:
yield None, err
else:
try:
yield f, None
finally:
f.close()
def read_wordlist(filename: str) -> List[str]:
with opened_w_error(filename, 'r') as (fd, err):
if type(err) == FileNotFoundError:
df.save_webpagecontent(df.get_webpage(uri), filename) #since it failed the first time we need to actually download it
with opened_w_error(filename, 'r') as (fd, err): # if it fails again abort
if err:
print("OSError:", err)
else:
return fd.readlines()
else:
return fd.readlines()
def print_mylist(wordlist: List[str]) -> None:
print('n'.join(word.strip() for word in wordlist))
print_mylist(read_wordlist(filename)[:50])
Thank you to everyone, especially Roland Illig, Hans-Martin Mosner, and Mast for all your help and encouragement and a safe place to learn!
$endgroup$
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "196"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f221830%2fcode-downloads-a-text-file-from-a-website-saves-it-to-local-disk-and-then-load%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
Your code is nice and concise however there are some changes you can make:
- You can just return
f.readlines()inmake_wordlist. If you've done this to show that the result is a list then it'd be better to use the
typingmodule.from typing import List
def make_wordlist(filename: str) -> List[str]:
...get_mylistcan be replaced withwordlist[:numlines]. This is because iflen(wordlist)is smaller or equal tonumlines, then it will return the entire thing anyway.- Performance wise it's best to use
print('n'.join(list))rather thanfor item in list: print(item). - I would prefer to be able to change
chunk_sizeinsave_webpagecontentand so you can make it a default argument. - IIRC multi-line docstrings shouldn't start on the same line as the
""", nor should they end on the same line either.
import requests
from typing import List
Response = requests.Response
def get_webpage(uri) -> Response:
return requests.get(uri)
def save_webpagecontent(r: Response, filename: str,
chunk_size: int=8388608) -> None:
"""
This function saves the page retrieved by get_webpage. r is the
response from the call to requests.get and
filename is where we want to save the file to in the filesystem.
"""
with open(filename, 'wb') as fd:
for chunk in r.iter_content(chunk_size):
fd.write(chunk)
def read_wordlist(filename: str) -> List[str]:
with open(filename) as fd:
return fd.readlines()
def print_mylist(word_list: List[str]) -> None:
print('n'.join(word.strip() for word in word_list))
"""
List of words collected and contributed to the public domain by
Grady Ward as part of the Moby lexicon project. See https://en.wikipedia.org/wiki/Moby_Project
"""
uri = 'https://ia802308.us.archive.org/7/items/mobywordlists03201gut/CROSSWD.TXT'
filename = 'wordlist.txt'
r = get_webpage(uri)
save_webpagecontent(r, filename)
print_mylist(read_wordlist(filename)[:10])
$endgroup$
1
$begingroup$
requests.get()returns aResponseobject, not aRequest.
$endgroup$
– Lukasz Salitra
Jun 7 at 9:51
1
$begingroup$
@LukaszSalitra Thank you, I've updated the code with that. Thought I read the docs correctly :/
$endgroup$
– Peilonrayz
Jun 7 at 9:54
$begingroup$
As per PEP257 multi-line docstrings can start either right after the opening"""or on the line below.
$endgroup$
– Mathias Ettinger
Jun 7 at 13:35
1
$begingroup$
@DuaneWhitty It's 100% optional. They don't make the code faster. If you use one of mypy, pyright, pyre or pytype then they can perform static code analysis which should make your code safer.
$endgroup$
– Peilonrayz
Jun 7 at 18:27
1
$begingroup$
@DuaneWhitty No problem to usethis textyou need to write `this text`. Feel free to try that out here. You also can't put newlines in comments. (They would look like a mess) Yeah looks like you're correct.
$endgroup$
– Peilonrayz
Jun 7 at 22:38
|
show 4 more comments
$begingroup$
Your code is nice and concise however there are some changes you can make:
- You can just return
f.readlines()inmake_wordlist. If you've done this to show that the result is a list then it'd be better to use the
typingmodule.from typing import List
def make_wordlist(filename: str) -> List[str]:
...get_mylistcan be replaced withwordlist[:numlines]. This is because iflen(wordlist)is smaller or equal tonumlines, then it will return the entire thing anyway.- Performance wise it's best to use
print('n'.join(list))rather thanfor item in list: print(item). - I would prefer to be able to change
chunk_sizeinsave_webpagecontentand so you can make it a default argument. - IIRC multi-line docstrings shouldn't start on the same line as the
""", nor should they end on the same line either.
import requests
from typing import List
Response = requests.Response
def get_webpage(uri) -> Response:
return requests.get(uri)
def save_webpagecontent(r: Response, filename: str,
chunk_size: int=8388608) -> None:
"""
This function saves the page retrieved by get_webpage. r is the
response from the call to requests.get and
filename is where we want to save the file to in the filesystem.
"""
with open(filename, 'wb') as fd:
for chunk in r.iter_content(chunk_size):
fd.write(chunk)
def read_wordlist(filename: str) -> List[str]:
with open(filename) as fd:
return fd.readlines()
def print_mylist(word_list: List[str]) -> None:
print('n'.join(word.strip() for word in word_list))
"""
List of words collected and contributed to the public domain by
Grady Ward as part of the Moby lexicon project. See https://en.wikipedia.org/wiki/Moby_Project
"""
uri = 'https://ia802308.us.archive.org/7/items/mobywordlists03201gut/CROSSWD.TXT'
filename = 'wordlist.txt'
r = get_webpage(uri)
save_webpagecontent(r, filename)
print_mylist(read_wordlist(filename)[:10])
$endgroup$
1
$begingroup$
requests.get()returns aResponseobject, not aRequest.
$endgroup$
– Lukasz Salitra
Jun 7 at 9:51
1
$begingroup$
@LukaszSalitra Thank you, I've updated the code with that. Thought I read the docs correctly :/
$endgroup$
– Peilonrayz
Jun 7 at 9:54
$begingroup$
As per PEP257 multi-line docstrings can start either right after the opening"""or on the line below.
$endgroup$
– Mathias Ettinger
Jun 7 at 13:35
1
$begingroup$
@DuaneWhitty It's 100% optional. They don't make the code faster. If you use one of mypy, pyright, pyre or pytype then they can perform static code analysis which should make your code safer.
$endgroup$
– Peilonrayz
Jun 7 at 18:27
1
$begingroup$
@DuaneWhitty No problem to usethis textyou need to write `this text`. Feel free to try that out here. You also can't put newlines in comments. (They would look like a mess) Yeah looks like you're correct.
$endgroup$
– Peilonrayz
Jun 7 at 22:38
|
show 4 more comments
$begingroup$
Your code is nice and concise however there are some changes you can make:
- You can just return
f.readlines()inmake_wordlist. If you've done this to show that the result is a list then it'd be better to use the
typingmodule.from typing import List
def make_wordlist(filename: str) -> List[str]:
...get_mylistcan be replaced withwordlist[:numlines]. This is because iflen(wordlist)is smaller or equal tonumlines, then it will return the entire thing anyway.- Performance wise it's best to use
print('n'.join(list))rather thanfor item in list: print(item). - I would prefer to be able to change
chunk_sizeinsave_webpagecontentand so you can make it a default argument. - IIRC multi-line docstrings shouldn't start on the same line as the
""", nor should they end on the same line either.
import requests
from typing import List
Response = requests.Response
def get_webpage(uri) -> Response:
return requests.get(uri)
def save_webpagecontent(r: Response, filename: str,
chunk_size: int=8388608) -> None:
"""
This function saves the page retrieved by get_webpage. r is the
response from the call to requests.get and
filename is where we want to save the file to in the filesystem.
"""
with open(filename, 'wb') as fd:
for chunk in r.iter_content(chunk_size):
fd.write(chunk)
def read_wordlist(filename: str) -> List[str]:
with open(filename) as fd:
return fd.readlines()
def print_mylist(word_list: List[str]) -> None:
print('n'.join(word.strip() for word in word_list))
"""
List of words collected and contributed to the public domain by
Grady Ward as part of the Moby lexicon project. See https://en.wikipedia.org/wiki/Moby_Project
"""
uri = 'https://ia802308.us.archive.org/7/items/mobywordlists03201gut/CROSSWD.TXT'
filename = 'wordlist.txt'
r = get_webpage(uri)
save_webpagecontent(r, filename)
print_mylist(read_wordlist(filename)[:10])
$endgroup$
Your code is nice and concise however there are some changes you can make:
- You can just return
f.readlines()inmake_wordlist. If you've done this to show that the result is a list then it'd be better to use the
typingmodule.from typing import List
def make_wordlist(filename: str) -> List[str]:
...get_mylistcan be replaced withwordlist[:numlines]. This is because iflen(wordlist)is smaller or equal tonumlines, then it will return the entire thing anyway.- Performance wise it's best to use
print('n'.join(list))rather thanfor item in list: print(item). - I would prefer to be able to change
chunk_sizeinsave_webpagecontentand so you can make it a default argument. - IIRC multi-line docstrings shouldn't start on the same line as the
""", nor should they end on the same line either.
import requests
from typing import List
Response = requests.Response
def get_webpage(uri) -> Response:
return requests.get(uri)
def save_webpagecontent(r: Response, filename: str,
chunk_size: int=8388608) -> None:
"""
This function saves the page retrieved by get_webpage. r is the
response from the call to requests.get and
filename is where we want to save the file to in the filesystem.
"""
with open(filename, 'wb') as fd:
for chunk in r.iter_content(chunk_size):
fd.write(chunk)
def read_wordlist(filename: str) -> List[str]:
with open(filename) as fd:
return fd.readlines()
def print_mylist(word_list: List[str]) -> None:
print('n'.join(word.strip() for word in word_list))
"""
List of words collected and contributed to the public domain by
Grady Ward as part of the Moby lexicon project. See https://en.wikipedia.org/wiki/Moby_Project
"""
uri = 'https://ia802308.us.archive.org/7/items/mobywordlists03201gut/CROSSWD.TXT'
filename = 'wordlist.txt'
r = get_webpage(uri)
save_webpagecontent(r, filename)
print_mylist(read_wordlist(filename)[:10])
edited Jun 7 at 13:09
yuri
4,1072 gold badges12 silver badges37 bronze badges
4,1072 gold badges12 silver badges37 bronze badges
answered Jun 7 at 9:43
PeilonrayzPeilonrayz
29.4k4 gold badges45 silver badges119 bronze badges
29.4k4 gold badges45 silver badges119 bronze badges
1
$begingroup$
requests.get()returns aResponseobject, not aRequest.
$endgroup$
– Lukasz Salitra
Jun 7 at 9:51
1
$begingroup$
@LukaszSalitra Thank you, I've updated the code with that. Thought I read the docs correctly :/
$endgroup$
– Peilonrayz
Jun 7 at 9:54
$begingroup$
As per PEP257 multi-line docstrings can start either right after the opening"""or on the line below.
$endgroup$
– Mathias Ettinger
Jun 7 at 13:35
1
$begingroup$
@DuaneWhitty It's 100% optional. They don't make the code faster. If you use one of mypy, pyright, pyre or pytype then they can perform static code analysis which should make your code safer.
$endgroup$
– Peilonrayz
Jun 7 at 18:27
1
$begingroup$
@DuaneWhitty No problem to usethis textyou need to write `this text`. Feel free to try that out here. You also can't put newlines in comments. (They would look like a mess) Yeah looks like you're correct.
$endgroup$
– Peilonrayz
Jun 7 at 22:38
|
show 4 more comments
1
$begingroup$
requests.get()returns aResponseobject, not aRequest.
$endgroup$
– Lukasz Salitra
Jun 7 at 9:51
1
$begingroup$
@LukaszSalitra Thank you, I've updated the code with that. Thought I read the docs correctly :/
$endgroup$
– Peilonrayz
Jun 7 at 9:54
$begingroup$
As per PEP257 multi-line docstrings can start either right after the opening"""or on the line below.
$endgroup$
– Mathias Ettinger
Jun 7 at 13:35
1
$begingroup$
@DuaneWhitty It's 100% optional. They don't make the code faster. If you use one of mypy, pyright, pyre or pytype then they can perform static code analysis which should make your code safer.
$endgroup$
– Peilonrayz
Jun 7 at 18:27
1
$begingroup$
@DuaneWhitty No problem to usethis textyou need to write `this text`. Feel free to try that out here. You also can't put newlines in comments. (They would look like a mess) Yeah looks like you're correct.
$endgroup$
– Peilonrayz
Jun 7 at 22:38
1
1
$begingroup$
requests.get() returns a Response object, not a Request.$endgroup$
– Lukasz Salitra
Jun 7 at 9:51
$begingroup$
requests.get() returns a Response object, not a Request.$endgroup$
– Lukasz Salitra
Jun 7 at 9:51
1
1
$begingroup$
@LukaszSalitra Thank you, I've updated the code with that. Thought I read the docs correctly :/
$endgroup$
– Peilonrayz
Jun 7 at 9:54
$begingroup$
@LukaszSalitra Thank you, I've updated the code with that. Thought I read the docs correctly :/
$endgroup$
– Peilonrayz
Jun 7 at 9:54
$begingroup$
As per PEP257 multi-line docstrings can start either right after the opening
""" or on the line below.$endgroup$
– Mathias Ettinger
Jun 7 at 13:35
$begingroup$
As per PEP257 multi-line docstrings can start either right after the opening
""" or on the line below.$endgroup$
– Mathias Ettinger
Jun 7 at 13:35
1
1
$begingroup$
@DuaneWhitty It's 100% optional. They don't make the code faster. If you use one of mypy, pyright, pyre or pytype then they can perform static code analysis which should make your code safer.
$endgroup$
– Peilonrayz
Jun 7 at 18:27
$begingroup$
@DuaneWhitty It's 100% optional. They don't make the code faster. If you use one of mypy, pyright, pyre or pytype then they can perform static code analysis which should make your code safer.
$endgroup$
– Peilonrayz
Jun 7 at 18:27
1
1
$begingroup$
@DuaneWhitty No problem to use
this text you need to write `this text`. Feel free to try that out here. You also can't put newlines in comments. (They would look like a mess) Yeah looks like you're correct.$endgroup$
– Peilonrayz
Jun 7 at 22:38
$begingroup$
@DuaneWhitty No problem to use
this text you need to write `this text`. Feel free to try that out here. You also can't put newlines in comments. (They would look like a mess) Yeah looks like you're correct.$endgroup$
– Peilonrayz
Jun 7 at 22:38
|
show 4 more comments
$begingroup$
Your program is concise and well-readable. One thing that is probably less pythonic is storing the received data in a file. If you have no further use for the file, you could just process the data into a wordlist while receiving it. This saves one intermediate step, and your program would not leave a residual wordlist.txt file around.
$endgroup$
3
$begingroup$
Besides the gains mentioned, file I/O is expensive CPU-wise. However, it's unclear at the moment whether the further processing of the words ("and then create a list to hold each line of the text file for later processing") is going to be done by the same Python program or a different program altogether. Saving to file as an intermediary step for safety/redundancy/whatever reasons is still a good reason to keep it around.
$endgroup$
– Mast
Jun 7 at 10:11
$begingroup$
Thank you both. I saved the file for two reasons: 1) I knew that I would want to be accessing the file many times over and over to try different analysis techniques as my knowledge grows. I'll be running word length analysis and letter frequency analysis and then trying out different ways to display the results. I thought it would be more efficient that downloading it many times over and over and also more polite to the site I'm downloading from. 2) I am thinking about how I should write programs in case I have less than ideal network conditions. Thanks again!
$endgroup$
– Duane Whitty
Jun 7 at 17:27
$begingroup$
One way to download the file only once would be to check for its existence and only download it if the file is not present. That's basically a simple cache without an invalidation policy, and you can manually invalidate the cache by removing the file. Alternatively, you could write separate download and analysis programs, which would help to avoid code duplication with the associated risk of having slightly different versions of the download code in each program.
$endgroup$
– Hans-Martin Mosner
Jun 7 at 20:48
$begingroup$
Thank you to everyone for the reviews. I have made some big updates to my code using the suggestions I received here. Should I now open another request for a code review or should I answer my question with my updated code? The changes include splitting the file handling into another module and using contextmanager to handle file not found errors.
$endgroup$
– Duane Whitty
Jun 8 at 3:11
add a comment |
$begingroup$
Your program is concise and well-readable. One thing that is probably less pythonic is storing the received data in a file. If you have no further use for the file, you could just process the data into a wordlist while receiving it. This saves one intermediate step, and your program would not leave a residual wordlist.txt file around.
$endgroup$
3
$begingroup$
Besides the gains mentioned, file I/O is expensive CPU-wise. However, it's unclear at the moment whether the further processing of the words ("and then create a list to hold each line of the text file for later processing") is going to be done by the same Python program or a different program altogether. Saving to file as an intermediary step for safety/redundancy/whatever reasons is still a good reason to keep it around.
$endgroup$
– Mast
Jun 7 at 10:11
$begingroup$
Thank you both. I saved the file for two reasons: 1) I knew that I would want to be accessing the file many times over and over to try different analysis techniques as my knowledge grows. I'll be running word length analysis and letter frequency analysis and then trying out different ways to display the results. I thought it would be more efficient that downloading it many times over and over and also more polite to the site I'm downloading from. 2) I am thinking about how I should write programs in case I have less than ideal network conditions. Thanks again!
$endgroup$
– Duane Whitty
Jun 7 at 17:27
$begingroup$
One way to download the file only once would be to check for its existence and only download it if the file is not present. That's basically a simple cache without an invalidation policy, and you can manually invalidate the cache by removing the file. Alternatively, you could write separate download and analysis programs, which would help to avoid code duplication with the associated risk of having slightly different versions of the download code in each program.
$endgroup$
– Hans-Martin Mosner
Jun 7 at 20:48
$begingroup$
Thank you to everyone for the reviews. I have made some big updates to my code using the suggestions I received here. Should I now open another request for a code review or should I answer my question with my updated code? The changes include splitting the file handling into another module and using contextmanager to handle file not found errors.
$endgroup$
– Duane Whitty
Jun 8 at 3:11
add a comment |
$begingroup$
Your program is concise and well-readable. One thing that is probably less pythonic is storing the received data in a file. If you have no further use for the file, you could just process the data into a wordlist while receiving it. This saves one intermediate step, and your program would not leave a residual wordlist.txt file around.
$endgroup$
Your program is concise and well-readable. One thing that is probably less pythonic is storing the received data in a file. If you have no further use for the file, you could just process the data into a wordlist while receiving it. This saves one intermediate step, and your program would not leave a residual wordlist.txt file around.
answered Jun 7 at 8:24
Hans-Martin MosnerHans-Martin Mosner
1512 bronze badges
1512 bronze badges
3
$begingroup$
Besides the gains mentioned, file I/O is expensive CPU-wise. However, it's unclear at the moment whether the further processing of the words ("and then create a list to hold each line of the text file for later processing") is going to be done by the same Python program or a different program altogether. Saving to file as an intermediary step for safety/redundancy/whatever reasons is still a good reason to keep it around.
$endgroup$
– Mast
Jun 7 at 10:11
$begingroup$
Thank you both. I saved the file for two reasons: 1) I knew that I would want to be accessing the file many times over and over to try different analysis techniques as my knowledge grows. I'll be running word length analysis and letter frequency analysis and then trying out different ways to display the results. I thought it would be more efficient that downloading it many times over and over and also more polite to the site I'm downloading from. 2) I am thinking about how I should write programs in case I have less than ideal network conditions. Thanks again!
$endgroup$
– Duane Whitty
Jun 7 at 17:27
$begingroup$
One way to download the file only once would be to check for its existence and only download it if the file is not present. That's basically a simple cache without an invalidation policy, and you can manually invalidate the cache by removing the file. Alternatively, you could write separate download and analysis programs, which would help to avoid code duplication with the associated risk of having slightly different versions of the download code in each program.
$endgroup$
– Hans-Martin Mosner
Jun 7 at 20:48
$begingroup$
Thank you to everyone for the reviews. I have made some big updates to my code using the suggestions I received here. Should I now open another request for a code review or should I answer my question with my updated code? The changes include splitting the file handling into another module and using contextmanager to handle file not found errors.
$endgroup$
– Duane Whitty
Jun 8 at 3:11
add a comment |
3
$begingroup$
Besides the gains mentioned, file I/O is expensive CPU-wise. However, it's unclear at the moment whether the further processing of the words ("and then create a list to hold each line of the text file for later processing") is going to be done by the same Python program or a different program altogether. Saving to file as an intermediary step for safety/redundancy/whatever reasons is still a good reason to keep it around.
$endgroup$
– Mast
Jun 7 at 10:11
$begingroup$
Thank you both. I saved the file for two reasons: 1) I knew that I would want to be accessing the file many times over and over to try different analysis techniques as my knowledge grows. I'll be running word length analysis and letter frequency analysis and then trying out different ways to display the results. I thought it would be more efficient that downloading it many times over and over and also more polite to the site I'm downloading from. 2) I am thinking about how I should write programs in case I have less than ideal network conditions. Thanks again!
$endgroup$
– Duane Whitty
Jun 7 at 17:27
$begingroup$
One way to download the file only once would be to check for its existence and only download it if the file is not present. That's basically a simple cache without an invalidation policy, and you can manually invalidate the cache by removing the file. Alternatively, you could write separate download and analysis programs, which would help to avoid code duplication with the associated risk of having slightly different versions of the download code in each program.
$endgroup$
– Hans-Martin Mosner
Jun 7 at 20:48
$begingroup$
Thank you to everyone for the reviews. I have made some big updates to my code using the suggestions I received here. Should I now open another request for a code review or should I answer my question with my updated code? The changes include splitting the file handling into another module and using contextmanager to handle file not found errors.
$endgroup$
– Duane Whitty
Jun 8 at 3:11
3
3
$begingroup$
Besides the gains mentioned, file I/O is expensive CPU-wise. However, it's unclear at the moment whether the further processing of the words ("and then create a list to hold each line of the text file for later processing") is going to be done by the same Python program or a different program altogether. Saving to file as an intermediary step for safety/redundancy/whatever reasons is still a good reason to keep it around.
$endgroup$
– Mast
Jun 7 at 10:11
$begingroup$
Besides the gains mentioned, file I/O is expensive CPU-wise. However, it's unclear at the moment whether the further processing of the words ("and then create a list to hold each line of the text file for later processing") is going to be done by the same Python program or a different program altogether. Saving to file as an intermediary step for safety/redundancy/whatever reasons is still a good reason to keep it around.
$endgroup$
– Mast
Jun 7 at 10:11
$begingroup$
Thank you both. I saved the file for two reasons: 1) I knew that I would want to be accessing the file many times over and over to try different analysis techniques as my knowledge grows. I'll be running word length analysis and letter frequency analysis and then trying out different ways to display the results. I thought it would be more efficient that downloading it many times over and over and also more polite to the site I'm downloading from. 2) I am thinking about how I should write programs in case I have less than ideal network conditions. Thanks again!
$endgroup$
– Duane Whitty
Jun 7 at 17:27
$begingroup$
Thank you both. I saved the file for two reasons: 1) I knew that I would want to be accessing the file many times over and over to try different analysis techniques as my knowledge grows. I'll be running word length analysis and letter frequency analysis and then trying out different ways to display the results. I thought it would be more efficient that downloading it many times over and over and also more polite to the site I'm downloading from. 2) I am thinking about how I should write programs in case I have less than ideal network conditions. Thanks again!
$endgroup$
– Duane Whitty
Jun 7 at 17:27
$begingroup$
One way to download the file only once would be to check for its existence and only download it if the file is not present. That's basically a simple cache without an invalidation policy, and you can manually invalidate the cache by removing the file. Alternatively, you could write separate download and analysis programs, which would help to avoid code duplication with the associated risk of having slightly different versions of the download code in each program.
$endgroup$
– Hans-Martin Mosner
Jun 7 at 20:48
$begingroup$
One way to download the file only once would be to check for its existence and only download it if the file is not present. That's basically a simple cache without an invalidation policy, and you can manually invalidate the cache by removing the file. Alternatively, you could write separate download and analysis programs, which would help to avoid code duplication with the associated risk of having slightly different versions of the download code in each program.
$endgroup$
– Hans-Martin Mosner
Jun 7 at 20:48
$begingroup$
Thank you to everyone for the reviews. I have made some big updates to my code using the suggestions I received here. Should I now open another request for a code review or should I answer my question with my updated code? The changes include splitting the file handling into another module and using contextmanager to handle file not found errors.
$endgroup$
– Duane Whitty
Jun 8 at 3:11
$begingroup$
Thank you to everyone for the reviews. I have made some big updates to my code using the suggestions I received here. Should I now open another request for a code review or should I answer my question with my updated code? The changes include splitting the file handling into another module and using contextmanager to handle file not found errors.
$endgroup$
– Duane Whitty
Jun 8 at 3:11
add a comment |
$begingroup$
Code downloads a text file from a website, saves it to local disk, and then loads it into a list for further processing - Version 2.0
In my new version of this code I have separated my code into 3 modules (my start at a 12 factor app):
download.py for handling downloading the text file from the website and saving it as a file to local storage;
config.py for specifying the URI of the website and the filename for local storage;
moby.py is the actual code that reads the words in the text file, 1 per line, into a list. For now all it does is prints out the words from the file, one per line.
The review my code received provided valuable suggestions on how it could be made more Pythonic, more modular, and more efficient.
Motivated by Hans-Martin Mosner to separate the file download code here is that module. Also made the chunk_size a parameter to the save_webpagecontent() function based on as suggested by Peilonrayz
download.py
import requests
from typing import List
Response = requests.Response
def get_webpage(uri) -> Response:
return requests.get(uri)
def save_webpagecontent(r: Response, filename: str, chunk_size=8388608) -> None:
"""
This function saves the page retrieved by get_webpage.
r is the response from the call to requests.get.
filename is where we want to save the file to in the filesystem.
chunk_size is the number of bytes to write to disk in each chunk
"""
with open(filename, 'wb') as fd:
for chunk in r.iter_content(chunk_size):
fd.write(chunk)
config.py
uri = 'https://ia802308.us.archive.org/7/items/mobywordlists03201gut/CROSSWD.TXT'
filename = 'wordlist.txt'
I feel I made the most gains in my Python profiency as a result of implementing the changes suggested by Peilonrayz where I did away with intermediate function calls and variables and by working on the suggestion by BruceWayne to add an event for failing to open the file. The file opening code turned out to be the most challenging. I wasn't able to get `opened_w_error() working exactly based on the example from PEP343. Figuring it out was very rewarding.
moby.py
import download_file as df
import config as cfg
from contextlib import contextmanager
from typing import List
filename = cfg.filename
uri = cfg.uri
@contextmanager
def opened_w_error(filename, mode="r"):
try:
f = open(filename, mode)
except OSError as err:
yield None, err
else:
try:
yield f, None
finally:
f.close()
def read_wordlist(filename: str) -> List[str]:
with opened_w_error(filename, 'r') as (fd, err):
if type(err) == FileNotFoundError:
df.save_webpagecontent(df.get_webpage(uri), filename) #since it failed the first time we need to actually download it
with opened_w_error(filename, 'r') as (fd, err): # if it fails again abort
if err:
print("OSError:", err)
else:
return fd.readlines()
else:
return fd.readlines()
def print_mylist(wordlist: List[str]) -> None:
print('n'.join(word.strip() for word in wordlist))
print_mylist(read_wordlist(filename)[:50])
Thank you to everyone, especially Roland Illig, Hans-Martin Mosner, and Mast for all your help and encouragement and a safe place to learn!
$endgroup$
add a comment |
$begingroup$
Code downloads a text file from a website, saves it to local disk, and then loads it into a list for further processing - Version 2.0
In my new version of this code I have separated my code into 3 modules (my start at a 12 factor app):
download.py for handling downloading the text file from the website and saving it as a file to local storage;
config.py for specifying the URI of the website and the filename for local storage;
moby.py is the actual code that reads the words in the text file, 1 per line, into a list. For now all it does is prints out the words from the file, one per line.
The review my code received provided valuable suggestions on how it could be made more Pythonic, more modular, and more efficient.
Motivated by Hans-Martin Mosner to separate the file download code here is that module. Also made the chunk_size a parameter to the save_webpagecontent() function based on as suggested by Peilonrayz
download.py
import requests
from typing import List
Response = requests.Response
def get_webpage(uri) -> Response:
return requests.get(uri)
def save_webpagecontent(r: Response, filename: str, chunk_size=8388608) -> None:
"""
This function saves the page retrieved by get_webpage.
r is the response from the call to requests.get.
filename is where we want to save the file to in the filesystem.
chunk_size is the number of bytes to write to disk in each chunk
"""
with open(filename, 'wb') as fd:
for chunk in r.iter_content(chunk_size):
fd.write(chunk)
config.py
uri = 'https://ia802308.us.archive.org/7/items/mobywordlists03201gut/CROSSWD.TXT'
filename = 'wordlist.txt'
I feel I made the most gains in my Python profiency as a result of implementing the changes suggested by Peilonrayz where I did away with intermediate function calls and variables and by working on the suggestion by BruceWayne to add an event for failing to open the file. The file opening code turned out to be the most challenging. I wasn't able to get `opened_w_error() working exactly based on the example from PEP343. Figuring it out was very rewarding.
moby.py
import download_file as df
import config as cfg
from contextlib import contextmanager
from typing import List
filename = cfg.filename
uri = cfg.uri
@contextmanager
def opened_w_error(filename, mode="r"):
try:
f = open(filename, mode)
except OSError as err:
yield None, err
else:
try:
yield f, None
finally:
f.close()
def read_wordlist(filename: str) -> List[str]:
with opened_w_error(filename, 'r') as (fd, err):
if type(err) == FileNotFoundError:
df.save_webpagecontent(df.get_webpage(uri), filename) #since it failed the first time we need to actually download it
with opened_w_error(filename, 'r') as (fd, err): # if it fails again abort
if err:
print("OSError:", err)
else:
return fd.readlines()
else:
return fd.readlines()
def print_mylist(wordlist: List[str]) -> None:
print('n'.join(word.strip() for word in wordlist))
print_mylist(read_wordlist(filename)[:50])
Thank you to everyone, especially Roland Illig, Hans-Martin Mosner, and Mast for all your help and encouragement and a safe place to learn!
$endgroup$
add a comment |
$begingroup$
Code downloads a text file from a website, saves it to local disk, and then loads it into a list for further processing - Version 2.0
In my new version of this code I have separated my code into 3 modules (my start at a 12 factor app):
download.py for handling downloading the text file from the website and saving it as a file to local storage;
config.py for specifying the URI of the website and the filename for local storage;
moby.py is the actual code that reads the words in the text file, 1 per line, into a list. For now all it does is prints out the words from the file, one per line.
The review my code received provided valuable suggestions on how it could be made more Pythonic, more modular, and more efficient.
Motivated by Hans-Martin Mosner to separate the file download code here is that module. Also made the chunk_size a parameter to the save_webpagecontent() function based on as suggested by Peilonrayz
download.py
import requests
from typing import List
Response = requests.Response
def get_webpage(uri) -> Response:
return requests.get(uri)
def save_webpagecontent(r: Response, filename: str, chunk_size=8388608) -> None:
"""
This function saves the page retrieved by get_webpage.
r is the response from the call to requests.get.
filename is where we want to save the file to in the filesystem.
chunk_size is the number of bytes to write to disk in each chunk
"""
with open(filename, 'wb') as fd:
for chunk in r.iter_content(chunk_size):
fd.write(chunk)
config.py
uri = 'https://ia802308.us.archive.org/7/items/mobywordlists03201gut/CROSSWD.TXT'
filename = 'wordlist.txt'
I feel I made the most gains in my Python profiency as a result of implementing the changes suggested by Peilonrayz where I did away with intermediate function calls and variables and by working on the suggestion by BruceWayne to add an event for failing to open the file. The file opening code turned out to be the most challenging. I wasn't able to get `opened_w_error() working exactly based on the example from PEP343. Figuring it out was very rewarding.
moby.py
import download_file as df
import config as cfg
from contextlib import contextmanager
from typing import List
filename = cfg.filename
uri = cfg.uri
@contextmanager
def opened_w_error(filename, mode="r"):
try:
f = open(filename, mode)
except OSError as err:
yield None, err
else:
try:
yield f, None
finally:
f.close()
def read_wordlist(filename: str) -> List[str]:
with opened_w_error(filename, 'r') as (fd, err):
if type(err) == FileNotFoundError:
df.save_webpagecontent(df.get_webpage(uri), filename) #since it failed the first time we need to actually download it
with opened_w_error(filename, 'r') as (fd, err): # if it fails again abort
if err:
print("OSError:", err)
else:
return fd.readlines()
else:
return fd.readlines()
def print_mylist(wordlist: List[str]) -> None:
print('n'.join(word.strip() for word in wordlist))
print_mylist(read_wordlist(filename)[:50])
Thank you to everyone, especially Roland Illig, Hans-Martin Mosner, and Mast for all your help and encouragement and a safe place to learn!
$endgroup$
Code downloads a text file from a website, saves it to local disk, and then loads it into a list for further processing - Version 2.0
In my new version of this code I have separated my code into 3 modules (my start at a 12 factor app):
download.py for handling downloading the text file from the website and saving it as a file to local storage;
config.py for specifying the URI of the website and the filename for local storage;
moby.py is the actual code that reads the words in the text file, 1 per line, into a list. For now all it does is prints out the words from the file, one per line.
The review my code received provided valuable suggestions on how it could be made more Pythonic, more modular, and more efficient.
Motivated by Hans-Martin Mosner to separate the file download code here is that module. Also made the chunk_size a parameter to the save_webpagecontent() function based on as suggested by Peilonrayz
download.py
import requests
from typing import List
Response = requests.Response
def get_webpage(uri) -> Response:
return requests.get(uri)
def save_webpagecontent(r: Response, filename: str, chunk_size=8388608) -> None:
"""
This function saves the page retrieved by get_webpage.
r is the response from the call to requests.get.
filename is where we want to save the file to in the filesystem.
chunk_size is the number of bytes to write to disk in each chunk
"""
with open(filename, 'wb') as fd:
for chunk in r.iter_content(chunk_size):
fd.write(chunk)
config.py
uri = 'https://ia802308.us.archive.org/7/items/mobywordlists03201gut/CROSSWD.TXT'
filename = 'wordlist.txt'
I feel I made the most gains in my Python profiency as a result of implementing the changes suggested by Peilonrayz where I did away with intermediate function calls and variables and by working on the suggestion by BruceWayne to add an event for failing to open the file. The file opening code turned out to be the most challenging. I wasn't able to get `opened_w_error() working exactly based on the example from PEP343. Figuring it out was very rewarding.
moby.py
import download_file as df
import config as cfg
from contextlib import contextmanager
from typing import List
filename = cfg.filename
uri = cfg.uri
@contextmanager
def opened_w_error(filename, mode="r"):
try:
f = open(filename, mode)
except OSError as err:
yield None, err
else:
try:
yield f, None
finally:
f.close()
def read_wordlist(filename: str) -> List[str]:
with opened_w_error(filename, 'r') as (fd, err):
if type(err) == FileNotFoundError:
df.save_webpagecontent(df.get_webpage(uri), filename) #since it failed the first time we need to actually download it
with opened_w_error(filename, 'r') as (fd, err): # if it fails again abort
if err:
print("OSError:", err)
else:
return fd.readlines()
else:
return fd.readlines()
def print_mylist(wordlist: List[str]) -> None:
print('n'.join(word.strip() for word in wordlist))
print_mylist(read_wordlist(filename)[:50])
Thank you to everyone, especially Roland Illig, Hans-Martin Mosner, and Mast for all your help and encouragement and a safe place to learn!
edited Jun 8 at 4:54
answered Jun 8 at 4:40
Duane WhittyDuane Whitty
687 bronze badges
687 bronze badges
add a comment |
add a comment |
Thanks for contributing an answer to Code Review Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f221830%2fcode-downloads-a-text-file-from-a-website-saves-it-to-local-disk-and-then-load%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
3
$begingroup$
Your code looks good, and it is not really long. Here on Code Review it is accepted to sometimes post around 1000 lines, if that's necessary for understanding the code. When pasting your code, you made a small mistake with the indentation around the
chunk_sizeline. After you repaired this, your question is in a perfect format for this site, especially since you explained in detail what code you wrote and why. That's something essential that several other questions are missing.$endgroup$
– Roland Illig
Jun 7 at 6:12
1
$begingroup$
Thank you for your remarks. I feel more encouraged about the process now!
$endgroup$
– Duane Whitty
Jun 7 at 6:21
3
$begingroup$
Don't worry, the SE network can be confusing for new users. You're in The Good Place now.
$endgroup$
– Mast
Jun 7 at 6:29
$begingroup$
Might want to add an event in case the request can't be completed and you can't download the code.
$endgroup$
– BruceWayne
Jun 7 at 19:49