Code downloads a text file from a website, saves it to local disk, and then loads it into a list for further processingSplitting a CSV file with headersMaking a list from user inputSplitting plain text dictionary data to multiple files, round 2.txt word-counterReusable, generic save methodFile copy in Python for slow networksFile copy in Python for slow networks (version 2)Reading a DHT11 to a fileMad Libs excerciseMinimising latency on a pyautogui responce when a given pixel is seen

Did Karl Marx ever use any example that involved cotton and dollars to illustrate the way capital and surplus value were generated?

Are all instances of trolls turning to stone ultimately references back to Tolkien?

How to get cool night-vision without lame drawbacks?

Was there ever a name for the weapons of the Others?

Change CPU MHz from Registry

Would a two-seat light aircaft with a landing speed of 20 knots and a top speed of 180 knots be technically possible?

Has there been any indication at all that further negotiation between the UK and EU is possible?

First-year PhD giving a talk among well-established researchers in the field

Distance Matrix (plugin) - QGIS

Does Marvel have an equivalent of the Green Lantern?

How precise do models need to be for 3d printing?

Can White Castle?

Inverse-quotes-quine

Why do textbooks often include the solutions to odd or even numbered problems but not both?

Hot coffee brewing solutions for deep woods camping

Can the negators "jamais, rien, personne, plus, ni, aucun" be used in a single sentence?

Why is there no havdallah when going from Yom Tov into Shabbat?

STM Microcontroller burns every time

Should I hide continue button until tasks are completed?

Why do some games show lights shine through walls?

Intuitively, why does putting capacitors in series decrease the equivalent capacitance?

Is there a maximum distance from a planet that a moon can orbit?

Change the boot order with no option in UEFI settings

How to add multiple ip address in destination ip in acl rule

Code downloads a text file from a website, saves it to local disk, and then loads it into a list for further processing

Splitting a CSV file with headersMaking a list from user inputSplitting plain text dictionary data to multiple files, round 2.txt word-counterReusable, generic save methodFile copy in Python for slow networksFile copy in Python for slow networks (version 2)Reading a DHT11 to a fileMad Libs excerciseMinimising latency on a pyautogui responce when a given pixel is seen

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;

My program opens a website and downloads a text file. The text file is a simple file with one word per line. I save the file to local disk and then create a list to hold each line of the text file for later processing. I would like to know if I am doing these first steps in a way that would be considered idiomatic Python and have I made any big mistakes that will hamper my efforts to expand on it later.

This is similar to an exercise in Think Python by Allen Downey. He suggests using a browser to download the text file but I wanted to do it with Python.

import requests

def get_webpage(uri):
 return requests.get(uri)


def save_webpagecontent(r, filename):
 """ This function saves the page retrieved by get_webpage. r is the 
 response from the call to requests.get and
 filename is where we want to save the file to in the filesystem."""

 chunk_size = 8388608 # number of bytes to write to disk in each chunk
 with open(filename, 'wb') as fd:
 for chunk in r.iter_content(chunk_size):
 fd.write(chunk)


def make_wordlist(filename):
 wordlist = []
 with open(filename) as fd:
 wordlist = fd.readlines()
 return wordlist


def get_mylist(wordlist, num_lines=10):
 if len(wordlist) <= num_lines:
 return wordlist
 return wordlist[:num_lines]


def print_mylist(mylist):
 for word in mylist:
 print(word.strip())
 return None

"""List of words collected and contributed to the public domain by
Grady Ward as part of the Moby lexicon project. See https://en.wikipedia.org/wiki/Moby_Project
"""
uri = 'https://ia802308.us.archive.org/7/items/mobywordlists03201gut/CROSSWD.TXT'
filename = 'wordlist.txt'

r = get_webpage(uri)
save_webpagecontent(r, filename)
wordlist = make_wordlist(filename)
mylist = get_mylist(wordlist)
print_mylist(mylist)

My program works as I expect it to. I have basically found how to do each individual piece by reading this forum and but I would like to know if I'm putting all the pieces together correctly. By correctly I mean something that not only functions as expected but also will be easy to build larger programs and modules from.

I hope it isn't wrong of me to post this much code. I wasn't sure how I could trim it down and still show what I am doing. Please let me know if I need to change the format of my question.

edited Jun 7 at 7:27

AlexV

2,7359 silver badges31 bronze badges

asked Jun 7 at 5:56

Duane Whitty

687 bronze badges

3

$begingroup$
Your code looks good, and it is not really long. Here on Code Review it is accepted to sometimes post around 1000 lines, if that's necessary for understanding the code. When pasting your code, you made a small mistake with the indentation around the chunk_size line. After you repaired this, your question is in a perfect format for this site, especially since you explained in detail what code you wrote and why. That's something essential that several other questions are missing.
$endgroup$
– Roland Illig
Jun 7 at 6:12

1

$begingroup$
Thank you for your remarks. I feel more encouraged about the process now!
$endgroup$
– Duane Whitty
Jun 7 at 6:21

3

$begingroup$
Don't worry, the SE network can be confusing for new users. You're in The Good Place now.
$endgroup$
– Mast
Jun 7 at 6:29

$begingroup$
Might want to add an event in case the request can't be completed and you can't download the code.
$endgroup$
– BruceWayne
Jun 7 at 19:49

add a comment |

This is similar to an exercise in Think Python by Allen Downey. He suggests using a browser to download the text file but I wanted to do it with Python.

import requests

def get_webpage(uri):
 return requests.get(uri)


def save_webpagecontent(r, filename):
 """ This function saves the page retrieved by get_webpage. r is the 
 response from the call to requests.get and
 filename is where we want to save the file to in the filesystem."""

 chunk_size = 8388608 # number of bytes to write to disk in each chunk
 with open(filename, 'wb') as fd:
 for chunk in r.iter_content(chunk_size):
 fd.write(chunk)


def make_wordlist(filename):
 wordlist = []
 with open(filename) as fd:
 wordlist = fd.readlines()
 return wordlist


def get_mylist(wordlist, num_lines=10):
 if len(wordlist) <= num_lines:
 return wordlist
 return wordlist[:num_lines]


def print_mylist(mylist):
 for word in mylist:
 print(word.strip())
 return None

"""List of words collected and contributed to the public domain by
Grady Ward as part of the Moby lexicon project. See https://en.wikipedia.org/wiki/Moby_Project
"""
uri = 'https://ia802308.us.archive.org/7/items/mobywordlists03201gut/CROSSWD.TXT'
filename = 'wordlist.txt'

r = get_webpage(uri)
save_webpagecontent(r, filename)
wordlist = make_wordlist(filename)
mylist = get_mylist(wordlist)
print_mylist(mylist)

I hope it isn't wrong of me to post this much code. I wasn't sure how I could trim it down and still show what I am doing. Please let me know if I need to change the format of my question.

edited Jun 7 at 7:27

AlexV

2,7359 silver badges31 bronze badges

asked Jun 7 at 5:56

Duane Whitty

687 bronze badges

3

$begingroup$
Your code looks good, and it is not really long. Here on Code Review it is accepted to sometimes post around 1000 lines, if that's necessary for understanding the code. When pasting your code, you made a small mistake with the indentation around the chunk_size line. After you repaired this, your question is in a perfect format for this site, especially since you explained in detail what code you wrote and why. That's something essential that several other questions are missing.
$endgroup$
– Roland Illig
Jun 7 at 6:12

1

$begingroup$
Thank you for your remarks. I feel more encouraged about the process now!
$endgroup$
– Duane Whitty
Jun 7 at 6:21

3

$begingroup$
Don't worry, the SE network can be confusing for new users. You're in The Good Place now.
$endgroup$
– Mast
Jun 7 at 6:29

$begingroup$
Might want to add an event in case the request can't be completed and you can't download the code.
$endgroup$
– BruceWayne
Jun 7 at 19:49

add a comment |

This is similar to an exercise in Think Python by Allen Downey. He suggests using a browser to download the text file but I wanted to do it with Python.

import requests

def get_webpage(uri):
 return requests.get(uri)


def save_webpagecontent(r, filename):
 """ This function saves the page retrieved by get_webpage. r is the 
 response from the call to requests.get and
 filename is where we want to save the file to in the filesystem."""

 chunk_size = 8388608 # number of bytes to write to disk in each chunk
 with open(filename, 'wb') as fd:
 for chunk in r.iter_content(chunk_size):
 fd.write(chunk)


def make_wordlist(filename):
 wordlist = []
 with open(filename) as fd:
 wordlist = fd.readlines()
 return wordlist


def get_mylist(wordlist, num_lines=10):
 if len(wordlist) <= num_lines:
 return wordlist
 return wordlist[:num_lines]


def print_mylist(mylist):
 for word in mylist:
 print(word.strip())
 return None

"""List of words collected and contributed to the public domain by
Grady Ward as part of the Moby lexicon project. See https://en.wikipedia.org/wiki/Moby_Project
"""
uri = 'https://ia802308.us.archive.org/7/items/mobywordlists03201gut/CROSSWD.TXT'
filename = 'wordlist.txt'

r = get_webpage(uri)
save_webpagecontent(r, filename)
wordlist = make_wordlist(filename)
mylist = get_mylist(wordlist)
print_mylist(mylist)

I hope it isn't wrong of me to post this much code. I wasn't sure how I could trim it down and still show what I am doing. Please let me know if I need to change the format of my question.

edited Jun 7 at 7:27

AlexV

2,7359 silver badges31 bronze badges

asked Jun 7 at 5:56

Duane Whitty

687 bronze badges

This is similar to an exercise in Think Python by Allen Downey. He suggests using a browser to download the text file but I wanted to do it with Python.

import requests

def get_webpage(uri):
 return requests.get(uri)


def save_webpagecontent(r, filename):
 """ This function saves the page retrieved by get_webpage. r is the 
 response from the call to requests.get and
 filename is where we want to save the file to in the filesystem."""

 chunk_size = 8388608 # number of bytes to write to disk in each chunk
 with open(filename, 'wb') as fd:
 for chunk in r.iter_content(chunk_size):
 fd.write(chunk)


def make_wordlist(filename):
 wordlist = []
 with open(filename) as fd:
 wordlist = fd.readlines()
 return wordlist


def get_mylist(wordlist, num_lines=10):
 if len(wordlist) <= num_lines:
 return wordlist
 return wordlist[:num_lines]


def print_mylist(mylist):
 for word in mylist:
 print(word.strip())
 return None

"""List of words collected and contributed to the public domain by
Grady Ward as part of the Moby lexicon project. See https://en.wikipedia.org/wiki/Moby_Project
"""
uri = 'https://ia802308.us.archive.org/7/items/mobywordlists03201gut/CROSSWD.TXT'
filename = 'wordlist.txt'

r = get_webpage(uri)
save_webpagecontent(r, filename)
wordlist = make_wordlist(filename)
mylist = get_mylist(wordlist)
print_mylist(mylist)

I hope it isn't wrong of me to post this much code. I wasn't sure how I could trim it down and still show what I am doing. Please let me know if I need to change the format of my question.

python array file

edited Jun 7 at 7:27

AlexV

2,7359 silver badges31 bronze badges

asked Jun 7 at 5:56

Duane Whitty

687 bronze badges

edited Jun 7 at 7:27

AlexV

2,7359 silver badges31 bronze badges

asked Jun 7 at 5:56

Duane Whitty

687 bronze badges

edited Jun 7 at 7:27

AlexV

2,7359 silver badges31 bronze badges

edited Jun 7 at 7:27

AlexV

2,7359 silver badges31 bronze badges

edited Jun 7 at 7:27

AlexV

2,7359 silver badges31 bronze badges

asked Jun 7 at 5:56

Duane Whitty

687 bronze badges

asked Jun 7 at 5:56

Duane Whitty

687 bronze badges

asked Jun 7 at 5:56

Duane Whitty

687 bronze badges

3

$begingroup$
Your code looks good, and it is not really long. Here on Code Review it is accepted to sometimes post around 1000 lines, if that's necessary for understanding the code. When pasting your code, you made a small mistake with the indentation around the chunk_size line. After you repaired this, your question is in a perfect format for this site, especially since you explained in detail what code you wrote and why. That's something essential that several other questions are missing.
$endgroup$
– Roland Illig
Jun 7 at 6:12

1

$begingroup$
Thank you for your remarks. I feel more encouraged about the process now!
$endgroup$
– Duane Whitty
Jun 7 at 6:21

3

$begingroup$
Don't worry, the SE network can be confusing for new users. You're in The Good Place now.
$endgroup$
– Mast
Jun 7 at 6:29

$begingroup$
Might want to add an event in case the request can't be completed and you can't download the code.
$endgroup$
– BruceWayne
Jun 7 at 19:49

add a comment |

3

$begingroup$
Your code looks good, and it is not really long. Here on Code Review it is accepted to sometimes post around 1000 lines, if that's necessary for understanding the code. When pasting your code, you made a small mistake with the indentation around the chunk_size line. After you repaired this, your question is in a perfect format for this site, especially since you explained in detail what code you wrote and why. That's something essential that several other questions are missing.
$endgroup$
– Roland Illig
Jun 7 at 6:12

1

$begingroup$
Thank you for your remarks. I feel more encouraged about the process now!
$endgroup$
– Duane Whitty
Jun 7 at 6:21

3

$begingroup$
Don't worry, the SE network can be confusing for new users. You're in The Good Place now.
$endgroup$
– Mast
Jun 7 at 6:29

$begingroup$
Might want to add an event in case the request can't be completed and you can't download the code.
$endgroup$
– BruceWayne
Jun 7 at 19:49

Your code looks good, and it is not really long. Here on Code Review it is accepted to sometimes post around 1000 lines, if that's necessary for understanding the code. When pasting your code, you made a small mistake with the indentation around the chunk_size line. After you repaired this, your question is in a perfect format for this site, especially since you explained in detail what code you wrote and why. That's something essential that several other questions are missing.

– Roland Illig
Jun 7 at 6:12

Thank you for your remarks. I feel more encouraged about the process now!

– Duane Whitty
Jun 7 at 6:21

Don't worry, the SE network can be confusing for new users. You're in The Good Place now.

– Mast
Jun 7 at 6:29

Might want to add an event in case the request can't be completed and you can't download the code.

– BruceWayne
Jun 7 at 19:49

add a comment |

3 Answers
3

active

oldest

votes

Your code is nice and concise however there are some changes you can make:

You can just return f.readlines() in make_wordlist.

If you've done this to show that the result is a list then it'd be better to use the typing module.
```
from typing import List


def make_wordlist(filename: str) -> List[str]:
 ...
```

get_mylist can be replaced with wordlist[:numlines]. This is because if len(wordlist) is smaller or equal to numlines, then it will return the entire thing anyway.

Performance wise it's best to use print('n'.join(list)) rather than for item in list: print(item).

I would prefer to be able to change chunk_size in save_webpagecontent and so you can make it a default argument.

IIRC multi-line docstrings shouldn't start on the same line as the """, nor should they end on the same line either.

import requests
from typing import List

Response = requests.Response


def get_webpage(uri) -> Response:
 return requests.get(uri)


def save_webpagecontent(r: Response, filename: str,
 chunk_size: int=8388608) -> None:
 """
 This function saves the page retrieved by get_webpage. r is the 
 response from the call to requests.get and
 filename is where we want to save the file to in the filesystem.
 """
 with open(filename, 'wb') as fd:
 for chunk in r.iter_content(chunk_size):
 fd.write(chunk)


def read_wordlist(filename: str) -> List[str]:
 with open(filename) as fd:
 return fd.readlines()


def print_mylist(word_list: List[str]) -> None:
 print('n'.join(word.strip() for word in word_list))


"""
List of words collected and contributed to the public domain by
Grady Ward as part of the Moby lexicon project. See https://en.wikipedia.org/wiki/Moby_Project
"""
uri = 'https://ia802308.us.archive.org/7/items/mobywordlists03201gut/CROSSWD.TXT'
filename = 'wordlist.txt'

r = get_webpage(uri)
save_webpagecontent(r, filename)
print_mylist(read_wordlist(filename)[:10])

edited Jun 7 at 13:09

yuri

4,1072 gold badges12 silver badges37 bronze badges

answered Jun 7 at 9:43

Peilonrayz

29.4k4 gold badges45 silver badges119 bronze badges

1

$begingroup$
requests.get() returns a Response object, not a Request.
$endgroup$
– Lukasz Salitra
Jun 7 at 9:51

1

$begingroup$
@LukaszSalitra Thank you, I've updated the code with that. Thought I read the docs correctly :/
$endgroup$
– Peilonrayz
Jun 7 at 9:54

$begingroup$
As per PEP257 multi-line docstrings can start either right after the opening """ or on the line below.
$endgroup$
– Mathias Ettinger
Jun 7 at 13:35

1

$begingroup$
@DuaneWhitty It's 100% optional. They don't make the code faster. If you use one of mypy, pyright, pyre or pytype then they can perform static code analysis which should make your code safer.
$endgroup$
– Peilonrayz
Jun 7 at 18:27

1

$begingroup$
@DuaneWhitty No problem to use this text you need to write `this text`. Feel free to try that out here. You also can't put newlines in comments. (They would look like a mess) Yeah looks like you're correct.
$endgroup$
– Peilonrayz
Jun 7 at 22:38

|
show 4 more comments

Your program is concise and well-readable. One thing that is probably less pythonic is storing the received data in a file. If you have no further use for the file, you could just process the data into a wordlist while receiving it. This saves one intermediate step, and your program would not leave a residual wordlist.txt file around.

answered Jun 7 at 8:24

Hans-Martin Mosner

1512 bronze badges

3

$begingroup$
Besides the gains mentioned, file I/O is expensive CPU-wise. However, it's unclear at the moment whether the further processing of the words ("and then create a list to hold each line of the text file for later processing") is going to be done by the same Python program or a different program altogether. Saving to file as an intermediary step for safety/redundancy/whatever reasons is still a good reason to keep it around.
$endgroup$
– Mast
Jun 7 at 10:11

$begingroup$
Thank you both. I saved the file for two reasons: 1) I knew that I would want to be accessing the file many times over and over to try different analysis techniques as my knowledge grows. I'll be running word length analysis and letter frequency analysis and then trying out different ways to display the results. I thought it would be more efficient that downloading it many times over and over and also more polite to the site I'm downloading from. 2) I am thinking about how I should write programs in case I have less than ideal network conditions. Thanks again!
$endgroup$
– Duane Whitty
Jun 7 at 17:27

$begingroup$
One way to download the file only once would be to check for its existence and only download it if the file is not present. That's basically a simple cache without an invalidation policy, and you can manually invalidate the cache by removing the file. Alternatively, you could write separate download and analysis programs, which would help to avoid code duplication with the associated risk of having slightly different versions of the download code in each program.
$endgroup$
– Hans-Martin Mosner
Jun 7 at 20:48

$begingroup$
Thank you to everyone for the reviews. I have made some big updates to my code using the suggestions I received here. Should I now open another request for a code review or should I answer my question with my updated code? The changes include splitting the file handling into another module and using contextmanager to handle file not found errors.
$endgroup$
– Duane Whitty
Jun 8 at 3:11

add a comment |

Code downloads a text file from a website, saves it to local disk, and then loads it into a list for further processing - Version 2.0

In my new version of this code I have separated my code into 3 modules (my start at a 12 factor app):

download.py for handling downloading the text file from the website and saving it as a file to local storage;

config.py for specifying the URI of the website and the filename for local storage;

moby.py is the actual code that reads the words in the text file, 1 per line, into a list. For now all it does is prints out the words from the file, one per line.

The review my code received provided valuable suggestions on how it could be made more Pythonic, more modular, and more efficient.

Motivated by Hans-Martin Mosner to separate the file download code here is that module. Also made the chunk_size a parameter to the save_webpagecontent() function based on as suggested by Peilonrayz

download.py

import requests
from typing import List

Response = requests.Response

def get_webpage(uri) -> Response:
 return requests.get(uri)


def save_webpagecontent(r: Response, filename: str, chunk_size=8388608) -> None:
 """
 This function saves the page retrieved by get_webpage.
 r is the response from the call to requests.get.
 filename is where we want to save the file to in the filesystem.
 chunk_size is the number of bytes to write to disk in each chunk
 """

 with open(filename, 'wb') as fd:
 for chunk in r.iter_content(chunk_size):
 fd.write(chunk)

config.py

uri = 'https://ia802308.us.archive.org/7/items/mobywordlists03201gut/CROSSWD.TXT'
filename = 'wordlist.txt'

I feel I made the most gains in my Python profiency as a result of implementing the changes suggested by Peilonrayz where I did away with intermediate function calls and variables and by working on the suggestion by BruceWayne to add an event for failing to open the file. The file opening code turned out to be the most challenging. I wasn't able to get `opened_w_error() working exactly based on the example from PEP343. Figuring it out was very rewarding.

moby.py

import download_file as df
import config as cfg
from contextlib import contextmanager
from typing import List

filename = cfg.filename
uri = cfg.uri

@contextmanager
def opened_w_error(filename, mode="r"):
 try:
 f = open(filename, mode)
 except OSError as err:
 yield None, err
 else:
 try:
 yield f, None
 finally:
 f.close()


def read_wordlist(filename: str) -> List[str]:
 with opened_w_error(filename, 'r') as (fd, err):
 if type(err) == FileNotFoundError:
 df.save_webpagecontent(df.get_webpage(uri), filename) #since it failed the first time we need to actually download it
 with opened_w_error(filename, 'r') as (fd, err): # if it fails again abort
 if err:
 print("OSError:", err)
 else:
 return fd.readlines()
 else:
 return fd.readlines()


def print_mylist(wordlist: List[str]) -> None:
 print('n'.join(word.strip() for word in wordlist))


print_mylist(read_wordlist(filename)[:50])

Thank you to everyone, especially Roland Illig, Hans-Martin Mosner, and Mast for all your help and encouragement and a safe place to learn!

edited Jun 8 at 4:54

answered Jun 8 at 4:40

Duane Whitty

687 bronze badges

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "196"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f221830%2fcode-downloads-a-text-file-from-a-website-saves-it-to-local-disk-and-then-load%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

3 Answers
3

active

oldest

votes

3 Answers
3

active

oldest

votes

Your code is nice and concise however there are some changes you can make:

You can just return f.readlines() in make_wordlist.

If you've done this to show that the result is a list then it'd be better to use the typing module.
```
from typing import List


def make_wordlist(filename: str) -> List[str]:
 ...
```

get_mylist can be replaced with wordlist[:numlines]. This is because if len(wordlist) is smaller or equal to numlines, then it will return the entire thing anyway.

Performance wise it's best to use print('n'.join(list)) rather than for item in list: print(item).

I would prefer to be able to change chunk_size in save_webpagecontent and so you can make it a default argument.

IIRC multi-line docstrings shouldn't start on the same line as the """, nor should they end on the same line either.

import requests
from typing import List

Response = requests.Response


def get_webpage(uri) -> Response:
 return requests.get(uri)


def save_webpagecontent(r: Response, filename: str,
 chunk_size: int=8388608) -> None:
 """
 This function saves the page retrieved by get_webpage. r is the 
 response from the call to requests.get and
 filename is where we want to save the file to in the filesystem.
 """
 with open(filename, 'wb') as fd:
 for chunk in r.iter_content(chunk_size):
 fd.write(chunk)


def read_wordlist(filename: str) -> List[str]:
 with open(filename) as fd:
 return fd.readlines()


def print_mylist(word_list: List[str]) -> None:
 print('n'.join(word.strip() for word in word_list))


"""
List of words collected and contributed to the public domain by
Grady Ward as part of the Moby lexicon project. See https://en.wikipedia.org/wiki/Moby_Project
"""
uri = 'https://ia802308.us.archive.org/7/items/mobywordlists03201gut/CROSSWD.TXT'
filename = 'wordlist.txt'

r = get_webpage(uri)
save_webpagecontent(r, filename)
print_mylist(read_wordlist(filename)[:10])

edited Jun 7 at 13:09

yuri

4,1072 gold badges12 silver badges37 bronze badges

answered Jun 7 at 9:43

Peilonrayz

29.4k4 gold badges45 silver badges119 bronze badges

1

$begingroup$
requests.get() returns a Response object, not a Request.
$endgroup$
– Lukasz Salitra
Jun 7 at 9:51

1

$begingroup$
@LukaszSalitra Thank you, I've updated the code with that. Thought I read the docs correctly :/
$endgroup$
– Peilonrayz
Jun 7 at 9:54

$begingroup$
As per PEP257 multi-line docstrings can start either right after the opening """ or on the line below.
$endgroup$
– Mathias Ettinger
Jun 7 at 13:35

1

$begingroup$
@DuaneWhitty It's 100% optional. They don't make the code faster. If you use one of mypy, pyright, pyre or pytype then they can perform static code analysis which should make your code safer.
$endgroup$
– Peilonrayz
Jun 7 at 18:27

1

$begingroup$
@DuaneWhitty No problem to use this text you need to write `this text`. Feel free to try that out here. You also can't put newlines in comments. (They would look like a mess) Yeah looks like you're correct.
$endgroup$
– Peilonrayz
Jun 7 at 22:38

|
show 4 more comments

Your code is nice and concise however there are some changes you can make:

You can just return f.readlines() in make_wordlist.

If you've done this to show that the result is a list then it'd be better to use the typing module.
```
from typing import List


def make_wordlist(filename: str) -> List[str]:
 ...
```

get_mylist can be replaced with wordlist[:numlines]. This is because if len(wordlist) is smaller or equal to numlines, then it will return the entire thing anyway.

Performance wise it's best to use print('n'.join(list)) rather than for item in list: print(item).

I would prefer to be able to change chunk_size in save_webpagecontent and so you can make it a default argument.

IIRC multi-line docstrings shouldn't start on the same line as the """, nor should they end on the same line either.

import requests
from typing import List

Response = requests.Response


def get_webpage(uri) -> Response:
 return requests.get(uri)


def save_webpagecontent(r: Response, filename: str,
 chunk_size: int=8388608) -> None:
 """
 This function saves the page retrieved by get_webpage. r is the 
 response from the call to requests.get and
 filename is where we want to save the file to in the filesystem.
 """
 with open(filename, 'wb') as fd:
 for chunk in r.iter_content(chunk_size):
 fd.write(chunk)


def read_wordlist(filename: str) -> List[str]:
 with open(filename) as fd:
 return fd.readlines()


def print_mylist(word_list: List[str]) -> None:
 print('n'.join(word.strip() for word in word_list))


"""
List of words collected and contributed to the public domain by
Grady Ward as part of the Moby lexicon project. See https://en.wikipedia.org/wiki/Moby_Project
"""
uri = 'https://ia802308.us.archive.org/7/items/mobywordlists03201gut/CROSSWD.TXT'
filename = 'wordlist.txt'

r = get_webpage(uri)
save_webpagecontent(r, filename)
print_mylist(read_wordlist(filename)[:10])

edited Jun 7 at 13:09

yuri

4,1072 gold badges12 silver badges37 bronze badges

answered Jun 7 at 9:43

Peilonrayz

29.4k4 gold badges45 silver badges119 bronze badges

1

$begingroup$
requests.get() returns a Response object, not a Request.
$endgroup$
– Lukasz Salitra
Jun 7 at 9:51

1

$begingroup$
@LukaszSalitra Thank you, I've updated the code with that. Thought I read the docs correctly :/
$endgroup$
– Peilonrayz
Jun 7 at 9:54

$begingroup$
As per PEP257 multi-line docstrings can start either right after the opening """ or on the line below.
$endgroup$
– Mathias Ettinger
Jun 7 at 13:35

1

$begingroup$
@DuaneWhitty It's 100% optional. They don't make the code faster. If you use one of mypy, pyright, pyre or pytype then they can perform static code analysis which should make your code safer.
$endgroup$
– Peilonrayz
Jun 7 at 18:27

1

$begingroup$
@DuaneWhitty No problem to use this text you need to write `this text`. Feel free to try that out here. You also can't put newlines in comments. (They would look like a mess) Yeah looks like you're correct.
$endgroup$
– Peilonrayz
Jun 7 at 22:38

|
show 4 more comments

Your code is nice and concise however there are some changes you can make:

You can just return f.readlines() in make_wordlist.

If you've done this to show that the result is a list then it'd be better to use the typing module.
```
from typing import List


def make_wordlist(filename: str) -> List[str]:
 ...
```

get_mylist can be replaced with wordlist[:numlines]. This is because if len(wordlist) is smaller or equal to numlines, then it will return the entire thing anyway.

Performance wise it's best to use print('n'.join(list)) rather than for item in list: print(item).

I would prefer to be able to change chunk_size in save_webpagecontent and so you can make it a default argument.

IIRC multi-line docstrings shouldn't start on the same line as the """, nor should they end on the same line either.

import requests
from typing import List

Response = requests.Response


def get_webpage(uri) -> Response:
 return requests.get(uri)


def save_webpagecontent(r: Response, filename: str,
 chunk_size: int=8388608) -> None:
 """
 This function saves the page retrieved by get_webpage. r is the 
 response from the call to requests.get and
 filename is where we want to save the file to in the filesystem.
 """
 with open(filename, 'wb') as fd:
 for chunk in r.iter_content(chunk_size):
 fd.write(chunk)


def read_wordlist(filename: str) -> List[str]:
 with open(filename) as fd:
 return fd.readlines()


def print_mylist(word_list: List[str]) -> None:
 print('n'.join(word.strip() for word in word_list))


"""
List of words collected and contributed to the public domain by
Grady Ward as part of the Moby lexicon project. See https://en.wikipedia.org/wiki/Moby_Project
"""
uri = 'https://ia802308.us.archive.org/7/items/mobywordlists03201gut/CROSSWD.TXT'
filename = 'wordlist.txt'

r = get_webpage(uri)
save_webpagecontent(r, filename)
print_mylist(read_wordlist(filename)[:10])

edited Jun 7 at 13:09

yuri

4,1072 gold badges12 silver badges37 bronze badges

answered Jun 7 at 9:43

Peilonrayz

29.4k4 gold badges45 silver badges119 bronze badges

Your code is nice and concise however there are some changes you can make:

You can just return f.readlines() in make_wordlist.

If you've done this to show that the result is a list then it'd be better to use the typing module.
```
from typing import List


def make_wordlist(filename: str) -> List[str]:
 ...
```

get_mylist can be replaced with wordlist[:numlines]. This is because if len(wordlist) is smaller or equal to numlines, then it will return the entire thing anyway.

Performance wise it's best to use print('n'.join(list)) rather than for item in list: print(item).

I would prefer to be able to change chunk_size in save_webpagecontent and so you can make it a default argument.

IIRC multi-line docstrings shouldn't start on the same line as the """, nor should they end on the same line either.

import requests
from typing import List

Response = requests.Response


def get_webpage(uri) -> Response:
 return requests.get(uri)


def save_webpagecontent(r: Response, filename: str,
 chunk_size: int=8388608) -> None:
 """
 This function saves the page retrieved by get_webpage. r is the 
 response from the call to requests.get and
 filename is where we want to save the file to in the filesystem.
 """
 with open(filename, 'wb') as fd:
 for chunk in r.iter_content(chunk_size):
 fd.write(chunk)


def read_wordlist(filename: str) -> List[str]:
 with open(filename) as fd:
 return fd.readlines()


def print_mylist(word_list: List[str]) -> None:
 print('n'.join(word.strip() for word in word_list))


"""
List of words collected and contributed to the public domain by
Grady Ward as part of the Moby lexicon project. See https://en.wikipedia.org/wiki/Moby_Project
"""
uri = 'https://ia802308.us.archive.org/7/items/mobywordlists03201gut/CROSSWD.TXT'
filename = 'wordlist.txt'

r = get_webpage(uri)
save_webpagecontent(r, filename)
print_mylist(read_wordlist(filename)[:10])

edited Jun 7 at 13:09

yuri

4,1072 gold badges12 silver badges37 bronze badges

answered Jun 7 at 9:43

Peilonrayz

29.4k4 gold badges45 silver badges119 bronze badges

edited Jun 7 at 13:09

yuri

4,1072 gold badges12 silver badges37 bronze badges

edited Jun 7 at 13:09

yuri

4,1072 gold badges12 silver badges37 bronze badges

edited Jun 7 at 13:09

yuri

4,1072 gold badges12 silver badges37 bronze badges

answered Jun 7 at 9:43

Peilonrayz

29.4k4 gold badges45 silver badges119 bronze badges

answered Jun 7 at 9:43

Peilonrayz

29.4k4 gold badges45 silver badges119 bronze badges

answered Jun 7 at 9:43

Peilonrayz

29.4k4 gold badges45 silver badges119 bronze badges

1

$begingroup$
requests.get() returns a Response object, not a Request.
$endgroup$
– Lukasz Salitra
Jun 7 at 9:51

1

$begingroup$
@LukaszSalitra Thank you, I've updated the code with that. Thought I read the docs correctly :/
$endgroup$
– Peilonrayz
Jun 7 at 9:54

$begingroup$
As per PEP257 multi-line docstrings can start either right after the opening """ or on the line below.
$endgroup$
– Mathias Ettinger
Jun 7 at 13:35

1

$begingroup$
@DuaneWhitty It's 100% optional. They don't make the code faster. If you use one of mypy, pyright, pyre or pytype then they can perform static code analysis which should make your code safer.
$endgroup$
– Peilonrayz
Jun 7 at 18:27

1

$begingroup$
@DuaneWhitty No problem to use this text you need to write `this text`. Feel free to try that out here. You also can't put newlines in comments. (They would look like a mess) Yeah looks like you're correct.
$endgroup$
– Peilonrayz
Jun 7 at 22:38

|
show 4 more comments

1

$begingroup$
requests.get() returns a Response object, not a Request.
$endgroup$
– Lukasz Salitra
Jun 7 at 9:51

1

$begingroup$
@LukaszSalitra Thank you, I've updated the code with that. Thought I read the docs correctly :/
$endgroup$
– Peilonrayz
Jun 7 at 9:54

$begingroup$
As per PEP257 multi-line docstrings can start either right after the opening """ or on the line below.
$endgroup$
– Mathias Ettinger
Jun 7 at 13:35

1

$begingroup$
@DuaneWhitty It's 100% optional. They don't make the code faster. If you use one of mypy, pyright, pyre or pytype then they can perform static code analysis which should make your code safer.
$endgroup$
– Peilonrayz
Jun 7 at 18:27

1

$begingroup$
@DuaneWhitty No problem to use this text you need to write `this text`. Feel free to try that out here. You also can't put newlines in comments. (They would look like a mess) Yeah looks like you're correct.
$endgroup$
– Peilonrayz
Jun 7 at 22:38

requests.get() returns a Response object, not a Request.

– Lukasz Salitra
Jun 7 at 9:51

@LukaszSalitra Thank you, I've updated the code with that. Thought I read the docs correctly :/

– Peilonrayz
Jun 7 at 9:54

As per PEP257 multi-line docstrings can start either right after the opening """ or on the line below.

– Mathias Ettinger
Jun 7 at 13:35

@DuaneWhitty It's 100% optional. They don't make the code faster. If you use one of mypy, pyright, pyre or pytype then they can perform static code analysis which should make your code safer.

– Peilonrayz
Jun 7 at 18:27

@DuaneWhitty No problem to use this text you need to write `this text`. Feel free to try that out here. You also can't put newlines in comments. (They would look like a mess) Yeah looks like you're correct.

– Peilonrayz
Jun 7 at 22:38

|
show 4 more comments

answered Jun 7 at 8:24

Hans-Martin Mosner

1512 bronze badges

3

$begingroup$
Besides the gains mentioned, file I/O is expensive CPU-wise. However, it's unclear at the moment whether the further processing of the words ("and then create a list to hold each line of the text file for later processing") is going to be done by the same Python program or a different program altogether. Saving to file as an intermediary step for safety/redundancy/whatever reasons is still a good reason to keep it around.
$endgroup$
– Mast
Jun 7 at 10:11

$begingroup$
Thank you both. I saved the file for two reasons: 1) I knew that I would want to be accessing the file many times over and over to try different analysis techniques as my knowledge grows. I'll be running word length analysis and letter frequency analysis and then trying out different ways to display the results. I thought it would be more efficient that downloading it many times over and over and also more polite to the site I'm downloading from. 2) I am thinking about how I should write programs in case I have less than ideal network conditions. Thanks again!
$endgroup$
– Duane Whitty
Jun 7 at 17:27

$begingroup$
One way to download the file only once would be to check for its existence and only download it if the file is not present. That's basically a simple cache without an invalidation policy, and you can manually invalidate the cache by removing the file. Alternatively, you could write separate download and analysis programs, which would help to avoid code duplication with the associated risk of having slightly different versions of the download code in each program.
$endgroup$
– Hans-Martin Mosner
Jun 7 at 20:48

$begingroup$
Thank you to everyone for the reviews. I have made some big updates to my code using the suggestions I received here. Should I now open another request for a code review or should I answer my question with my updated code? The changes include splitting the file handling into another module and using contextmanager to handle file not found errors.
$endgroup$
– Duane Whitty
Jun 8 at 3:11

add a comment |

answered Jun 7 at 8:24

Hans-Martin Mosner

1512 bronze badges

3

$begingroup$
Besides the gains mentioned, file I/O is expensive CPU-wise. However, it's unclear at the moment whether the further processing of the words ("and then create a list to hold each line of the text file for later processing") is going to be done by the same Python program or a different program altogether. Saving to file as an intermediary step for safety/redundancy/whatever reasons is still a good reason to keep it around.
$endgroup$
– Mast
Jun 7 at 10:11

$begingroup$
Thank you both. I saved the file for two reasons: 1) I knew that I would want to be accessing the file many times over and over to try different analysis techniques as my knowledge grows. I'll be running word length analysis and letter frequency analysis and then trying out different ways to display the results. I thought it would be more efficient that downloading it many times over and over and also more polite to the site I'm downloading from. 2) I am thinking about how I should write programs in case I have less than ideal network conditions. Thanks again!
$endgroup$
– Duane Whitty
Jun 7 at 17:27

$begingroup$
One way to download the file only once would be to check for its existence and only download it if the file is not present. That's basically a simple cache without an invalidation policy, and you can manually invalidate the cache by removing the file. Alternatively, you could write separate download and analysis programs, which would help to avoid code duplication with the associated risk of having slightly different versions of the download code in each program.
$endgroup$
– Hans-Martin Mosner
Jun 7 at 20:48

$begingroup$
Thank you to everyone for the reviews. I have made some big updates to my code using the suggestions I received here. Should I now open another request for a code review or should I answer my question with my updated code? The changes include splitting the file handling into another module and using contextmanager to handle file not found errors.
$endgroup$
– Duane Whitty
Jun 8 at 3:11

add a comment |

answered Jun 7 at 8:24

Hans-Martin Mosner

1512 bronze badges

answered Jun 7 at 8:24

Hans-Martin Mosner

1512 bronze badges

answered Jun 7 at 8:24

Hans-Martin Mosner

1512 bronze badges

answered Jun 7 at 8:24

Hans-Martin Mosner

1512 bronze badges

answered Jun 7 at 8:24

Hans-Martin Mosner

1512 bronze badges

3

$begingroup$
Besides the gains mentioned, file I/O is expensive CPU-wise. However, it's unclear at the moment whether the further processing of the words ("and then create a list to hold each line of the text file for later processing") is going to be done by the same Python program or a different program altogether. Saving to file as an intermediary step for safety/redundancy/whatever reasons is still a good reason to keep it around.
$endgroup$
– Mast
Jun 7 at 10:11

$begingroup$
Thank you both. I saved the file for two reasons: 1) I knew that I would want to be accessing the file many times over and over to try different analysis techniques as my knowledge grows. I'll be running word length analysis and letter frequency analysis and then trying out different ways to display the results. I thought it would be more efficient that downloading it many times over and over and also more polite to the site I'm downloading from. 2) I am thinking about how I should write programs in case I have less than ideal network conditions. Thanks again!
$endgroup$
– Duane Whitty
Jun 7 at 17:27

$begingroup$
One way to download the file only once would be to check for its existence and only download it if the file is not present. That's basically a simple cache without an invalidation policy, and you can manually invalidate the cache by removing the file. Alternatively, you could write separate download and analysis programs, which would help to avoid code duplication with the associated risk of having slightly different versions of the download code in each program.
$endgroup$
– Hans-Martin Mosner
Jun 7 at 20:48

$begingroup$
Thank you to everyone for the reviews. I have made some big updates to my code using the suggestions I received here. Should I now open another request for a code review or should I answer my question with my updated code? The changes include splitting the file handling into another module and using contextmanager to handle file not found errors.
$endgroup$
– Duane Whitty
Jun 8 at 3:11

add a comment |

3

$begingroup$
Besides the gains mentioned, file I/O is expensive CPU-wise. However, it's unclear at the moment whether the further processing of the words ("and then create a list to hold each line of the text file for later processing") is going to be done by the same Python program or a different program altogether. Saving to file as an intermediary step for safety/redundancy/whatever reasons is still a good reason to keep it around.
$endgroup$
– Mast
Jun 7 at 10:11

$begingroup$
Thank you both. I saved the file for two reasons: 1) I knew that I would want to be accessing the file many times over and over to try different analysis techniques as my knowledge grows. I'll be running word length analysis and letter frequency analysis and then trying out different ways to display the results. I thought it would be more efficient that downloading it many times over and over and also more polite to the site I'm downloading from. 2) I am thinking about how I should write programs in case I have less than ideal network conditions. Thanks again!
$endgroup$
– Duane Whitty
Jun 7 at 17:27

$begingroup$
One way to download the file only once would be to check for its existence and only download it if the file is not present. That's basically a simple cache without an invalidation policy, and you can manually invalidate the cache by removing the file. Alternatively, you could write separate download and analysis programs, which would help to avoid code duplication with the associated risk of having slightly different versions of the download code in each program.
$endgroup$
– Hans-Martin Mosner
Jun 7 at 20:48

$begingroup$
Thank you to everyone for the reviews. I have made some big updates to my code using the suggestions I received here. Should I now open another request for a code review or should I answer my question with my updated code? The changes include splitting the file handling into another module and using contextmanager to handle file not found errors.
$endgroup$
– Duane Whitty
Jun 8 at 3:11

Besides the gains mentioned, file I/O is expensive CPU-wise. However, it's unclear at the moment whether the further processing of the words ("and then create a list to hold each line of the text file for later processing") is going to be done by the same Python program or a different program altogether. Saving to file as an intermediary step for safety/redundancy/whatever reasons is still a good reason to keep it around.

– Mast
Jun 7 at 10:11

Thank you both. I saved the file for two reasons: 1) I knew that I would want to be accessing the file many times over and over to try different analysis techniques as my knowledge grows. I'll be running word length analysis and letter frequency analysis and then trying out different ways to display the results. I thought it would be more efficient that downloading it many times over and over and also more polite to the site I'm downloading from. 2) I am thinking about how I should write programs in case I have less than ideal network conditions. Thanks again!

– Duane Whitty
Jun 7 at 17:27

One way to download the file only once would be to check for its existence and only download it if the file is not present. That's basically a simple cache without an invalidation policy, and you can manually invalidate the cache by removing the file. Alternatively, you could write separate download and analysis programs, which would help to avoid code duplication with the associated risk of having slightly different versions of the download code in each program.

– Hans-Martin Mosner
Jun 7 at 20:48

Thank you to everyone for the reviews. I have made some big updates to my code using the suggestions I received here. Should I now open another request for a code review or should I answer my question with my updated code? The changes include splitting the file handling into another module and using contextmanager to handle file not found errors.

– Duane Whitty
Jun 8 at 3:11

add a comment |

Code downloads a text file from a website, saves it to local disk, and then loads it into a list for further processing - Version 2.0

In my new version of this code I have separated my code into 3 modules (my start at a 12 factor app):

download.py for handling downloading the text file from the website and saving it as a file to local storage;

config.py for specifying the URI of the website and the filename for local storage;

moby.py is the actual code that reads the words in the text file, 1 per line, into a list. For now all it does is prints out the words from the file, one per line.

The review my code received provided valuable suggestions on how it could be made more Pythonic, more modular, and more efficient.

Motivated by Hans-Martin Mosner to separate the file download code here is that module. Also made the chunk_size a parameter to the save_webpagecontent() function based on as suggested by Peilonrayz

download.py

import requests
from typing import List

Response = requests.Response

def get_webpage(uri) -> Response:
 return requests.get(uri)


def save_webpagecontent(r: Response, filename: str, chunk_size=8388608) -> None:
 """
 This function saves the page retrieved by get_webpage.
 r is the response from the call to requests.get.
 filename is where we want to save the file to in the filesystem.
 chunk_size is the number of bytes to write to disk in each chunk
 """

 with open(filename, 'wb') as fd:
 for chunk in r.iter_content(chunk_size):
 fd.write(chunk)

config.py

uri = 'https://ia802308.us.archive.org/7/items/mobywordlists03201gut/CROSSWD.TXT'
filename = 'wordlist.txt'

moby.py

import download_file as df
import config as cfg
from contextlib import contextmanager
from typing import List

filename = cfg.filename
uri = cfg.uri

@contextmanager
def opened_w_error(filename, mode="r"):
 try:
 f = open(filename, mode)
 except OSError as err:
 yield None, err
 else:
 try:
 yield f, None
 finally:
 f.close()


def read_wordlist(filename: str) -> List[str]:
 with opened_w_error(filename, 'r') as (fd, err):
 if type(err) == FileNotFoundError:
 df.save_webpagecontent(df.get_webpage(uri), filename) #since it failed the first time we need to actually download it
 with opened_w_error(filename, 'r') as (fd, err): # if it fails again abort
 if err:
 print("OSError:", err)
 else:
 return fd.readlines()
 else:
 return fd.readlines()


def print_mylist(wordlist: List[str]) -> None:
 print('n'.join(word.strip() for word in wordlist))


print_mylist(read_wordlist(filename)[:50])

Thank you to everyone, especially Roland Illig, Hans-Martin Mosner, and Mast for all your help and encouragement and a safe place to learn!

edited Jun 8 at 4:54

answered Jun 8 at 4:40

Duane Whitty

687 bronze badges

add a comment |

Code downloads a text file from a website, saves it to local disk, and then loads it into a list for further processing - Version 2.0

In my new version of this code I have separated my code into 3 modules (my start at a 12 factor app):

download.py for handling downloading the text file from the website and saving it as a file to local storage;

config.py for specifying the URI of the website and the filename for local storage;

moby.py is the actual code that reads the words in the text file, 1 per line, into a list. For now all it does is prints out the words from the file, one per line.

The review my code received provided valuable suggestions on how it could be made more Pythonic, more modular, and more efficient.

Motivated by Hans-Martin Mosner to separate the file download code here is that module. Also made the chunk_size a parameter to the save_webpagecontent() function based on as suggested by Peilonrayz

download.py

import requests
from typing import List

Response = requests.Response

def get_webpage(uri) -> Response:
 return requests.get(uri)


def save_webpagecontent(r: Response, filename: str, chunk_size=8388608) -> None:
 """
 This function saves the page retrieved by get_webpage.
 r is the response from the call to requests.get.
 filename is where we want to save the file to in the filesystem.
 chunk_size is the number of bytes to write to disk in each chunk
 """

 with open(filename, 'wb') as fd:
 for chunk in r.iter_content(chunk_size):
 fd.write(chunk)

config.py

uri = 'https://ia802308.us.archive.org/7/items/mobywordlists03201gut/CROSSWD.TXT'
filename = 'wordlist.txt'

moby.py

import download_file as df
import config as cfg
from contextlib import contextmanager
from typing import List

filename = cfg.filename
uri = cfg.uri

@contextmanager
def opened_w_error(filename, mode="r"):
 try:
 f = open(filename, mode)
 except OSError as err:
 yield None, err
 else:
 try:
 yield f, None
 finally:
 f.close()


def read_wordlist(filename: str) -> List[str]:
 with opened_w_error(filename, 'r') as (fd, err):
 if type(err) == FileNotFoundError:
 df.save_webpagecontent(df.get_webpage(uri), filename) #since it failed the first time we need to actually download it
 with opened_w_error(filename, 'r') as (fd, err): # if it fails again abort
 if err:
 print("OSError:", err)
 else:
 return fd.readlines()
 else:
 return fd.readlines()


def print_mylist(wordlist: List[str]) -> None:
 print('n'.join(word.strip() for word in wordlist))


print_mylist(read_wordlist(filename)[:50])

Thank you to everyone, especially Roland Illig, Hans-Martin Mosner, and Mast for all your help and encouragement and a safe place to learn!

edited Jun 8 at 4:54

answered Jun 8 at 4:40

Duane Whitty

687 bronze badges

add a comment |

Code downloads a text file from a website, saves it to local disk, and then loads it into a list for further processing - Version 2.0

In my new version of this code I have separated my code into 3 modules (my start at a 12 factor app):

download.py for handling downloading the text file from the website and saving it as a file to local storage;

config.py for specifying the URI of the website and the filename for local storage;

moby.py is the actual code that reads the words in the text file, 1 per line, into a list. For now all it does is prints out the words from the file, one per line.

The review my code received provided valuable suggestions on how it could be made more Pythonic, more modular, and more efficient.

Motivated by Hans-Martin Mosner to separate the file download code here is that module. Also made the chunk_size a parameter to the save_webpagecontent() function based on as suggested by Peilonrayz

download.py

import requests
from typing import List

Response = requests.Response

def get_webpage(uri) -> Response:
 return requests.get(uri)


def save_webpagecontent(r: Response, filename: str, chunk_size=8388608) -> None:
 """
 This function saves the page retrieved by get_webpage.
 r is the response from the call to requests.get.
 filename is where we want to save the file to in the filesystem.
 chunk_size is the number of bytes to write to disk in each chunk
 """

 with open(filename, 'wb') as fd:
 for chunk in r.iter_content(chunk_size):
 fd.write(chunk)

config.py

uri = 'https://ia802308.us.archive.org/7/items/mobywordlists03201gut/CROSSWD.TXT'
filename = 'wordlist.txt'

moby.py

import download_file as df
import config as cfg
from contextlib import contextmanager
from typing import List

filename = cfg.filename
uri = cfg.uri

@contextmanager
def opened_w_error(filename, mode="r"):
 try:
 f = open(filename, mode)
 except OSError as err:
 yield None, err
 else:
 try:
 yield f, None
 finally:
 f.close()


def read_wordlist(filename: str) -> List[str]:
 with opened_w_error(filename, 'r') as (fd, err):
 if type(err) == FileNotFoundError:
 df.save_webpagecontent(df.get_webpage(uri), filename) #since it failed the first time we need to actually download it
 with opened_w_error(filename, 'r') as (fd, err): # if it fails again abort
 if err:
 print("OSError:", err)
 else:
 return fd.readlines()
 else:
 return fd.readlines()


def print_mylist(wordlist: List[str]) -> None:
 print('n'.join(word.strip() for word in wordlist))


print_mylist(read_wordlist(filename)[:50])

Thank you to everyone, especially Roland Illig, Hans-Martin Mosner, and Mast for all your help and encouragement and a safe place to learn!

edited Jun 8 at 4:54

answered Jun 8 at 4:40

Duane Whitty

687 bronze badges

Code downloads a text file from a website, saves it to local disk, and then loads it into a list for further processing - Version 2.0

In my new version of this code I have separated my code into 3 modules (my start at a 12 factor app):

download.py for handling downloading the text file from the website and saving it as a file to local storage;

config.py for specifying the URI of the website and the filename for local storage;

moby.py is the actual code that reads the words in the text file, 1 per line, into a list. For now all it does is prints out the words from the file, one per line.

The review my code received provided valuable suggestions on how it could be made more Pythonic, more modular, and more efficient.

Motivated by Hans-Martin Mosner to separate the file download code here is that module. Also made the chunk_size a parameter to the save_webpagecontent() function based on as suggested by Peilonrayz

download.py

import requests
from typing import List

Response = requests.Response

def get_webpage(uri) -> Response:
 return requests.get(uri)


def save_webpagecontent(r: Response, filename: str, chunk_size=8388608) -> None:
 """
 This function saves the page retrieved by get_webpage.
 r is the response from the call to requests.get.
 filename is where we want to save the file to in the filesystem.
 chunk_size is the number of bytes to write to disk in each chunk
 """

 with open(filename, 'wb') as fd:
 for chunk in r.iter_content(chunk_size):
 fd.write(chunk)

config.py

uri = 'https://ia802308.us.archive.org/7/items/mobywordlists03201gut/CROSSWD.TXT'
filename = 'wordlist.txt'

moby.py

import download_file as df
import config as cfg
from contextlib import contextmanager
from typing import List

filename = cfg.filename
uri = cfg.uri

@contextmanager
def opened_w_error(filename, mode="r"):
 try:
 f = open(filename, mode)
 except OSError as err:
 yield None, err
 else:
 try:
 yield f, None
 finally:
 f.close()


def read_wordlist(filename: str) -> List[str]:
 with opened_w_error(filename, 'r') as (fd, err):
 if type(err) == FileNotFoundError:
 df.save_webpagecontent(df.get_webpage(uri), filename) #since it failed the first time we need to actually download it
 with opened_w_error(filename, 'r') as (fd, err): # if it fails again abort
 if err:
 print("OSError:", err)
 else:
 return fd.readlines()
 else:
 return fd.readlines()


def print_mylist(wordlist: List[str]) -> None:
 print('n'.join(word.strip() for word in wordlist))


print_mylist(read_wordlist(filename)[:50])

Thank you to everyone, especially Roland Illig, Hans-Martin Mosner, and Mast for all your help and encouragement and a safe place to learn!

edited Jun 8 at 4:54

answered Jun 8 at 4:40

Duane Whitty

687 bronze badges

edited Jun 8 at 4:54

answered Jun 8 at 4:40

Duane Whitty

687 bronze badges

answered Jun 8 at 4:40

Duane Whitty

687 bronze badges

answered Jun 8 at 4:40

Duane Whitty

687 bronze badges

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Code Review Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Otdfbt

3 Answers
3

Your Answer

Post as a guest

3 Answers
3

3 Answers
3

Post as a guest

Popular posts from this blog

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

3 Answers 3

3 Answers 3

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

3 Answers
3

3 Answers
3

3 Answers
3